| Literature DB >> 33324401 |
Abstract
The killer-cell immunoglobulin-like receptor (KIR) proteins evolve to fight viruses and mediate the body's reaction to pregnancy. These roles provide selection pressure for variation at both the structural/haplotype and base/allele levels. At the same time, the genes have evolved relatively recently by tandem duplication and therefore exhibit very high sequence similarity over thousands of bases. These variation-homology patterns make it impossible to interpret KIR haplotypes from abundant short-read genome sequencing data at population scale using existing methods. Here, we developed an efficient computational approach for in silico KIR probe interpretation (KPI) to accurately interpret individual's KIR genes and haplotype-pairs from KIR sequencing reads. We designed synthetic 25-base sequence probes by analyzing previously reported haplotype sequences, and we developed a bioinformatics pipeline to interpret the probes in the context of 16 KIR genes and 16 haplotype structures. We demonstrated its accuracy on a synthetic data set as well as a real whole genome sequences from 748 individuals from The Genome of the Netherlands (GoNL). The GoNL predictions were compared with predictions from SNP-based predictions. Our results show 100% accuracy rate for the synthetic tests and a 99.6% family-consistency rate in the GoNL tests. Agreement with the SNP-based calls on KIR genes ranges from 72%-100% with a mean of 92%; most differences occur in genes KIR2DS2, KIR2DL2, KIR2DS3, and KIR2DL5 where KPI predicts presence and the SNP-based interpretation predicts absence. Overall, the evidence suggests that KPI's accuracy is 97% or greater for both KIR gene and haplotype-pair predictions, and the presence/absence genotyping leads to ambiguous haplotype-pair predictions with 16 reference KIR haplotype structures. KPI is free, open, and easily executable as a Nextflow workflow supported by a Docker environment at https://github.com/droeatumn/kpi.Entities:
Keywords: genotype; haplotype; interpretation; killer-cell immunoglobulin-like receptor; natural killer; whole genome sequencing (WGS)
Year: 2020 PMID: 33324401 PMCID: PMC7727328 DOI: 10.3389/fimmu.2020.583013
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Reference haplotype definitions. Haplotype numeric labels (Jiang et al. 2012) are shown with their definition via gene counts. Following Jiang et al. convention, some haplotypes (e.g., 7, 9) are distinguished by KIR2DS3/KIR2DS5 alleles instead of structural differences. In this study, some haplotypes (e.g., 1, 2) are combined, as KIR2DS4 full/deleted alleles are not considered in KPI’s genotyping.
Reference haplotype names and frequencies.
| Jiang et al. 2012 # | informal names | Jiang et al. 2012 freq. |
|---|---|---|
| 1/2 | cA01~tA01 | 55.2% |
| 3 | cA01~tB01_2DS5 | 10.9% |
| 11 | cA01~tB01_2DS3 | 1.4% |
| 4/5 | cB02~tA01 | 12.8% |
| 6/10 | cB01~tA01_2DS3 | 6.9% |
| 25 | cB01~tA01_2DS5 | 0.1% |
| 7 | cB01~tB01_2DS3_2DS5 | 2.6% |
| 9 | cB01~tB01_2DS3_2DS3 | 2.1% |
| 8 | cB02~tB01_2DS5 | 2.1% |
| 17 | cB02~tB01_2DS3 | 0.3% |
| 12 | cB04~tB03_2DS5 | 0.8% |
| 18 | cB04~tB03_2DS3 | 0.3% |
| 13 | cB01~tB05 | 0.7% |
| 15 | cB05~tB01 | 0.4% |
| 16 | cA01~tB05 | 0.3% |
| 21 | cB05~tA01 | 0.2% |
| sum | 97.0% |
The first column contains the numeric label assigned to haplotypes in Jiang et al. (2012). Column 2 contains the informal names along with a reference frequency in column 3.
Results of synthetic tests.
| haplotype 1 structure | haplotype 2 structure | haplotype 1 GenBank accession | haplotype 2 GenBank accession | KPI haplotype prediction | haplotypes consistent w/ truth? | gene prediction accuracy |
|---|---|---|---|---|---|---|
| cA01~tA01 | cA01~tA01 | GU182344 | GU182340 | cA01~tA01+cA01~tA01 | Y | 100% |
| cA01~tB01 | cA01~tA01 | KU645197 | GU182360 | cA01~tA01+cA01~tB01 or | Y | 100% |
| cB01~tA01 | cA01~tA01 | GU182351 | NC000019.10 | cA01~tA01+cB01~tA01 | Y | 100% |
| cB01~tB01 | cB02~tA01 | GU182339 | GU182353 | 10 possibilities, including | Y | 100% |
| cB02~tA01 | cA01~tA01 | GU182341 | KP420442 | cA01~tA01+cB02~tA01 | Y | 100% |
| cB02~tB01 | cA01~tA01 | GU182359 | KP420439 | 9 possibilities, including | Y | 100% |
The first four columns detail the sequences from which the tests (n=6 haplotype-pairs) were generated. The fifth column is killer-cell immunoglobulin-like receptor probe interpretation’s (KPI’s) haplotype predictions, some of which are summarized for display.
Summary of killer-cell immunoglobulin-like receptor (KIR)*IMP and KIR probe interpretation (KPI) gene predictions.
| gene | reference freq. | KIR*IMP freq. | KPI freq. | KIR*IMP - KPI | KIR*IMP & KPI agreement | KIR*IMP - reference | KPI - reference |
|---|---|---|---|---|---|---|---|
| 2DS2 | 53-54% | 39% | 52% | −13% |
|
|
|
| 2DL2 | 53-54% | 39% | 51% | −12% |
|
|
|
| 2DL3 | 90% | 96% | 92% | 4% | 92% | 6% | 2% |
| 2DP1 | 96% | 99% | 97% | 2% | 97% | 3% | 1% |
| 2DL1 | 96% | 99% | 97% | 2% | 97% | 3% | 1% |
| 3DP1 | 100% | 100% | 100% | 0% | 100% | 0% | 0% |
| 2DL4 | 100% | 100% | 100% | 0% | 100% | 0% | 0% |
| 3DL1 | 93%–94% | 96% | 96% | 0% | 100% | 2% | 2% |
| 3DS1 | 38%–44% | 33% | 35% | −2% | 97% | −5% | −3% |
| 2DL5 | 53%–56% | 38% | 47% | −9% |
|
|
|
| 2DS3 | 30%–31% | 10% | 29% | −18% |
|
|
|
| 2DS5 | 30%–36% | 25% | 27% | −2% | 96% | −5% | −3% |
| 2DS4 | 92%–94% | 96% | 96% | 0% | 100% | 2% | 2% |
| 2DS1 | 43%–44% | 33% | 34% | −1% | 98% | −10% | −9% |
| average | 92% | ||||||
Frequencies relative to Genome of the Netherlands (GoNL) cohort of 748 individuals. The abbreviated gene name is in column 1. Column 2 lists the reference frequencies from The Allele Frequency Net Database. The predicted frequencies from KIR*IMP and KPI are in columns 3 and 4, respectively. The delta between KIR*IMP and KPI is shown in the column 5. Column 6 shows the agreement between KIR*IMP and KPI. Column 7 shows the delta between KIR*IMP and the reference. Column 8 shows the delta between KPI and the reference. Frequencies with differences >10% are in bold.
Confusion matrix of killer-cell immunoglobulin-like receptor (KIR)*IMP and KIR probe interpretation (KPI) gene predictions.
| gene | KIR*IMP:P KPI:A | KIR*IMP:A KPI:P | KIR*IMP:P KPI:P | KIR*IMP:A KPI:A |
|---|---|---|---|---|
| 2DS2 | 8% |
| 32% | 40% |
| 2DL2 | 8% |
| 31% | 41% |
| 2DL3 | 6% | 2% | 90% | 2% |
| 2DP1 | 2% | 0% | 97% | 0% |
| 2DL1 | 2% | 1% | 97% | 0% |
| 3DP1 | 0% | 0% | 100% | 0% |
| 2DL4 | 0% | 0% | 100% | 0% |
| 3DL1 | 0% | 0% | 96% | 4% |
| 3DS1 | 1% | 3% | 32% | 64% |
| 2DL5 | 2% |
| 36% | 51% |
| 2DS3 | 1% |
| 10% | 71% |
| 2DS5 | 1% | 3% | 24% | 72% |
| 2DS4 | 0% | 0% | 96% | 4% |
| 2DS1 | 1% | 1% | 33% | 66% |
Frequencies relative to GoNL cohort size of 748 individuals. The abbreviated gene names are in column 1. Column 2 lists the cases when KIR*IMP calls present (‘P’) and KPI calls absent (‘A’). Column 3 lists the cases when KIR*IMP calls absent (‘A’) and KPI calls present (‘P’). Column 4 is when they both call present. Column 5 is when they both call absent. Discrepancies >10% are in bold.
Comparison of killer-cell immunoglobulin-like receptor (KIR)*IMP highest probability and EM-reduced haplotype prediction frequencies.
| hap | reference frequency | KIR*IMP | KPI w/ EM | KIR*IMP - | KPI w/ EM - | KIR*IMP - |
|---|---|---|---|---|---|---|
| # | reference | reference | KPI w/ EM | |||
| 1 | 55.20% | 71.86% | 59.70% | 16.70% | 4.50% | 12.17% |
| 2 | ||||||
| 3 | 10.90% | 12.57% | 9.60% | 1.70% | −1.29% | 2.95% |
| 11 | 1.40% | 1.47% | 0.60% | 0.00% | −0.82% | 0.85% |
| 4 | 12.80% | 7.49% | 15.30% | −5.30% | 2.55% | -7.83% |
| 5 | ||||||
| 9 | 2.10% | 3.41% | 2.90% | 1.30% | 0.78% | 0.52% |
| 7 | 2.60% | 0.33% | 3.60% | −2.20% | 1.00% | −3.24% |
| 6 | 6.90% | 1.80% | 5.90% | −5.10% | −1.08% | −4.05% |
| 10 | ||||||
| 8 | 2.10% | 0.60% | 0.50% | −1.50% | −1.65% | 0.12% |
| 17 | 0.30% | 0.00% | 0.00% | −0.30% | −0.23% | −0.03% |
| 14* | 2.40% | 0.40% | 0.00% | −2.00% | 0.00% | 0.00% |
| 18* | 0.30% | 0.07% | 0.00% | −0.20% | 0.00% | 0.00% |
| 12* | 0.80% | 0.00% | 0.00% | −0.80% | 0.00% | 0.00% |
| mean | 97.00% | 100.00% | 98.10% |
The table shows the comparison of the predictions between both methods as well as with reference European frequencies from Jiang et al. 2012 (column 2), which is the source of the haplotype numbers (column 1). KIR*IMP’s haplotype frequencies for the 1496 GoNL haplotypes are in column 3; some haplotypes are combined, as the haplotype numbers distinguish KIR2DS4 alleles. Column 4 contains frequencies for EM-reduced KIR probe interpretation (KPI) haplotype predictions Column 5 compares KIR*IMP frequencies with the reference, as column 6 does for EM predictions. Finally, column 7 compares the frequencies of KIR*IMP and the EM-reduced predictions. Haplotypes with a predicted frequency of 0 in both KIR*IMP and KPI are not shown. Haplotypes 14, 18, and 12 are in KIR*IMP’s set of reference haplotypes, but not KPI’s.