| Literature DB >> 35192601 |
Genelle F Harrison1,2, Laura Ann Leaton1,2, Erica A Harrison3, Katherine M Kichula1,2, Marte K Viken4,5, Jonathan Shortt1, Christopher R Gignoux1, Benedicte A Lie4,5, Damjan Vukcevic6,7, Stephen Leslie6,7,8, Paul J Norman1,2.
Abstract
Highly polymorphic interaction of KIR3DL1 and KIR3DS1 with HLA class I ligands modulates the effector functions of natural killer (NK) cells and some T cells. This genetically determined diversity affects severity of infections, immune-mediated diseases, and some cancers, and impacts the course of immunotherapies, including transplantation. KIR3DL1 is an inhibitory receptor, and KIR3DS1 is an activating receptor encoded by the KIR3DL1/S1 gene that has more than 200 diverse and divergent alleles. Determination of KIR3DL1/S1 genotypes for medical application is hampered by complex sequence and structural variation, requiring targeted approaches to generate and analyze high-resolution allele data. To overcome these obstacles, we developed and optimized a model for imputing KIR3DL1/S1 alleles at high-resolution from whole-genome SNP data. We designed the model to represent a substantial component of human genetic diversity. Our Global imputation model is effective at genotyping KIR3DL1/S1 alleles with an accuracy ranging from 88% in Africans to 97% in East Asians, with mean specificity of 99% and sensitivity of 95% for alleles >1% frequency. We used the established algorithm of the HIBAG program, in a modification named Pulling Out Natural killer cell Genomics (PONG). Because HIBAG was designed to impute HLA alleles also from whole-genome SNP data, PONG allows combinatorial diversity of KIR3DL1/S1 with HLA-A and -B to be analyzed using complementary techniques on a single data source. The use of PONG thus negates the need for targeted sequencing data in very large-scale association studies where such methods might not be tractable.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35192601 PMCID: PMC8896733 DOI: 10.1371/journal.pcbi.1009059
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Genomic location of KIR3DL1/S1 and overview of allele imputation workflow.
A. Shows the location of the KIR3DL1/S1 gene on five examples of KIR haplotypes. KIR3DL1/S1 is shaded in blue, and other KIR genes are shaded grey. The KIR3DL1/S1 gene can be absent (haplotype 4) or fused in-frame with KIR3DL2 (haplotype 5) [92]. The human genome coordinates (build hg19) from which classifiers were drawn for imputation are given at the top. B. Schematic of model building, testing and output for the imputation of KIR3DL1/S1 alleles using PONG. Shown are the required input files and their format for model building (blue) and testing (green). Red boxes give an example of the output from the imputation.
Fig 2Optimization of KIR3DL1/S1 allele imputation using data from Europeans.
A. Bar graph shows the KIR3DL1/S1 allele frequencies in the combined EUR population group [78] comprised of 353 individuals from Italy, Finland, United Kingdom, Spain, or Utah. The alleles were determined from short read sequence data [79]. B. Shown is a summary of the results obtained using models tested during optimization. From left to right are the filtered criteria (SNPs or KIR3DL1/S1 alleles), the filtering threshold values, resulting model build time, and accuracy of the imputed genotypes. Grey dotted arrow indicates that the final model was built using MAC < 3 for SNPs and for KIR3DL1/S1 alleles. C. Shows the imputation accuracy for each KIR3DL1/S1 allele present in the final filtered EUR data set. Blue bars indicate the sensitivity (% of times a given allele was called as present when known to be present). Red line indicates specificity (% of times a given allele was called as absent when known to be absent).
Number of KIR3DL1/S1 alleles and individuals in data sets.
|
| ||||
|---|---|---|---|---|
| 1000 Genomes Population Group | All | Global | In Model Set | In Test Set |
| Africa (AFR) | 558 | 541 | 272 | 269 |
| Americas (AMR) | 298 | 292 | 146 | 146 |
| East Asia (EAS) | 406 | 389 | 196 | 193 |
| Europe (EUR) | 353 | 345 | 174 | 171 |
| South Asia (SAS) | 467 | 457 | 229 | 228 |
| Global | 2,082 | 2,024 | 1,017 | 1,007 |
Fig 3Accurate imputation of KIR3DL1/S1 alleles using a Global population model.
A. Bar graphs shows the number of KIR3DL1/S1 alleles present in each of the five broad population groups of the 1,000 Genomes database. The bar colors indicate: (pink) the number of alleles present before filtering, (ruby) by MAC < 3 filtering, and (burgundy) by combining the five groups to form a Global population and then MAC < 3 filtering. The population groups are East Asian (EAS), European (EUR), South Asian (SAS), American (AMR) and African (AFR). B. Shows the imputation accuracy obtained for each of the population group and the Global models. (Within group) the model was built using 50% of the indicated group and tested on the other 50%. (Global) the model was built using 50% of all individuals and tested on the remaining 50% of the specified group. C. and D. Show the imputation efficacy for each allele present in the final Global data set. Blue bars indicate the sensitivity (% of times a given allele was called as present when known to be present). Red line indicates specificity (% of times a given allele was called as absent when known to be absent). Blue dots indicate the KIR3DL1/S1 allele frequencies in the Global population.
Accuracy of PONG across SNP arrays.
| Test Set Parameters | Accuracy (% of Genotypes Imputed Correctly) | ||||
|---|---|---|---|---|---|
| Population | SNP Imputation | Illumina Omni 2.5 | Infinium | Immunochip | MEGA |
| EUR | no | 94% | 49% | 55% | 58% |
| yes | -- | 91% | 91% | 92% | |
| EAS | no | 96% | 61% | 72% | 78% |
| yes | -- | 81% | 89% | 92% | |
| AFR | no | 88% | 28% | 54% | 37% |
| yes | -- | 78% | 80% | 84% | |
Genome window chr19: 55,100,000–55,500,000 (hg19).
Model and test sets supplemented with imputed Illumina Omni 2.5 genotypes.
Fig 4Accurate imputation of KIR3DL1/S1 alleles from Immunochip SNP data.
Bar graph shows the efficiency of KIR3DL1/S1 allele imputation using a model built and tested on a cohort from Norway who also had their KIR3DL1/S1 alleles genotyped to high resolution. Blue bars indicate the sensitivity (% of times a given allele was called as present when known to be present). Red line indicates specificity (% of times a given allele was called as absent when known to be absent).