| Literature DB >> 35073846 |
Albert Rosenberger1,2, Viola Tozzi3, Heike Bickeböller3.
Abstract
BACKGROUND: Imputation of untyped markers is a standard tool in genome-wide association studies to close the gap between directly genotyped and other known DNA variants. However, high accuracy with which genotypes are imputed is fundamental. Several accuracy measures have been proposed and some are implemented in imputation software, unfortunately diversely across platforms. In the present paper, we introduce Iam hiQ, an independent pair of accuracy measures that can be applied to dosage files, the output of all imputation software. Iam (imputation accuracy measure) quantifies the average amount of individual-specific versus population-specific genotype information in a linear manner. hiQ (heterogeneity in quantities of dosages) addresses the inter-individual heterogeneity between dosages of a marker across the sample at hand.Entities:
Keywords: Accuracy measures; GWAS; Genotype imputation; High-throughput genotyping
Mesh:
Year: 2022 PMID: 35073846 PMCID: PMC8785528 DOI: 10.1186/s12859-022-04568-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Iam by hiQ. Main panel: all markers by Iam vs. hiQ; blue dots: variants with info < 0.5; red dots: variants with 0.5 ≤ info < 0.8; 8; green dots: variants with info ≥ 0.8; dotted line: robust 99.9999999% bivariate normal random interval (assuming a two-dimensional normal distribution). The oversized grey bubble in the top right corner represents the vast majority of almost fully-informative markers with Iam ≥ 0.99 and hiQ ≥ 0.99; inserted panel: like main panel, but marker are divided according to the minor allele frequency
Classification of markers by Iam and hiQ
| < 0.47 | ≥ 0.47 | ||||
|---|---|---|---|---|---|
| N | %a | N | %a | ||
| All markers | |||||
| < 0.97 | 59,077 | 0.6% | 59,130 | 0.6% | |
| ≥ 0.97 | 1,214,620 | 11.7% | 9,094,772 | 87.2% | |
| Quality defined by | |||||
| < 0.97 | 59,077 | 1.2% | 58,592 | 1.1% | |
| ≥ 0.97 | 1,214,612 | 23.8% | 3,777,566 | 73.9% | |
| < 0.97 | – | 538 | < 0.1% | ||
| ≥ 0.97 | 8 | < 0.1% | 5,317,206 | 99.9% | |
| Minor allele frequency (MAF) | |||||
| < 1% | < 0.97 | 15,366 | 0.3% | 136 | < 0.1% |
| ≥ 0.97 | 1,210,505 | 23.2% | 4,000,616 | 76.5% | |
| 1% to < 5% | < 0.97 | 13,448 | 0.9% | 12,742 | 0.8% |
| ≥ 0.97 | 2,317 | 0.2% | 1,472,328 | 98.1% | |
| 5% to < 10% | < 0.97 | 4,007 | 0.6% | 12,117 | 1.8% |
| ≥ 0.97 | 7 | < 0.1% | 638,931 | 97.5% | |
| 10% to < 30% | < 0.97 | 8,441 | 0.6% | 18,576 | 1.4% |
| ≥ 0.97 | 10 | < 0.1% | 1,288,714 | 97.9% | |
| 30% to 50% | < 0.97 | 9,283 | 1.3% | 4,188 | 0.6% |
| ≥ 0.97 | 261 | < 0.1% | 721,679 | 98.1% | |
| > 50% | < 0.97 | 8,532 | 0.9% | 11,371 | 1.1% |
| ≥ 0.97 | 1,520 | 0.2% | 972,504 | 97.8% | |
Thresholds for Iam (0.47) and hiQ (0.97) were defined according to a robust 99.9999999% bivariate normal random interval (assuming a two-dimensional normal distribution)
aProportion within tabulated subgroup of markers
Fig. 2Manhattan-like-plot: Iam hiQ. Upper panel: (low Q.: = 0; high Q.: = 1; Thresholds hiQ (cutoff = 0.97); lower panel: (low Q.: = 0; high Q.: = 1; Thresholds Iam cutoff = 0.47): Thresholds were defined according a robust 99.9999999% bivariate normal random interval (assuming a two-dimensional normal distribution)
Fig. 3Manhattan-like-plot: Iam hiQ: chromosome 1. Upper panel: (low Q.: = 0; high Q.: = 1; Thresholds hiQ (cutoff = 0.97); lower panel: (low Q.: = 0; high Q.: = 1; Thresholds Iam cutoff = 0.47): Thresholds were defined according a robust 99.9999999% bivariate normal random interval (assuming a two-dimensional normal distribution); red flames indicate “very hot” regions; orange flames indicate “hot” regions
Correlation between accuracy measures
| – | 0.684 | 0.944 | 0.484 | |
| 0.405 | – | 0.367 | 0.156 | |
| 0.976 | 0.686 | – | 0.050 | |
| 0.305 | 0.335 | 0.051 | – |
Right upper triangle: Pearson’s correlation coefficient, left lower triangle: Spearman’ rank correlation coefficient
Fig. 4From dosages to Iam-indices. MAF/f minor allele frequency; HWE Hardy–Weinberg equilibrium; Iam imputation accuracy measure
Q and Iam by MAF
| MAF | Qchance | QHWE | |
|---|---|---|---|
| 50% | 0.667 | 0.625 | 0.0625 |
| 40% | 0.667 | 0.614 | 0.0784 |
| 30% | 0.667 | 0.575 | 0.1369 |
| 20% | 0.667 | 0.486 | 0.2704 |
| 10% | 0.667 | 0.311 | 0.5329 |
| 5% | 0.667 | 0.176 | 0.7353 |
| 1% | 0.667 | 0.039 | 0.9415 |
| 0.1% | 0.667 | 0.0040 | 0.9940 |
| 0.01% | 0.667 | 0.0004 | 0.9994 |
| 0.001% | 0.667 | 0.00004 | 0.9999 |
| 0.0001% | 0.667 | 0.000004 | 1.0000 |
MAF: minor allele frequency (), Qchance refers to a dosage of ; QHWE refers to a dosage of
Inter-individual heterogeneity of dosages: example
| ID | Marker 1 | Marker 2 | ||||
|---|---|---|---|---|---|---|
| 1 | 0.6 | 0.3 | 0.1 | 1 | 0 | 0 |
| 2 | 0.6 | 0.3 | 0.1 | 1 | 0 | 0 |
| 3 | 0.6 | 0.3 | 0.1 | 1 | 0 | 0 |
| 4 | 0.6 | 0.3 | 0.1 | 1 | 0 | 0 |
| 5 | 0.6 | 0.3 | 0.1 | 1 | 0 | 0 |
| 6 | 0.6 | 0.3 | 0.1 | 1 | 0 | 0 |
| 7 | 0.6 | 0.3 | 0.1 | 0 | 1 | 0 |
| 8 | 0.6 | 0.3 | 0.1 | 0 | 1 | 0 |
| 9 | 0.6 | 0.3 | 0.1 | 0 | 1 | 0 |
| 10 | 0.6 | 0.3 | 0.1 | 0 | 0 | 1 |
| Avg | 0.6 | 0.3 | 0.1 | 0.6 | 0.3 | 0.1 |