| Literature DB >> 24744770 |
Kaitlyn Cook1, Alejandra Benitez2, Casey Fu3, Nathan Tintle4.
Abstract
The new class of rare variant tests has usually been evaluated assuming perfect genotype information. In reality, rare variant genotypes may be incorrect, and so rare variant tests should be robust to imperfect data. Errors and uncertainty in SNP genotyping are already known to dramatically impact statistical power for single marker tests on common variants and, in some cases, inflate the type I error rate. Recent results show that uncertainty in genotype calls derived from sequencing reads are dependent on several factors, including read depth, calling algorithm, number of alleles present in the sample, and the frequency at which an allele segregates in the population. We have recently proposed a general framework for the evaluation and investigation of rare variant tests of association, classifying most rare variant tests into one of two broad categories (length or joint tests). We use this framework to relate factors affecting genotype uncertainty to the power and type I error rate of rare variant tests. We find that non-differential genotype errors (an error process that occurs independent of phenotype) decrease power, with larger decreases for extremely rare variants, and for the common homozygote to heterozygote error. Differential genotype errors (an error process that is associated with phenotype status), lead to inflated type I error rates which are more likely to occur at sites with more common homozygote to heterozygote errors than vice versa. Finally, our work suggests that certain rare variant tests and study designs may be more robust to the inclusion of genotype errors. Further work is needed to directly integrate genotype calling algorithm decisions, study costs and test statistic choices to provide comprehensive design and analysis advice which appropriately accounts for the impact of genotype errors.Entities:
Keywords: SKAT; dosage; gene-based; genotype uncertainty; misclassification
Year: 2014 PMID: 24744770 PMCID: PMC3978329 DOI: 10.3389/fgene.2014.00062
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Regression model coefficients relating power loss/gain to simulation parameters.
| 1 | Length | −0.04 | −0.02 | −0.05 | 0.015 | 0.010 | 0.011 | 0.053 | 0.070 | 0.046 | −0.007 | −0.010 | −0.006 |
| Joint | −0.05 | −0.04 | −0.05 | 0.028 | 0.031 | 0.023 | 0.034 | 0.049 | 0.030 | 0.017 | 0.024 | 0.015 | |
| 2 | Length | −0.04 | −0.03 | −0.05 | 0.017 | 0.014 | 0.013 | 0.050 | 0.062 | 0.043 | −0.009 | −0.011 | −0.007 |
| Joint | −0.05 | −0.04 | −0.05 | 0.028 | 0.032 | 0.024 | 0.030 | 0.040 | 0.026 | 0.014 | 0.018 | 0.012 | |
| 4 | Length | −0.04 | −0.03 | −0.05 | 0.021 | 0.020 | 0.017 | 0.045 | 0.054 | 0.039 | −0.009 | −0.013 | −0.008 |
| Joint | −0.04 | −0.03 | −0.05 | 0.024 | 0.027 | 0.019 | 0.023 | 0.029 | 0.020 | 0.009 | 0.012 | 0.008 | |
| 8 | Length | −0.03 | −0.03 | −0.04 | 0.017 | 0.018 | 0.014 | 0.028 | 0.039 | 0.026 | −0.010 | 0.011 | −0.007 |
| Joint | −0.03 | −0.03 | −0.03 | 0.017 | 0.020 | 0.014 | 0.015 | 0.017 | 0.013 | 0.005 | 0.005 | 0.005 | |
p < 0.05;
p < 0.01;
p < 0.001.
Increase in power for a 1% increase in error rate for .
Increase in power for a 0.1% increase in average MAF across all variant sites in the gene. For example, for J.
Increase in power for a 10% point increase in the number of risk increasing SNPs. For example, for J.
Increase in power for a 10% point increase in the number of risk reducing SNPs. For example, for J.
Figure 1Power for . Power loss occurs when non-differential genotype errors are present at a locus. The power curves illustrated are at a site with eight causal variants. As genotype errors increase, power loss occurs. However, the power loss is most substantial when the minor allele frequency is the lowest.
Regression model coefficients relating type I error loss/gain to simulation parameters.
| 1 | Length | 0.08 | 0.01 | 0.07 | −0.004 | 0.008 | −0.005 | 0.12 | −0.004 | 0.11 |
| Joint | 0.13 | 0.02 | 0.12 | −0.005 | 0.009 | −0.009 | 0.10 | 0.022 | 0.10 | |
| 2 | Length | 0.08 | 0.01 | 0.07 | −0.003 | 0.007 | −0.005 | 0.11 | −0.006 | 0.11 |
| Joint | 0.13 | 0.02 | 0.12 | −0.005 | 0.009 | −0.009 | 0.10 | 0.022 | 0.10 | |
| 4 | Length | 0.08 | 0.01 | 0.08 | −0.002 | 0.007 | −0.004 | 0.10 | −0.005 | 0.10 |
| Joint | 0.12 | 0.01 | 0.11 | −0.004 | 0.008 | −0.008 | 0.10 | 0.020 | 0.09 | |
| 8 | Length | 0.06 | 0.005 | 0.06 | −0.001 | 0.004 | −0.003 | 0.10 | −0.003 | 0.10 |
| Joint | 0.10 | 0.01 | 0.10 | −0.003 | 0.005 | −0.007 | 0.09 | 0.013 | 0.09 | |
p < 0.05;
p < 0.01;
p < 0.001.
Increase in type I error rate for a 1% increase in error rate for .
Increase in type I error rate for a 0.1% increase in average MAF across all variant sites in the gene. For example, for J.
Increase in type I error for a 10% increase in the relative difference between the ratio of the case to control error rate. For example, for J.
Figure 2Type 1 error rate for . As differential error rates increase, the type I error rate increases. The type I error rate is illustrated at a site with eight non-causal variants. As differential (20% higher in cases) genotype error rates increased, the type I error rate increased. When the MAF was low, this effect was even larger.
Figure 3Higher norms are more robust to genotype errors when the proportion of non-causal variants is larger: length tests. The figure illustrates power of four different (norm) length statistics, under varying error models. All test statistics experience power loss in the presence of errors. However, power loss can be mitigated through the use of high norm test statistics.
Figure 4Higher norms are more robust to genotype errors when the proportion of non-causal variants is larger: joint tests. The figure illustrates power of four different (norm) joint statistics, under varying error models. All test statistics experience power loss in the presence of errors. However, power loss can be mitigated through the use of high norm test statistics.
Proportion of simulation settings and average MAF, within each absolute difference subcategory.
| <0.05 | Percentage of settings (Count/Total) | 98.9% (137/140) | 83.6% (117/140) | 88.0% (154/175) | 73.1% (128/175) |
| Mean control MAF | 0.6% | 0.6% | 0.6% | 0.7% | |
| 0.05–0.1 | Percentage of settings (Count/Total) | 2.1% (3/140) | 13.6% (19/140) | 5.7% (10/175) | 15.4% (27/175) |
| Mean control MAF | 0.01% | 0.1% | 0.3% | 0.2% | |
| >0.1 | Percentage of settings (Count/Total) | 0 | 2.9% (4/140) | 6.3% (11/175) | 11.4% (20/175) |
| Mean control MAF | – | 0.3% | 0.004% | 0.1% | |