| Literature DB >> 27328765 |
Qi Yan1, Rui Chen2, James S Sutcliffe3, Edwin H Cook4, Daniel E Weeks5, Bingshan Li2, Wei Chen1,5.
Abstract
Family-based sequencing studies have unique advantages in enriching rare variants, controlling population stratification, and improving genotype calling. Standard genotype calling algorithms are less likely to call rare variants correctly, often mistakenly calling heterozygotes as reference homozygotes. The consequences of such non-random errors on association tests for rare variants are unclear, particularly in transmission-based tests. In this study, we investigated the impact of genotyping errors on rare variant association tests of family-based sequence data. We performed a comprehensive analysis to study how genotype calling errors affect type I error and statistical power of transmission-based association tests using a variety of realistic parameters in family-based sequencing studies. In simulation studies, we found that biased genotype calling errors yielded not only an inflation of type I error but also a power loss of association tests. We further confirmed our observation using exome sequence data from an autism project. We concluded that non-symmetric genotype calling errors need careful consideration in the analysis of family-based sequence data and we provided practical guidance on ameliorating the test bias.Entities:
Mesh:
Year: 2016 PMID: 27328765 PMCID: PMC4916415 DOI: 10.1038/srep28323
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The total transmitted and non-transmitted alleles over all 182,799 SNPs for single SNP TDT test in type I error rate simulation studies (each SNP could have none or multiple transmitted and non-transmitted alleles).
| Null | r2 = 0; r1 = 1% Parents | r2 = 0; r1 = 5% Parents | r2 = 0; r1 = 10% Parents | |
|---|---|---|---|---|
| Transmitted | 374,502 (47%) | 370,753 (47%) | 355,759 (47%) | 337,142 (47%) |
| Non-transmitted | 427,972 (53%) | 423,684 (53%) | 406,135 (53%) | 384,324 (53%) |
| r2 = 0; r1 = 1% Offspring | r2 = 0; r1 = 5% Offspring | r2 = 0; r1 = 10% Offspring | ||
| Transmitted | 370,880 (46%) | 356,113 (44%) | 338,061 (42%) | |
| Non-transmitted | 431,594 (54%) | 446,354 (56%) | 464,397 (58%) | |
| r1 = 0; r2 = 0.1% Parents | r1 = 0; r2 = 0.5% Parents | r1 = 0; r2 = 1% Parents | ||
| Transmitted | 374,502 (45%) | 374,502 (38%) | 374,502 (32%) | |
| Non-transmitted | 463,784 (55%) | 607,270 (62%) | 785,472 (68%) | |
| r1 = 0; r2 = 0.1% Offspring | r1 = 0; r2 = 0.5% Offspring | r1 = 0; r2 = 1% Offspring | ||
| Transmitted | 374,912 (47%) | 376,561 (47%) | 378,642 (47%) | |
| Non-transmitted | 427,562 (53%) | 425,913 (53%) | 423,832 (53%) |
Figure 1QQ plots for type I error rate simulation studies (gTDT results) with different scenarios of error patterns.
We considered four scenarios to mimic this error pattern: 1. r2 (the error rate of calling homozygote 0/0 as heterozygote 0/1) = 0; r1 (the error rate of calling heterozygote 0/1 as homozygote 0/0) = 1%, 5% or 10% in parents; 2. r2 = 0; r1 = 1%, 5% or 10% in offspring; 3. r1 = 0; r2 = 0.1%, 0.5% or 1% in parents; 4. r1 = 0; r2 = 0.1%, 0.5% or 1% in offspring. The 95% point-wise confidence band (gray area) is computed under the assumption of the p-values being drawn independently from a uniform [0, 1] distribution.
The total transmitted and non-transmitted alleles over all 19,103 SNPs for single SNP TDT test in power simulation studies (each SNP could have none or multiple transmitted and non-transmitted alleles).
| Original | r2 = 0; r1 = 1% Parents | r2 = 0; r1 = 5% Parents | r2 = 0; r1 = 10% Parents | |
|---|---|---|---|---|
| Transmitted | 17,225 (61%) | 17,050 (61%) | 16,361 (61%) | 15,463 (61%) |
| Non-transmitted | 11,112 (39%) | 11,000 (39%) | 10,518 (39%) | 9,981 (39%) |
| r2 = 0; r1 = 1% Offspring | r2 = 0; r1 = 5% Offspring | r2 = 0; r1 = 10% Offspring | ||
| Transmitted | 17,032 (60%) | 16,349 (58%) | 15,559 (55%) | |
| Non-transmitted | 11,305 (40%) | 11,988 (42%) | 12,778 (45%) | |
| r1 = 0; r2 = 0.1% Parents | r1 = 0; r2 = 0.5% Parents | r1 = 0; r2 = 1% Parents | ||
| Transmitted | 17,225 (54%) | 17,225 (36%) | 17,224 (26%) | |
| Non-transmitted | 14,938 (46%) | 30,096 (64%) | 48,833 (74%) | |
| r1 = 0; r2 = 0.1% Offspring | r1 = 0; r2 = 0.5% Offspring | r1 = 0; r2 = 1% Offspring | ||
| Transmitted | 17,236 (61%) | 17,817 (63%) | 17,349 (61%) | |
| Non-transmitted | 11,101 (39%) | 10,520 (37%) | 10,988 (39%) |
The total transmitted and non-transmitted alleles for single SNP TDT test in chromosome 1 from 116 parent-offspring trios from the autism study.
| 60x | 12x | 6x | |
|---|---|---|---|
| Transmitted | 108,467 (47%) | 48,769 (40%) | 19,454 (32%) |
| Non-transmitted | 124,184 (53%) | 72,287 (60%) | 41,744 (68%) |
Figure 2QQ plots for genes (gTDT results) in chromosome 1 from 116 parent-offspring trios from the autism study and only genotypes with GQ > 5 are used.
The 95% point-wise confidence band (gray area) is computed under the assumption of the p-values being drawn independently from a uniform [0, 1] distribution. (A) Variant calling was carried out by GATK best-practice pipeline with different depths; (B) Variant calling was carried out by GATK best-practice pipeline, Beagle4 and Polymutt with the same depth of 6x.
Figure 3The impact of genotyping bias on different lengths of genes (gTDT results).
(A) QQ plots for genes including more than 100 variants with different depths; (B) QQ plots for genes including less than 50 variants with different depths.