| Literature DB >> 25519370 |
Anthony L Hinrichs1, Robert C Culverhouse2, Brian K Suarez3.
Abstract
The ideal genetic analysis of family data would include whole genome sequence on all family members. A strategy of combining sequence data from a subset of key individuals with inexpensive, genome-wide association study (GWAS) chip genotypes on all individuals to infer sequence level genotypes throughout the families has been suggested as a highly accurate alternative. This strategy was followed by the Genetic Analysis Workshop 18 data providers. We examined the quality of the imputation to identify potential consequences of this strategy by comparing discrepancies between GWAS genotype calls and imputed calls for the same variants. Overall, the inference and imputation process worked very well. However, we find that discrepancies occurred at an increased rate when imputation was used to infer missing data in sequenced individuals. Although this may be an artifact of this particular instantiation of these analytic methods, there may be general genetic or algorithmic reasons to avoid trying to fill in missing sequence data. This is especially true given the risk of false positives and reduction in power for family-based transmission tests when founders are incorrectly imputed as heterozygotes. Finally, we note a higher rate of discrepancies when unsequenced individuals are inferred using sequenced individuals from other pedigrees drawn from the same admixed population.Entities:
Year: 2014 PMID: 25519370 PMCID: PMC4143754 DOI: 10.1186/1753-6561-8-S1-S17
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Discrepancies by family
| Sequenced individuals | Nonsequenced individuals | All individuals | |||||||
|---|---|---|---|---|---|---|---|---|---|
| T2DG23 | 32 | 14678 | 32 | 14678 | |||||
| T2DG15 | 41 | 17431 | 41 | 17431 | |||||
| T2DG14 | 40 | 15459 | 40 | 15459 | |||||
| T2DG25 | 33 | 12714 | 33 | 12714 | |||||
| T2DG17 | 20 | 5287 | 22 | 5639 | 256.3 | 42 | 10926 | 260.1 | |
| T2DG20 | 20 | 4977 | 16 | 2943 | 183.9 | 36 | 7920 | 220.0 | |
| T2DG08 | 25 | 5461 | 43 | 9112 | 211.9 | 68 | 14573 | 214.3 | |
| T2DG27 | 17 | 3686 | 18 | 3074 | 170.8 | 35 | 6760 | 193.1 | |
| T2DG02 | 43 | 9108 | 43 | 7922 | 184.2 | 86 | 17030 | 198.0 | |
| T2DG21 | 19 | 3915 | 16 | 2630 | 164.4 | 35 | 6545 | 187.0 | |
| T2DG04 | 38 | 7245 | 190.7 | 25 | 4155 | 166.2 | 63 | 11400 | 181.0 |
| T2DG06 | 39 | 6976 | 25 | 3174 | 127.0 | 64 | 10150 | 158.6 | |
| T2DG03 | 38 | 4675 | 123.0 | 39 | 6943 | 77 | 11618 | 150.9 | |
| T2DG11 | 29 | 5132 | 6 | 774 | 129.0 | 35 | 5906 | 168.7 | |
| T2DG10 | 40 | 5127 | 128.2 | 24 | 4058 | 64 | 9185 | 143.5 | |
| T2DG16 | 26 | 3211 | 123.5 | 22 | 3434 | 48 | 6645 | 138.4 | |
| T2DG47 | 12 | 1785 | 148.8 | 10 | 1547 | 22 | 3332 | 151.5 | |
| T2DG09 | 27 | 3182 | 117.9 | 6 | 878 | 33 | 4060 | 123.0 | |
| T2DG07 | 30 | 3378 | 112.6 | 6 | 867 | 36 | 4245 | 117.9 | |
| T2DG05 | 40 | 4349 | 108.7 | 28 | 3058 | 68 | 7407 | 108.9 | |
Discrepancies in the full comparison single-nucleotide polymorphisms set between GWAS data and GENO data sets, by family, individuals sequenced and individuals imputed. Bold indicates highest discrepancy rate by subsample.
N=number of individuals in the family
D=number of discrepancies observed within the family
D/N=average number of discrepancies observed per family
Discrepancies by process type
| Discrepancy | High call rate SNPs | Full comparison SNPs 451,279 SNPs | |||
|---|---|---|---|---|---|
| Type | Subjects | Genotypes ( | % Discrepant | Genotypes ( | % Discrepant |
| Imputation | 463 | 205,962 | 25.16 | 1,864,804 | 28.82 |
| Sequencing | 463 | 107,780,325 | 0.03 | 197,178,315 | 0.06 |
| Inference | 495 | 116,463,033 | 0.10 | 222,926,764 | 0.20 |
| Inference | 349 | 82,103,861 | 0.07 | 157,186,202 | 0.18 |
| Inference | 146 | 34,359,172 | 0.18 | 65,740,562 | 0.26 |
Discrepancies between genome-wide association study (GWAS) data and GENO data sets, divided by analytical process. "Imputation" fills in missing genotypes in sequence data. "Inference" infers phased sequence data on unsequenced individuals based on GWAS data. SNP, single-nucleotide polymorphism.