| Literature DB >> 25519372 |
Sunah Song1, Robert Shields1, Xin Li2, Jing Li1.
Abstract
We developed a general framework for family-based imputation using single-nucleotide polymorphism data and sequence data distributed by Genetic Analysis Workshop 18. By using PedIBD, we first inferred haplotypes and inheritance patterns of each family from SNP data. Then new variants in unsequenced family members can be obtained from sequenced relatives through their shared haplotypes. We then compared the results of our method against the imputation results provided by Genetic Analysis Workshop organizers. The results showed that our strategy uncovered more variants for more unsequenced relatives. We also showed that recombination breakpoints inferred by PedIBD have much higher resolution than those inferred from previous studies.Entities:
Year: 2014 PMID: 25519372 PMCID: PMC4143700 DOI: 10.1186/1753-6561-8-S1-S20
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1The imputation framework (left) and an example illustrating the imputation procedure (right). In the example, individuals with grey color have single-nucleotide polymorphism (SNP) chip data from genome-wide association studies (GWAS), and individuals with black color have both chip data and sequence data. The haplotypes in this segment are labelled using different colors and they are inferred based on GWAS data. Notice that both haplotypes of individual 949 and one haplotype of individual 957 can be recovered based on the information of their children (the missed haplotype is illustrated using a thin black bar). However, only one haplotype can be recovered for 957 because he only has one child. The two variants are from sequence data (1 and 2 are alleles, and 0 is missing). For the first variant, because member 974 is homozygous genotype (1, 1), the alleles on its two haplotypes (pink and dark blue) can be assigned. Subsequently, the alleles on the light blue haplotype of member 940, the yellow haplotype of member 956, and the green haplotype of member 939 can be resolved (all three have sequence data). For all the other members, their alleles can be imputed based on the color of their haplotypes. However, haplotype light green (in members 949, 959, and 960) cannot be imputed because it has not occurred in any sequenced individual, thus showing missing one allele. For the second variant, our algorithm will identify a conflict because member 974 assigns allele 2 to the pink haplotype, and member 939 assigns allele 1 to the pink haplotype.
Figure 2The pedigree structure of family 21. The figure shows the pedigree information of family 21 and one haplotype segment inferred by PedIBD for each individual. It also shows which individuals have been imputed and 5 masked individuals who selected for imputation accuracy with asterisks. The legends are the same as those in Figure 1 (right).
Missing rate and Inconsistency between whole genome sequencing and genome-wide association studies
| Family ID | Total number of individuals | Missing rate | Genotype inconsistency between GWAS and WGS | ||||||
|---|---|---|---|---|---|---|---|---|---|
| All | GWAS | WGS | GWAS | WGS | GWAS and WGS | Cause of genotype inconsistency | |||
| 65,500 | 1,697,985 | 63,803 | Missing in GWAS | Missing in WGS | Mismatch | ||||
| 2 | 107 | 86 | 43 | 1.18% | 2.31% | 3.20% | 67.55% | 29.72% | 2.73% |
| 3 | 98 | 77 | 38 | 0.17% | 1.66% | 0.84% | 19.98% | 72.31% | 7.71% |
| 4 | 97 | 64 | 39 | 0.17% | 2.34% | 1.38% | 15.80% | 73.59% | 10.60% |
| 5 | 91 | 68 | 40 | 0.13% | 1.56% | 0.82% | 20.41% | 72.47% | 7.12% |
| 6 | 88 | 64 | 39 | 0.84% | 2.06% | 2.07% | 59.85% | 36.38% | 3.77% |
| 7 | 89 | 36 | 30 | 14.59% | 1.58% | 17.58% | 96.60% | 2.95% | 0.45% |
| 8 | 84 | 68 | 25 | 3.31% | 2.18% | 9.23% | 89.44% | 9.33% | 1.23% |
| 9 | 81 | 33 | 27 | 13.34% | 1.50% | 16.38% | 96.63% | 2.90% | 0.47% |
| 10 | 83 | 64 | 40 | 2.42% | 1.86% | 4.43% | 81.79% | 16.71% | 1.50% |
| 11 | 76 | 35 | 29 | 20.57% | 1.88% | 24.81% | 96.92% | 2.63% | 0.45% |
| 16 | 59 | 48 | 26 | 0.10% | 1.62% | 0.82% | 14.04% | 78.04% | 7.93% |
| 17 | 57 | 42 | 20 | 0.30% | 2.58% | 1.45% | 18.22% | 75.43% | 6.35% |
| 20 | 51 | 36 | 20 | 0.36% | 2.12% | 1.43% | 28.55% | 65.70% | 5.76% |
| 21 | 50 | 35 | 19 | 0.11% | 2.49% | 1.26% | 4.83% | 90.21% | 4.96% |
| 27 | 44 | 35 | 17 | 0.16% | 2.22% | 1.27% | 14.98% | 77.98% | 7.04% |
| 47 | 27 | 22 | 12 | 0.09% | 2.06% | 1.04% | 9.89% | 84.35% | 5.76% |
| 1182 | 813 | 464 | *3.61% | *2.00% | *5.50% | *45.97% | *49.42% | *4.61% | |
| ^0.72% | ^2.08% | ^2.25% | ^34.26% | ^60.17% | ^5.57% | ||||
Asterisks (*) and Carets (^) indicate averaged numbers across families, with (*) and without (^) families 7, 9, and 11.
Figure 3Imputation results from our approach and data provided by Genetic Analysis Workshop (GAW). All numbers are averaged on individual. A) Imputation results of our method and GAW as the number of imputed variants and individuals. B) Overlaps between two sets. C) Itemized comparison between two sets.
Pedigree information of masked individual and imputation accuracy.
| Masked individual ID | Pedigree information | Accuracy (%) | |||
|---|---|---|---|---|---|
| First-degree relationship | Second-degree relationship | ||||
| Parents | Children | Siblings | Half-sibling, grandparent, grandchildren, aunt and uncle, niece and nephew | ||
| 2(U) | 3(GS) | 0 | 5(G) + 1(GS) | 99.43 | |
| 2(U) | 0 | 1(U) +1(G)+1(GS) | 10(G) + 2(GS) | 98.33 | |
| 1(U)+1(GS) | 0 | 0 | 3(U) + 1(G) + 1(GS) | 96.33 | |
| 1(U)+1(GS) | 0 | 0 | 1(U) + 3(G) + 2(GS) | 95.72 | |
| 2(GS) | 0 | 4(G) | 3(G) + 1(GS) | 91.37 | |
G, genotyped only, GS, genotyped + sequenced; U, untyped.