| Literature DB >> 18466519 |
Claire Bardel1, Pascal Croiseau, Emmanuelle Génin.
Abstract
We recently described a new method to identify disease susceptibility loci, based on the analysis of the evolutionary relationships between haplotypes of cases and controls. However, haplotypes are often unknown and the problem of phase inference is even more crucial when there are missing data. In this work, we suggest using a multiple imputation algorithm to deal with missing phase and missing data, prior to a phylogeny-based analysis. We used the simulated data of Genetic Analysis Workshop 15 (Problem 3, answer known) to assess the power of the phylogeny-based analysis to detect disease susceptibility loci after reconstruction of haplotypes by a multiple-imputation method. We compare, for various rates of missing data, the performance of the multiple imputation method with the performance achieved when considering only the most probable haplotypic configurations or the true phase. When only the phase is unknown, all methods perform approximately the same to identify disease susceptibility sites. In the presence of missing data however, the detection of disease susceptibility sites is significantly better when reconstructing haplotypes by multiple imputation than when considering only the best haplotype configurations.Entities:
Year: 2007 PMID: 18466519 PMCID: PMC2367603 DOI: 10.1186/1753-6561-1-s1-s22
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Power to identify the true susceptibility loci using different methods to infer phases
| % Power (95% confidence interval) | |||
| Method | Phaseda | Most likely haplotypesb | Imputationc |
| 2 sites | 81 (73.3–88.7) | 84 (76.8–91.2) | 78 (69.9–86.1) |
| 1 site onlyd | 12 (5.6–18.4) | 8 (2.7–13.3) | 13 (6.4–19.6) |
| 1 site + 1 error | 7 (2.0–12.0) | 8 (2.7–13.3) | 9 (3.4–14.6) |
a The phase given in the data is used.
b The phase is inferred using ZAPLO (selection of the most likely haplotypes).
c The phase is inferred using ZAPLO and multiple imputation.
d Only one site has a V> 0 and it is a true DS site.
Figure 1Power to identify one of the two susceptibility sites for different rates of missing data. Missing data and missing phases are reconstructed using a multiple imputation method (in red) or the most likely haplotypes obtained with ZAPLO (in black). The percentage of replicates in which the site with the highest Vi is one of the simulated DS sites is shown according to the properties of the second-best site (if any): i) no second-best site is identified with a V> 0 (striped bars); ii) the second-best site is a DS site (open bars); iii) the second-best site is not a DS site (colored bars).
Figure 2Power to identify the two susceptibility sites for different rates of missing data. Missing data and missing phases are reconstructed using a multiple imputation method (in red) or the most likely haplotypes obtained with ZAPLO (in black). The percentage of replicates in which the two sites with the highest Vi values are DR and locus C are reported in the two situations in which there are other sites with V> 0 (open bars) or there is no other site with V> 0 (colored bars).
Figure 3Error in the identification of the susceptibilityloci for different rates of missing data. Missing data and missing phases are reconstructed using a multiple imputation method (in red) or the most likely haplotypes obtained with ZAPLO (in black). Colored bars: the best site (with the highest V) is neither DR nor locus C. Empty bars: sum of two error rates, error on the best site and error on the second best site only (i.e., the site with the highest Vi is either locus C or DR, but the site with the second highest Vis neither locus C nor DR).