| Literature DB >> 22006673 |
Kimmo Palin1, Harry Campbell, Alan F Wright, James F Wilson, Richard Durbin.
Abstract
Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identical-By-Descent (IBD). We apply the Bayesian network based model in a new phasing algorithm, called systematic long-range phasing (SLRP), that can capitalize on the close genetic relationships in isolated founder populations, and show with simulated and real genome-wide genotype data that SLRP substantially reduces the rate of phasing errors compared to previous phasing algorithms. Furthermore, the method accurately identifies regions of IBD, enabling linkage-like studies without pedigrees, and can be used to impute most genotypes with very low error rate.Entities:
Mesh:
Year: 2011 PMID: 22006673 PMCID: PMC3368215 DOI: 10.1002/gepi.20635
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Fig. 1Bayesian network for the SLRP model of haplotype phasing and IBD inference. The observed genotype of an individual a at marker j is in variable , which depends on the diplotype . Variable indicates the type of IBD between a pair of individuals a and b at the marker j. IBD, identity-by-descent; SLRP, systematic long-range phasing.
Mean number of switch errors per Morgan on simulated chromosome 20 data at sites where SLRP calls a phase
| Perfect | 1% Missing | 5% Missing | 0.2% Errors | 2% Errors | |
|---|---|---|---|---|---|
| Beagle 3.0.4 | 63.5 | 68.3 | 98.5 | 73.1 | 182.0 |
| Mach1 | 11.9 | 11.8 | 12.4 | 13.3 | 29.0 |
| SLRP | 2.7 | 2.8 | 3.4 | 5.1 | 28.8 |
| SLRP yield (%) | 92.8 | 92.7 | 91.9 | 87.8 | 45.3 |
| SLRP within phased segments | 1.7 | 1.7 | 2.1 | 3.6 | 21.0 |
SLRP, systematic long-range phasing.
Median false discovery rate and sensitivity for detecting IBD on simulated chromosome 20 data
| Perfect (%) | 1% Missing (%) | 5% Missing (%) | 0.2% Errors (%) | 2% Errors (%) | ||
|---|---|---|---|---|---|---|
| Beagle | FDR | 1.0 | 1.1 | 1.3 | 1.1 | 1.5 |
| Beagle fastIBD | FDR | 8.4 | 8.4 | 8.7 | 8.3 | 6.7 |
| Germline | FDR | 10.0 | 9.9 | 9.5 | 9.5 | 6.0 |
| SLRP | FDR | 1.5 | 1.5 | 1.5 | 1.3 | 1.7 |
| Beagle | Sensitivity | 12.1 | 13.2 | 22.0 | 19.3 | 13.4 |
| Beagle fastIBD | Sensitivity | 87.1 | 87.0 | 87.0 | 85.8 | 71.6 |
| Germline | Sensitivity | 69.8 | 68.7 | 62.6 | 65.4 | 10.1 |
| SLRP | Sensitivity | 68.5 | 68.3 | 67.6 | 55.0 | 17.6 |
SLRP, systematic long-range phasing; FDR, false-discovery rate; IBD, identity-by-descent.
Output genotype error rate on simulated chromosome 20 data at sites where SLRP imputes
| 1% Missing (%) | 5% Missing (%) | |
|---|---|---|
| Beagle 3.0.4 | 4.04 | 5.34 |
| Mach1 | 1.65 | 1.85 |
| SLRP | 0.09 | 0.11 |
| Yield for alleles | 76 | 74 |
SLRP, systematic long-range phasing.
Switch errors per Morgan on the ORCADES data over all chromosomes
| Full data | Distantly related | |
|---|---|---|
| Beagle 3.0.4 | 23.5 | 62.5 |
| Mach1 | 17.2 | 23.3 |
| SLRP | 3.6 | 3.8 |
| SLRP yield (%) | 92 | 74 |
SLRP, systematic long-range phasing; ORCADES, Orkney Complex Disease Study.