| Literature DB >> 15601529 |
Laurent Excoffier1, Guillaume Laval, David Balding.
Abstract
The authors present ELB, an easy to programme and computationally fast algorithm for inferring gametic phase in population samples of multilocus genotypes. Phase updates are made on the basis of a window of neighbouring loci, and the window size varies according to the local level of linkage disequilibrium. Thus, ELB is particularly well suited to problems involving many loci and/or relatively large genomic regions, including those with variable recombination rate. The authors have simulated population samples of single nucleotide polymorphism genotypes with varying levels of recombination and marker density, and find that ELB provides better local estimation of gametic phase than the PHASE or HTYPER programs, while its global accuracy is broadly similar. The relative improvement in local accuracy increases both with increasing recombination and with increasing marker density. Short tandem repeat (STR, or microsatellite) simulation studies demonstrate ELB's superiority over PHASE both globally and locally. Missing data are handled by ELB; simulations show that phase recovery is virtually unaffected by up to 2 per cent of missing data, but that phase estimation is noticeably impaired beyond this amount. The authors also applied ELB to datasets obtained from random pairings of 42 human X chromosomes typed at 97 diallelic markers in a 200 kb low-recombination region. Once again, they found ELB to have consistently better local accuracy than PHASE or HTYPER, while its global accuracy was close to the best.Entities:
Mesh:
Substances:
Year: 2003 PMID: 15601529 PMCID: PMC3525008 DOI: 10.1186/1479-7364-1-1-7
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Figure 1Mean global and local accuracy of ELB, PHASE (Ver. 1) and HTYPER algorithms when inferring gametic phase in 100 simulated single nucleotide polymorphism datasets. The lines at the top of each histogram bar show the standard error of the mean. PHASE was run with: burn-in 5,000 steps; thinning interval 100; number of samples 5,000. ELB was run with: burn-in 400,000 steps; thinning interval 1,000; number of samples 2,000; α = 0.01; ε = 0.01; γ = 0.01: HTYPER results are those reported after 20 independent runs, as recommended by its authors.
Figure 4Global and local accuracies of ELB, PHASE (Ver. 1) and HTYPER in each of 100 datasets obtained by random pairings of 42 human male X chromosomes typed at 97 diallelic polymorphisms (predominantly single nucleotide polymorphisms). Datasets have been sorted by increasing values given by the ELB algorithm, separately for global and local accuracies. The mean values are reported within parentheses for each algorithm. ELB parameters: burn-in 300,000 steps; thinning interval 200; number of samples 10,000; α = 0.01; ε = 0.01; γ = 0.01. The one missing value for PHASE corresponds to an unexplained program crash. HTYPER results are those reported after 20 independent runs, as recommended by its authors.
Properties of simulated samples
| Case | Data type | ||||
|---|---|---|---|---|---|
| 1 | 5 | 40 | 25 [14-39] | 4.8 | |
| 2 | 5 | 100 | 25 [13-44] | 4.8 | |
| 3 | 5 | 200 | 25 [10-38] | 4.9 | |
| 4 | 10 | 40 | 49 [33-69] | 9.9 | |
| 5 | 10 | 100 | 48 [31-70] | 9.6 | |
| 6 | 10 | 200 | 48 [30-61] | 9.6 | |
| 7 | 20 | 40 | 90 [65-127] | 18.3 | |
| 8 | 20 | 100 | 90 [65-109] | 18.7 | |
| 9 | 20 | 200 | 89 [65-119] | 18.5 | |
| 10 | 40 | 10 | 7.8 | ||
| 11 | 100 | 10 | 7.9 | ||
| 12 | 200 | 10 | 7.8 | ||
| 13 | 40 | 20 | 15.7 | ||
| 14 | 100 | 20 | 15.6 | ||
| 15 | 200 | 20 | 15.6 | ||
| 16 | 40 | 50 | 39.1 | ||
| 17 | 100 | 50 | 39.1 | ||
| 18 | 200 | 50 | 39.1 |
1. All simulations were performed in stationary random-mating populations. Samples consisted in 50 diploid individuals.
2. θ = 4Nu where N is the population size and u is the mutation rate per generation for the whole chromosomal segment.
3. R = 4Nr where r is the recombination rate for the whole chromosomal segment.
4. L is the number of polymorphic sites in the sample. For SNPs, we report the average number among 100 replicates, as well as the minimum and maximum numbers in brackets.
5. π is the average number of discordant sites between two gametes.
Figure 2Mean (and standard error of the mean) global and local accuracy of ELB and PHASE (Ver. 1) algorithms when inferring gametic phase in 100 simulated short tandem repeat (microsatellite) datasets. PHASE was run with: burn-in 5,000 steps; thinning interval 100; number of samples 5,000. ELB was run with: burn-in 400,000 steps; thinning interval 1,000; number of samples 2,000; α = 0.01; ε = 0.1; γ = 0. HTYPER results are those reported after 20 independent runs, as recommended by its authors.
Figure 3Mean (and standard error of the mean) global and local accuracy of ELB for single nucleotide polymorphism genotypes with varying amounts of missing data. (A) Data are missing at a uniform rate across all individuals. (B) Data are missing at a uniform rate among only ten individuals out of 50.