| Literature DB >> 16539698 |
Abstract
BACKGROUND: The amount of genome-wide molecular data is increasing rapidly, as is interest in developing methods appropriate for such data. There is a consequent increasing need for methods that are able to efficiently simulate such data. In this paper we implement the sequentially Markovian coalescent algorithm described by McVean and Cardin and present a further modification to that algorithm which slightly improves the closeness of the approximation to the full coalescent model. The algorithm ignores a class of recombination events known to affect the behavior of the genealogy of the sample, but which do not appear to affect the behavior of generated samples to any substantial degree.Entities:
Mesh:
Year: 2006 PMID: 16539698 PMCID: PMC1458357 DOI: 10.1186/1471-2156-7-16
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1The various categories of recombination. Illustration of the different types of recombinations. Ancestral material is shown as solid red lines, while non-ancestral material is shown as red-dotted lines. Locations of recombinations are shown below and to the left of the recombination event. Type of recombination is indicated with a blue numeral above the event.
Figure 2Illustration of . This figure shows how the algorithm forms the next tree along the chromosome, moving from left-to-right, given the state of the current tree.
Mean Height of ith tree for SMC. We show the mean TMRCA for the ith tree along the chromosome, when it exists, as a function of the recombination rate. Data was simulated for a sample size of n = 2. Results are given for ms, SMC and SMC'.
| ms | SMC | SMC' | ms | SMC | SMC' | |
| 1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 2 | 1.68 | 1.75 | 1.68 | 1.41 | 1.51 | 1.41 |
| 3 | 2.06 | 2.11 | 2.08 | 1.63 | 1.76 | 1.64 |
| 4 | 2.39 | 2.33 | 2.40 | 1.75 | 1.88 | 1.77 |
| 5 | 2.68 | 2.47 | 2.66 | 1.81 | 1.94 | 1.85 |
| 6 | 2.95 | 2.57 | 2.90 | 1.87 | 1.97 | 1.90 |
| 7 | 3.19 | 2.65 | 3.12 | 1.89 | 1.99 | 1.93 |
| 8 | 3.44 | 2.72 | 3.33 | 1.91 | 1.99 | 1.95 |
| 9 | 3.67 | 2.79 | 3.54 | 1.93 | 1.99 | 1.97 |
| 10 | 3.92 | 2.84 | 3.69 | 1.94 | 2.00 | 1.98 |
| last tree | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Mean Height of ith tree for SMC. We show the mean TMRCA for the ith tree along the chromosome, when it exists, for a sample size of n = 20. Results are given for ms, SMC and SMC'
| ms | SMC | SMC' | |
| 1 | 1.90 | 1.90 | 1.90 |
| 2 | 1.96 | 1.99 | 1.96 |
| 3 | 1.99 | 2.05 | 2.01 |
| 4 | 2.05 | 2.10 | 2.05 |
| 5 | 2.06 | 2.14 | 2.08 |
| 6 | 2.09 | 2.16 | 2.11 |
| 7 | 2.11 | 2.19 | 2.13 |
| 8 | 2.12 | 2.20 | 2.15 |
| 9 | 2.13 | 2.22 | 2.16 |
| 10 | 2.15 | 2.22 | 2.18 |
| last tree | 1.90 | 1.90 | 1.90 |
Run-times. Average time per simulation, as a function of sample size n, based on 20 trials, assuming θ = 10-4/bp and ρ = 5 * 10-4/bp. Simulations were run on a 2.8 GHz Intel Xeon processor. Dashes correspond to simulations that could not be completed because they required too much (> 3 GB RAM) memory.
| Length (Mb) | SMC | ms | |
| 1000 | 2 | 0.9 | 7.2 |
| 5 | 2.1 | 62.6 | |
| 10 | 4.3 | 473.6 | |
| 20 | 8.3 | 6459.6 | |
| 50 | 20.9 | - | |
| 100 | 41.6 | - | |
| 200 | 83.9 | - | |
| 4000 | 2 | 4.0 | 10.6 |
| 5 | 10.4 | - | |
| 10 | 22.2 | - | |
| 20 | 40.7 | - | |
| 50 | 105.8 | - | |
| 100 | 201.5 | - | |
| 200 | 406.1 | - | |
Figure 3Decay of . This figure shows how r2 decays as a function of distance for both the SMC and SMC' algorithm and for an exact coalescent model (simulated using ms). Data was simulated for a 2 Mb region and a sample size of n = 20.
Mean of LD summaries. We show results for the mean value of summaries of LD behavior for ms and the SMC and SMC' algorithms. We simulated a 2 Mb region for a sample of size 500. Numbers shown are averaged over 1000 replicates. Markers with MAF less than 0.05 were excluded.
| Statistic | ms | SMC | SMC' |
| 404 | 404 | 404 | |
| 236 | 232 | 236 | |
| % seq. in hap. blocks | 41.1 | 41.2 | 41.2 |