| Literature DB >> 19812730 |
Allen G Rodrigo1, Peter Tsai, Helen Shearman.
Abstract
Coalescent-based Bayesian Markov chain Monte Carlo (MCMC) inference generates estimates of evolutionary parameters and their posterior probability distributions. As the number of sequences increases, the length of time taken to complete an MCMC analysis increases as well. Here, we investigate an approach to distribute the MCMC analysis across a cluster of computers. To do this, we use bootstrapped topologies as fixed genealogies, perform a single MCMC analysis on each genealogy without topological rearrangements, and pool the results across all MCMC analyses. We show, through simulations, that although the standard MCMC performs better than the bootstrap-MCMC at estimating the effective population size (scaled by mutation rate), the bootstrap-MCMC returns better estimates of growth rates. Additionally, we find that our bootstrap-MCMC analyses are, on average, 37 times faster for equivalent effective sample sizes.Entities:
Keywords: Bayesian inference; MCMC; bootstrap; coalescent; effective population size
Year: 2009 PMID: 19812730 PMCID: PMC2747130 DOI: 10.4137/ebo.s2765
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Parameter estimated from sequences under constant growth rate using both bootstrap-MCMC and standard-MCMC. The true value of Θ is Nμ = 0.15.
| Simulation | Est. θ boot.— Mean (Median) | Est. θ Full— Mean (Median) | θ 95% HPD—Bootstrap (Standard) | Post. ESS—Bootstrap (Standard) |
|---|---|---|---|---|
| 70 Sequences 0 | 0.171 (0.169) | 0.167 (0.165) | 0.130, 0.217 (0.126, 0.210) | 17650 (22910) |
| 70 Sequences 1 | 0.157 (0.155) | 0.143 (0.142) | 0.118, 0.199 (0.108, 0.181) | 29820 (32180) |
| 70 Sequences 2 | 0.166 (0.164) | 0.150 (0.148) | 0.126, 0.210 (0.114, 0.189) | 26390 (23610) |
| 70 Sequences 3 | 0.175 (0.173) | 0.141 (0.140) | 0.125, 0.228 (0.108, 0.179) | 13760 (11630) |
| 70 Sequences 4 | 0.196 (0.194) | 0.175 (0.173) | 0.148, 0.248 (0.131, 0.218) | 3002 (3648) |
| 70 Sequences 5 | 0.169 (0.166) | 0.146 (0.145) | 0.123, 0.220 (0.111, 0.185) | 25470 (25110) |
| 70 Sequences 6 | 0.155 (0.153) | 0.154 (0.152) | 0.117, 0.194 (0.117, 0.194) | 42420 (36840) |
| 70 Sequences 7 | 0.125 (0.124) | 0.124 (0.122) | 0.095, 0.159 (0.093, 0.156) | 31040 (33920) |
| 70 Sequences 8 | 0.130 (0.128) | 0.128 (0.126) | 0.098, 0.164 (0.097, 0.162) | 40670 (38470) |
| 70 Sequences 9 | 0.158 (0.156) | 0.149 (0.147) | 0.117, 0.199 (0.111, 0.186) | 35480 (34940) |
| 140 Sequences 0 | 0.153 (0.152) | 0.147 (0.146) | 0.125, 0.182 (0.120, 0.175) | 25550 (27850) |
| 140 Sequences 1 | 0.141 (0.140) | 0.119 (0.118) | 0.112, 0.172 (0.097, 0.142) | 12230 (13000) |
| 140 Sequences 2 | 0.151 (0.150) | 0.145 (0.145) | 0.124, 0.181 (0.119, 0.172) | 26640 (22830) |
| 140 Sequences 3 | 0.191 (0.189) | 0.169 (0.168) | 0.154, 0.228 (0.139, 0.201) | 17660 (18690) |
| 140 Sequences 4 | 0.158 (0.157) | 0.153 (0.152) | 0.129, 0.189 (0.125, 0.182) | 26390 (27460) |
| 140 Sequences 5 | 0.133 (0.132) | 0.128 (0.127) | 0.108, 0.160 (0.105, 0.153) | 22860 (19510) |
| 140 Sequences 6 | 0.171 (0.170) | 0.135 (0.135) | 0.134, 0.209 (0.112, 0.162) | 8927 (9467) |
| 140 Sequences 7 | 0.180 (0.178) | 0.159 (0.158) | 0.146, 0.217 (0.129, 0.189) | 16210 (16420) |
| 140 Sequences 8 | 0.187 (0.185) | 0.174 (0.173) | 0.151, 0.225 (0.144, 0.207) | 10200 (10780) |
| 140 Sequences 9 | 0.172 (0.171) | 0.152 (0.151) | 0.140, 0.208 (0.123, 0.180) | 10240 (11550) |
| 210 Sequences 0 | 0.176 (0.175) | 0.150 (0.150) | 0.147, 0.206 (0.128, 0.175) | 4032 (3953) |
| 210 Sequences 1 | 0.159 (0.158) | 0.147 (0.146) | 0.134, 0.185 (0.124, 0.170) | 14850 (23130) |
| 210 Sequences 2 | 0.174 (0.172) | 0.147 (0.147) | 0.141, 0.211 (0.125, 0.171) | 8089 (8350) |
| 210 Sequences 3 | 0.159 (0.158) | 0.150 (0.149) | 0.134, 0.185 (0.127, 0.174) | 16700 (25630) |
| 210 Sequences 4 | 0.186 (0.185) | 0.174 (0.173) | 0.156, 0.215 (0.146, 0.200) | 3325 (3984) |
| 210 Sequences 5 | 0.160 (0.158) | 0.142 (0.141) | 0.129, 0.196 (0.121, 0.165) | 14150 (14650) |
| 210 Sequences 6 | 0.168 (0.167) | 0.159 (0.159) | 0.142, 0.195 (0.134, 0.183) | 16160 (15320) |
| 210 Sequences 7 | 0.166 (0.165) | 0.158 (0.158) | 0.140, 0.193 (0.134, 0.185) | 17630 (18850) |
| 210 Sequences 8 | 0.180 (0.179) | 0.163 (0.162) | 0.152, 0.209 (0.139, 0.188) | 15220 (18880) |
| 210 Sequences 9 | 0.160 (0.159) | 0.151 (0.151) | 0.135, 0.186 (0.127, 0.174) | 17120 (16750) |
Parameters estimated from sequences under exponential growth rate using both bootstrap-MCMC and standard-MCMC. The true value of Θ is Nμ = 0.15, and the true value of G is Ng = 500.
| Simulation | Est. θ boot.— Mean (Median) | Est. θ Full— Mean (Median) | Est. G. boot— Mean (Median) | Est. G. Full— Mean (Median) | θ 95% HPD—Bootstrap (Standard) | G. 95% HPD— Bootstrap (Standard) | Post. ESS—Bootstrap (Standard) |
|---|---|---|---|---|---|---|---|
| 70 GSequences 0 | 0.181 (0.168) | 0.132 (0.126) | 434.679 (425.746) | 370.959 (366.815) | 0.085, 0.311 (0.071, 0.201) | 251.860, 631.251 (221.913, 520.012) | 16794 (12050) |
| 70 GSequences 1 | 0.254 (0.240) | 0.202 (0.193) | 434.812 (430.352) | 397.602 (394.314) | 0.121, 0.411 (0.107, 0.311) | 285.788, 597.187 (262.828, 534.213) | 14380 (11520) |
| 70 GSequences 2 | 0.238 (0.224) | 0.201 (0.191) | 484.164 (478.187) | 465.142 (461.059) | 0.113, 0.391 (0.103, 0.320) | 319.461, 667.636 (301.304, 632.454) | 27940 (32550) |
| 70 GSequences 3 | 0.174 (0.165) | 0.141 (0.135) | 380.629 (376.169) | 352.246 (348.381) | 0.086, 0.282 (0.077, 0.215) | 232.665, 540.039 (216.199, 494.499) | 30690 (37260) |
| 70 GSequences 4 | 0.279 (0.262) | 0.234 (0.222) | 494.104 (489.128) | 471.440 (467.070) | 0.132, 0.468 (0.118, 0.376) | 329.325, 672.809 (311.627, 636.543) | 24700 (18650) |
| 70 GSequences 5 | 0.236 (0.222) | 0.189 (0.180) | 499.410 (493.513) | 465.967 (460.956) | 0.114, 0.393 (0.098, 0.303) | 323.513, 690.004 (305.06, 641.080) | 26370 (24340) |
| 70 GSequences 6 | 0.192 (0.183) | 0.179 (0.171) | 422.840 (418.716) | 429.507 (425.558) | 0.092, 0.303 (0.094, 0.278) | 270.336, 582.810 (284.913, 591.731) | 12920 (11840) |
| 70 GSequences 7 | 0.214 (0.201) | 0.176 (0.168) | 438.542 (432.191) | 406.301 (401.038) | 0.101, 0.359 (0.093, 0.276) | 263.157, 616.865 (250.038, 566.817) | 16480 (12074) |
| 70 GSequences 8 | 0.336 (0.307) | 0.236 (0.223) | 585.284 (577.157) | 512.506 (506.613) | 0.133, 0.606 (0.113, 0.385) | 371.421, 809.052 (330.425, 692.880) | 10490 (9643) |
| 70 GSequences 9 | 0.252 (0.239) | 0.207 (0.199) | 416.132 (412.133) | 392.367 (388.522) | 0.126, 0.404 (0.109, 0.317) | 273.342, 561.178 (262.454, 527.924) | 13290 (10250) |
| 140 GSequences 0 | 0.227 (0.220) | 0.161 (0.158) | 433.741 (429.693) | 360.669 (358.054) | 0.142, 0.327 (0.112, 0.217) | 294.335, 576.537 (250.416, 479.588) | 6995 (6744) |
| 140 GSequences 1 | 0.228 (0.222) | 0.160 (0.158) | 436.595 (432.981) | 361.241 (358.454) | 0.246, 0.326 (0.109, 0.217) | 299.991, 577.365 (248.455, 480.345) | 7989 (8188) |
| 140 GSequences 2 | 0.240 (0.233) | 0.167 (0.164) | 473.610 (469.566) | 384.907 (382.058) | 0.148, 0.346 (0.116, 0.229) | 330.771, 622.735 (270.311, 507.224) | 6509 (6968) |
| 140 GSequences 3 | 0.254 (0.248) | 0.200 (0.196) | 448.674 (445.572) | 409.473 (406.482) | 0.162, 0.360 (0.135, 0.273) | 321.012, 582.177 (290.712, 533.648) | 13600 (15300) |
| 140 GSequences 4 | 0.199 (0.194) | 0.157 (0.154) | 365.991 (363.410) | 323.510 (321.658) | 0.131, 0.276 (0.109, 0.210) | 254.410, 483.394 (221.039, 426.570) | 14790 (16150) |
| 140 GSequences 5 | 0.192 (0.188) | 0.155 (0.152) | 318.666 (316.574) | 285.561 (283.529) | 0.130, 0.262 (0.109, 0.204) | 221.148, 419.749 (195.243, 376.003) | 7522 (7017) |
| 140 GSequences 6 | 0.160 (0.157) | 0.127 (0.125) | 354.315 (350.023) | 318.985 (316.645) | 0.106, 0.223 (0.085, 0.171) | 230.170, 480.825 (212.019, 431.861) | 9345 (13690) |
| 140 GSequences 7 | 0.218 (0.212) | 0.166 (0.163) | 367.546 (365.015) | 323.169 (321.146) | 0.137, 0.306 (0.116, 0.222) | 251.669, 493.619 (226.859, 422.724) | 7939 (10050) |
| 140 GSequences 8 | 0.265 (0.257) | 0.197 (0.193) | 421.904 (418.548) | 366.210 (363.716) | 0.163, 0.379 (0.132, 0.268) | 293.196, 549.427 (259.492, 480.724) | 13890 (15680) |
| 140 GSequences 9 | 0.245 (0.239) | 0.179 (0.175) | 418.412 (415.02) | 347.147 (344.787) | 0.155, 0.348 (0.122, 0.241) | 292.941, 547.342 (239.858, 454.721) | 12920 (14450) |
| 210 GSequences 0 | 0.256 (0.251) | 0.182 (0.180) | 482.733 (479.288) | 407.266 (404.506) | 0.173, 0.346 (0.135, 0.237) | 354.561, 622.448 (294.541, 523.521) | 5947 (7109) |
| 210 GSequences 1 | 0.225 (0.219) | 0.154 (0.153) | 387.966 (384.962) | 314.526 (313.176) | 0.149, 0.308 (0.115, 0.196) | 259.056, 526.567 (223.560, 412.111) | 3683 (3272) |
| 210 GSequences 2 | 0.199 (0.195) | 0.150 (0.148) | 336.966 (334.123) | 289.065 (287.505) | 0.142, 0.264 (0.114, 0.190) | 231.906, 440.254 (205.937, 367.878) | 6124 (6282) |
| 210 GSequences 3 | 0.155 (0.153) | 0.124 (0.123) | 293.805 (291.854) | 262.915 (261.456) | 0.113, 0.200 (0.094, 0.156) | 291.854, 390.218 (179.046, 350.422) | 11250 (11110) |
| 210 GSequences 4 | 0.207 (0.202) | 0.140 (0.138) | 465.902 (462.193) | 367.540 (365.828) | 0.139, 0.285 (0.103, 0.179) | 323.570, 616.338 (256.036, 481.543) | 8031 (7640) |
| 210 GSequences 5 | 0.239 (0.232) | 0.166 (0.163) | 544.848 (540.033) | 476.366 (473.416) | 0.149, 0.341 (0.120, 0.216) | 375.516, 724.607 (342.172, 610.708) | 3740 (4139) |
| 210 GSequences 6 | 0.273 (0.267) | 0.207 (0.205) | 473.015 (470.211) | 424.283 (422.115) | 0.187, 0.368 (0.150, 0.268) | 345.418, 602.284 (313.863, 540.195) | 9274 (7088) |
| 210 GSequences 7 | 0.294 (0.287) | 0.200 (0.187) | 530.482 (527.17) | 442.856 (440.409) | 0.195, 0.407 (0.144, 0.259) | 384.004, 682.460 (322.246, 565.085) | 7618 (7488) |
| 210 GSequences 8 | 0.205 (0.200) | 0.151 (0.150) | 390.942 (386.986) | 333.692 (332.128) | 0.138, 0.280 (0.113, 0.193) | 267.469, 525.530 (235.079, 434.877) | 6268 (5803) |
| 210 GSequences 9 | 0.199 (0.195) | 0.144 (0.142) | 364.931 (362.487) | 294.125 (292.573) | 0.139, 0.264 (0.106, 0.180) | 255.888, 475.197 (204.590, 385.250) | 7895 (8876) |
Figure 1Posterior distribution from bootstrap-MCMC and standard-MCMC. Example of the log-posterior probability distribution from both bootstrap- MCMC (top) and standard-MCMC (below) obtained with 210 sequences simulated with a constant population size. Note also the difference in scales of the horizontal axes.