| Literature DB >> 22044581 |
Liang Liu1, Lili Yu, Venugopal Kalavacharla, Zhanji Liu.
Abstract
BACKGROUND: A birth and death process is frequently used for modeling the size of a gene family that may vary along the branches of a phylogenetic tree. Under the birth and death model, maximum likelihood methods have been developed to estimate the birth and death rate and the sizes of ancient gene families (numbers of gene copies at the internodes of the phylogenetic tree). This paper aims to provide a Bayesian approach for estimating parameters in the birth and death model.Entities:
Mesh:
Year: 2011 PMID: 22044581 PMCID: PMC3774087 DOI: 10.1186/1471-2105-12-426
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A birth and death process along the lineages of a phylogenetic tree. The branch lengths ts of the phylogenetic tree are given in millions of years. In the phylogenetic tree, (x1, x2, x3, x4, x5) are the sizes of gene family i for species 1, 2, 3, 4, and 5, while (θ, θ, θ, θ,) are the sizes of the internal nodes for gene family i.
Figure 2The phylogenetic tree used in the simulation study.
Figure 3Simulation results. The estimation errors of the Bayesian and ML estimates of λ are calculated for the simulations with a) λ = 0.001, b) λ = 0.005, and c) λ = 0.01. The proportion of trials yielding the true unlikely gene family is reported when the unlikely gene family is simulated with d) a fast birth and death rate or with e) a slow birth and death rate.
The estimation error of the proportions of gene families that showed expansions, contractions, and no change.
| λ = 0.001 | λ = 0.005 | λ = 0.01 | ||||
|---|---|---|---|---|---|---|
| # of gene families | Bayesian | CAFE | Bayesian | CAFE | Bayesian | CAFE |
| 20 | 0.07 | 0.138 | 0.088 | 0.184 | 0.089 | 0.214 |
| 40 | 0.048 | 0.105 | 0.062 | 0.148 | 0.063 | 0.179 |
| 60 | 0.032 | 0.089 | 0.051 | 0.134 | 0.052 | 0.170 |
| 80 | 0.032 | 0.084 | 0.045 | 0.130 | 0.045 | 0.170 |
| 100 | 0.03 | 0.081 | 0.039 | 0.126 | 0.032 | 0.164 |
Gene family data were simulated from the birth and death model with λ = 0.001, 0.005, 0.01 respectively. The Bayesian model and CAFE were then applied to the simulated data to estimate the proportions of gene families that showed expansions, contractions, or no change. The estimation error is equal to the square root of the mean squared error of the estimated proportions of expansions, contractions, and no change. In general, the estimation error decreases as the number of gene families increases.
Figure 4The estimates of the birth and death rates on the branches of the phylogenetic tree for 1254 gene families of the five yeast species. The birth and death rates were estimated under the Bayesian heterogeneous rate model. The interval on each branch is the 95% credible interval for the birth and death rate λ. The branch lengths t in the tree are given in millions of years [4]. The branch numbers are highlighted in red.
The Bayesian estimates of the numbers of gene families in the reduced yeast dataset (1257 gene families) that showed expansions, no change, or contractions on the eight branches of the phylogenetic tree in Fig.4.
| Branch number | Expansions | No change | Contractions |
|---|---|---|---|
| 1 (t = 12) | 84 | 1120 | 50 |
| 2 (t = 12) | 48 | 1129 | 77 |
| 3 (t = 22) | 616 | 510 | 128 |
| 4 (t = 27) | 496 | 635 | 123 |
| 5 (t = 32) | 51 | 1107 | 96 |
| 6 (t = 10) | 36 | 1126 | 92 |
| 7 (t = 5) | 3 | 1146 | 5 |
| 8 (t = 5) | 50 | 1134 | 70 |
Numbers in the first column are the branch numbers highlighted in Fig. 4.
The most unlikely gene families identified by the Bayesian hypothesis test.
| Family ID | Gene family | PPP |
|---|---|---|
| 3 | (2 (8 (15 (34 83)))) | 0.000 |
| 18 | (17 (14 (15 (1 5)))) | 0.000 |
| 28 | (1 (3 (3 (2 34)))) | 0.000 |
| 13 | (7 (16 (7 (20 17)))) | 0.002 |
| 34 | (5 (11 (14 (4 2)))) | 0.003 |
| 6 | (15 (33 (24 (30 31)))) | 0.004 |
| 397 | (1 (1 (2 (1 5)))) | 0.006 |
| 77 | (2 (5 (4 (7 4)))) | 0.019 |
| 256 | (1 (2 (7 (1 1)))) | 0.019 |
| 89 | (2 (9 (4 (2 2)))) | 0.021 |
| 262 | (1 (4 (4 (1 1)))) | 0.025 |
The computational time (seconds) for running the Bayesian analysis (10000 iterations) on a Lenovo notebook T61 (Intel 2 Duo CPU, 2.4 GHz, 2.48 GB of RAM).
| number of gene families | 5 species | 10 species | 20 species |
|---|---|---|---|
| 10 | 11 | 22 | 42 |
| 20 | 20 | 38 | 52 |
| 40 | 40 | 64 | 104 |