| Literature DB >> 15679890 |
Frank Grønlund Jørgensen1, Asger Hobolth, Henrik Hornshøj, Christian Bendixen, Merete Fredholm, Mikkel Heide Schierup.
Abstract
BACKGROUND: The availability of abundant sequence data from key model organisms has made large scale studies of molecular evolution an exciting possibility. Here we use full length cDNA alignments comprising more than 700,000 nucleotides from human, mouse, pig and the Japanese pufferfish Fugu rubrices in order to investigate 1) the relationships between three major lineages of mammals: rodents, artiodactyls and primates, and 2) the rate of evolution and the occurrence of positive Darwinian selection using codon based models of sequence evolution.Entities:
Mesh:
Year: 2005 PMID: 15679890 PMCID: PMC549206 DOI: 10.1186/1741-7007-3-2
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Overview of the codon models used in the analyses.
| M0: One Ratio | 5 | κ, τpig, τhuman, τmouse, ω |
| M1a: Free Ratio | 7 | κ, τpig, τhuman, τmouse, ωpig, ωhuman, ωmouse |
| M1b: Neutral | 6 | κ, τpig, τhuman, τmouse, |
| M2a: Model A | 8 | κ, τpig, τhuman, τmouse, |
The parameters used are (κ) transition / transversion ratio, (τ) branch length, (ω) dN/dS ratio, (p) fraction of codons that fall into the specified ω category.
Figure 1Phylogenetic tree of key mammalian species. A schematic drawing showing the topologies considered in our study compared to a recent study on human, chimpanzee and mouse trios [23]. Branch a shows the branch considered in the study by Clark et al (2003) while branch a+b represents the evolutionary time scale studied here.
Figure 2Distribution of sequence alignment lengths. Histogram showing the distribution of sequence lengths in the three species alignments.
Figure 3Conflicting mammalian phylogenies. A schematic drawing of the three conflicting bifurcating topologies (a-c) as well as a multifurcating alternative (d). The divergence times shown in (a) are million years from present [31].
Comparison of topologies.
| A | 245 | -921354 | 0.0227 | 0.0280 | 0.0554 | 0.0083 | 0.3294 |
| B | 440 | -920090 | 0.0292 | 0.0281 | 0.0403 | 0.0171 | 0.3229 |
| C | 180 | -921703 | 0.0292 | 0.0241 | 0.0555 | 0.0055 | 0.3304 |
| A | 215 | -570181 | 0.0189 | 0.0235 | 0.0524 | 0.0088 | 0.2692 |
| B | 386 | -568900 | 0.0265 | 0.0237 | 0.0341 | 0.0195 | 0.2600 |
| C | 208 | -570504 | 0.0264 | 0.0190 | 0.0525 | 0.0058 | 0.2708 |
| A | 215 | -498689 | 0.0124 | 0.0156 | 0.0323 | 0.0053 | 0.1680 |
| B | 545 | -498005 | 0.0167 | 0.0157 | 0.0229 | 0.0102 | 0.1642 |
| C | 175 | -498925 | 0.0167 | 0.0130 | 0.0324 | 0.0034 | 0.1687 |
Top (A-C) refers to the three different topologies shown in Figure 3a–c. No. genes is the number of individual genes that favour each topology. The likelihood and the branch lengths shown are based on the concatenated super gene of the 988 individual four-species alignments; the average values of the branch lengths from the individual genes are highly similar to these results. Branch lengths are shown in number of substitutions per site.
The rates of evolution.
| Human | 0.115 | 0.017 | 0.126 | 0.118 | 0.003 | 0.018 | 0.010 | 0.0006 |
| Pig | 0.165 | 0.020 | 0.183 | 0.176 | 0.006 | 0.020 | 0.011 | 0.0005 |
| Mouse | 0.329 | 0.035 | 0.365 | 0.360 | 0.015 | 0.035 | 0.023 | 0.0013 |
Estimated rates of evolution on the super gene and the individual alignments. (dS) synonymous substitutions per codon, (dN) nonsynonymous substitutions per codon.
Figure 4Evolutionary rates. Histograms of key parameters in the codon models. (a-c) The rate of synonymous substitutions per synonymous site (dS) in the pig, human and mouse lineage respectively. (d-f) The rate of nonsynonymous substitutions per nonsynonymous site (dN) in the pig, human and mouse lineage respectively. (g-h) The ratio of nonsynonymous substitutions to synonymous substitutions (dN/dS ratio) in the pig, human and mouse lineage respectively. The horizontal line represents the mean of the distributions.
Genes where all branches have ω > 1 based on the one ratio model.
| NM_031268 | 72 | 0.096 | 0.102 | 0.128 | 2.844 | 1.481 | 5.7 | 1.3 | 6.0 | 1.3 | 7.6 | 1.6 |
| NM_032353 | 97 | 0.105 | 0.269 | 0.104 | 3.183 | 1.206 | 7.6 | 2.6 | 19.5 | 6.6 | 7.5 | 2.6 |
| XM_165930a | 102 | 0.231 | 0.370 | 0.162 | 8.593 | 2.127 | 20.5 | 3.1 | 32.8 | 4.9 | 14.4 | 2.2 |
| XM_168460a | 74 | 0.176 | 0.527 | 0.132 | 2.665 | 1.121 | 8.7 | 4.3 | 26.1 | 12.9 | 6.5 | 3.2 |
Three-species alignments where the average dN/dS ratios over sites and lineages are larger than one. (Gene) Genbank accession number of the human gene. (L) Length of sequence alignment in codons, P(N) number of nonsynonymous substitutions in pig, P(S) number of synonymous substitutions in pig, M(N), M(S) and H(N), H(S) represents the mouse and human lineage respectively. (a)Possible pseudogene in human lineage.
Genes with branches where ω > 1 based on the free ratio model.
| NM_001866 | 80 | 2.130 | 0.434 | 0.535 | 9.8 | 1.9 | 11.5 | 11.0 | 12.0 | 9.3 | 0.4775 |
| NM_004085 | 97 | 1.088 | 0.053 | 0.000 | 2.0 | 0.5 | 4.9 | 24.1 | 0.0 | 8.8 | 0.9863 |
| NM_004549 | 122 | 0.570 | 0.276 | 1.615 | 15.2 | 10.2 | 25.7 | 35.5 | 28.9 | 6.8 | 0.4229 |
| NM_004891 | 65 | 0.092 | 0.116 | 2.560 | 2.9 | 11.6 | 8.1 | 25.1 | 10.1 | 1.4 | 0.5318 |
| NM_006607 | 187 | 0.261 | 0.281 | 2.117 | 24.6 | 38.0 | 54.7 | 78.4 | 31.0 | 5.9 | 0.3625 |
| NM_012198 | 216 | 0.341 | 0.073 | 1.849 | 24.9 | 26.5 | 18.3 | 91.6 | 29.4 | 5.8 | 0.3888 |
| NM_017425 | 147 | 0.594 | 0.424 | 1.110 | 33.5 | 18.7 | 39.3 | 30.8 | 21.2 | 6.3 | 0.8750 |
| NM_022978 | 60 | 0.228 | 0.143 | 1.102 | 2.8 | 4.4 | 3.3 | 8.0 | 38.9 | 12.5 | 0.8547 |
| NM_031268 | 72 | 1.307 | 1.009 | 2.307 | 5.5 | 1.3 | 5.6 | 1.8 | 8.2 | 1.1 | 0.4009 |
| NM_032353 | 97 | 0.539 | 2.100 | 0.964 | 5.71 | 4.3 | 22.0 | 4.3 | 7.1 | 3.0 | 0.1606 |
| NM_032731 | 123 | 1.576 | 0.299 | 0.146 | 12.5 | 2.9 | 27.9 | 33.6 | 5.6 | 13.8 | 0.6854 |
| XM_003044a | 118 | 1.022 | 0.172 | 0.042 | 22.1 | 9.1 | 21.7 | 53.1 | 1.6 | 15.6 | 0.9743 |
| XM_016532 | 155 | 0.000 | 0.125 | 1.193 | 0.0 | 34.0 | 16.0 | 49.0 | 11.4 | 3.6 | 0.8863 |
| XM_041680a | 168 | 0.415 | 0.186 | 1.079 | 29.4 | 25.5 | 33.1 | 64.0 | 6.7 | 2.2 | 0.9673 |
| XM_062742a | 110 | 0.000 | 0.011 | 1.661 | 0.0 | 19.0 | 1.0 | 36.4 | 22.9 | 5.4 | 0.5119 |
| XM_069411 | 187 | 0.085 | 0.058 | 1.108 | 3.7 | 15.9 | 5.5 | 34.9 | 119.9 | 39.6 | 0.6943 |
| XM_092681 | 81 | 0.187 | 0.040 | 1.167 | 5.3 | 10.0 | 3.0 | 26.0 | 15.9 | 4.8 | 0.8551 |
| XM_165930a | 102 | ∞ | 2.061 | 0.691 | 24.7 | 0.0 | 32.9 | 5.1 | 10.2 | 4.7 | 0.0020* |
| XM_166695a | 190 | 0.235 | 0.067 | 1.183 | 19.8 | 30.9 | 12.5 | 38.3 | 79.6 | 24.7 | 0.6657 |
| XM_168460a | 74 | 2.672 | 0.848 | 1.085 | 10.4 | 2.3 | 23.9 | 15.6 | 6.6 | 3.4 | 0.9234 |
| XM_172026a | 72 | 0.035 | 0.042 | 2.213 | 0.9 | 8.8 | 2.1 | 17.3 | 30.1 | 4.7 | 0.2564 |
| XM_172342a | 143 | 0.000 | 0.032 | 1.320 | 0.0 | 3.0 | 2.0 | 20.3 | 27.3 | 6.6 | 0.6075 |
| XM_172363a | 77 | 0.139 | 0.132 | 1.077 | 8.7 | 21.0 | 8.6 | 22.0 | 14.4 | 4.5 | 0.9422 |
Three-species alignments where one or more lineages have a dN/dS ratio larger than one. (Gene) Genbank accession number of the human gene. (L) Length of sequence alignment in codons, P(N) number of nonsynonymous substitutions in pig, P(S) number of synonymous substitutions in pig, M(N), M(S) and H(N), H(S) represents the mouse and human lineage respectively. (a)Possible pseudogenes in human lineage. (ω > 1) If more than one branch have ω > 1 only the significance of the branch with the largest value of ω is shown. (*) LRT (ω > 1) significant at 0.01 level.
Genes predicted to be under positive selection with the branch-site models.
| NM_001785 | 3.8 | 0.27 | Cytidine deaminase |
| NM_001867 | 5.3 | 0.20 | Cytochrome c oxidase subunit VIIc |
| NM_004846 | 10.7 | 0.05 | Eukaryotic translation initiation factor 4E-like 3 |
| NM_006607 | 27.9 | 0.23 | Pituitary tumor-transforming 2 |
| NM_012198 | 4.8 | 0.32 | Grancalcin, EF-hand calcium binding protein |
| NM_021167 | 6.9 | 0.13 | Ocular development-associated gene (Interim) |
| NM_022978 | 22.0 | 0.28 | Small EDRK-rich factor 1B (centromeric) |
| NM_080915 | 28.1 | 0.12 | Deoxyguanosine kinase |
| XM_039644 | 49.5 | 0.02 | Unclassified |
| XM_059906 | 8.2 | 0.07 | Unclassified |
| XM_062742 | 2.0 | 0.84 | Unclassified |
| XM_069411 | 6.8 | 0.38 | Similar to RIKEN cDNA 1300003K24 (Interim) |
| XM_166695 | 6.9 | 0.31 | Unclassified |
| XM_167131 | 3.8 | 0.28 | Unclassified |
| XM_172026 | 16.9 | 0.38 | Unclassified |
| NM_000509 | 2.4 | 0.13 | Fibrinogen, gamma polypeptide |
| NM_000520 | 10.3 | 0.06 | Hexosaminidase A (alpha polypeptide) |
| NM_002979 | 2.0 | 0.15 | Sterol carrier protein 2 |
| NM_003142 | 3.1 | 0.14 | Sjogren syndrome antigen B (autoantigen La) * |
| NM_016489 | 2.4 | 0.32 | 5'-nucleotidase, cytosolic III |
| NM_005731 | 8.6 | 0.074 | Actin related protein 2/3 complex, subunit 2, 34 kDa |
| NM_013342 | 1.3 | 0.152 | TCF3 (E2A) fusion partner (in childhood leukaemia) |
| NM_023935 | 3.0 | 0.171 | Chromosome 20 open reading frame 116 |
| XM_007076 | 2.5 | 0.248 | Unclassified |
Genes shown here all have a significant LRT at the 0.001 level. (ω) the predicted dN/dS ratio in the foreground lineage. (p) the proportion of sites predicted to be under positive selection.
Heterogeneity in dN/dS ratios over sites.
| CT | 0.691 | 0.238 | 0.055 | 8 × 10-5 | 0.016 | 0.221 | 0.418 | 0.163 | 2.494 |
| F3 × 4 | 0.681 | 0.245 | 0.058 | 1 × 10-5 | 0.017 | 0.219 | 0.410 | 0.162 | 2.658 |
The concatenated super gene is used to estimate the distribution of dN/dS ratios over sites. Each codon is allowed to fall into one of the five predefined dN/dS ratio classes. The branch lengths are expressed as number of substitutions per codon.
Codon usage in the three mammalian species.
| 2.16 | 2.00 | 2.06 | 1.43 | 1.44 | 1.38 | 1.61 | 1.40 | 1.48 | 1.03 | 0.99 | 0.97 | ||||
| 1.85 | 2.01 | 1.93 | 1.30 | 1.40 | 1.37 | 1.48 | 1.64 | 1.60 | 0.92 | 0.97 | 0.96 | ||||
| 0.98 | 0.81 | 0.89 | 1.12 | 1.01 | 1.04 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 1.50 | 1.42 | 1.46 | 0 | 0 | 0 | 1.24 | 1.23 | 1.23 | |||||||
| 1.53 | 1.40 | 1.44 | 1.66 | 1.62 | 1.60 | 1.17 | 1.05 | 1.07 | |||||||
| 1.55 | 1.67 | 1.70 | 1.35 | 1.37 | 1.48 | 1.15 | 1.31 | 1.25 | |||||||
| 0.83 | 0.78 | 0.74 | 1.63 | 1.58 | 1.53 | 1.30 | 1.18 | 1.19 | |||||||
| 3.32 | 3.60 | 3.48 | 3.03 | 3.17 | 3.13 | ||||||||||
| 2.16 | 1.90 | 2.01 | 1.42 | 1.30 | 1.32 | 2.05 | 1.73 | 1.88 | 1.18 | 1.12 | 1.13 | ||||
| 2.06 | 2.24 | 2.20 | 1.53 | 1.61 | 1.62 | 1.76 | 1.99 | 1.86 | 1.39 | 1.52 | 1.44 | ||||
| 0.92 | 0.84 | 0.88 | 1.64 | 1.59 | 1.50 | 3.35 | 2.95 | 3.23 | 1.46 | 1.40 | 1.42 | ||||
| 2.22 | 2.20 | 2.22 | 3.74 | 3.98 | 3.65 | 1.04 | 1.14 | 1.08 | |||||||
| 1.49 | 1.37 | 1.41 | 2.21 | 2.22 | 2.11 | 2.93 | 2.59 | 2.79 | 1.32 | 1.23 | 1.26 | ||||
| 1.33 | 1.50 | 1.43 | 2.35 | 2.43 | 2.54 | 2.35 | 2.70 | 2.51 | 1.95 | 2.08 | 0.06 | ||||
| 0.99 | 0.84 | 0.88 | 1.89 | 1.77 | 1.81 | 3.77 | 3.44 | 3.62 | 2.05 | 1.95 | 1.97 | ||||
| 2.66 | 2.82 | 2.75 | 3.51 | 3.79 | 3.66 | 1.28 | 1.35 | 1.34 | |||||||
The frequencies are expressed as a percentage of the 240,048 codons in each of the three species. Human(H), Mouse(M), Pig(P). Stop codons are not allowed in the analyses (*).
Evaluation of the choice of codon equilibrium frequencies.
| κ | ω | |||||||
| FQ | 1 | 0.136 | 0.340 | 0.178 | 2.862 | 0.125 | -1502578 | 249354 |
| F1 × 4 | 3 | 0.134 | 0.335 | 0.175 | 2.776 | 0.122 | -1500436 | 231560 |
| F3 × 4 | 9 | 0.136 | 0.343 | 0.178 | 2.692 | 0.119 | -1495232 | 133363 |
| F3 × 4 + CpG | 10 | 0.138 | 0.351 | 0.181 | 2.593 | 0.114 | -1493996 | 74214 |
| Codon Table | 60 | 0.136 | 0.348 | 0.179 | 2.497 | 0.113 | -1478877 | 0 |
The values are estimated with the concanated gene comprising 240,048 codons using the one ratio model. (df) Number of parameters (κ) Transition/transversion ratio. (ω) dN/dS ratio. (X2) A chi square test statistic comparing the expected frequencies of each codon to the observed codon counts.