| Literature DB >> 27026053 |
Naoko Takezaki1, Hidenori Nishihara2.
Abstract
Determining the phylogenetic relationship of two extant lineages of lobe-finned fish, coelacanths and lungfishes, and tetrapods is important for understanding the origin of tetrapods. We analyzed data sets from two previous studies along with a newly collected data set, each of which had varying numbers of species and genes and varying extent of missing sites. We found that in all the data sets the sister relationship of lungfish and tetrapods was constructed with the use of cartilaginous fish as the outgroup with a high degree of statistical support. In contrast, when ray-finned fish were used as the outgroup, which is taxonomically an immediate outgroup of lobe-finned fish and tetrapods, the sister relationship of coelacanth and tetrapods was supported most strongly, although the statistical support was weaker. Even though it is generally accepted that the closest relative is an appropriate outgroup, our analysis suggested that the large divergence of the ray-finned fish as indicated by their long branch lengths and different amino acid frequencies made them less suitable as an outgroup than cartilaginous fish.Entities:
Keywords: cartilaginous fish; lungfish; missing data; phylogenomics; ray-finned fish
Mesh:
Substances:
Year: 2016 PMID: 27026053 PMCID: PMC4860700 DOI: 10.1093/gbe/evw071
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FThe phylogenetic relationship of the major lineages in jawed vertebrates and lobe-finned vertebrates. (a) The relationship of major lineages in jawed vertebrates. (b) Three possible relationships for the three extant lineages of lobe-finned vertebrates: Sister relationships of lungfishes and tetrapods (Tree 1), coelacanths and tetrapods (Tree 2), and lungfishes and coelacanths (Tree 3).
Data Sets Analyzed in this Study
| Data Set | Source | Genes | Amino Acid Sites | Species | Missing Sites (%) |
|---|---|---|---|---|---|
| I | Amemiya et al. | 251 | 112,212 | 20 | 14.2 |
| II | Liang et al. | 1,288 | 618,946 | 10 | 6.5 |
| III | This study | 831 | 242,475 | 25 | 0 |
Note.—In data set I, only concatenated sequence was available, and two shark species were missing. In data set II, genes with <50 amino acid sites were excluded.
FMaximum-likelihood trees constructed for concatenated sequences of the three data sets. (a)–(c) CF and RF were used as the outgroup. (d)–(f) RF was used as the outgroup. (a) and (d) Data set I from Amemiya et al. (2013). (b) and (e) Data set II from Liang et al. (2013). (c) and (f) Data set III collected in this study. The numbers on the branches are BPs from 500 replications. The trees were constructed with the JTTFG4 setting by PhyML.
Log-Likelihood Values of Concatenated Sequences
| Data Set | Tree | Outgroup | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| CF + RF | CF | RF | ||||||||
| Δ | AU | Δ | AU | Δ | AU | |||||
| I | 1 | −1,241,832.7 | Best | 0.738 | −1,016,570.7 | Best | 0.624 | −56.8 | 0.094 | |
| 2 | −82.6 | 0.028 | −80.8 | 0.044 | −1,157,666.6 | Best | 0.621 | |||
| 3 | −53.6 | 0.146 | −134.5 | 1 × 10−5 | −3.1 | 0.574 | ||||
| II | 1 | −5,726,397.0 | Best | 1 | −4,779,270.5 | Best | 1 | −278.3 | 0.001 | |
| 2 | −593.6 | 2 × 10−5 | −404.9 | 8 × 10−6 | −5,243,076.5 | Best | 0.936 | |||
| 3 | −503.2 | 1 × 10−35 | −697.3 | 7 × 10−5 | −121.6 | 0.001 | ||||
| III | 1 | −3,086,859.0 | Best | 0.855 | −2,296,640.0 | Best | 0.832 | −5.0 | 0.680 | |
| 2 | −308.3 | 5 × 10−6 | −224.9 | 0.001 | −2,715,213.2 | Best | 0.765 | |||
| 3 | −270.7 | 3 × 10−4 | −415.2 | 6 × 10−5 | −103.2 | 0.073 | ||||
Note.—“AU” refers to the P-value from the AU test; JTTFG4 was assumed. Tree topologies for data sets I and III take into account the ambiguities in relationship of elephant, armadillo, and the other eutherian mammals, and those for data set III the ambiguities in pufferfish and stickleback and the cluster of tilapia, platyfish, and medaka (supplementary table S6, Supplementary Material online). The results for the topologies with the highest likelihood value among those with Trees 1, 2, and 3 are shown; the results of all tree topologies are shown in supplementary table S8, Supplementary Material online. L = log-likelihood for the best tree. ΔL = the difference in the log-likelihood values relative to the best tree.
Summary of Tree Topologies and their Statistical Support
| Data Set | Method | Substitution Model | Outgroup | |||||
|---|---|---|---|---|---|---|---|---|
| CF + RF | CF | RF | ||||||
| Tree | BP or PP | Tree | BP or PP | Tree | BP or PP | |||
| I | ML | JTTFG4 | 1 | 87 | 1 | 97 | 2 | 54 |
| GTRG4 | 1 | 89 | 1 | 100 | 3 | 51 | ||
| Bayesian | JTTFG4 | 1 | 1.00 | 1 | 1.00 | 2 | 1.00 | |
| GTRG4 | 1 | 1.00 | 1 | 1.00 | 3 | 1.00 | ||
| II | ML | JTTFG4 | 1 | 100 | 1 | 100 | 2 | 95 |
| GTRG4 | 1 | 100 | 1 | 100 | 2 | 88 | ||
| Bayesian | JTTFG4 | 1 | 1.00 | 1 | 1.00 | 2 | 1.00 | |
| GTRG4 | 1 | 1.00 | 1 | 1.00 | 2 | 1.00 | ||
| MSC | JTTFG4 | 1 | 100 | 1 | 99.8 | 2 | 83.3 | |
| 3 | 15.8 | |||||||
| III | ML | JTTFG4 | 1 | 100 | 1 | 97 | 2 | 52 |
| GTRG4 | 1 | 100 | 1 | 100 | 1 | 59 | ||
| Bayesian | JTTFG4 | 1 | 1.00 | 1 | 1.00 | 2 | 1.00 | |
| GTRG4 | 1 | 1.00 | 1 | 1.00 | 1 | 1.00 | ||
| MSC | JTTFG4 | 1 | 99.4 | 1 | 98.4 | 2 | 48.1 | |
Note.—BP: bootstrap probability (for ML and MSC methods). PP: posterior probability (for Bayesian analyses).
Number of Genes that Supported Trees 1, 2, and 3
| Data Set | Tree | Outgroup | ||
|---|---|---|---|---|
| CF + RF | CF | RF | ||
| II | 1 | 517 | 522 | 429 |
| 2 | 342 | 401 | 425 | |
| 3 | 429 | 365 | 434 | |
| 0.0001 | 0.0005 | 0.9767 | ||
| III | 1 | 335 | 326 | 278 |
| 2 | 229 | 273 | 319 | |
| 3 | 267 | 232 | 234 | |
| 0.006 | 0.0157 | 0.0373 | ||
Note.—“P value” indicates that from a chi-square test.
FNetwork trees. (a) Data set I. (b) Data set II. (c) Data set III. The neighbor-net with JTTG distance was used.
Average Branch Lengths of Trees of the Five Taxonomic Groups
| Tree | Data Set | Tetrapod | Lungfish | Coelacanth | RF | CF | b1 | b2 | r1 | r2 |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | I | 0.211 | 0.162 | 0.119 | 0.284 | 0.209 | 0.009 | 0.020 | 1.37 | 1.36 |
| II | 0.205 | 0.171 | 0.146 | 0.291 | 0.207 | 0.010 | 0.018 | 1.17 | 1.40 | |
| III | 0.177 | 0.153 | 0.114 | 0.292 | 0.196 | 0.009 | 0.017 | 1.34 | 1.49 | |
| 2 | I | 0.213 | 0.161 | 0.120 | 0.283 | 0.209 | 0.008 | 0.019 | 1.34 | 1.35 |
| II | 0.208 | 0.172 | 0.147 | 0.291 | 0.207 | 0.007 | 0.018 | 1.17 | 1.40 | |
| III | 0.179 | 0.154 | 0.115 | 0.292 | 0.196 | 0.007 | 0.017 | 1.33 | 1.49 | |
| 3 | I | 0.213 | 0.160 | 0.116 | 0.284 | 0.210 | 0.010 | 0.021 | 1.38 | 1.35 |
| II | 0.209 | 0.169 | 0.143 | 0.291 | 0.208 | 0.009 | 0.020 | 1.18 | 1.40 | |
| III | 0.180 | 0.152 | 0.112 | 0.292 | 0.196 | 0.008 | 0.018 | 1.35 | 1.49 |
Note.—Tetrapod, lungfish, coelacanth, RF, and CF refer to the branch lengths leading to the taxa (supplementary fig. S5, Supplementary Material online); r1 and r2 refer to the ratio of branch lengths of lungfish to coelacanth and of RF to CF, respectively; the average of the lengths to all species was taken for the length of the tetrapod, RF, and CF branches; b1 and b2 refer to internal branches connecting coelacanth and lungfish and RF and CF, respectively (supplementary fig. S5, Supplementary Material online); branch lengths were estimated by the ML method with JTTFG4.
Differences in Amino Acid Frequencies among the Taxonomic Groups
| Data Set | Tetrapod | Lungfish | Coelacanth | RF | CF | |
|---|---|---|---|---|---|---|
| I | Tetrapod | — | 39.8* | 17.1 | 41.7* | 40.4* |
| Lungfish | 6 | — | 14.1 | 92.3* | 22.2 | |
| Coelacanth | 0 | 0 | — | 71.1* | 31.5 | |
| RF | 11 | 7 | 7 | — | 147.3* | |
| CF | 5 | 0 | 1 | 7 | — | |
| II | Tetrapod | — | 92.8* | 72.0* | 137.2* | 41.3* |
| Lungfish | 9 | — | 31.6 | 305.1* | 31.0 | |
| Coelacanth | 4 | 1 | — | 301.8* | 42.5* | |
| RF | 12 | 13 | 12 | — | 174.6* | |
| CF | 9 | 5 | 6 | 10 | — | |
| III | Tetrapod | – | 27.0 | 19.9 | 48.6* | 23.4 |
| Lungfish | 1 | — | 7.5 | 73.8* | 9.8 | |
| Coelacanth | 1 | 0 | — | 78.4* | 12.6 | |
| RF | 15 | 10 | 12 | — | 165.3* | |
| CF | 10 | 0 | 0 | 12 | — |
Note.—Upper diagonal elements are chi-square values between the taxonomic groups. An asterisk indicates that the value is significant at the 1% level. Lower diagonal elements show the number of amino acids for which the z-test was significant (at the 1% level); for data set II, only shared sites were used.
The Number of Replications in which Tree 1–3 Topologies Were Obtained in a Simulation When Tree 1 Was Assumed
| Data Set | Tree Constructed | Estimated Branch Length | bR × 2 | b1/10 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| CF + RF | RF | CF | CF + RF | RF | CF | CF + RF | RF | CF | ||
| I | 1 | 597.5 | 524.0 | 536.8 | 572.0 | 474.7 | 553.8 | 383.8 | 334.2 | 358.2 |
| 2 | 213.5 | 217.5 | 230.8 | 237.0 | 279.7 | 237.3 | 305.3 | 353.2 | 309.2 | |
| 3 | 189.0 | 258.5 | 232.3 | 191.0 | 245.7 | 208.8 | 310.8 | 312.7 | 332.7 | |
| II | 1 | 603.3 | 531.0 | 547.0 | 590.0 | 460.0 | 563.0 | 366.2 | 335.5 | 369.8 |
| 2 | 210.3 | 238.5 | 226.5 | 235.5 | 294.0 | 223.5 | 341.7 | 349.0 | 304.3 | |
| 3 | 186.3 | 230.5 | 226.5 | 174.5 | 246.0 | 213.5 | 292.2 | 315.5 | 325.8 | |
| III | 1 | 599.8 | 520.5 | 534.3 | 551.7 | 462.5 | 516.5 | 389.8 | 338.5 | 346.0 |
| 2 | 196.8 | 247.0 | 238.3 | 232.7 | 278.5 | 239.5 | 306.8 | 342.5 | 317.5 | |
| 3 | 203.3 | 232.5 | 227.3 | 215.7 | 259.0 | 244.0 | 303.3 | 319.0 | 336.5 | |
Note.—bR × 2 refers to the case in which the length of the branch to RF was elongated to two times the estimated value when sequences were generated. b1/10 refers to the case in which the length of the internal branch was reduced to one-tenth of the estimated value; when CF + RF was used as the outgroup, the likelihood values of the three topologies shown in supplementary figure S5, Supplementary Material online, were compared. The JTTG model was assumed to generate sequences using the estimated gamma parameter values 0.461, 0.501, and 0.394 for data sets I, II, and III, respectively; 1,000 replications were carried out.