| Literature DB >> 24278129 |
Bin Lu1, Weizhao Yang, Qiang Dai, Jinzhong Fu.
Abstract
The phylogenetic position of turtles within the vertebrate tree of life remains controversial. Conflicting conclusions from different studies are likely a consequence of systematic error in the tree construction process, rather than random error from small amounts of data. Using genomic data, we evaluate the phylogenetic position of turtles with both conventional concatenated data analysis and a "genes as characters" approach. Two datasets were constructed, one with seven species (human, opossum, zebra finch, chicken, green anole, Chinese pond turtle, and western clawed frog) and 4584 orthologous genes, and the second with four additional species (soft-shelled turtle, Nile crocodile, royal python, and tuatara) but only 1638 genes. Our concatenated data analysis strongly supported turtle as the sister-group to archosaurs (the archosaur hypothesis), similar to several recent genomic data based studies using similar methods. When using genes as characters and gene trees as character-state trees with equal weighting for each gene, however, our parsimony analysis suggested that turtles are possibly sister-group to diapsids, archosaurs, or lepidosaurs. None of these resolutions were strongly supported by bootstraps. Furthermore, our incongruence analysis clearly demonstrated that there is a large amount of inconsistency among genes and most of the conflict relates to the placement of turtles. We conclude that the uncertain placement of turtles is a reflection of the true state of nature. Concatenated data analysis of large and heterogeneous datasets likely suffers from systematic error and over-estimates of confidence as a consequence of a large number of characters. Using genes as characters offers an alternative for phylogenomic analysis. It has potential to reduce systematic error, such as data heterogeneity and long-branch attraction, and it can also avoid problems associated with computation time and model selection. Finally, treating genes as characters provides a convenient method for examining gene and genome evolution.Entities:
Mesh:
Year: 2013 PMID: 24278129 PMCID: PMC3836853 DOI: 10.1371/journal.pone.0079348
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Alternative placements of turtles in the current phylogeny of living tetrapods.
Figure 2The procedural flowchart of using genes as characters in phylogenomic analysis.
Figure 3The phylogenetic hypotheses derived from the 7-species data.
Amino-acid and nucleotide sequences were analyzed by maximum parsimony (MP) and maximum likelihood (ML) methods, respectively. Numbers near the nodes are bootstrap values.
Figure 4The phylogenetic hypotheses derived from the 11-species data.
Amino-acid and nucleotide sequences were analyzed by maximum parsimony (MP) and maximum likelihood (ML) methods, respectively. A Bayesian tree from nucleotide sequences and a MP tree from the 1st and 2nd codon position sequences are also presented. Numbers near the nodes are bootstrap values or Bayesian posterior probabilities (E).
A simple tally of genes that support alternative hypotheses.
| Hypothesis | Tree topology | Number of genes (nucleotides) | Number of proteins (amino-acid) |
|
| |||
| Archosaur Hypothesis | ((ZF,Ch),PT) | 2117 (46%) | 1868 (41%) |
| (((ZF,Ch),PT)An) | 1502 | 1327 | |
| (((((ZF,Ch),PT),An),(Hu,Op)),Fr) | 1281 | 1149 | |
| Lepidosaur Hypothesis | (PT,An) | 798 (17%) | 811 (18%) |
| ((PT,An),(ZF,Ch)) | 618 | 597 | |
| ((((PT,An),(ZF,Ch)),(Hu,Op)),Fr) | 500 | 501 | |
| Diapsid Hypothesis | ((ZF,Ch,An),PT) | 460 (10%) | 535 (12%) |
| (((((ZF,Ch),An),PT),(Hu,Op)),Fr) | 379 | 450 | |
| Other topologies | 1209 (26%) | 1370 (30%) | |
|
| |||
| Archosaur Hypothesis | (((ZF,Ch),Cr),(PT,ST)) | 222 (14%) | 80 (5%) |
| ((((ZF,Ch),Cr),(PT,ST)),((An,Py),Tu)) | 106 | 32 | |
| Lepidosaur Hypothesis | (((An,Py),Tu),(PT,ST)) | 121 (7%) | 80 (5%) |
| ((((An,Py),Tu),(PT,ST)),((ZF,Ch),Cr)) | 32 | 21 | |
| Crocodilian Hypothesis | ((Cr,(PT,ST)),(ZF,Ch)) | 90 (5%) | 22 (1%) |
| Bird Hypothesis | (((ZF,Ch),(PT,ST)),Cr) | 47 (3%) | 17 (1%) |
| Other topologies | 1158 (71%) | 1439 (88%) |
Hu = Human (Homo sapiens), Fr = Western Clawed Frog (Xenopus tropicalis), ZF = Zebra Finch (Taeniopygia guttata), Ch = Chicken (Gallus gallus), An = Green Anole (Anolis carolinensis), PT = Chinese Pond Turtle (Mauremys reevesii), ST = Soft-shelled Turtle (Pelodiscus sinensis), Py = Royal Python (Python regius), Tu = Tuatara (Sphenodon punctatus), Op = Opossum (Monodelphis domestica), Cr = Nile Crocodile (Crocodylus niloticus).
For example, the clade ((ZF,Ch),PT) appears on 2117 gene trees without regarding other relationships. The 7-species dataset includes 4,584 putatively orthologous proteins and coding genes; the 11-species dataset includes 1,638 putatively orthologous proteins and coding genes.
Figure 5The phylogenetic hypotheses derived from using genes as characters.
Individual gene tree was first estimated and used as character-state tree. Each gene was then treated as a character and a parsimony analysis was used to construct the species trees. Amino-acid and nucleotide sequences were analyzed separately. Numbers near the nodes are bootstrap values. Note the different placements of turtles and the low bootstrap values for the associated nodes.
Figure 6Results of the SplitTree analysis.
Loops or boxes indicate the location of incongruence on the networks. The analysis examined four datasets, and all suggested strong incongruence regarding the phylogenetic position of turtles.
Figure 7Heatmaps from the Phylcon analysis.
Each vertical line represents a gene and different colors represent different p values from the AU tests. A small p value (dark green) indicates that a gene rejects a topology. Topologies examined: 7-species data: 1. (((((ZF, Ch), PT),An),(Hu, Op)), Fr); 2. ((((ZF, Ch),(PT, An)),(Hu, Op)), Fr); 3. (((((ZF, Ch),An), PT),(Hu, Op)), Fr). 11-species data: 1. (((Hu, Op),((((ZF, Ch), Cr),(PT, ST)),((An,Py),Tu))),Fr); 2. (((Hu, Op),(((ZF, Ch), Cr),((PT, ST),((An,Py),Tu)))),Fr); 3. (((Hu, Op),(((ZF, Ch),(Cr,(PT, ST))),((An,Py),Tu))),Fr); 4. (((Hu, Op),((((ZF, Ch),(PT, ST)), Cr),((An,Py),Tu))),Fr). Hu = Human (Homo sapiens), Fr = Western Clawed Frog (Xenopus tropicalis), ZF = Zebra Finch (Taeniopygia guttata), Ch = Chicken (Gallus gallus), An = Green Anole (Anolis carolinensis), PT = Chinese Pond Turtle (Mauremys reevesii), ST = Soft-shelled Turtle (Pelodiscus sinensis), Py = Royal Python (Python regius), Tu = Tuatara (Sphenodon punctatus), Op = Opossum (Monodelphis domestica), Cr = Nile Crocodile (Crocodylus niloticus).
Chi-square test of the relationship between genes under positive selection and alternative hypotheses.
| Numbers genes support a clade | ((ZF, Ch), PT) | (PT, An) |
| Under positive selection | 143 (130.66) | 37 (49.34) |
| Neutral | 1917 (1929.35) | 741 (728.66) |
| Total | 2060 | 778 |
| Chi-square value | 4.54 | |
| P |
|
ZF = Zebra Finch (Taeniopygia guttata), Ch = Chicken (Gallus gallus), PT = Chinese Pond Turtle (Mauremys reevesii), An = Green Anole (Anolis carolinensis).
Numbers in parentheses are expected numbers of genes under random distribution. Significantly more positively selected genes support the archosaur hypothesis.
Chi-square tests of the relationship between genes functions and phylogenetic hypotheses.
| GO/KEGG category | A+ | A− | O+ | O− | X2 | P |
|
| ||||||
|
| ||||||
| organophosphate metabolic process (GO:0019637) | 61 (49) | 1975 (1987) | 46 (58) | 2342 (2030) | 5.3 | 0.0210 |
| hormone metabolic process (GO:0042445) | 49 (39) | 1987 (1997) | 35 (45) | 2353 (2343) | 5.2 | 0.0223 |
| macromolecule metabolic process (GO:0043170) | 1180 (1148) | 856 (888) | 1314 (1346) | 1074 (1042) | 3.8 | 0.0500 |
| primary metabolic process (GO:0044238) | 1419 (1385) | 617 (651) | 1590 (1624) | 798 (764) | 4.9 | 0.0269 |
|
| ||||||
| organelle lumen (GO:0043233) | 682 (648) | 1354 (1388) | 726 (760) | 1662 (1628) | 4.8 | 0.0276 |
| extracellular matrix (GO:0031012) | 77 (65) | 1959 (1971) | 64 (76) | 2324 (2312) | 4.3 | 0.0376 |
| extracellular matrix part (GO:0044420) | 37 (25) | 1999 (2011) | 18 (30) | 2370 (2358) | 10.1 | 0.0015 |
|
| ||||||
| nucleotide binding (GO:0000166) | 409 (381) | 1627 (1655) | 419 (447) | 1969 (1941) | 4.7 | 0.0307 |
| hormone binding (GO:0042562) | 25 (18) | 2011 (2018) | 15 (22) | 2373 (2366) | 4.4 | 0.0357 |
| cofactor binding (GO:0048037) | 72 (60) | 1964 (1976) | 59 (71) | 2329 (2317) | 4.3 | 0.0372 |
|
| ||||||
| inositol phosphate metabolism (map00562) | 28 (18) | 318 (328) | 10 (20) | 375 (365) | 11.2 | 0.0008 |
| phosphatidylinositol signaling system (map04070) | 31 (22) | 315 (324) | 15 (24) | 370 (361) | 7.9 | 0.0049 |
| arginine and proline metabolism (map00330) | 24 (18) | 322 (328) | 14 (20) | 371 (365) | 4.0 | 0.0448 |
|
| ||||||
|
| ||||||
| cell activation (GO:0001775) | 76 (61) | 701 (716) | 274 (289) | 3373 (3358) | 4.5 | 0.0334 |
| translational initiation (GO:0006413) | 17 (10) | 760 (767) | 40 (47) | 3607 (3600) | 6.0 | 0.0143 |
| cellular component disassembly (GO:0022411) | 45 (34) | 732 (743) | 149 (160) | 3498 (3487) | 4.4 | 0.0350 |
| fertilization (GO:0009566) | 14 (8) | 763 (769) | 33 (39) | 3614 (3608) | 4.9 | 0.0268 |
| pollination (GO:0009856) | 16 (10) | 761 (767) | 40 (46) | 3607 (3601) | 4.7 | 0.0293 |
| system process (GO:0003008) | 156 (135) | 621 (642) | 610 (631) | 3037 (3016) | 5.0 | 0.0250 |
| digestion (GO:0007586) | 13 (6) | 764 (771) | 23 (30) | 3624 (3617) | 8.6 | 0.0033 |
| stem cell maintenance (GO:0019827) | 14 (8) | 763 (769) | 34 (40) | 3613 (3607) | 4.5 | 0.0336 |
| regulation of multi-organism process (GO:0043900) | 15 (9) | 762 (768) | 34 (40) | 3613 (3607) | 5.8 | 0.0158 |
|
| ||||||
| cell projection (GO:0042995) | 130 (110) | 647 (667) | 499 (519) | 3148 (3128) | 4.9 | 0.0271 |
| membrane part (GO:0044425) | 329 (296) | 448 (481) | 1355 (1388) | 2292 (2259) | 7.3 | 0.0068 |
|
| ||||||
| protein complex scaffold (GO:0032947) | 12 (5.6) | 765 (771) | 20 (26) | 3627 (3621) | 8.8 | 0.0029 |
|
| ||||||
| glycerolipid metabolism (map00561) | 14 (7) | 119 (126) | 25 (32) | 573 (566) | 8.7 | 0.0032 |
| glycerophospholipid metabolism (map00564) | 13 (7) | 120 (126) | 28 (34) | 570 (564) | 5.3 | 0.0210 |
Only significant results are presented. A: the archosaur hypothesis; L: the lepidosaur hypothesis; O: other hypotheses. +: number of genes which have a particular GO category or KEGG pathway; −: number of genes which do not have the GO category or KEGG pathway. Numbers in parentheses are expected numbers of genes under random distribution. For example, of the 2117 genes that support the Archosaur hypothesis, 2036 genes have the GO category of “biological process”; among them, 61 have the GO category of “organophosphate metabolic process” and 1975 genes do not have the term.