| Literature DB >> 24795089 |
Blaise Li1, João S Lopes2, Peter G Foster3, T Martin Embley4, Cymon J Cox5.
Abstract
Archaeplastida (=Kingdom Plantae) are primary plastid-bearing organisms that evolved via the endosymbiotic association of a heterotrophic eukaryote host cell and a cyanobacterial endosymbiont approximately 1,400 Ma. Here, we present analyses of cyanobacterial and plastid genomes that show strongly conflicting phylogenies based on 75 plastid (or nuclear plastid-targeted) protein-coding genes and their direct translations to proteins. The conflict between genes and proteins is largely robust to the use of sophisticated data- and tree-heterogeneous composition models. However, by using nucleotide ambiguity codes to eliminate synonymous substitutions due to codon-degeneracy, we identify a composition bias, and dependent codon-usage bias, resulting from synonymous substitutions at all third codon positions and first codon positions of leucine and arginine, as the main cause for the conflicting phylogenetic signals. We argue that the protein-coding gene data analyses are likely misleading due to artifacts induced by convergent composition biases at first codon positions of leucine and arginine and at all third codon positions. Our analyses corroborate previous studies based on gene sequence analysis that suggest Cyanobacteria evolved by the early paraphyletic splitting of Gloeobacter and a specific Synechococcus strain (JA33Ab), with all other remaining cyanobacterial groups, including both unicellular and filamentous species, forming the sister-group to the Archaeplastida lineage. In addition, our analyses using better-fitting models suggest (but without statistically strong support) an early divergence of Glaucophyta within Archaeplastida, with the Rhodophyta (red algae), and Viridiplantae (green algae and land plants) forming a separate lineage.Entities:
Keywords: Archaeplastida; Cyanobacteria; origin of plastids; phylogeny
Mesh:
Substances:
Year: 2014 PMID: 24795089 PMCID: PMC4069611 DOI: 10.1093/molbev/msu105
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Summary of Phylogenetic Support Values.
| Analysis | Plastids Sister to | SO-6 | UNIT+ | Glaucophyta Sister to | |||
|---|---|---|---|---|---|---|---|
| “core” | OSC-2 | Monophyletic | Monophyletic | Rhodophyta | Viridiplantae | ||
| cg75_mlboot | −0.72 | 0.72 | −0.72 | 0.72 | −0.70 | 0.59 | −0.59 |
| cp75_mlboot | 0.99 | −0.99 | −1.00 | 1.00 | 1.00 | 0.70 | −0.70 |
| cp75_stat | 1.00 | −1.00 | −1.00 | 1.00 | 1.00 | 1.00 | −1.00 |
| cp75_CAT | 1.00 | −1.00 | −1.00 | 1.00 | 1.00 | −0.94 | −0.94 |
| cp75_NDCH | 1.00 | −1.00 | −1.00 | 1.00 | 1.00 | −0.71 | −0.71 |
| cg75_stat | −1.00 | 1.00 | −1.00 | 1.00 | −1.00 | 1.00 | −1.00 |
| cg75_NDCH | −1.00 | 1.00 | −1.00 | 1.00 | 1.00 | 1.00 | −1.00 |
| cg75_degen3 | −1.00 | 0.89 | −1.00 | 1.00 | 1.00 | 0.98 | −0.98 |
| cg75_degen | 0.98 | −0.98 | −1.00 | 1.00 | 1.00 | 0.80 | −0.80 |
| cg75_degenLR3 | 0.95 | −0.95 | −1.00 | 1.00 | 1.00 | 0.83 | −0.83 |
| cg75_degen1LR | −0.91 | 0.91 | −0.91 | 0.91 | −0.74 | 0.45 | −0.45 |
| cg75_degen12S | −0.92 | −0.92 | 0.78 | −0.92 | −0.91 | −0.82 | 0.82 |
Note.—BPs (“mlboot”) or posterior probabilities are shown for relationships (columns) for selected analyses (rows). A positive value is the support for the relationship or a negative value is the support for its most supported conflicting node, where appropriate. When the relationship is a sister-group relationship, the support value reported is the lowest among those of the two monophyletic sister groups and of the clade formed by those two groups. The “core” refers to the “core-cyanobacteria” group defined as all Cyanobacteria present in our taxonomic sampling except the early-diverging GBACT taxa. The “degen*” analyses are performed under standard ML, but a proportion of the signal associated with codon synonymy is suppressed by recoding some of the codon positions where codon degeneracy exists (supplementary table S1, Supplementary Material online): “1LR” designates the signal associated with first codon position synonymy among leucine and arginine codons; “12S” designates the signal associated with first and second codon position synonymy among serine codons; “3” designates the signal associated with third codon position synonymy among all codons families. “CAT” designates the site-heterogeneous composition model implemented in Phylobayes. “stat” indicates a stationary composition model and “NDCH” designates the nonstationary (tree-heterogeneous) composition model implemented in P4.
FML bootstrap analysis of the protein-coding gene data set “cg75” and 50% majority-rule consensus tree of 200 ML () bootstrap trees. Values above the branches are BPs. Colors indicate taxonomic groups (supplementary table S1, Supplementary Material online): Bacteria (purple), Cyanobacteria (blue), Glaucophyta (orange), Rhodophyta (red), and Viridiplantae (green). Note that Prochlorococcus is attracted to the Archaeplastida clade causing lower support values between the two points of attachment.
FML bootstrap analysis of the protein data set “cp75” and 50% majority-rule consensus tree of 200 ML () bootstrap trees. Values above branches are BPs. Colors indicate taxonomic group (refer legend of fig. 1).
FSimplified ML bootstrap tree for the recoded protein-coding gene data set “cg75_degen12S” and 50% majority-rule consensus tree of 200 ML () bootstrap trees. Clades are labeled by their group label were possible. The codon usage bias and proportions at the three codon positions of the original “cg75” data set (i.e., without recoding) are presented to the right of the taxa (average values are given for summarized groups). This tree was chosen to display codon usage biases and G + C proportions because it seems to exemplify reconstruction errors induced by compositional effects. The topology of this tree somewhat correlates with composition and codon usages biases. Codon usage bias among Leu, Ser, and Arg is measured as the of the unbiased ratio between the usage of the two families of codons where the number of occurrences of codons of a family is divided by the number of possible codons in that family (2 or 4). Codon family labels: , ; , ; , ; , ; , ; and , . The codon bias representation is inspired by figure 1 of Inagaki and Roger (2006). Values above branches are BPs. Colors indicate taxonomic group (refer legend of fig. 1). *Prochlorococcus is an abbreviation of Prochlorococcus marinus (SO-6).