| Literature DB >> 26849226 |
Monique Turmel1, Jean-Charles de Cambiaire1, Christian Otis1, Claude Lemieux1.
Abstract
The Chlorodendrophyceae is a small class of green algae belonging to the core Chlorophyta, an assemblage that also comprises the Pedinophyceae, Trebouxiophyceae, Ulvophyceae and Chlorophyceae. Here we describe for the first time the chloroplast genomes of chlorodendrophycean algae (Scherffelia dubia, 137,161 bp; Tetraselmis sp. CCMP 881, 100,264 bp). Characterized by a very small single-copy (SSC) region devoid of any gene and an unusually large inverted repeat (IR), the quadripartite structures of the Scherffelia and Tetraselmis genomes are unique among all core chlorophytes examined thus far. The lack of genes in the SSC region is offset by the rich and atypical gene complement of the IR, which includes genes from the SSC and large single-copy regions of prasinophyte and streptophyte chloroplast genomes having retained an ancestral quadripartite structure. Remarkably, seven of the atypical IR-encoded genes have also been observed in the IRs of pedinophycean and trebouxiophycean chloroplast genomes, suggesting that they were already present in the IR of the common ancestor of all core chlorophytes. Considering that the relationships among the main lineages of the core Chlorophyta are still unresolved, we evaluated the impact of including the Chlorodendrophyceae in chloroplast phylogenomic analyses. The trees we inferred using data sets of 79 and 108 genes from 71 chlorophytes indicate that the Chlorodendrophyceae is a deep-diverging lineage of the core Chlorophyta, although the placement of this class relative to the Pedinophyceae remains ambiguous. Interestingly, some of our phylogenomic trees together with our comparative analysis of gene order data support the monophyly of the Trebouxiophyceae, thus offering further evidence that the previously observed affiliation between the Chlorellales and Pedinophyceae is the result of systematic errors in phylogenetic reconstruction.Entities:
Mesh:
Year: 2016 PMID: 26849226 PMCID: PMC4743939 DOI: 10.1371/journal.pone.0148934
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Gene maps of the Scherffelia and Tetraselmis chloroplast genomes.
Filled boxes represent genes, with colors denoting gene categories as indicated in the legend. Genes on the outside of each map are transcribed counterclockwise; those on the inside are transcribed clockwise. The second outermost middle ring indicates the positions of the IR, LSC and SSC regions. Thick lines in the innermost ring represent the gene clusters conserved between the two chlorodendrophycean cpDNAs.
General features of Scherffelia, Tetraselmis and other core chlorophyte chloroplast genomes.
| Taxon | A+T | Size (bp) | Genes | Introns | Repeats | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| (%) | Genome | IR | SSC | No. | % | GI | GII | % | (%) | |
| 67.4 | 137,161 | 32,310 | 3,385 | 104 | 58.5 | 3 | 4 | 8.4 | 0.3 | |
| 66.0 | 100,264 | 21,342 | 392 | 99 | 76.5 | 0 | ||||
| 59.7 | 94,262 | 9,926 | 6,225 | 105 | 75.3 | 0.3 | ||||
| 66.6 | 126,694 | 16,074 | 7,927 | 106 | 55.8 | 5 | 5 | 9.9 | 1.9 | |
| 70.0 | 123,994 | 10,913 | 13,871 | 112 | 63.3 | 1 | 0.2 | 4.0 | ||
| 63.3 | 109,775 | 12,798 | 17,968 | 113 | 74.1 | 1 | 0.2 | 4.2 | ||
| 67.3 | 187,843 | 18,786 | 10,954 | 109 | 42.5 | 1 | 1 | 1.0 | 22.7 | |
| 72.0 | 117,543 | 15,891 | 8,415 | 105 | 61.8 | 8 | 12.3 | 11.6 | ||
| 66.8 | 114,128 | 10,577 | 11,068 | 111 | 67.1 | 1 | 0.2 | 7.3 | ||
| 68.5 | 167,972 | 6,835 | 33,215 | 110 | 47.6 | 5.5 | ||||
| 68.4 | 145,947 | 6,786 | 16,399 | 109 | 52.5 | 10.2 | ||||
| 59.5 | 151,933 | 18,510 | 33,610 | 104 | 53.5 | 5 | 6.8 | 11.1 | ||
| 68.5 | 195,867 | 6,039 | 42,875 | 105 | 43.2 | 27 | 15.3 | 5.3 | ||
| 69.2 | 106,859 | 108 | 61.9 | 7 | 6 | 8.3 | 2.4 | |||
| 70.5 | 196,547 | 35,492 | 45,200 | 99 | 52.6 | 17 | 4 | 17.9 | 1.3 | |
| 73.1 | 161,452 | 12,023 | 64,967 | 97 | 56.1 | 7 | 2 | 7.9 | 2.6 | |
| 65.5 | 203,826 | 22,211 | 78,099 | 94 | 44.1 | 5 | 2 | 6.8 | 16.5 | |
a Intronic genes and freestanding ORFs not usually found in green plant chloroplast genomes are not included in these values. Duplicated genes were counted only once. The proportion of coding sequences in the genome is also provided.
b Number of group I (GI) and group II (GII) introns is given. The proportion of intron sequences in the genome is also provided.
c Nonoverlapping repeat elements were mapped on each genome with RepeatMasker using as input sequences the repeats of at least 30 bp identified with REPuter. The proportion of the estimated repeat sequences in the genome is given.
Fig 2Gene repertoires of the chloroplast genomes compared in this study.
Only the conserved genes that are missing in one or more genomes are indicated. The presence of a gene is denoted by a blue box. A total of 85 genes are shared by all compared genomes: atpA, B, E, F, H, I, cemA, clpP, ftsH, petB, D, G, L, psaA, B, C, J, psbA, B, C, D, E, F, H, I, J, K, L, N, T, Z, rbcL, rpl2, 5, 14, 16, 20, 23, 36, rpoA, B, C1, C2, rps2, 3, 7, 8, 9, 11, 12, 18, 19, rrf, rrl, rrs, tufA, ycf1, 3, 4, 12, trnA(ugc), C(gca), D(guc), E(uuc), F(gaa), G(gcc), G(ucc), H(gug), I(gau), K(uuu), L(uaa), L(uag), Me(cau), Mf(cau), N(guu), P(ugg), Q(uug), R(acg), R(ucu), S(gcu), S(uga), T(ugu), V(uac), W(cca), Y(gua).
Introns in the Scherffelia chloroplast genome.
| Intron ORF | ||||
|---|---|---|---|---|
| Intron designation | Subgroup | Location | Type | Size (codons) |
| IB4 | L8 | LAGLIDADG (2) | 315 | |
| IA2 | L6 | GIY-YIG | 195 | |
| IA3 | L6 | LAGLIDADG (1) | 167 | |
| IIB | Domain IV | RT-X | 470 | |
| IIB | – | – | – | |
| IIB | Domain IV | RT-X | 459 | |
| IIB | Domain IV | RT-X | 241 |
a The insertion sites of the introns in protein-coding genes are given relative to the corresponding genes in Mesostigma cpDNA whereas the insertion site of the rrl intron is given relative to the E. coli 23S rRNA. For each insertion site, the position corresponds to the nucleotide immediately preceding the intron.
b Group I introns were classified according to Michel and Westhof [31], whereas classification of group II introns was according to Michel et al. [32].
c L followed by a number refers to the loop extending the base-paired region identified by the number; Domain refers to a domain of the group II intron secondary structure.
d For the group I intron ORFs, the conserved motif in the predicted homing endonuclease is given, with the number of copies of the LAGLIDADG motif indicated in parentheses. For the group II intron ORFs, RT and X refer to the reverse transcriptase and maturase domains, respectively.
Fig 3Gene partitioning patterns of the Scherffelia, Tetraselmis and other chlorophyte chloroplast genomes.
For each genome, one copy of the IR (thick vertical lines) and the entire SSC region are represented, but only the portion of the LSC region in the vicinity of the IR is displayed. The five genes composing the rDNA operon are highlighted in light green. The color assigned to each of the remaining genes is dependent upon the position of the corresponding gene relative to the rDNA operon in the cpDNA of the streptophyte alga Mesostigma viride, a genome displaying an ancestral gene partitioning pattern [56]. The genes highlighted in blue are found within or near the SSC region in this streptophyte genome (downstream of the rDNA operon), whereas those highlighted in light orange are found within or near the LSC region (upstream of the rDNA operon). The dark orange boxes denote the genes of LSC origin that have been acquired by the IRs of core chlorophytes (pedinophyceans, chlorodendrophyceans and core trebouxiophyceans). Note that, to simplify the comparison of gene order, some genomes are represented in their alternative isomeric form as compared to that used for the genome sequence deposited in GenBank.
Fig 4Extent of rearrangements between the Scherffelia and Tetraselmis chloroplast genomes.
These genomes were aligned using Mauve 2.3.1. Only one copy of the IR (pink boxes) is shown for each genome. The blocks of colinear sequences containing two or more genes are numbered as in Fig 1. Gene clusters 5 and 6 were retrieved as a single locally colinear block because their very small sizes did not allow them to be resolved in Mauve. Conversely, the gene cluster spanning the LSC/IR junction (cluster 1) was fragmented into three colinear blocks in Mauve because only one copy of the IR was included in this analysis and also because the two genomes were treated as linear instead of circular molecules (the genomes were linearized at the LSC/IR junction).
Fig 5ML phylogeny of chlorophytes inferred using the amino acid and nucleotide data sets assembled from 79 protein-coding genes.
The best-scoring RAxML tree inferred from the amino acid (PCG-AA) data set under the GTR+Γ4 model is presented. Bootstrap support (BS) values are reported on the nodes: from top to bottom or left to right, are shown the values for the analyses of the PCG-AA and the nucleotide PCG123degen and PCG12 data sets. A black dot indicates that the corresponding branch received a BS value of 100% in all three analyses; a dash represents a BS value < 50%. The scale bar denotes the estimated number of amino acid substitutions per site.
Fig 7ML phylogeny of chlorophytes inferred using the nucleotide PCG12RNA and PCG123degenRNA data sets assembled from 79 protein-coding and 29 RNA-coding genes.
The best-scoring RAxML tree inferred from the PCG12RNA data set under the GTR+Γ4 model is presented. BS values are reported on the nodes: from top to bottom or left to right, are shown the values for the analyses of the PCG12RNA and PCG123degenRNA data sets. A black dot indicates that the corresponding branch received a BS value of 100% in both analyses; a dash represents a BS value < 50%. The scale bar denotes the estimated number of nucleotide substitutions per site.
Fig 6Bayesian phylogeny of chlorophytes inferred using the PCG-AA data set assembled from 79 cpDNA-encoded proteins.
The majority-rule posterior consensus tree inferred with Phylobayes under the CAT+Γ4 model is presented. Posterior probability values are reported on the nodes: a black dot indicates that the corresponding branch received a value of 1.00 whereas a dash indicates a value < 0.95. The scale bar denotes the estimated number of amino acid substitutions per site.
Fig 8Shared gene pairs in chlorophyte chloroplast genomes.
The gene pairs that are shared by at least three taxa were identified among all possible signed gene pairs in the compared genomes. The presence of a gene pair is denoted by a blue box; a gray box refers to a gene pair in which at least one gene is missing due to gene loss. (A) Retention of prasinophyte gene pairs among core chlorophytes. The tree topology shown in Fig 7 was used to map losses of prasinophyte gene pairs. The characters indicated on the branches are restricted to those involving no gene losses; the characters denoted by triangles and rectangles represent homoplasic and synapomorphic losses, respectively. The full names of the gene pairs corresponding to the character numbers are given above the distribution matrix. The three chlorodendrophycean gene pairs highlighted in green and the pedinophycean gene pair highlighted in cyan are shared exclusively with prasinophyte genomes. (B) Gain of derived gene pairs among core chlorophytes. The six gene pairs highlighted in magenta denote synapomorphic characters uniting the Chlorellales and core trebouxiophyceans. Note that seven gene pairs (3'psaM-5'trnQ(uug), 3'trnQ(uug)-3'ycf47, 5'chlB-5’psbK, 3'chlB-5'psaA, 3'ftsH-3'trnL(caa), 3’rps4-5’trnS(gga) and 3'minD-5'trnN(guu)) could not be unambiguously included in this list of synapomorphies because at least one gene in each pair is missing in some taxa. Also note that the synapomorphic signatures of all highlighted gene pairs were confirmed using a larger data set including the gene pairs of all currently available chlorophyte chloroplast genomes.