| Literature DB >> 20624729 |
Jean-Simon Brouard1, Christian Otis, Claude Lemieux, Monique Turmel.
Abstract
The Chlorophyceae, an advanced class of chlorophyte green algae, comprises five lineages that form two major clades (Chlamydomonadales + Sphaeropleales and Oedogoniales + Chaetopeltidales + Chaetophorales). The four complete chloroplast DNA (cpDNA) sequences currently available for chlorophyceans uncovered an extraordinarily fluid genome architecture as well as many structural features distinguishing this group from other green algae. We report here the 521,168-bp cpDNA sequence from a member of the Chaetopeltidales (Floydiella terrestris), the sole chlorophycean lineage not previously sampled for chloroplast genome analysis. This genome, which contains 97 conserved genes and 26 introns (19 group I and 7 group II introns), is the largest chloroplast genome ever sequenced. Intergenic regions account for 77.8% of the genome size and are populated by short repeats. Numerous genomic features are shared with the cpDNA of the chaetophoralean Stigeoclonium helveticum, notably the absence of a large inverted repeat and the presence of unique gene clusters and trans-spliced group II introns. Although only one of the Floydiella group I introns encodes a homing endonuclease gene, our finding of five free-standing reading frames having similarity with such genes suggests that chloroplast group I introns endowed with mobility were once more abundant in the Floydiella lineage. Parsimony analysis of structural genomic features and phylogenetic analysis of chloroplast sequence data unambiguously resolved the Oedogoniales as sister to the Chaetopeltidales and Chaetophorales. An evolutionary scenario of the molecular events that shaped the chloroplast genome in the Chlorophyceae is presented.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20624729 PMCID: PMC2997540 DOI: 10.1093/gbe/evq014
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FGene map of the Floydiella chloroplast genome. Genes are colored according to their function. Coding sequences on the outside of the map are transcribed in a CW direction. Introns are represented by open boxes; the single intron ORF (in rrl) is denoted by a narrow, blue box. The rpoB gene consists of two separate ORFs (rpoBa and rpoBb) that are not associated with sequences typical of group I or group II introns; the rpoBb fragment contains the Fte RPB2 intein. The three ORFs display sequence similarity with group I intron-encoded HNH homing endonucleases. tRNA genes are indicated by the one-letter amino acid code followed by the anticodon in parentheses (Me, elongator methionine; Mf, initiator methionine).
General Features of Floydiella and Other Chlorophycean cpDNAs
| OCC clade | CS clade | ||||
| Oedogoniales | Chaetopeltidales | Chaetophorales | Chlamydomonadales | Sphaeropleales | |
| Feature | |||||
| Size (bp) | |||||
| Total | 196,547 | 521,168 | 223,902 | 203,827 | 161,452 |
| IR | 35,492 | — | — | 22,211 | 12,022 |
| SC1 | 80,363 | — | — | 81,307 | 72,440 |
| SC2 | 45,200 | — | — | 78,088 | 64,968 |
| A + T (%) | 70.5 | 65.5 | 71.1 | 65.5 | 73.1 |
| Sidedness index | 0.74 | 0.91 | 0.95 | 0.87 | 0.88 |
| Conserved genes (no.) | 99 | 97 | 97 | 94 | 96 |
| Introns | |||||
| Fraction of genome (%) | 17.9 | 4.3 | 7.9 | 6.8 | 8.6 |
| Group I (no.) | 17 | 19 | 16 | 5 | 7 |
| Group II (no.) | 4 | 7 | 5 | 2 | 2 |
| Intergenic sequences | |||||
| Fraction of genome (%) | 22.6 | 77.8 | 46.7 | 49.2 | 34.3 |
| Average size (bp) | 370 | 3,824 | 1,026 | 937 | 517 |
| Short repeated sequences | |||||
| Fraction of genome (%) | 1.3 | 49.9 | 17.8 | 15.8 | 3.0 |
Because Floydiella and Stigeoclonium cpDNAs lack an IR, only the total size of this genome is given.
Single-copy region with the larger size.
Single-copy region with the smaller size.
Conserved genes refer to free-standing coding sequences usually present in chloroplast genomes. Genes present in the IR were counted only once.
ORFs showing no sequence similarity with known genes were considered as intergenic sequences.
Nonoverlapping repeated elements ≥ 30 bp were identified as described in the Materials and Methods.
Differences between the Repertoires of Conserved Genes in Floydiella and Other Chlorophycean cpDNAs
| OCC Clade | CS Clade | ||||
| Oedogoniales | Chaetopeltidales | Chaetophorales | Chlamydomonadales | Sphaeropleales | |
| Gene | |||||
| + | − | − | + | ||
| − | − | + | + | ||
| + | + | − | − | ||
| − | − | − | + | ||
| + | + | − | − | ||
| + | + | − | − | ||
| + | − | − | − | ||
| + | − | − | − | ||
| − | + | − | − | ||
Only the genes that are missing in one or more genomes are indicated. Plus and minus signs denote the presence and absence of genes, respectively. A total of 93 genes are shared by all compared cpDNAs: atpA, B, E, F, H, I, ccsA, cemA, chlB, L, N, clpP, ftsH, petB, D, G, L, psaA, B, C, J, psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z, rbcL, rpl2, 5, 14, 16, 20, 23, 36, rpoA, B, C1, C2, rps2, 3, 4, 7, 8, 9, 11, 12, 14, 18, 19, rrf, rrl, rrs, tufA, ycf1, 3, 4, 12, trnA(ugc), C(gca), D(guc), E(uuc), F(gaa), G(gcc), G(ucc), H(gug), I(cau), I(gau), K(uuu), L(uaa), L(uag), Me(cau), Mf(cau), N(guu), P(ugg), Q(uug), R(acg), R(ucu), S(gcu), S(uga), T(ugu), V(uac), W(cca), and Y(gua).
Among all completely sequenced chlorophyte chloroplast cpDNAs, the Odeogonium genome is unique in encoding trnR(ucg). In a BlastN search against the NCBI database, this chloroplast gene revealed a best hit with the mitochondrial trnR(ucg) gene of the fern Asplenium nidus (E value = 9 × 10−18) followed by hits with numerous bacterial trnR(ucg) and trnR(acg) genes (E values ranging from 5 × 10−15 to 6 × 10−7), suggesting that the Oedogonium trnR(ucg) was acquired through horizontal transfer from a mitochondrial or bacterial donor. Interestingly, a mitochondrial origin has previously been reported for two other genes (int and dpoB) unique to the Oedogonium chloroplast (Brouard et al. 2008).
Most Abundant Repeat Families in Floydiella cpDNA
| Designation | Prototype Sequence | Size (bp) | Copy Number |
| A | ACCCGAGCAGAGCTCGGGCAAAAGCCCTTT | 30 | 141 |
| B | CGGGGCCCAAAADAGAKAAAAGGCCTGAAC | 30 | 112 |
| C | MAMGKAGYTCTTTAAAAAGCAGGGG | 25 | 94 |
| D | AAAKAGGGCTTTTTAAAAGGTTGCACCC | 28 | 91 |
| E | TTTTTTCCTTTTTTTACWAAGAAAGGGGAAAGR | 33 | 62 |
| F | GCTTTTGCCCGAGCTCTGCTTTTTAAAGAGGGT | 33 | 60 |
| G | CCTYTTAAAKAKTTCTTTAAAAAGCCCYSK | 30 | 55 |
| H | TAAAAACCCTCAGAAAGGGCTCAAATTTGCTTC | 33 | 53 |
| I | CCCCGTCCTCTTCTTTTTTTGGAAAAGAAAA | 31 | 44 |
| J | TTTTTTTCTYTTATGATAGATTYTMYCTTTT | 31 | 44 |
| K | AAAAAATGGCCCCCTCTGTTTAAAGAAGGGCY | 32 | 36 |
| L | GYKTTTTCYTTTTAAAAKAGGGCTTTTTAAA | 31 | 35 |
| M | AAATTTTTTGGGTTCAGGTTCGGGTTRCAC | 30 | 33 |
| N | AGAGGCCTTTTTAAAGAAAAAGAGCTCCGC | 30 | 29 |
| O | CCTGAACCCAAAAAATTTTAAGGTTCAGGCC | 31 | 29 |
| P | GGCCCTCACCCAAAAAATTTGAAAGTTC | 28 | 28 |
| Q | AAAAGAGGGCTTTTTCCTTTTAAAAGAGGG | 30 | 26 |
| R | AAAGGGTGCAACCCGAACCCCGTCCAAAAA | 30 | 24 |
| S | GGGCTTTTTAAAAAGCCCGCCCTCTTTTTT | 30 | 23 |
| T | GGGCCTTTCAAATTTTTTGGCCTGAACY | 28 | 23 |
| U | AACCCGAACCTTAAAATTTTTTGGGTTCGGG | 31 | 19 |
| V | GAAAAAACCCGAACRGAGTTCGGGCAGGGGCC | 32 | 18 |
| W | GGTTGCACTCCTCTCTCTTTTAAAAGRAA | 29 | 17 |
| X | TTTTCCTTTTAAAAKAGGGTGGGGTTGCAC | 30 | 16 |
| Y | AGCTCCGCCCTTCTTTTTTACAGAAAAA | 28 | 16 |
| Z | RDRAGGGCCCCTGCTTTTTAAAGAACT | 27 | 15 |
Families of nonoverlapping repeats sharing ≥ 90% sequence identities were identified as described in the Materials and Methods.
Introns in Floydiella cpDNA
| Designation | Predicted Insertion Site | Subgroup | Size (bp) |
| Group I introns | |||
| Ft. | 1769 | 1A1 | 851 |
| Ft. | 276 | 1A1 | 372 |
| Ft. | 333 | 1B | 331 |
| Ft. | 414 | 1A1 | 412 |
| Ft. | 790 | 1B | 425 |
| Ft. | 579 | 1A2 | 695 |
| Ft. | 1089 | 1A1 | 820 |
| Ft. | 508 | 1A3 | 257 |
| Ft. | 531 | 1A3 | 339 |
| Ft. | 692 | 1A1 | 256 |
| Ft. | 958 | 1A1 | 321 |
| Ft. | 1065 | 1A1 | 1725 |
| Ft. | 1766 | 1A1 | 449 |
| Ft. | 1931 | 1B | 406 |
| Ft. | 2449 | 1A1 | 375 |
| Ft. | 2500 | 1B | 381 |
| Ft. | 2511 | 1A3 | 402 |
| Ft. | 2596 | 1A3 | 431 |
| Ft. | 35 | 1C3 | 997 |
| Group II introns | |||
| | |||
| Ft. | 80 | IIA | 876 |
| Ft. | 285 | IIA | 1672 |
| Ft. | 1225 | IIB | 940 |
| | |||
| Ft. | 25 | IIB (I) | 2532 |
| Ft. | 4 | IIB (I) | 1313 |
| Ft. | 67 | IIB (I) | 2672 |
| Ft. | 120 | IIA (II) | 1598 |
Insertion sites of introns in genes coding for tRNAs and proteins are given relative to the corresponding genes in Mesostigma cpDNA, whereas those in rrs and rrl are given relative to Escherichia coli 16S and 23S rRNAs, respectively. For each insertion site, the position corresponding to the nucleotide immediately preceding the intron is reported.
Group I introns were classified according to Michel and Westhof (1990), whereas classification of group II introns was according to Michel et al. (1989). For each trans-spliced intron, the domain containing the site of discontinuity is indicated in parentheses.
An homing endonuclease of 431 amino acids with two copies of the LAGLIDADG motif is encoded in loop L9 of the Ft.rrl.2 intron.
FDistributions of introns in Floydiella and other chlorophycean chloroplast genomes. Circles denote group I introns, squares represent group II introns, and divided squares denote trans-spliced group II introns. Open symbols indicate the absence of intron ORFs, whereas filled symbols indicate their presence. Unique insertion sites, that is, sites that have not been identified in any other green plants, are denoted by colored numbers. In the last column are indicated the introns of Chlamydomonas species other than Chlamydomonas reinhardtii that have homologs in completely sequenced chlorophycean algal genomes. References for the latter introns are as follows: psaB (Turmel, Mercier, and Côté 1993), psbA (Turmel et al. 1989), psbC (Turmel, Mercier, and Côté 1993), rrs (Durocher et al. 1989; Turmel, Mercier, et al. 1995), and rrl (Turmel et al. 1991; Côté et al. 1993; Turmel, Gutell, et al. 1993; Turmel, Côté, et al. 1995). An asterisk denotes the absence of the ORF in some Chlamydomonas species. Intron insertion sites are designated as indicated in table 4. Oc, Oedogonium cardiacum; Ft, Floydiella terrestris; Sh, Stigeoclonium helveticum; So, Scenedesmus obliquus; Cr, Chlamydomonas. reinhardtii; C, Chlamydomonas species.
FConservation of ancestral and derived gene pairs in fully sequenced chlorophycean chloroplast genomes. (A) Conserved gene pairs dating back to a distant chlorophyte ancestor (3′psaJ-5′rps12) or to the last common ancestor of all green plants (all other gene pairs). (B) Conserved gene pairs that emerged during the evolution of the Chlorophyceae. For each gene pair, adjoining termini of the genes are indicated. Dark boxes indicate the presence of gene pairs with the same polarities in two or more genomes, whereas light or open boxes indicate the absence of gene pairs. A light box indicates that the two genes associated with a gene pair are found in the genome but are unlinked. An open box indicates that one or both genes associated with a gene pair are absent from the genome. Gene pairs linked by brackets are contiguous on the genome. Six categories of derived gene pairs were distinguished according to their distribution: 1) those present in all three lineages of the OCC clade (OCC), 2) those supporting a sister relationship between the Chaetophorales and Chaetopeltidales (T1), 3) those supporting a sister relationship between the Oedogoniales and Chaetophorales (T2), 4) the single gene pair supporting a sister relationship between the Oedogoniales and Chaetopeltidales (T3), 5) those present in both lineages of the CS clade (CS), and 6) the three remaining gene pairs found in some lineages of the OCC and CS clades.
FPhylogenies inferred from 69 concatenated chloroplast genes (first two codon positions) and their deduced amino acid sequences. (A) Best ML tree inferred from the amino acid data set. (B) Best ML tree inferred from the nucleotide data set. ML bootstrap support values are shown on the corresponding nodes. CS, CS clade; OCC, OCC clade; T, Trebouxiophyceae; U, Ulvophyceae.
FScenarios of gains/losses of chloroplast genomic features predicted by the three possible branching orders of the OCC lineages (T1, T2, and T3). Gains of derived gene pairs, trans-spliced rbcL introns (rbcL_i67 and rbcL_i120), and the RPB2 intein are denoted by blue symbols, whereas losses of IR, derived gene pairs and the RPB2 intein are denoted by orange symbols. Characters supporting a clade are denoted by squares, whereas homoplasic characters are denoted by triangles.
FInferred gains and losses of chloroplast genomic features during the evolution of chlorophyceans. Genomic characters were mapped on the tree identifying the Oedogoniales as sister to the Chaetophorales and Chaetopeltidales. Gains and losses of 2-state characters are indicated by blue and orange symbols, respectively. Characters supporting a clade or unique to a lineage are denoted by squares, whereas homoplasic characters are denoted by triangles. The 3-state characters related to the rps4 and rpoB gene structures are indicated by red diamonds. Ancestral gene pairs are denoted by orange dots. Each trans-spliced group II introns is designated by the name of the gene in which it resides followed by its insertion position (see fig. 2).