| Literature DB >> 26139832 |
Monique Turmel1, Christian Otis2, Claude Lemieux2.
Abstract
Previous studies of trebouxiophycean chloroplast genomes revealed little information regarding the evolutionary dynamics of this genome because taxon sampling was too sparse and the relationships between the sampled taxa were unknown. We recently sequenced the chloroplast genomes of 27 trebouxiophycean and 2 pedinophycean green algae to resolve the relationships among the main lineages recognized for the Trebouxiophyceae. These taxa and the previously sampled members of the Pedinophyceae and Trebouxiophyceae are included in the comparative chloroplast genome analysis we report here. The 38 genomes examined display considerable variability at all levels, except gene content. Our results highlight the high propensity of the rDNA-containing large inverted repeat (IR) to vary in size, gene content and gene order as well as the repeated losses it experienced during trebouxiophycean evolution. Of the seven predicted IR losses, one event demarcates a superclade of 11 taxa representing 5 late-diverging lineages. IR expansions/contractions account not only for changes in gene content in this region but also for changes in gene order and gene duplications. Inversions also led to gene rearrangements within the IR, including the reversal or disruption of the rDNA operon in some lineages. Most of the 20 IR-less genomes are more rearranged compared with their IR-containing homologs and tend to show an accelerated rate of sequence evolution. In the IR-less superclade, several ancestral operons were disrupted, a few genes were fragmented, and a subgroup of taxa features a G+C-biased nucleotide composition. Our analyses also unveiled putative cases of gene acquisitions through horizontal transfer.Entities:
Keywords: Pedinophyceae; Trebouxiophyceae; genome rearrangements; horizontal transfer; introns; inverted repeat; plastid genomics; repeats
Mesh:
Substances:
Year: 2015 PMID: 26139832 PMCID: PMC4524492 DOI: 10.1093/gbe/evv130
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FPhylogenetic relationships among the 38 core chlorophytes examined in this study and total lengths of coding, intronic, intergenic, and small repeated sequences (>30 bp) in their chloroplast genomes. The presence of a large IR encoding rRNA genes is also indicated. The best-scoring ML tree that Lemieux et al. (2014a) inferred from 79 cpDNA-encoded proteins under the GTR+Γ4 model is presented. Note that intron-encoded genes were not considered as coding sequences but rather as intron sequences and that the O. solitaria, P. brevispinosa, and T. aggregata genome sequences are not complete.
GenBank Accession Numbers and Main Features of the Chloroplast Genomes Examined in This Study
| Taxon | Accession No. | A+T | Size (bp) | Genes (no.) | Introns | Repeats | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| (%) | Genome | IR | LSC | SSC | GI | GII | (%) | |||
| KM462870* | 59.7 | 94,262 | 9,926 | 68,185 | 6,225 | 106 | 0.3 | |||
| KM462867* | 66.6 | 126,694 | 16,074 | 86,619 | 7,927 | 107 | 5 | 5 | 2.1 | |
| NC_016733 | 65.2 | 98,340 | 10,639 | 70,398 | 6,664 | 106 | 0 | |||
| KM462885* | 70.0 | 169,201 | 22,061 | 87,535 | 37,544 | 112 | 6 | 5.4 | ||
| NC_012978 | 70.0 | 123,994 | 10,913 | 88,297 | 13,871 | 112 | 1 | 4.0 | ||
| KM462886* | 63.3 | 109,775 | 12,798 | 66,211 | 17,968 | 113 | 1 | 4.2 | ||
| KM462888* | 61.8 | 108,470 | 113 | 1 | 3.0 | |||||
| NC_001865 | 68.4 | 150,613 | 113 | 3 | 7.3 | |||||
| NC_015359 | 65.9 | 124,579 | 113 | 3 | 2.4 | |||||
| KM462874* | 72.0 | 117,543 | 15,891 | 77,346 | 8,415 | 105 | 8 | 11.6 | ||
| KM462881* | 67.3 | 187,843 | 18,786 | 139,317 | 10,954 | 109 | 1 | 1 | 22.7 | |
| KM462883* | 72.1 | 129,187 | 11,970 | 95,317 | 9,930 | 108 | 1 | 1 | 3.2 | |
| KM462877* | 70.5 | 132,626 | 13,730 | 95,069 | 10,097 | 109 | 2 | 1 | 5.1 | |
| FJ968739 | 71.0 | >96,287 | >378 | 71,295 | 110 | 1 | 1 | 0.7 | ||
| KM462880* | 66.8 | 114,128 | 10,577 | 81,906 | 11,068 | 111 | 1 | 7.3 | ||
| KM462875* | 65.5 | >295,314 | 45,468 | >194,027 | 10,351 | 111 | 16 | 3 | 21.3 | |
| KM462873* | 68.6 | 211,747 | 112 | 5 | 19.8 | |||||
| KM462864* | 68.1 | 116,952 | 8,272 | 51,357 | 49,051 | 107 | 4 | 1 | 4.3 | |
| KM462862* | 64.9 | 306,152 | 108 | 7 | 1 | 23.1 | ||||
| “ | KM462865* | 68.5 | 167,972 | 6,835 | 121,087 | 33,215 | 110 | 5.5 | ||
| KM462868* | 68.6 | 197,094 | 10,619 | 141,677 | 34,179 | 111 | 4.0 | |||
| KM462866* | 66.6 | 236,463 | 27,336 | 141,652 | 40,139 | 111 | 20.0 | |||
| KM462869* | 68.4 | 145,947 | 6,786 | 115,976 | 16,399 | 109 | 6.8 | |||
| NC_009681 | 72.7 | 195,081 | 107 | 4 | 4.8 | |||||
| KM462872* | 60.3 | 181,542 | 28,473 | 76,371 | 48,225 | 110 | 15 | 7.1 | ||
| KM462876* | 65.3 | 158,609 | 107 | 16.7 | ||||||
| KM462882* | 64.9 | 148,459 | 107 | 3.5 | ||||||
| EU123962–EU124002 | 65.2 | >245,724 | 100 | 8 | 42.7 | |||||
| KM462861* | 69.6 | 146,596 | 112 | 3.6 | ||||||
| KM462871* | 72.2 | 156,031 | 111 | 1 | 1.4 | |||||
| KM462860* | 64.1 | 289,394 | 111 | 3 | 5 | 19.7 | ||||
| KM462863* | 58.8 | 201,425 | 110 | 6 | 1 | 23.0 | ||||
| KM462878* | 54.6 | 94,206 | 111 | 0 | ||||||
| KM462884* | 57.6 | 172,826 | 112 | 1 | 2 | 9.8 | ||||
| KM462887* | 54.2 | 134,677 | 110 | 3 | 15.1 | |||||
| NC_018569 | 42.3 | 149,707 | 114 | 4 | 0.9 | |||||
| NC_015084 | 49.2 | 175,731 | 114 | 1 | 10.6 | |||||
| KM462879* | 49.4 | 183,394 | 114 | 14 | 18.6 | |||||
aThe asterisks denote the 29 genomes sequenced by Lemieux et al. (2014a) and described here for the first time.
bIntronic genes and freestanding ORFs not usually found in green plant chloroplast genomes are not included in these values. Duplicated genes were counted only once.
cNumber of group I (GI) and group II (GII) introns is given.
dNonoverlapping repeat elements were mapped on each genome with RepeatMasker using the repeats ≥30 bp identified with REPuter as input sequences.
eBecause the Oocystis solitaria, Pleurastrosarcina brevispinosa, and Trebouxia aggregata chloroplast genomes are partially sequenced, the values reported for their sizes represent underestimates and those corresponding to other genomic features may be inaccurate.
fThe exact sizes of the O. solitaria IR and SSC regions could not be determined because the IR/SSC junction has not been identified.
gThe size of the P. brevispinosa LSC region was underestimated because this region contains a sequencing gap.
FGene organization of the large IRs in the chloroplast genomes examined in this study. Coding sequences of the rRNA genes are represented in red and, for all the IRs featuring an ancestral rDNA operon, the direction of transcription of this operon is shown by an arrow. The O. solitaria IR is not represented because its extent remains unknown. All gene maps are drawn to scale.
FGene repertoires of the chloroplast genomes examined in this study. Only the genes that are missing in one or more genomes are indicated. The presence of a standard gene is denoted by a blue box. A total of 91 genes are shared by all compared genomes that have been completely sequenced: accD, atpA, B, E, F, H, I, cemA, clpP, ftsH, petA, B, D, G, psaA, B, C, I, J, M, psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z, rbcL, rpl2, 5, 12, 14, 16, 19, 20, 23, 36, rpoA, B, C1, C2, rps2, 3, 7, 8, 9, 11, 12, 18, 19, rrf, rrl, rrs, tufA, ycf1, 3, 4, 20, trnA(ugc), C(gca), D(guc), E(uuc), F(gaa), G(gcc), G(ucc), H(gug), I(gau), K(uuu), L(uaa), L(uag), Me(cau), Mf(cau), N(guu), P(ugg), Q(uug), R(ucu), R(acg), S(gcu), S(uga), T(ugu), V(uac), W(cca), and Y(gua). Eight of these genes (petG, psbI, trnI(gau), L(uaa), P(ugg), R(ucu), S(gcu), T(ugu)) have not been identified in the partial chloroplast genome sequence of T. aggregata. Note that ycf12 (psb30) codes for a subunit of the photosystem II complex (Kashino et al. 2007).
Nonstandard Genes Identified as Freestanding ORFs in the Chloroplast Genomes Examined in This Study
| Taxon | ORF | Genomic Coordinates | Conserved Domain |
|---|---|---|---|
| 148 | 110000–109554 | DNA breaking-rejoining enzymes, C-terminal catalytic domain (cd00397) | |
| 119 | 105729–106088 | DNA breaking-rejoining enzymes, C-terminal catalytic domain (cd00397) | |
| 298 | 35212–36108 | DNA breaking-rejoining enzymes, C-terminal catalytic domain (cd00397) | |
| 154 | 277159–277623 | Integrase core domain (pfam00665) | |
| 298 | 296164–297060 | Integrase core domain (pfam00665) | |
| 200 | 274101–274703 | Putative integrase/recombinase | |
| 102 | 51790–51482 | Serine recombinase family, resolvase and invertase subfamily, catalytic domain (cd03768) | |
| 117 | 24161–23808 | Phage-associated DNA primase (COG3378) | |
| 653 | 183296–185257 | Phage/plasmid primase, P4 family, C-terminal domain (TIGR01613) | |
| 403 | 111049–112260 | Primase C terminal 1 (smart00942) | |
| 153 | 116412–116873 | DNA polymerase type-B family catalytic domain (cd00145) | |
| 328 | 93575–94561 | DNA polymerase type-B alpha subfamily catalytic domain (cd05532) | |
| 242 | 8571–7843 | Deoxyribonucleoside kinase (cd01673) | |
| 139 | 12286–12705 | Type II restriction endonuclease | |
| 214 | 109148–109792 | ||
| 123 | 99567–99938 | N-6 DNA methylase (pfam02384) | |
| 338 | 100401–101417 | N-6 DNA methylase (pfam02384) | |
| 152 | 101377–101835 | N-6 DNA methylase (pfam02384) | |
| 175 | 19229–19756 | LAGLIDADG DNA endonuclease family (pfam00961) | |
| 331 | 21453–22448 | LAGLIDADG DNA endonuclease family (pfam03161) | |
| 119 | 7868–8227 | LAGLIDADG DNA endonuclease family (pfam03161) | |
| 671 | 127765–129780 | Reverse transcriptase with group II intron origin (cd01651) | |
| Group II intron, maturase-specific domain (pfam08388) | |||
| 214 | 282883–283527 | Reverse transcriptase with group II intron origin (cd01651) |
aReported here are the freestanding ORFs larger than 100 codons that revealed similarity (E-value threshold of 1e-06) with proteins of known function and/or recognized protein domains in our BLASTP searches. Each ORF is identified by the number of amino acid residues in the encoded protein.
bThe orf123, orf338, and orf152 of Chlorella variabilis may be part of a larger ORF considering that they are contiguous on the genome sequence and all show similarity to N-6 DNA methylases.
cThese ORFs are not encoded within recognizable group I and group II intron sequences and thus appear to be free-standing.
FGene partitioning patterns of the IR-containing chloroplast genomes examined in this study. The IRs span the sequence delimited by thick vertical lines; only the IR/LSC junction was identified in the O. solitaria genome, with the sequence corresponding to the dotted lines being most likely part of the IR. Note that the gene sequences spanning the IR/SSC or IR/LSC junction are represented in the SSC or LSC region, respectively. The five genes composing the rDNA operon are highlighted in yellow. The color assigned to each of the remaining genes is dependent upon the position of the corresponding gene relative to the rDNA operon in previously reported IR-containing prasinophycean and streptophyte cpDNAs displaying an ancestral gene partitioning pattern. The genes highlighted in blue are found within or near the SSC region in ancestral genomes (downstream of the rDNA operon), whereas those highlighted in orange are found within or near the LSC region (upstream of the rDNA operon).
FExtent of gene rearrangements in the chloroplast genomes examined in this study. A signed gene-order matrix of the 91 genes shared by all compared genomes was used to predict the number of sequence reversals on each branch of the best-scoring ML tree inferred from 79 cpDNA-encoded proteins (Lemieux et al. 2014a). For comparison of branch lengths, both the genome rearrangement and protein trees are represented; the genome rearrangement tree was scaled using Ktreedist (Soria-Carrasco et al. 2007) so that its global divergence is as similar as possible to that of the protein tree. The gray circles denote the genomes containing a large IR. The partially sequenced T. aggregata chloroplast genome was not included in this analysis because it is available as multiple contigs and lacks several genes.
FDistribution of shared gene pairs in the chloroplast genomes examined in this study. Among all possible gene pairs in the signed gene-order matrix of the 91 genes common to all compared taxa, we selected those that are shared between at least five taxa. The presence of a gene pair is denoted by a blue box. A gray box refers to a gene pair in which at least one gene is missing due to gene loss. Gene pairs were organized in blocks of contiguous gene pairs (shown as alternating colors) to facilitate the identification of conserved gene clusters. The partially sequenced T. aggregata chloroplast genome was not included in this analysis because it is available as multiple contigs and lacks several genes.
FDistribution of group I and group II introns among the chloroplast genomes examined in this study. A light blue box represents an intron lacking an ORF, whereas a colored box represents an intron containing an ORF (see the color code at the bottom of the figure for the type of intron-encoded protein). Intron insertion sites in protein-coding and tRNA genes are given relative to the corresponding genes in Mesostigma cpDNA; insertion sites in rrs and rrl are given relative to the Escherichia coli 16S and 23S rRNAs, respectively. For each insertion site, the position corresponding to the nucleotide immediately preceding the intron is reported. Abbreviations: LAGLIDADG, LAGLIDADG homing endonuclease; GIY-YIG, GIY-YIG homing endonuclease; H-N-H, H-N-H homing endonuclease; RT, reverse transcriptase and/or intron maturase and/or H-N-H endonuclease.
FInferred gains and losses of chloroplast genomic features during the evolution of trebouxiophyceans. Note that conserved gene pairs could not be inferred for the T. aggregata chloroplast genome because the sequence of this genome is partial and fragmented on 41 contigs. The gene pairs corresponding to the numbered characters are listed at the bottom of the figure.