| Literature DB >> 25281016 |
Claude Lemieux1, Christian Otis, Monique Turmel.
Abstract
BACKGROUND: Because they represent the earliest divergences of the Chlorophyta, the morphologically diverse unicellular green algae making up the prasinophytes hold the key to understanding the nature of the first viridiplants and the evolutionary patterns that accompanied the radiation of chlorophytes. Nuclear-encoded 18S rDNA phylogenies unveiled nine prasinophyte clades (clades I through IX) but their branching order is still uncertain. We present here the newly sequenced chloroplast genomes of Nephroselmis astigmatica (clade III) and of five picoplanktonic species from clade VI (Prasinococcus sp. CCMP 1194, Prasinophyceae sp. MBIC 106222 and Prasinoderma coloniale) and clade VII (Picocystis salinarum and Prasinophyceae sp. CCMP 1205). These chloroplast DNAs (cpDNAs) were compared with those of the six previously sampled prasinophytes (clades I, II, III and V) in order to gain information both on the relationships among prasinophyte lineages and on chloroplast genome evolution.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25281016 PMCID: PMC4194372 DOI: 10.1186/1471-2164-15-857
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Gene maps of prasinococcalean chloroplast genomes. (A) Prasinococcus, (B) Prasinophyceae sp. MBIC 10622 and (C) Prasinoderma. Filled boxes represent genes, with colors denoting gene categories as indicated in the legend at the bottom the figure. Genes on the outside of each map are transcribed counterclockwise; those on the inside are transcribed clockwise. On panels A and B, thick lines in the inner rings denote the gene clusters that were found to be conserved in pair-wise comparisons of the three clade-VI genomes. In panel A, the inner rings from the inside to the outside indicate the levels of synteny between the Prasinococcus cpDNA and those of Prasinophyceae sp. and Prasinoderma, respectively, whereas the level of synteny between the latter two genomes is shown in panel B.
Figure 2Gene map of the choroplast genome. Filled boxes represent genes, with colors denoting categories as indicated in Figure 1. Genes on the outside of the map are transcribed counterclockwise; those on the inside are transcribed clockwise. The outermost inner ring indicates the positions of the IR and SC regions. Thick lines in the innermost ring represent the conserved gene clusters between Nephroselmis astigmatica and Nephroselmis olivacea cpDNAs.
Figure 3Gene maps of the and Prasinophyceae sp. CCMP 1205 chloroplast genomes. Filled boxes represent genes, with colors denoting categories as indicated in Figure 1. Genes on the outside of each map are transcribed counterclockwise; those on the inside are transcribed clockwise. The intron sequences bordering the Picocystis ycf3 exons (ycf3a and ycf3b) are spliced in trans at the RNA level. The IR and SC regions of the Picocystis genome are represented on the inner ring. The gene clusters shared by the Picocystis and Prasinophyceae sp. CCMP 1205 genomes are displayed on the ring inside the gene map of the latter genome.
General features of the prasinophyte cpDNAs compared in this study
| Taxon | Size | Introns b | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Name | Label | Total (bp) | IR (bp) | A + T (%) | Genes a | GI | GII | Intergenic c(%) | Repeats d(%) |
|
| |||||||||
|
| PCUS | 85,590 | 67.9 | 115 | 14.3 | 0.9 | |||
| Prasinophyceae sp. MBIC 106222 | MBIC | 72,423 | 62.1 | 103 | 13.4 | 1.1 | |||
|
| PRMA | 77,750 | 65.9 | 106 | 16.0 | 0.4 | |||
|
| |||||||||
|
| PYRA | 101,605 | 13,057 | 65.3 | 110 | 1 | 22.4 | 0.5 | |
|
| |||||||||
|
| MICR | 72,585 | 7,307 | 61.2 | 86e | 19.0 | 0 | ||
|
| MONO | 114,528 | 61.0 | 94 | 5 | 1 | 44.6 | 16.9 | |
|
| OSTR | 71,666 | 6,824 | 60.1 | 88 | 1 | 15.1 | 0 | |
|
| |||||||||
|
| NAST | 125,042 | 13,742 | 59.5 | 123 | 2 | 18.6 | 0.3 | |
|
| NOLI | 200,801 | 46,137 | 57.9 | 128f | 45.6 | 0.5 | ||
|
| |||||||||
|
| PYCN | 80,211 | 60.5 | 99g | 1 | 14.0 | 0 | ||
|
| |||||||||
|
| PICO | 81,133 | 10,364 | 62.7 | 114 | 1 | 9.2 | 0 | |
| Prasinophyceae sp. CCMP 1205 | 1205 | 64,335 | 63.3 | 100 | 10.0 | 0 | |||
aDuplicated genes were counted only once.
bNumber of group I (GI) and group II (GII) introns is given.
cOnly the ORFs coding for proteins of known functions or having recognized domains were considered as genes.
dNonoverlapping repeat elements were mapped on each genome with RepeatMasker [70] using the repeats ≥30 bp identified with REPuter [69] as input sequences.
eThis value is probably an underestimate because the genome sequence appears to be incomplete and missing three genes (see the legend of Figure 4).
fThe ycf20 pseudogene, which corresponds to the annotated orf111, was not counted.
gThis value differs from that reported previously [25] because an additional gene, rrf (coordinates 33313–33429 in [GenBank:NC_012097]), was identified using RNAmmer [64] in the course of this study.
Figure 4Differences between the chloroplast gene repertoires displayed by prasinophytes and the deep-branching streptophytes and . The conserved genes missing in one or more prasinophyte genome as well as the six conserved genes found in Mesostigma and/or Chlorokybus but absent from all prasinophytes are indicated in the figure; the streptophyte-specific genes are denoted by filled circles. The presence of a gene is indicated by a dark blue box and the presence of a pseudogene by a light blue box. Species names are abbreviated as in Table 1. Although rpl36, trnH (gug) and trnV (uac)) are recorded as missing in Micromonas, all three genes are probably present because three lines of evidence suggest that the genome sequence in the [GenBank:NC_012575] accession is partial and that a missing segment contains these genes: 1) the three genes are conserved in all other compared green algae, 2) given that chloroplast gene order is colinear in Ostreococcus and Micromonas, they are predicted to be contiguous and located between psbB and trnG (ucc) 3) these predicted positions correspond to the circularization endpoints of the genome assembly deposited in [GenBank:NC_012575]. Species names for prasinophytes are abbreviated as in Table 1. A total of 75 genes are shared by all compared prasinophyte cpDNAs: atpA, B, E, F, H, I, clpP, ftsH, petA, B, G, psaA, B, psbA, B, C, D, E, F, H, I, J, K, L, N, T, Z, rbcL, rpl2, 5, 14, 16, 20, 23, rpoA, C1, C2, rps2, 3, 4, 7, 8, 11, 12, 14, 18, 19, rrl, rrs, tufA, ycf1, 3, 12, trnA (ugc), C (gca), D (guc), E (uuc), F (gaa), G (ucc), I (gau), K (uuu), L (uaa), L (uag), Me (cau), Mf (cau), N (guu), P (ugg), Q (uug), R (acg), R (ucu), S (gcu), S (uga), T (ugu), W (cca) and Y (gua).
Figure 5Distribution of ancestral gene pairs among the 12 prasinophyte cpDNAs examined in this study. We selected all the gene pairs that are shared by at least two prasinophytes from distinct lineages and also by one or both of the streptophytes Mesostigma and Chlorokybus. In addition, when one of the genes in a given gene pair was missing from several prasinophyte lineages or from the two streptophytes, gene pairs conserved in a single prasinophyte lineage or missing from streptophytes were selected. The presence of a gene pair is denoted by a dark blue box; a gray box indicates that at least one gene is missing due to gene loss. The gene pairs forming larger conserved clusters are grouped and individual genes that were cleanly deleted from some of these clusters are indicated on the right of the figure. Species names are abbreviated as in Table 1.
Figure 6Relationships among prasinophyte lineages inferred using a data set of 14,382 positions assembled from 71 cpDNA-encoded proteins of 47 green plant taxa. Trees were inferred using PhyloBayes under the CATGTR + Γ4 model and RAxML under the LG4X and gcpREV + Γ4 models. In the ML analyses, the data set was partitioned by gene, with the model applied to each of the 71 partitions. The Bayesian majority-rule consensus tree is presented. Support values are reported on the nodes: from top to bottom, or from left to right, are shown the BP values for the CATGTR + Γ4 analyses, the PP values for the CATGTR + Γ4 analyses, and the BP values for the LG4X and gcpREV + Γ4 analyses. Dashes (−) indicate lower than 0.95 PP or 40% BP support values; black dots indicate that the corresponding nodes received 1.00 PP and 100% BP support values. The histograms on the left indicate the proportion of missing genes and missing sites for each taxon. The scale bar denotes the estimated number of amino acid substitutions per site.
Figure 7Relationships among prasinophyte lineages inferred using a data set of 15,549 positions assembled from 79 cpDNA-encoded proteins of 34 green plant taxa. Trees were inferred using PhyloBayes under the CATGTR + Γ4 model and RAxML under the LG4X, GTR + Γ4 and gcpREV + Γ4 models. In the ML analyses, the data set was partitioned by gene, with the model applied to each of the 79 partitions. The Bayesian majority-rule consensus tree is presented. Support values are reported on the nodes: from top to bottom, or from left to right, are shown the BP values for the LG4X, GTR + Γ4 and gcpREV + Γ4 analyses, and the PP values for the CATGTR + Γ4 analyses. Dashes (−) indicate lower than 0.95 PP or 40% BP support values; black dots indicate that the corresponding nodes received 1.00 PP and 100% BP support values. The scale bar denotes the estimated number of amino acid substitutions per site.
Figure 8Analysis of the -spliced group II intron in . (A) Potential intron secondary structure modeled according to Michel et al. (1989) [66]. Exon sequences are shown in lowercase letters. Roman numbers specify the six major structural domains. Tertiary interactions are denoted by dashed lines, arrows, or Greek letterings. EBS and IBS are exon-binding and intron-binding sites, respectively. The asterisk indicates the site of lariat formation. Note that the precise position of the breakpoint within domain IV is unknown. (B) Confirmation of intron trans-splicing by RT-PCR analysis. The diagrams on the left display the genomic configuration of the Picocystis ycf3 exons (solid color), with the trans-spliced intron sequences shown as color gradients. Primer locations are indicated by numbered arrows (see Methods for primer sequences); the numbers in parentheses denote the nucleotide positions corresponding to the 5′ ends of the primers on the ycf3 coding sequence after intron removal. Coding regions shown above or below the horizontal line are transcribed to the right or to the left, respectively. PCR assays were carried out on cDNA or genomic DNA (gDNA), with the numbers above the gel lanes indicating the combinations of primers used. The amplicon derived from the PCR assay on cDNA is of the size expected if intron trans-splicing occurs to produce the ycf3 RNA. The identity of this amplicon as well as the insertion position of the intron in the ycf3 gene were confirmed by DNA sequencing. The amplicons derived from the PCR assays on gDNA have the sizes predicted by the genome map.