| Literature DB >> 24567305 |
Elizabeth C Ruck1, Teofil Nakov, Robert K Jansen, Edward C Theriot, Andrew J Alverson.
Abstract
Photosynthesis by diatoms accounts for roughly one-fifth of global primary production, but despite this, relatively little is known about their plastid genomes. We report the completely sequenced plastid genomes for eight phylogenetically diverse diatoms and show them to be variable in size, gene and foreign sequence content, and gene order. The genomes contain a core set of 122 protein-coding genes, with 15 additional genes exhibiting complex patterns of 1) gene losses at varying phylogenetic scales, 2) functional transfers to the nucleus, 3) gene duplication, divergence, and differential retention of paralogs, and 4) acquisitions of putatively functional recombinase genes from resident plasmids. The newly sequenced genomes also contain several previously unreported genes, highlighting how poorly characterized diatom plastid genomes are overall. Genome size variation reflects major expansions of the inverted repeat region in some cases but, more commonly, large-scale expansions of intergenic regions, many of which contain unique open reading frames of likely foreign origin. Although many gene clusters are conserved across species, rearrangements appear to be frequent in most lineages.Entities:
Keywords: chloroplast; diatoms; genomes; horizontal gene transfer; plastid
Mesh:
Substances:
Year: 2014 PMID: 24567305 PMCID: PMC3971590 DOI: 10.1093/gbe/evu039
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Culturing, Sequencing, and Assembly Information for the Eight Newly Sequenced Diatom Plastid Genomes
| Taxon | GenBank Accession | Culture Collection | Strain ID | Growth Medium | Sequencing Platform | Sequence Assembler |
|---|---|---|---|---|---|---|
| KC509524 | NCMA | CCMP1856 | F/2 | Roche 454, Illumina HiSeq | Newbler, ABySS | |
| KC509521 | NCMA | CCMP310 | F/2 | Roche 454 | Newbler | |
| KC509525 | NCMA | CCMP1797 | F/2 | Roche 454 | Newbler | |
| KC509520 | NCMA | CCMP1717 | F/2 | Roche 454 | Newbler | |
| KC509519 | CPCC | UTCC605 | COMBO | Roche 454 | Newbler | |
| KF733443 | UTEX | FD354 | COMBO | Illumina MiSeq | ABySS, Ray | |
| KC509522 | NCMA | CCMP1855 | F/2 | Roche 454 | Newbler | |
| KC509523 | NA | BCCO11 | F/2 | Illumina HiSeq | ABySS, Ray |
Note.—NCMA, Provasoli-Guillard National Center for Marine Algae and Microbiota; CCPC, Canadian Phycological Culture Centre at the University of Toronto; UTEX, The Culture Collection of Algae at The University of Texas at Austin
aEnvironmental sample, Boulder Creek, Colorado, USA, April 2011.
FSequence coverage by protein-coding, intergenic, and expanded (≥1 kb in length) intergenic regions in diatom plastid genomes. Bars are drawn proportional to the genome size and show the fraction of the genome occupied by these three sequence categories. Taxa in boldface identify genomes sequenced for this study. Phylogenetic relationships were redrawn from Theriot et al. (2010) and unpublished data. Taxa marked with a superscript “a” are dinoflagellates with diatom-derived plastids (Imanian et al. 2010).
FSize variation of the inverted repeated region (IRa) across 16 fully sequenced plastid genomes in diatoms. Colored boxes circumscribe genes in various functional categories, with those above the line transcribed on the forward strand and vice versa for genes below the line. Maps are drawn to scale, and the gray box demarcates the core rns-trnI-trnA-rnl-rrn5 gene cluster conserved across the 16 genomes. Newly sequenced taxa from this study are in boldface and the nucleotide length of IRa are in parentheses beneath each taxon name. Double arrows delimit large putatively foreign sequence insertions.
FEvolutionary patterns of pseudogenization, loss, and gain of genes in diatom plastid genomes. The matrix shows 15 genes variably present among the sequenced genomes. The presence of nuclear gene copies is almost certainly underreported for lack of nuclear genome data in most species. Taxa in boldface identify genomes sequenced for this study. Phylogenetic relationships were redrawn from Theriot et al. (2010) and unpublished data. Taxa marked with a superscript “a” are dinoflagellates with diatom-derived plastids (Imanian et al. 2010), and genes marked with a superscript “b” are of plasmid (serC) or unknown (tyrC) origin.
FGene phylogenies for acpP and tsf, both of which exhibit complex phylogenetic distributions across diatoms. Numbers are Bayesian posterior probability values. Genes located in the diatom plastid genome are marked with a circle, whereas genes in the nuclear genome are marked with a diamond. Taxa whose plastid genomes have multiple copies of the gene are boldfaced, with the −1 and −2 suffix denoting the different ortholog groups. Taxon marked with a superscript “a” is not the same Cylindrotheca strain sequenced as part of this study.
FPlastid-genome alignments for three representative diatoms. One copy of the inverted repeat region was removed from each genome prior to aligning the genomes. Colored blocks indicate gene clusters with conserved gene order across the three taxa. Blocks below black center line are inverted relative to the Leptocylindrus reference genome. Two blocks (a and b) contain the largest conserved gene clusters, consisting of the RNA polymerase genes (a) or the large ribosomal operon in the small single copy region (b).