| Literature DB >> 25998521 |
Melinda P Simmons1, Charles Bachy2, Sebastian Sudek2, Marijke J van Baren2, Lisa Sudek2, Manuel Ares3, Alexandra Z Worden4.
Abstract
Spliceosomal introns are a hallmark of eukaryotic genes that are hypothesized to play important roles in genome evolution but have poorly understood origins. Although most introns lack sequence homology to each other, new families of spliceosomal introns that are repeated hundreds of times in individual genomes have recently been discovered in a few organisms. The prevalence and conservation of these introner elements (IEs) or introner-like elements in other taxa, as well as their evolutionary relationships to regular spliceosomal introns, are still unknown. Here, we systematically investigate introns in the widespread marine green alga Micromonas and report new families of IEs, numerous intron presence-absence polymorphisms, and potential intron insertion hot-spots. The new families enabled identification of conserved IE secondary structure features and establishment of a novel general model for repetitive intron proliferation across genomes. Despite shared secondary structure, the IE families from each Micromonas lineage bear no obvious sequence similarity to those in the other lineages, suggesting that their appearance is intimately linked with the process of speciation. Two of the new IE families come from an Arctic culture (Micromonas Clade E2) isolated from a polar region where abundance of this alga is increasing due to climate induced changes. The same two families were detected in metagenomic data from Antarctica--a system where Micromonas has never before been reported. Strikingly high identity between the Arctic isolate and Antarctic coding sequences that flank the IEs suggests connectivity between populations in the two polar systems that we postulate occurs through deep-sea currents. Recovery of Clade E2 sequences in North Atlantic Deep Waters beneath the Gulf Stream supports this hypothesis. Our research illuminates the dynamic relationships between an unusual class of repetitive introns, genome evolution, speciation, and global distribution of this sentinel marine alga.Entities:
Keywords: Introner Elements; introns; marine algae; phytoplankton; polar systems; repetitive introns
Mesh:
Substances:
Year: 2015 PMID: 25998521 PMCID: PMC4540971 DOI: 10.1093/molbev/msv122
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FMolecular phylogeny of Micromonas and insertion sequences in gene homologs from cultured clades. (a) Bayesian reconstruction of the 18S rRNA gene sequences from the Mamiellophyceae and other prasinophytes, using 1,646 unambiguously aligned positions and the GTR + Γ + I model of substitution. Micromonas clades (blue) are highlighted. Clade names are designated with letters, as in Slapeta et al. (2006) and roman numerals, as in Worden et al. (2009). Differentiation of Clade E.III to Clades E1 and E2 (black labeling) was achieved herein using new data. Sequences from environmental clone library studies were included for Clade -.IV (an uncultured clade) and groups with sparse representation in culture collections, such as the E2 Clade. Other widespread Mamiellophyceae genera shown, Ostreococcus (pink) and Bathycoccus (green), also have genome-sequenced representatives used in primer design for the IE PCR study. The tree is rooted by the prasinophyte Pycnococcus-clade for display purposes. (b–e) Architecture of amplified regions of protein-encoding genes investigated in cultured Micromonas clades (table 1). Thick bars (blue) represent exons, vertical turquoise lines denote loci where introns are present (accompanied by thin horizontal intron lines) or absent (vertical line only). Thin horizontal lines represent Clade D IEs (yellow) and newly identified introns in Clade C (blue) and Clade E2 (red, purple) homologs of the Transporter. The first two Clade E2 introns (red) are highly identical (alignment under panel [e]). Note that E1 and E2 data are lacking for the ATPase, as was Actin for E2 presumably due to primer mismatches later identified using transcriptome assemblies.
Micromonas Isolates Grown and Number of Assembled Sequences Obtained from Clones for Each Gene Homolog Investigated.
| Isolate | Clade | Actin | ATPase | Transporter | Dehydrogenase |
|---|---|---|---|---|---|
| RCC299 | A | 2 | 2 | 1 | 2 |
| CCMP492 | A | 2 | 2 | 0 | 2 |
| CCMP1764 | B | 2 | 2 | 2 | 2 |
| NEPCC29 | C | 2 | 0 | 2 | 2 |
| CS222 | C | 2 | 2 | 2 | 2 |
| CCMP1195 | C | 2 | 2 | 2 | 2 |
| CCMP490 | D | 2 | 2 | 2 | 2 |
| CCMP1545 | D | 2 | 2 | 2 | 2 |
| CCMP1646 | E | 2 | 0 | 2 | 1 |
| CCMP2099 | E | 0‡ | 0 | 2 | 2 |
Note.—Clade designations based on Slapeta et al. (2006).
aRCC299 was not included in Slapeta’s analyses; therefore this assignment is based on phylogenetic analyses herein.
bThe primers produced sequences from a different predicted ABC Transporter in Clade A strain CCMP492 and in RCC299; the correct RCC299 gene homolog was obtained from the sequenced genome and the CCMP492 amplicon was discarded from further analyses.
cWhile successful for other Clade C strains, the correct ATPase homolog was not retrieved in cloned NEPCC29 sequences.
dComparison to transcript sequences, obtained later from Clade E2 isolate CCMP2099, revealed extensive primer mismatches for these genes, likely explaining unsuccessful PCR results.
IE Families and Their Distribution in the Micromonas Clades.
| This Study | Strain or Metagenomic Read | ||
|---|---|---|---|
| D-IE1 | IE1 | IEA1 | CCMP1545, CCMP490, temperate & tropical metagenomes |
| D-IE2 | IE2 | IEA2 | CCMP1545, CCMP490, temperate & tropical metagenomes |
| D-IE3 | IE3 | IEA3 | CCMP1545, CCMP490 (metagenomes not searched) |
| D-IE4 | IE4 | IEA4 | CCMP1545 (CCMP490 & metagenomes not searched) |
| Unconf. | Not reported | IEB | CCMP1545 |
| Unconf. | Not reported | IED | CCMP1545 |
| ABC-IE | Not reported | IEC, seen in RCC299 | NEPCC29, CS222, CCMP1195, CCMP1764, RCC299, temperate metagenomes |
| E2-IEt1 | Not reported | Not reported | CCMP2099, NADW, Antarctic metagenomes |
| E2-IEt2 | Not reported | Not reported | CCMP2099, NADW, Antarctic metagenomes |
Note.—Families IEB and IED reported in Verhelst et al. (2013) are considered unconfirmed; these groups have very few members that are very diverged and most are not spliced as annotated in RNA-seq data.
FThe global distribution of Micromonas introns in available metagenomes and discovery of new IE families. (a) Isolation sites for cultured Micromonas strains (circles), the sample site for environmental clone libraries generated herein (star), and sites where multiple BLASTn hits were recovered in public metagenomic data (symbols and color-codes as indicated on legend). Inset borders are color-coded to show corresponding map regions. Note that red triangles (representing E2-IEt1) lay beneath every purple triangle (E2-IEt2 sequences) and the location of the deep profile (supplementary table S3, Supplementary Material online) is not shown. (b) E2-IEt1 consensus sequence from Antarctic metagenomic reads encoding eight different proteins. (c) Aligned E2-IEt1 (and 12 exonic flanking nucleotides at each end) from Antarctic reads, including two from the same gene present in different samples (2 such examples, bottom 4 E2-IEs; excluded from [b] to avoid overrepresenting element conservation) and from the CCMP2099 Transporter gene. CCMP2099 transcript contigs (nonbold numbers) are shown beneath each DNA sequence. Regions flanking the Arctic CCMP2099 E2-IEt1.a and Antarctic metagenomic E2-IEt1.24 and E2-IEt1.25 (from different Antarctic samples) in the Transporter gene were identical, as were the E2-IEs themselves except a single “T” at different positions in E2-IEt1.24 and E2-IEt1.25 (potentially representing 454 homopolymer accuracy issues).
FIntron presence–absence patterns in Pacific Ocean environmental clones. Architecture for regions of the genes encoding (a) the putative Calcium ATPase and (b) Actin are shown. Thick bars (blue) represent exons, vertical turquoise lines denote loci where introns are present (accompanied by thin horizontal intron lines) or absent (vertical line only). Thin horizontal lines represent D-IEs (yellow, D-IEs) and a newly identified presence–absence polymorphism (green) in environmental clones similar to Clade D. ATPase Cluster B consists of six environmental sequences, whereas Actin Cluster S4 and the RSI-bearing Clade D-like type have one and two clones, respectively. (c) Nucleotide polymorphisms in the amplified region of IE-bearing ATPase homologs. Coding region (black) and D-IEs (orange) lengths are indicated above top bar and numbering below corresponds to SNP positions. The number of sequences (100% identical) in each cluster from cultures and environmental clones from spring or fall Pacific clone libraries is indicated. Dots represent identical nucleotides to those of the first sequence and variants denote other nucleotides. Only positions with polymorphisms are shown. The asterisk (orange) represents a 5′-IE (184 nt) in Env. Cluster B sequences, absent from all other ATPase sequences (variant nucleotide numbering does not include the Cluster B 5′-IE).
FSecondary structure models for Introner lariat RNAs. A sequence complementary to the 5′ splice site is found in several IE types. The feature does not appear as a conserved primary sequence element because its sequence varies to maintain pairing with the 5′ ss. 2′-5′ linkage between the branchpoint A residue and the G at the 5′-end of the intron is shown with an asterisk and bases are numbered from the beginning of the intron. Additional sequences between the 5′ splice site and the branchpoint are represented by a line, and sequences downstream from the branchpoint are not shown. (a) ABC-IE in the Transporter gene of NEPCC29. (b) ABC-IE in the Transporter gene of RCC299 (two exact copies in this genome). (c) An example D-IE3 from an NADH dehydrogenase subunit that is present in 32 identical copies in CCMP1545, with another 28 copies that have single base changes. The loop is larger than in panels (a) and (b) and other structures can form but a 5′ ss complementary sequence is present. (d) Secondary structure of the Type 1 E2-IE from the CCMP2099 Transporter gene. (e) A generalized structure for IE lariats showing the 5′ ss paired with the sequence downstream.
FProposed model for IE reverse splicing into ssDNA generated at R-loops. (a) Diagram of a stalled RNA polymerase II complex behind which an R-loop has formed by pairing of the nascent transcript with the template strand of DNA. A spliceosome that carries the lariat intron product of a recent splicing event binds to the displaced nontemplate DNA strand. RNA (red) and DNA (black) are shown along with nucleosomes (discs) and spliceosome (blue oval). The lightning bolt indicates potential for where the first step of reverse splicing (the reverse of the second step of forward splicing) might occur on the DNA. (b) Detailed description of a possible reverse splicing mechanism for IE transposition at R-loops. See text for additional details.