| Literature DB >> 18587153 |
Nicolas J Tourasse1, Anne-Brit Kolstø.
Abstract
Group I and group II introns are different catalytic self-splicing and mobile RNA elements that contribute to genome dynamics. In this study, we have analyzed their distribution and evolution in 29 sequenced genomes from the Bacillus cereus group of bacteria. Introns were of different structural classes and evolutionary origins, and a large number of nearly identical elements are shared between multiple strains of different sources, suggesting recent lateral transfers and/or that introns are under a strong selection pressure. Altogether, 73 group I introns were identified, inserted in essential genes from the chromosome or newly described prophages, including the first elements found within phages in bacterial plasmids. Notably, bacteriophages are an important source for spreading group I introns between strains. Furthermore, 77 group II introns were found within a diverse set of chromosomal and plasmidic genes. Unusual findings include elements located within conserved DNA metabolism and repair genes and one intron inserted within a novel retroelement. Group II introns are mainly disseminated via plasmids and can subsequently invade the host genome, in particular by coupling mobility with host cell replication. This study reveals a very high diversity and variability of mobile introns in B. cereus group strains.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18587153 PMCID: PMC2504315 DOI: 10.1093/nar/gkn372
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 2.Predicted secondary structures of the group I introns inserted within the nrdE gene of B. cereus group strains. The figure shows that the introns belong to different structural classes and are thus from multiple origins. Structure models were predicted using MFOLD (86) and redrawn according to the format defined in (87) using RnaViz (88). Intron sequences are in uppercase letters and exon nucleotides are in lowercase letters. Base pairs are linked by dots. Labels P1 to P9 indicate the group I intron domains. The P1 stem represents the internal guide sequence (IGS) used for recognition of the 5′ splice site. Nucleotides involved in formation of the P10 pairing for 3′ splice-site selection are boxed. Intron-encoded HEG ORFs are not included. Base numbering does not include full HEG sequences, however it includes ORF remnants. Bacillus cereus group host strains are given in parentheses after the intron names (other strains sharing the introns are listed in Table 2). Note that the nrdE-IVS3 intron previously reported as belonging to the A2 subclass (49,54) has been reclassified here in the A3 subclass mainly due to the presence of a single P9 stem followed by a short 3′-end.
Figure 5.Predicted secondary structures of the group I introns inserted within the TMP gene of B. cereus group prophages. The figure illustrates intron adaptation to specific target sites. (A and B) Similar introns inserted in different sites (IVS1 and IVS2) within TMP; (C and D) different introns inserted within the same site (IVS3). Introns are represented as described in the legend to Figure 2 and, in addition, the P1 domain and bases forming the P10 pairing (boxed) involved in splice-site recognition are shaded in gray. The P1 and P10 pairings are different between the IVS1 and IVS2 introns, while they are identical between IVS3a and IVS3b. Bacillus cereus group host strains are given in parentheses after the intron names (other strains sharing the introns are listed in Table 2).
Figure 6.Predicted secondary structures of selected group II introns from B. cereus group bacteria, illustrating various intron classes found in these organisms. Bacillus cereus group host strains are given in parentheses after the intron names (other strains sharing B.c.I10 are listed in Table 2). Roman letters I–VI indicate the six functional RNA domains. The intron-encoded multifunctional RT ORF, located within domain IV, is not included, thus base numbering does not include the ORF sequence. Intron sequences are in uppercase letters and exon nucleotides are in lowercase letters. Base pairs are linked by dots. Potential exon-binding sites (EBS1, EBS2 and EBS3 or δ′) and their corresponding intron-binding sites (IBS1, IBS2 and IBS3 or δ) involved in base-pairings used for splice-site recognition are boxed. Note that the δ–δ′ pairing in class A introns is analogous to the EBS3–IBS3 pairing in other classes and that there is no EBS2–IBS2 pairing in class C elements.
Strain information and numbers of full-length group I and group II introns in B. cereus group genomes
| Species/strain | Origin/source | Genome status | GenBank accession number | GroupI introns | Group II introns |
|---|---|---|---|---|---|
| Cow (Texas, USA) | Finished | AE017334, | 3 | 2 | |
| AE017336 (pXO1), | |||||
| AE017335 (pXO2) | |||||
| Farm (USA, 1916) | Finished | AE016877, | 0 | 1 | |
| AE016878 (pBClin15) | |||||
| Dairy cheese (Canada, 1930) | Finished | AE017194, | 2 | 7 | |
| AE017195 (pBc10987) | |||||
| Human, sputum and blood | 12X shotgun | AAEK00000000, | 1 | 3 | |
| (Louisiana, USA, 1994) | DQ889680 (pBCXO1), | ||||
| DQ889679 (pBC210) | |||||
| Human, severe tissue necrosis | Finished | AE017355, | 2 | 0 | |
| (Yugoslavia, 1995) | CP000047 (pBT9727) | ||||
| Sewage (Israel) | 8X shotgun | AAJM01000000 | 0 | 2 | |
| Dead zebra (Namibia, 1996) | Finished | CP000001, | 5 | 5 | |
| CP000040 (pE33L466), | |||||
| CP000041 (pE33L5), | |||||
| CP000042 (pE33L54), | |||||
| CP000043 (pE33L8), | |||||
| CP000044 (pE33L9) | |||||
| Forest soil (France, 2000) | Finished | CP000903, | 5 | 0 | |
| CP000904 (pBWB401), | |||||
| CP000905 (pBWB402), | |||||
| CP000906 (pBWB403), | |||||
| CP000907 (pBWB404) | |||||
| Suspected bioweapons facility (Iraq) | Finished | CP000485, | 1 | 1 | |
| CP000486 (pALH1) | |||||
| Vegetable puree (France, 1998) | Finished | CP000764, | 1 | 2 | |
| CP000765 (pBC9801) | |||||
| Human, vomit (UK, 1972) | 8X shotgun | AAUF00000000, | 4 | 12 | |
| DQ889676 (pCER270) | |||||
| Human, periodontitis (Norway, 1995) | 8X shotgun | AAUE00000000, | 4 | 4 | |
| DQ889677 (pPER272) | |||||
| Human, eye (Oklahoma, USA) | 8X shotgun | ABDA00000000 | 2 | 1 | |
| Human, stool (Nebraska, USA, 1996) | 8X shotgun | ABDJ00000000 | 3 | 4–5 | |
| Human, blood and pleural fluid (1969) | 8X shotgun | ABDI00000000 | 1 | 0 | |
| Spice mix (Norway, 1999) | 8X shotgun | ABDK00000000 | 2 | 0 | |
| Dust (Texas, USA, 2003) | 8X shotgun | ABDM00000000 | 2 | 4 | |
| Food (USA, 1997) | 8X shotgun | ABDL00000000 | 3 | 13–14 | |
| Soil | 8X shotgun | ABCZ00000000 | 2 | 0 |
aThe first accession number given is that of the chromosome for finished genomes or the full set of sequence contigs for unfinished genomes. Following numbers are for complete plasmids, when applicable (plasmid names are given in parentheses).
bIndividual intron information and sequence data will be deposited in the GISSD (http://www.rna.whu.edu.cn/gissd/index.html) and the group II intron database (http://www.fp.ucalgary.ca/group2introns/), and are also available from the authors upon request.
cEleven B. anthracis strains have been sequenced (see http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=1392). Since B. anthracis isolates are highly monomorphic and virtually identical, data for the ‘Ames Ancestor’ strain only are presented in this article.
dStrain NVH391-98 has a reduced size genome of 4.1 Mb, whereas the estimated genome size of all other strains is 5.2–5.9 Mb.
eNo gene annotation available for these genomes at the time of study.
fDue to incomplete sequence data the assignment of a few intron fragments to the same or separate elements could not be confirmed.
Full-length introns shared between B. cereus group genomes
| Intron | Strains sharing the intron | Location |
|---|---|---|
| Group I | ||
| recA | S | |
| nrdE-IVS2 | KBAB4 and NVH391-98 | S |
| nrdE-IVS3 | S | |
| nrdE-IVS4 | ATCC 10987, E33L, G9241, AH187 and H3081.97 | S |
| nrdE-IVS5 | 03BB108 and Al Hakam | S |
| nrdF-IVS6 | NVH0597-99, AH1134, AH187 and H3081.97 | S |
| TMP-IVS1 | E33L, AH820, AH187 and G9842 (2 copies) | S |
| TMP-IVS2 | S | |
| TMP-IVS3a | KBAB4 and E33L | S |
| TMP-IVS3b | KBAB4, G9842 and AH820 | S |
| tail tube | KBAB4 and B4264 | S |
| Group II | ||
| S | ||
| S | ||
| ATCC 14579, ATCC 10987 (2 copies), E33L, AH820, AH187 (3 copies) and H3081.97 | S and D | |
| ATCC 10987, AH820, AH187 and H3081.97 (2 copies in each strain) | S | |
| ATCC 10987, Al Hakam and 03BB108 | S | |
| ATCC 10987, AH187 and H3081.97 | S | |
| E33L (4 copies) and AH1134 | D | |
| AH187 (6 copies), H3081.97 (7 copies) and 03BB108 | S and D | |
| AH820 and H3081.97 | S |
aS, D, same or different host gene (or insertion site), respectively.
bThe B. cereus AH1134 recA intron encodes a full homing endonuclease gene (HEG), while only the last 39 bp of the HEG remain in the other strains.
cThe B. thuringiensis konkukian 97-27 intron lacks the HEG (only the first 72 bp and the last 66 bp remain), while the other strains carry a full HEG.
dThe B. cereus AH187 intron lacks the HEG (only the first 10 bp and the last 104 bp remain), while the other strains carry a full HEG.
eThe B.a.I1, B.a.I2, B.c.I5 and B.c.I11 introns are found exclusively on plasmids.
fOne copy on the chromosome and one on a plasmid.
gBacillus thuringiensis konkukian 97-27 and B. cereus subsp. cytotoxis NVH391-98 only carry a truncated copy of the intron. Three of the four copies in B. cereus E33L lack a full RT ORF (only the first 13 bp and the last 28 bp remain). B.c.I7 is related to the full-length intron (B.th.I1) in the pAW63 plasmid of B. thuringiensis kurstaki HD73 (60% nucleotide sequence identity).
hThe B.c.I10 intron is highly similar to the ORF-less intron (B.th.I2) in the pAW63 plasmid of B. thuringiensis kurstaki HD73 (83% nucleotide sequence identity).
Figure 1.Multiple sequence alignments of the group I intron insertion sites within the recA, nrdE and nrdF genes of sequenced B. cereus group strains. When appropriate, the sequences from other organisms were included for comparison and their GenBank accession numbers are given in the last part of the strain identifiers. For the exons, only the differences to an arbitrary reference sequence are shown. Positions identical to the reference are displayed as dots. The reference sequence is shown on the top and bottom lines of each alignment. For the introns, only the first few nucleotides at the 5′- and 3′-ends are shown and are separated by square brackets. A ‘g’ or an ‘h’ in-between the brackets indicate that the intron encodes a HEG of the GIY-YIG or H-N-H family, respectively. Complementary nucleotide sets predicted to be involved in the formation of the P1 pairing (P1-P1′) and the P10 pairing (P10-P10′) for recognition of the 5′ and 3′ splice sites, respectively, are boxed and indicated by arrows. For introns that do not start with TAA or TAG or are not inserted in-frame within the host gene, the stop codon that would terminate translation coming from the 5′ exon is in bold. The B. cereus AH1134 recA intron encodes a putative HEG located within P1. For simplicity, the HEG is not shown in full and the central part is replaced with a parenthesized asterisk, indicating that the HEG family is unknown. The predicted stop codon of the HEG is underlined. Note that the coconversion tracts in the 5′ exon of nrdE-IVS5 in B. thuringiensis Al Hakam and B. cereus 03BB108 actually extends to 145 bp; however, only the last 60 bp are shown here due to space limitations.
Sequence similarities between B. cereus group and bacteriophage group I introns
| Partially matching bacteriophage Intron | |
|---|---|
| recA | |
| nrdE-IVS3 | |
| nrdE-IVS4 | Enterobacterial phage T4 (RIR, nrdB) |
| nrdF-IVS6 | |
| TMP-IVS1 | |
| Enterobacterial phage K1E (large terminase) | |
| TMP-IVS2 | |
| Enterobacterial phage K1E (large terminase) | |
| Enterobacterial phage RB3 (RIR, nrdB) | |
| Enterobacterial phage U5 (RIR, nrdB) | |
| TMP-IVS3a | |
| Term-IVS1 | |
| Term-IVS2 | |
| Tail tube | |
aThe hits reported here are continuous matches of 80 nt or more to intron catalytic RNAs (i.e. HEG not included) obtained by a BLASTN search with increased match reward (−r 2) and no filtering for low complexity regions (-F F) using the B. cereus group introns as queries.
bThe intron host gene function is given in parentheses, when known. RIR, ribonucleotide reductase.
Figure 3.Unrooted phylogenetic tree of the sequenced B. cereus group strains. Strains are colored by source of isolation. For each isolate, the presence or absence of group I introns within the recA, nrdE and nrdF genes are indicated. The 11 B. anthracis isolates form a clonal complex and are represented here as a single lineage. The tree is based on the concatenation of the sequences conserved among the chromosomes of all strains (a total of 2 128 496 bp, gaps removed) and was built using the Neighbor-Joining method applied to a distance matrix of pairwise percentages of nucleotide differences between the sequences. All nodes in the tree have a bootstrap support of 100%, based on 1000 replicates.
Figure 4.Group I introns within the TMP gene of B. cereus group prophages. (A) Schematic representation of the distribution of group I introns within the TMP gene of B. cereus group prophages. Introns are present at three insertion sites, IVS1, IVS2 and IVS3, indicated by arrows. Identical introns are represented by the same symbol. Homologous prophages are drawn in the same color or hatching pattern. Contig names are given after the strain identifiers. (B) Multiple sequence alignment of the intron insertion sites within the TMP gene. The sequences of intron-less TMPs were included for comparison. Gene or contig names are given after the strain identifiers. Sequences are represented as described in the legend to Figure 1. The central part of the TMP gene is not shown and is replaced with empty parentheses.
Figure 7.Insertion sites and genomic distribution of the B.c.I1 and B.c.I10 group II introns in B. cereus group genomes. (A and B) Multiple sequence alignments of the insertion sites of B.c.I1 and B.c.I10, respectively. The intron-binding sites (IBS1, IBS2 and IBS3) in the exons involved in base-pairings with the complementary exon-binding sites (EBS1, EBS2 and EBS3) in the intron RNA upon splicing and reverse splicing are indicated (IBS1, IBS2 and IBS3 are boxed in black). EBS1, EBS2 and EBS3 are identical in all intron copies. For the introns, delimited by a red box, only the first few nucleotides at the 5′- and 3′-ends are shown and are separated by ‘[…]’. Positions identical in all sequences are marked with asterisks below the alignments. Multiple chromosomal intron copies are distinguished by a letter in parentheses after the strain identifiers and for plasmidic copies plasmid names are given after an underscore. For B.c.I10, the related ORF-less intron B.th.I2 from the pAW63 plasmid of B. thuringiensis kurstaki HD73 (65) has been included for comparison. (C) Circular representations of the chromosomes of the emetic B. cereus AH187 and H3081.97 strains showing the locations of the group II introns present in these strains. Since the chromosomal sequences of these strains are unfinished, pseudochromosomes were assembled using the B. anthracis Ames Ancestor chromosome as a reference (see the ‘Strain phylogeny reconstruction’ section in Supplementary Material for details). Introns inserted in the forward and reverse DNA strands are represented by red and blue arrowheads, respectively. Multiple copies of the same intron are distinguished by a letter in parentheses. OriC indicates the putative origin of replication. The circular representations were generated using CGView (89).
Figure 8.Schematic representation illustrating the B.th.I3 group II intron in B. thuringiensis israelensis ATCC 35646, which is inserted within a newly described retroelement. Predicted RT ORFs are drawn as boxes containing the gene names. The retroelement and the intron are delimited by arrows. The intron is in gray and is inserted at the catalytic site of the retroelement's RT (RBTH_06733 + RBTH_06731), corresponding to the RYADD motif at the amino-acid level. The 25-bp direct repeats flanking the retroelement are indicated.