| Literature DB >> 28698278 |
Sukrit Silas1,2, Kira S Makarova3, Sergey Shmakov3,4, David Páez-Espino5, Georg Mohr6, Yi Liu7, Michelle Davison8, Simon Roux9, Siddharth R Krishnamurthy10, Becky Xu Hua Fu1, Loren L Hansen1, David Wang10, Matthew B Sullivan9,11, Andrew Millard12, Martha R Clokie13, Devaki Bhaya8, Alan M Lambowitz6, Nikos C Kyrpides5, Eugene V Koonin3, Andrew Z Fire14.
Abstract
Cas1 integrase is the key enzyme of the clustered regularly interspaced short palindromic repeat (CRISPR)-Cas adaptation module that mediates acquisition of spacers derived from foreign DNA by CRISPR arrays. In diverse bacteria, the cas1 gene is fused (or adjacent) to a gene encoding a reverse transcriptase (RT) related to group II intron RTs. An RT-Cas1 fusion protein has been recently shown to enable acquisition of CRISPR spacers from RNA. Phylogenetic analysis of the CRISPR-associated RTs demonstrates monophyly of the RT-Cas1 fusion, and coevolution of the RT and Cas1 domains. Nearly all such RTs are present within type III CRISPR-Cas loci, but their phylogeny does not parallel the CRISPR-Cas type classification, indicating that RT-Cas1 is an autonomous functional module that is disseminated by horizontal gene transfer and can function with diverse type III systems. To compare the sequence pools sampled by RT-Cas1-associated and RT-lacking CRISPR-Cas systems, we obtained samples of a commercially grown cyanobacterium-Arthrospira platensis Sequencing of the CRISPR arrays uncovered a highly diverse population of spacers. Spacer diversity was particularly striking for the RT-Cas1-containing type III-B system, where no saturation was evident even with millions of sequences analyzed. In contrast, analysis of the RT-lacking type III-D system yielded a highly diverse pool but reached a point where fewer novel spacers were recovered as sequencing depth was increased. Matches could be identified for a small fraction of the non-RT-Cas1-associated spacers, and for only a single RT-Cas1-associated spacer. Thus, the principal source(s) of the spacers, particularly the hypervariable spacer repertoire of the RT-associated arrays, remains unknown.IMPORTANCE While the majority of CRISPR-Cas immune systems adapt to foreign genetic elements by capturing segments of invasive DNA, some systems carry reverse transcriptases (RTs) that enable adaptation to RNA molecules. From analysis of available bacterial sequence data, we find evidence that RT-based RNA adaptation machinery has been able to join with CRISPR-Cas immune systems in many, diverse bacterial species. To investigate whether the abilities to adapt to DNA and RNA molecules are utilized for defense against distinct classes of invaders in nature, we sequenced CRISPR arrays from samples of commercial-scale open-air cultures of Arthrospira platensis, a cyanobacterium that contains both RT-lacking and RT-containing CRISPR-Cas systems. We uncovered a diverse pool of naturally occurring immune memories, with the RT-lacking locus acquiring a number of segments matching known viral or bacterial genes, while the RT-containing locus has acquired spacers from a distinct sequence pool for which the source remains enigmatic.Entities:
Keywords: CRISPR; RNA spacer acquisition; cyanobacteria; deep sequencing; horizontal gene transfer; host-parasite relationship; phylogeny; reverse transcriptase
Mesh:
Substances:
Year: 2017 PMID: 28698278 PMCID: PMC5513706 DOI: 10.1128/mBio.00897-17
Source DB: PubMed Journal: mBio Impact factor: 7.867
FIG 1 Phylogeny of a representative set of reverse transcriptases encoded within CRISPR-cas loci. A maximum likelihood phylogenetic tree was reconstructed for 134 RT sequences using the FastTree program. SH (Shimodaira-Hasegawa)-like node support values calculated by the same program are shown if they are greater than 70%; node support values for key nodes are highlighted. Major well-supported distinct branches are shown by blue rectangles. Each sequence in the tree is shown with a local numeric identifier (ID) and species name; these are also provided in Table S1 (https://figshare.com/s/3a8dab8ed7138922f693) for comparison. RT protein domain architecture is coded in each sequence description as follows: Cas6_RT_Cas1 and RT_Cas1 for the respective fusions, RT for the systems with known subtypes, and NA_RT for all other cases. A typical domain or gene organization for each branch and for selected sequences is shown to the right of the tree. Independent genes are shown with distinct arrows, while fused genes are displayed as single arrows with multiple colors. The text is color-coded to denote CRISPR-Cas system subtypes as follows: III-A, dark blue; III-B, magenta; III-D, sky blue; I-E, orange. The outgroup is collapsed and is indicated by a triangle. The details for the outgroup branch are provided in Fig. S1. For the sequences that were classified previously (15), the respective groups are indicated in green.
FIG 2 Architectures of selected RT-associated CRISPR-Cas loci. For each locus, the species name, genome accession number, and respective nucleotide coordinates are indicated. Genes are shown roughly to scale; CRISPR arrays are indicated in brackets and are not shown to scale. Homologous genes are color-coded, with the exception of numerous ancillary genes, which are all shown in light green with a green outline, and unknown proteins are shown in gray. The gene names largely follow the nomenclature in reference 7, but the RAMP proteins of groups 5 and 7 are denoted gr5 and gr7, respectively. The CRISPR-Cas system subtype is indicated for the loci encoding the respective effector genes.
FIG 3 CRISPR-Cas systems in Arthrospira platensis. (A) Distribution of CRISPR-Cas systems by phylogenetic type in three sequenced reference strains of A. platensis ("Others": types I, II, IV, V, and VI). The tree at the right side of the panel shows the evolutionary relationship between type III subtypes. (B) List of CRISPR-Cas systems and arrays in type strain A. platensis NIES-39 (left panel). The approximate location in megabases (Mb) on the circular chromosome in the direction of the arrow is indicated (right panel). The cas10 cmr2 and cmr6 family genes in the III-B–RT system (but not the III-B system) show signs of mutational atrophy. (C) Alignments of CRISPR direct repeats from the various CRISPR arrays. (D) Gel image showing PCR products amplified from CRISPR arrays. The first lane shows a 25-bp DNA ladder. (E) List of Spirulina brands used in this study.
FIG 4 Sequencing of spacers from Spirulina purchased from grocery stores. (A) Numbers of unique spacers recovered from type III-D and type III-B CRISPR arrays before and after clustering. The last two columns show Chao-1 estimates of species richness/diversity (57). The Chao-1 estimate corresponds to an approximate lower bound for the total number of unique spacers in the sample. The 95% confidence lower bound for the Chao-1 estimate has also been calculated. (B) Histogram of spacer frequency before and after clustering. The last bin contains all sequences observed 100 times or more. (C) Histogram of spacer lengths for III-D and III-B spacers. (D) Saturation curves calculated for III-D and III-B spacer pools using the clustering algorithm described for panel A showing the number of sequence clusters obtained as progressively larger subsets of the spacer datasets were considered.
FIG 5 Summary of Spirulina spacer search attempts. (A and B) BLAST results matching ~2 million CRISPR spacers to reads and contigs from (A) various virome datasets and (B) Spirulina metagenomic data. The “unfiltered hits” columns show the number of matches returned at e-value stringency cutoffs based on the size of the database. The “Unfiltered alignments” column in panel B denotes the subsets of reads and contigs from the unfiltered hits that could in turn be identified in public protein sequence databases (NR) at the protein level. The “Satisfactory alignments” columns list the manually curated hits remaining after low-complexity and low-confidence matches were eliminated.