| Literature DB >> 24531374 |
Gisle Vestergaard1, Roger A Garrett2, Shiraz A Shah2.
Abstract
CRISPR adaptive immune systems were analyzed for all available completed genomes of archaea, which included representatives of each of the main archaeal phyla. Initially, all proteins encoded within, and proximal to, CRISPR-cas loci were clustered and analyzed using a profile-profile approach. Then cas genes were assigned to gene cassettes and to functional modules for adaptation and interference. CRISPR systems were then classified primarily on the basis of their concatenated Cas protein sequences and gene synteny of the interference modules. With few exceptions, they could be assigned to the universal Type I or Type III systems. For Type I, subtypes I-A, I-B, and I-D dominate but the data support the division of subtype I-B into two subtypes, designated I-B and I-G. About 70% of the Type III systems fall into the universal subtypes III-A and III-B but the remainder, some of which are phyla-specific, diverge significantly in Cas protein sequences, and/or gene synteny, and they are classified separately. Furthermore, a few CRISPR systems that could not be assigned to Type I or Type III are categorized as variant systems. Criteria are presented for assigning newly sequenced archaeal CRISPR systems to the different subtypes. Several accessory proteins were identified that show a specific gene linkage, especially to Type III interference modules, and these may be cofunctional with the CRISPR systems. Evidence is presented for extensive exchange having occurred between adaptation and interference modules of different archaeal CRISPR systems, indicating the wide compatibility of the functionally diverse interference complexes with the relatively conserved adaptation modules.Entities:
Keywords: CRISPR; Cas; Type I; Type III; archaea
Mesh:
Substances:
Year: 2014 PMID: 24531374 PMCID: PMC3973734 DOI: 10.4161/rna.27990
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652

Figure 1. 16S rRNA phylogenetic tree for all archaeal genomes included in the study. The tree shows the primary archaeal phyla with the total number of genomes analyzed in each phylum indicated (single genomes for a given phylum are not numbered). The putative kingdoms are color-coded on the bar on the far left. Crenarchaeota, green; Aigarchaeota, light green; Korarchaeota, yellow; Thaumarchaeota, light blue; Euryarchaeota, dark blue. Environmental habitats are color-coded on vertical lines to the right of the kingdom bar: hydrothermal, orange; marine, blue; wetland sediment, brown, and hypersaline lake, pink. For each phylum, the Type I and Type III CRISPR subtypes are color-coded on the right and the total numbers of each subtype are given. The branch length ruler corresponds to a 5% difference in 16S RNA sequence.
Table 1. Archaeal CRISPR systems. Total numbers of integral CRISPR immune systems and of independent interference modules
| CRISPR system | Total adaptation + interference | Interference |
|---|---|---|
| Type I | 106 | 11 |
| Type III | 16 | 59 |
| Type I+III | 57 | 4 |
| Variants | 11 | 20 |
Table 2. Cas proteins associated with archaeal Type I and Type III CRISPR functional modules
| Function | Type-I | Type III-A | Type III-B |
|---|---|---|---|
| Adaptation | |||
| Cas1 | Cas1 | Cas1 | |
| Cas2 | Cas2 | Cas2 | |
| Cas4 | Cas4 | ||
| Cas4' | |||

Figure 2. Dendrogram and gene syntenies of archaeal adaptation modules of different CRISPR subtypes and the total numbers of identified subtypes are given on the tree. The dendrogram represents a simplified version of the full dendrogram in Figure S1 showing the different classes of adaptation modules. Gene contents (indicated by bold white numbers), gene sizes, and gene syntenies are shown for representative adaptation modules. Further, the subtypes of interference modules associated with each class of adaptation module are shown. The threshold used to classify the variants is indicated by the light blue area toward the base of the tree, and it was selected so that the resulting classes would match optimally with the associated interference subtypes, which are mainly Type I. Nevertheless, some adaptation module classes are associated with interference modules from different subtypes.
Table 3. Number and kingdom distribution of the major subtypes of archaeal Type I and Type III systems
| Type I | Type III | ||||||
|---|---|---|---|---|---|---|---|
| subtype | number | cren | eury | subtype | number | cren | eury |
| A | 69 | 53 | 14 | A | 32 | 11 | 21 |
| B | 26 | 1 | 25 | B | 49 | 39 | 9 |
| C | 2 | 0 | 2 | C | 8 | 0 | 8 |
| D | 18 | 7 | 11 | D | 16 | 14 | 1 |
| E | 5 | 0 | 5 | ||||
| G | 27 | 0 | 25 | ||||
cren, crenarchaea; eury, euryarchaea.
Table 4. Number and kingdom distribution of variant archaeal interference subtypes
| Type-1 | |||
|---|---|---|---|
| subtype | number | cren | eury |
| VI-1 | 5 | 0 | 5 |
| VI-2 | 2 | 0 | 2 |
| VI-3 | 2 | 0 | 2 |
| VI-4 | 1 | 0 | 1 |
cren, crenarchaea; eury, euryarchaea.

Figure 3. Dendrogram and gene syntenies of archaeal interference modules of different Type I subtypes where the total numbers of identified subtypes are given on the tree. Gene contents (indicated by bold numbers), gene sizes, and gene syntenies are shown for representative interference modules of each subtype. The dendrogram is a simplified version of the top half of the full dendrogram in Figure S2. The subtype threshhold is indicated by the light blue vertical line near the base of the tree. Most of the identified modules fall within already established subtypes. The new subtype I-G was separated from the earlier subtype I-B, and variant subtypes VI-I, VI-2, and VI-3 constitute exceptional subtypes with few members.

Figure 4. Dendrogram of the classified Type III subtypes. Total numbers of identified subtypes are given on the tree. Gene contents, gene sizes, and gene syntenies are shown for representative interference modules of the different subtypes. The dendrogram represents a simplified and inverted bottom half of the full dendrogram in Figure S2. Gene contents are indicated by bold numbers within genes, and for subtypes III-A and III-B, csm/cmr gene names are also given above the genes. The subtypes have distinct gene syntenies and branch before the defined threshold indicated by the light blue vertical line. Subtypes III-C and III-D are newly defined, and VIII-1 that was found only in some members of the Sulfolobales, and is therefore classified as a variant subtype. All subtypes have cas10, protein S, cas5, and multiple paralogs of cas7. asp denotes the gene of a putative aspartate protease. The subtype I-D gene cassette (in yellow) that branches at the junction of the Type III and Type I subtypes in Figure S2 is included as an outgroup.
Table 5. Accessory proteins encoded by non-core cas genes associated primarily with Type III systems
| Name | Number | Example | Description |
|---|---|---|---|
| Type III-specific | |||
| Csx1 proper | 65 | TTX_1228 | protein (~450 aa) matching TF Cas_DxTHG and CDD Csx1_III_U models |
| Csx1 superfamily (Csx1s) | 18 | TON_0318 | diverse group of proteins matching Csx1_III_U, including TF Cas02710. Does not include CasR and Csm6 |
| RecF-associated protein (RFas) | 16 | P186_0873 | diverse group of small proteins (~170 aa), always associated with Cas_RecF. |
| Cas_RecF | 12 | P186_0874 | protein with RecF domain found near crenarchaeal Type III systems. Always associated with RFas |
| Cas ABC ATPase (ABC_ATPase) | 11 | YN1551_2137 | protein with ABC ATPase domain found near crenarchaeal Type III systems. Always associated with sRRMs |
| small RRM protein (sRRM) | 12 | YN1551_2138 | diverse group of proteins (~150 aa) containing RRM domains and always accociated with Cas ABC ATPase |
| Aspartic protease (Asp_prot) | 11 | Saci_1895 | small protein (~100 aa) with retropepsin domain, associated with Type III-D and VIII-1 subtypes in Sulfolobales |
| Cmr7 | 7 | SSO1986 | protein (~200 aa) with metalloprotease domain, associated with certain variants of Type III-B subtypes |
| Csx3 | 5 | Mhar_1706 | small (~100 aa) protein matching to the TF Cas_Csx3 model. Unrelated to Csx1 |
| Membrane dipeptidase (Zn2+_pep) | 4 | MJ_1673 | small (~130 aa) protein with Zn2+ dipeptidase domain, associated with some Type III-A systems in Methanococcales |
| Mvol_05XX-fam | 5 | Mvol_0536 | five proteins spanning three families associated with Type III-B systems in Methanococcales. Also found in |
| Cas HerA helicase (HerA) | 14 | Tagg_0815 | always associated with NurA |
| Cas NurA nuclease (NurA) | 14 | Tagg_0814 | always associated with HerA |
| Csx1 minimal (Csx1m) | 13 | Pyrfu_0517 | small (~180 aa) protein containing a minimal Csx1 domain matching TF Cas_NE0113 and CDD Csx1_III_U |
| Csx1 with PIN toxin (Csx1p) | 8 | Metig_1253 | ~400 aa protein consisting of a minimal Csx1 domain fused to a PIN toxin domain |
| Tneu_1160-fam | 3 | Tneu_1160 | large protein of unknown function, associated with CRISPR systems of Thermoproteales |
| SbcC repair ATPase (SbcC) | 3 | CSUB_C0986 | large protein with SbcC domain found in |
| SbcD nuclease (SbcD) | 3 | CSUB_C0987 | protein with SbcD domain, associated with SbcC |
| Cas ABC ATPase (ABC_ATPase) | 5 | TTX_1256 | specific for Type I systems but related to corresponding protein for Type III systems |
| small RRM protein (sRRM) | 4 | TTX_1257 | specific for Type I systems but related to corresponding protein for Type III systems |
| Thermococcales specific | 53 | PF1132–1135 | 11 protein families found in Thermococcales adjacent to Type I systems regardless of subtype |
Most of the accessory proteins are specific to CRISPR systems and are not normally found encoded outside of cas gene cassettes. In addition to the most widespread Csx1-type accessory proteins, numerous accessory proteins have helicase and nuclease domains, which tend to be found associated with DNA replication, recombination, and repair. Protease domains, toxin domains, and RRM domains are also found in many proteins, while some larger proteins have non-identifiable domains.

Figure 5. Examples of Type III-A, III-B, and III-D subtypes in archaea from diverse phyla that are associated with multiple and diverse accessory protein genes. Type III gene cassettes are color-coded red and cmr and csm gene labels are given. Type I-A gene cassettes are color-coded for adaptation (light blue) and interference (yellow). Some of the subtype III-B systems are directly linked to subtype I-A systems. All core cas genes are identified by bold numbers. Non-core cas-associated genes are color-coded violet and are labeled by names above the genes (see also Table 5). Genes for putative regulatory proteins (green) and Cas6 (orange) are also indicated together with CRISPR loci (dark blue) where the number indicates the number of repeat-spacer units. In some Type III gene cassettes, the number of accessory protein genes exceeds the number of core cas genes.

Figure 6. A putative example of modular exchange of Type I adaptation and interference modules. Gene cassettes for adaptation (coded blue) and interference (coded yellow) for Methanocaldococcus vulcanius and Methanocaldococcus fervens genomes are depicted showing the adaptation modules with highly similar sequences (93% concatenated amino acid sequence identity) that are associated with three different subtypes (I-A, I-D, and I-G) of Type I interference modules. Gene contents are indicated by bold numbers within genes. R (green) denotes genes of putative transcriptional regulators and 6 (orange) indicates genes of the Cas6 RNA processing enzyme.

Figure 7. Schematic comparison of (A) a subtype I-E interference complex from E. coli with (B) a subtype III-B interference complex from Pyrococcus furiosus. The two structures share homologous protein components consistent with the related compositions of their gene cassettes. In (A), the Cas6 protein is considered to be part of the interference complex and Cas3 is essential for the interference mechanism. In (B), Cas 6 has not been shown to be part of the interference complex. Moreover, the two Cas7 proteins in green (Cmr1) and orange Cmr6 (see Fig. 4) are Cas7 paralogs. The estimated binding sites of the crRNAs are color-coded brown.