| Literature DB >> 29911924 |
Shiraz A Shah1,2, Omer S Alkhnbashi3, Juliane Behler4, Wenyuan Han2, Qunxin She2, Wolfgang R Hess4,5, Roger A Garrett2, Rolf Backofen3,6.
Abstract
A study was undertaken to identify conserved proteins that are encoded adjacent to cas gene cassettes of Type III CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats - CRISPR associated) interference modules. Type III modules have been shown to target and degrade dsDNA, ssDNA and ssRNA and are frequently intertwined with cofunctional accessory genes, including genes encoding CRISPR-associated Rossman Fold (CARF) domains. Using a comparative genomics approach, and defining a Type III association score accounting for coevolution and specificity of flanking genes, we identified and classified 39 new Type III associated gene families. Most archaeal and bacterial Type III modules were seen to be flanked by several accessory genes, around half of which did not encode CARF domains and remain of unknown function. Northern blotting and interference assays in Synechocystis confirmed that one particular non-CARF accessory protein family was involved in crRNA maturation. Non-CARF accessory genes were generally diverse, encoding nuclease, helicase, protease, ATPase, transporter and transmembrane domains with some encoding no known domains. We infer that additional families of non-CARF accessory proteins remain to be found. The method employed is scalable for potential application to metagenomic data once automated pipelines for annotation of CRISPR-Cas systems have been developed. All accessory genes found in this study are presented online in a readily accessible and searchable format for researchers to audit their model organism of choice: http://accessory.crispr.dk .Entities:
Keywords: CARF; CRISPR; accessory; ancillary; archaea; auxillary; bacteria; csx1; csx3; helicase; nuclease; protease; type III
Mesh:
Substances:
Year: 2018 PMID: 29911924 PMCID: PMC6546367 DOI: 10.1080/15476286.2018.1483685
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652
Summary of results from the current study compared to previous studies that identified accessory cas genes, divided between archaea and bacteria. The previous studies were limited to a (a) archaeal genomes [1] and (b) a representative subset of genomes [12]. The present study is more comprehensive.
| present study | Makarova 2014 | Vestergaard 2014 | |||
|---|---|---|---|---|---|
| Ar | Ba | Ar | Ba | Ar | |
| Number of genomes surveyed | 217 | 2534 | 172 | 484 | 159 |
| Number of genomes with CRISPR-Cas systems | 131 | 1131 | 50 | 133 | 124 |
| Number of genomes with Type III systems | 78 | 304 | 34 | 114 | 83 |
| Total number of Type III systems | 110 | 402 | 61 | 251 | 125 |
| Number of genomes with accessory genes | 67 | 262 | 78 | 180 | 76 |
| Total number of putative accessory genes | 248 | 734 | 190 | 454 | 239 |
| Accessory genes found outside | 0 | 0 | 104 | 152 | 0 |
| Accessory genes associated to Type III systems | 248 | 734 | 61 | 251 | 210 |
| Total number of non-CARF accessory genes found | 136 | 369 | 0 | 0 | 135 |
Figure 1.Gene maps of example Type III gene cassettes including accessory genes. Core Type III genes are drawn in red with csm/cmr numbers for Type III-A/B modules, and cas numbers for Type III-D modules. cas6 and the adaptation module genes are orange and blue respectively, also with cas gene numbers. Accessory genes are drawn in purple with the gene cluster number indicated.
List of putative accessory gene clusters conserved near Type III genetic modules which passed the Type III association score cut-off (>24). A few gene families with lower scores are also included because they have been confirmed as accessory proteins in earlier studies. For each putative accessory protein family, the cluster id, the size (i.e. number of members per cluster), and the calculated Type III association score are listed. An example (gene-) locus id is also provided for reference. Names are provided for accessory protein families identified in earlier studies [1,10,12], while those identified in the present study are indicated in bold with C3a numbers (Cas type 3 Associated). 39 of 76 putative accessory protein families are newly identified. Commonly associated Type III subtypes are listed for each putative accessory protein family. Unclassified variant subtypes are indicated by ‘III’. Most accessory gene families can function with different Type III subtypes. A predicted function is given in the last column.
| cluster | size | score | example | name | subtype | annotation |
|---|---|---|---|---|---|---|
| 1 | 116 | 73.88 | JTY_2831 | Csx1/Csm6 | A,B,D | CARF+HEPN |
| 2 | 94 | 69.25 | Selin_1039 | Csx1 | A,B,D | CARF+HEPN |
| 3 | 69 | 62.33 | FFONT_0074 | Csm6 | A,B,D | CARF+RelE |
| 5 | 64 | 59.36 | ERE_12150 | Csx19/24 | (A,B),D | core gene ( |
| 6 | 61 | 58.51 | TOPB45_1115 | Csx1 | A,B,(C,D) | CARF+HEPN |
| 7 | 51 | 42.59 | Tph_c24580 | WYL/Csx1 | A,B,D | CARF+WYL |
| 9 | 49 | 37.91 | Vdis_1148 | Csx1 | (A),B,(C)D | CARF+Nuclease |
| 11 | 41 | 35.59 | Thebr_0941 | Csx15/20 | A,B,D | peptidase |
| 17 | 31 | 24.7 | VMUT_1481 | Cas_RecF | A,B,(C,D) | SMC ATPase |
| 23 | 20 | 4.61 | CYB_0598 | Csx1 | (A),B,D | found elsewhere |
| 25 | 18 | 55.83 | Dtur_0613 | Csx3 | A,B,D | Csx3 (CARF) |
| 27 | 17 | 50.07 | Cyan10605_3521 | Csx18 | B,(D) | adaptation associated |
| 28 | 17 | 66.76 | CYB_0586 | Csx21 | 17 | Type III-D associated |
| 29 | 14 | 45.91 | CBO2177 | CorA | B,(C),D | CorA-like |
| 33 | 13 | 10.47 | M1425_0883 | Csa3 | (A),B,C | Type I-A associated |
| 35 | 12 | 61.07 | CYB_0599 | Csx3 | (A),B,D | Csx3-AAA |
| 36 | 11 | 50.71 | Nos7107_4284 | WYL | (A,B),D | HTH-WYL |
| 37 | 11 | 56.62 | SacN8_09300 | Csx26 | III,A,(B) | HNH nuclease |
| 38 | 10 | 54.35 | Mefer_0963 | A,B | unknown | |
| 42 | 9 | 74.4 | CFT03427_1632 | Csx19 | D | core gene ( |
| 43 | 9 | 58.83 | Ferpe_1557 | A,B | Lon protease | |
| 45 | 9 | 28.54 | Marme_0670 | PD-DExK | A,B,D | possible nuclease |
| 47 | 9 | 27.09 | SacN8_09290 | protease | III,A,B,D | peptidase |
| 50 | 8 | 61.07 | Swol_2530 | Csx13 | (A),B,D | TM+CARF |
| 55 | 7 | 44.44 | Nos7107_2836 | B,D | ABC permease | |
| 57 | 7 | 20.46 | Calkr_2542 | HerA | A,B,(C) | Type III coevolved |
| 58 | 7 | 33.01 | VMUT_1471 | NurA | A,B | Type III coevolved |
| 59 | 7 | 55.84 | Nos7107_2826 | A,B,D | trans-membrane (TM) | |
| 64 | 7 | 43.07 | SacRon12I_09300 | III,A | unknown | |
| 67 | 6 | 40.03 | Caur_2269 | (A),B | AAA-Csx3 | |
| 69 | 6 | 29.69 | B005_5545 | D | unknown | |
| 76 | 6 | 39 | LS215_0703 | III,A,D | unknown | |
| 77 | 6 | 29.73 | NE0116 | Csx16 | A,B,D | a.k.a. cas_VVA1548 |
| 80 | 6 | 27.48 | PTH_0706 | B,C,D | AAA+ ATPase | |
| 81 | 6 | −9.48 | YN1551_2131 | Csx1 | (A,B),D | Type I associated |
| 83 | 6 | 88.69 | VMUT_1493 | cas_RFas | A,(B) | cluster 17 associated |
| 84 | 6 | 66.57 | TTX_1229 | cas_RFas | A,(B) | cluster 17 associated |
| 87 | 6 | 74.65 | SSO1986 | Cmr7 | B | Cmr7 |
| 93 | 5 | 30.57 | Thebr_0950 | B,C,D | poss. AAA ATPase | |
| 96 | 5 | 42.25 | Mvol_0529 | Mvol_0529-fam | B | DNA binding C-ter. |
| 104 | 5 | 39.74 | Msed_1167 | Csx1 | A,B,D | CARF+PIN |
| 107 | 5 | 84.37 | B005_5544 | Csx1 | B,D | CARF |
| 108 | 5 | 78.8 | Pcal_0278 | cas_RFas | B | cluster 17 associated |
| 116 | 4 | 57.92 | Rru_A0181 | B,D | cluster 29 associated | |
| 121 | 4 | 62.04 | Desac_1715 | Csx23 | A,B,D | unknown |
| 123 | 4 | 37.09 | Caur_2303 | B,D | unknown | |
| 124 | 4 | 44.42 | PCC7418_1341 | B | unknown | |
| 139 | 4 | 28.91 | Tthe_0931 | A,B,D | HKD+Snf2 | |
| 146 | 4 | 68.43 | SSO1421 | D | DNA binding HTH | |
| 152 | 3 | 25.85 | Cylst_6373 | C | Type III-C specific | |
| 156 | 3 | 28.1 | Metin_0159 | B,C | oxidoreductase | |
| 159 | 3 | 27.06 | Calkr_2554 | A,B,C | methyl transferase | |
| 162 | 3 | 40.43 | SYO3AOP1_0653 | III,A,B | unknown | |
| 166 | 3 | 32.1 | Hbut_0719 | B,D | poss. crRNA proc. | |
| 168 | 3 | 31.1 | slr7083 | B | unknown | |
| 173 | 3 | 70 | Adeg_0988 | Csx21 | III,B,D | unknown |
| 174 | 3 | 74.3 | Adeg_0809 | B,C | nucleotidyl trans. | |
| 178 | 3 | 37.97 | Csac_0071 | A,B,D | putative invertase | |
| 181 | 3 | 25.55 | Tpen_1359 | A,B,C | Diadenylate cyclase | |
| 186 | 3 | 28.77 | RoseRS_2594 | A,C,D | C-3ʹ,4ʹ desaturase | |
| 187 | 3 | 39 | Dole_0738 | A,B | NERD+UvrD | |
| 189 | 3 | 50.73 | Hoch_5581 | B,D | kinase | |
| 193 | 3 | 43.78 | Thebr_0949 | B,D | poss. alt. Cas3 | |
| 194 | 3 | 26.83 | Mcup_1132 | B,D | AA transporter | |
| 196 | 3 | 46.29 | TREPR_1099 | III,A,B | unknown | |
| 198 | 3 | 40.67 | Mrub_1490 | B | TPR protein | |
| 205 | 3 | 67.5 | MLP_11360 | D | unknown | |
| 206 | 3 | 29.39 | Ndas_2980 | D | TAP-like protein | |
| 211 | 3 | 51.67 | Pars_1112 | cas_RFas | B | unknown |
| 212 | 3 | 39.67 | RoseRS_0371 | C,D | SNc+LTD | |
| 214 | 3 | 30.16 | Rcas_4246 | B,C | GlgB | |
| 216 | 3 | 25.6 | SSA_1254 | A,B | adaptation associated | |
| 222 | 3 | 44 | Vpar_1800 | A,D | unknown | |
| 223 | 3 | 43.33 | YG5714_0635 | III,D | unknown | |
| 227 | 3 | 36.73 | Sfum_1354 | PrimPol | A,B | adaptation polymerase |
| 230 | 3 | 28.76 | TVNIR_1454 | B | RecX family protein |
The ten lowest ranking genes in terms of significance of association to Type III modules, based on the Type III association score. Even though these gene clusters are often found near genomic Type III systems, they are inferred not to bear any functional link with them, and, therefore, were not considered accessory.
| cluster | size | score | example locus | domain matches | comments |
|---|---|---|---|---|---|
| 209 | 3 | −88.9 | CFBS_2969 | InsA | putative transposase |
| 82 | 6 | −72.2 | UDA_2812 | rve/Tra5 | integrase/transposase |
| 144 | 4 | −66.3 | Pcal_0278 | COG5552/DUF2277 | unknown function |
| 12 | 39 | −60.4 | SSO1518 | Tnp 1/InsE | transposase |
| 54 | 8 | −46.7 | MAF_28180 | Int C/XerC | tyrosine recombinase |
| 48 | 9 | −42.6 | YN1551_2375 | InsG/IS 4 | transposase |
| 169 | 3 | −33.3 | Cagg_3808 | Ftn | nonheme ferritin |
| 8 | 51 | −33.0 | Smar_0302 | PotA/MalK | ABC transporter |
| 153 | 3 | −32.6 | Athe_0143 | AmyAc MTase | alpha amylase |
| 154 | 3 | −30.33 | Athe_0142 | YqiI | trans-membrane |
Figure 2.Gene map of the Synechocystis sp. PCC 6803 pSYSA Type III-Bv module with flanking genes. Core Type III genes are coloured red and denoted with Cmr numbers. Adaptation module genes are coloured blue and marked with Cas protein numbers. Accessory genes found in this study are coloured purple and indicated with cluster numbers. Genes deleted in mutants subject to the interference assay are marked by a dot below. None of the individual deletions resulted in a marked decrease in interference activity.
Figure 3.Experimental investigation of accessory gene knock-out mutants in Synechocystis sp. PCC 6803. a) Interference activity of subtype III-Bv-associated accessory gene knock-out mutants. Conjugation efficiencies are calculated by the ratio of the plasmid target (pT) to the plasmid non-target (pNT, control). The conjugation efficiency of the control plasmid was set to 1 and the number of colonies for the plasmid targets was normalized to the control plasmid. Data points represent mean values and standard deviations were calculated for three independent biological replicates. The accessory gene slr7080 is included in cluster 35, slr7083 in cluster 168 and slr7088 belongs to cluster 11.‘s’, invader plasmid with protospacer in sense orientation, ‘as’, in antisense orientation, ‘c’, control plasmid without protospacer. b) Northern hybridization using a radioactively labelled transcript probe spanning CRISPR3 spacers 1–4. The knock-out mutant ∆slr7088 shows decreased CRISPR3 crRNA accumulation compared to the wildtype (WT) strain. After normalization against 5S rRNA the WT clones accumulated in average 1.24 times more CRISPR3 crRNA than the slr7088 deletion mutants. A representative of two independent experiments is shown.
Figure 4.Three subtrees from a neighbor-joining tree of all CARF proteins found in this study. Gene (locus) ids are shown along with the subtype of the associated Type III system. The branch length corresponding to a 25 % dissimilarity at the amino acid sequence level is indicated with the ruler. Closely similar CARF proteins can associate with, and cofunction with, different subtypes of Type III systems. SisCsx1 is included in the final subtree (in bold).
Comparisons of reaction efficiencies of different CARF proteins with respect to RNase activity, and for Type III effector complexes with respect to cOA synthesis. Substrates comprise RNA for CARF proteins and ATP for Type III complexes. Reaction times, substrate and effector concentrations shown are the minimum required for digestion of 50% of the RNA substrate or for converting more than 80% of ATP into cOA. The concentration of cOA required for activation of CARF proteins differs by several orders of magnitude, as does the efficiency with which the Type III complexes synthesize cOA. The c-A6 required to activate StCsm’s cognate Csm6 protein comprises a minor species (only 0.5% total cOA synthesized), with the major species being c-A3. In contrast, almost all cOA synthesized by SisCmr and EiCsm* was of the type required by cognate CARF proteins. Table contents were compiled from published data [23,24]. EiCsm* contained dEiCsm3, a nuclease-dead mutant protein. The SisCsx1 and SisCmr data were produced in the Copenhagen laboratory (WH, QS unpublished).
| SisCsx1 | TtCsm6 | StCsm6 | StCsm6’ | EiCsm6 | SisCmr | StCsm | EiCsm* | |
|---|---|---|---|---|---|---|---|---|
| class | Csx1 | Csm6 | Csm6 | Csm6 | Csm6 | III-B | III-A | III-A |
| active cOA species | c-A4 | c-A4 | c-A6 | c-A6 | c-A6 | c-A4 | c-A6 | c-A6 |
| effector conc (nM) | 100 | 10 | 0.1 | 1 | 0.15 | 10 | 200 | 160 |
| Substrate | 520 nM | 10 nM | 10 nM | 10 nM | 40 nM | 100 µM | 50 µM | 500 µM |
| cOA (nM) | 20 | 500 | 0.5 | 5 | 5 | ~ 70% | 0.5 % | ~ 100 % |
| reaction time (min) | 20 | 30 | 30 | 30 | 4 | 20 | 10 | 30 |