| Literature DB >> 16545108 |
Kira S Makarova1, Nick V Grishin, Svetlana A Shabalina, Yuri I Wolf, Eugene V Koonin.
Abstract
BACKGROUND: All archaeal and many bacterial genomes contain Clustered Regularly Interspaced Short Palindrome Repeats (CRISPR) and variable arrays of the CRISPR-associated (cas) genes that have been previously implicated in a novel form of DNA repair on the basis of comparative analysis of their protein product sequences. However, the proximity of CRISPR and cas genes strongly suggests that they have related functions which is hard to reconcile with the repair hypothesis.Entities:
Year: 2006 PMID: 16545108 PMCID: PMC1462988 DOI: 10.1186/1745-6150-1-7
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1Distribution of COG1518 genes and, by implication, CASS among prokaryotic lineages.
Figure 2Phylogenies of the key cas genes and organization of cas operons. (a) Phylogenetic tree for COG1518 proteins (b) Phylogenetic tree for COG1203 proteins (predicted helicase) from the CASS versions lacking COG1518 (c) Phylogenetic tree for the predicted CASS polymerase (COG1353). Prokaryotic lineages are color-coded: orange, archaea; blue, Proteobacteria; green, low-GC Gram-positive bacteria; black, other bacteria. In the operon organizations cartoons, orthologous genes are color-coded and denoted by either the predicted function or the COG number. Exclamation points denote previously undetected RAMPs. The names of species that have a reverse transcriptase gene within one of the cas operons are underlined in red. In the left panel, the distinct versions of CASS are numbered, and in the right panel, these numbers are given at tree leaves to indicate the helicase cassette(s) that co-occurs with the given polymerase cassette.
Protein components of CASS
| 1 | COG1518 | COG1518 (cas1) | All | Putative novel nuclease/integrase; Mostly α-helical protein |
| 2 | COG1343 | COG1343 (cas2), COG3512, ygbF-like; MTH324-like; y1723_N-like; | All | Small protein related to VapD, fused to helicase (COG1203) in y1723-like proteins |
| 3 | COG1203 | COG1203 (cas3) | All | DNA helicase; Most proteins have fusion to HD nuclease |
| 4 | RecB-like nuclease | COG1468 (cas4), COG4343 | All | RecB-like nuclease; Contains three-cysteine C-terminal cluster |
| 5 | RAMP | COG1688, COG1769, COG1583, COG1567, COG1336, COG1367, COG1604, COG1337, COG1332, COG5551, BH0337-like, MJ0978-like, YgcH-like, y1726-like, y1727-like | All | Belong to "RAMP" superfamily, possibly RNA-binding protein, structurally related to a duplicated ferredoxin fold (PDB: |
| 6 | COG1857 | COG1857, COG3649, YgcJ-like, y1725-like | All | α/β protein; probable enzymatic activity, possibly, a nuclease |
| 7 | HD-like nuclease | COG1203 (N-terminus), COG2254 | All | HD-like nuclease |
| 8 | BH0338 | BH0338-like MTH1090-like | All, mostly archaea and FIRM | Large Zn-finger-containing proteins, possibly, nucleases (nuclease activity has been reported for MTH1090 [75]. |
| 9 | ygcL | ygcL | Bacteria, mostly PROTEO | Large Zn-finger containing proteins; |
| 10 | COG1353 | COG1353, MTH326-like, alr1562, slr7011 | All, mostly Archaea | Putative novel polymerase; Multidomain protein with permuted HD nuclease domain, palm domain, polymerase-thumb-like domain and Zn-ribbon; MTH326-like has inactivated polymerase catalytic domain; alr1562 and slr7011 – predicted only on the basis of size, presence of HD domain, and location with RAMPs in one operon |
| 11 | COG1517/HTH | COG2462 | Archaea | Former COG2462; Fusion of COG1517-like domain to HTH-type transcriptional regulator; Possible regulator of the system expression in archaea |
| 12 | COG1421 | COG1421 | All, mostly Archaea | ~150 aa protein; Has a few motifs similar to ygcK-like; mostly α-helical protein |
| 13 | ygcK | ygcK-like | Bacteria, mostly PROTEO | ~180 aa protein; has a few motifs similar to COG1421; mostly α-helical protein |
| 14 | COG3337 | COG3337 | All, mostly Archaea | ~110 aa; mostly α-helical protein |
| 15 | COG1517 | COG1517 | All, mostly Archaea | Some are fused to HTH domain (see COG1517/HTH), some proteins have the domain duplication; structure is available (1XMX); domain appears to have a Rossmann-like fold. |
| 16 | COG3513 | COG3513 | Bacteria, mostly PROTEO | Huge protein; contains McrA/HNH-nuclease related domain and RuvC-like nuclease domain |
| 17 | PH0918 | PH0918-like | All, mostly Archaea | Specific for |
| 18 | AF1870 | AF1870-like | Archaea | Former COG3574; ~150 aa protein. |
| 19 | AF0070 | AF0070-like | Archaea | ~420 aa protein, no prediction |
| 20 | y1724 | y1724-like | Bacteria, mostly PROTEO | ~450 aa protein, no prediction |
| 21 | SPy1049 | Spy1049-like | Bacteria, mostly FIRM | ~220 aa protein, no prediction |
| 22 | TTE2665 | TTE2665-like | Bacteria, mostly CHLOR | ~130 aa protein, no prediction |
| 23 | LA3191 | LA3191-like | Few bacteria | ~650 aa, no prediction |
A – Subfamilies are named by the corresponding COG number or by a protein ID B – All indicates that the family is widespread in all major prokaryotic lineages; PROTEO – proteobacteria; FIRM – firmicutes; CHLOR – Chlorobia
Figure 3The RAMPs. (A) The conserved motifs of the RAMP superfamily and individual RAMP families. h designates a hydrophobic residue, p designates a polar residues, t designates a residue with high turn-forming propensity, and + designates a positively charged residue. (B) A ribbon model of the structure of a RAMP protein from Thermus thermophilus (PDB entry 1wj9). Two ferredoxin-like domains are rainbow-colored from N- to C-terminus such that the corresponding strands in the two each domain receive the same color. The G-rich conserved loop in the C-terminal domain is colored black, structurally disordered regions are shown by dots, α-helices and β-strands are numbered consecutively throughout the sequence from α1 to α4 and from β1 to β8.
Figure 4A ribbon model for the structure of a COG1517 protein, Vc1899 from 1xmx). The structure is rainbow-colored from N- to C-terminus such that each of the three domains is assigned a visually distinct region of the color spectrum: blue, the modified Rossmann-like fold; green, the winged helix-turn helix (HTH) domain; yellow-orange, the endonuclease-like domain. The T-turn in the HTH is colored black, a structurally disordered region is shown by dots, α-helices and β-strands are numbered consecutively throughout the sequence from α1 to α14 and from β1 to β16.
Functional and structural parallels between CASS and eukaryotic RNAi machinery
| Dicers | Helicase/RNAseIII. Processing of long dsRNA into siRNA and pre-miRNA into miRNA, involves unwinding | Helicase (COG1203) + HD nuclease (COG2254) - fused or adjacent genes, | SFII helicase + HD nuclease |
| Argonautes/slicers | Ferredoxin-fold-PAZ-PIWI – endonuclease, target degradation | RecB-family nuclease (COG1468, 4343); COG1857 – a novel nuclease? | Target degradation |
| R2D2/RDE-4 | dsRNA-binding domain, interacts with Dicer | RAMPs | Ferredoxin-fold duplication. Size-specific psiRNA-binding, pre-psiRNA-binding, other RNA-binding functions? |
| Fmr1/Fxr | RGG, KH-ssRNA-binding | RAMPs | Ferredoxin-fold duplication. Size-specific psiRNA-binding, pre-psiRNA-binding, other RNA-binding functions? |
| Tsn | Tudor, SN – RNA-binding | RAMPs | Ferredoxin-fold duplication. Size-specific psiRNA-binding, pre-psiRNA-binding, other RNA-binding functions? |
| Vig | RGG – RNA-binding | RAMPs | Ferredoxin-fold duplication. Size-specific psiRNA-binding, pre-psiRNA-binding, other RNA-binding functions? |
| RNA-dependent RNA polymerase | RdRp domain related to DdRp; 2nd-strand synthesis for siRNA production | Predicted RdRp/RT (COG1353) | Palm polymerase domain. 2nd strand synthesis for psiRNA production, reverse transcription for CRISPR formation |
Figure 5The current hypothetical model for CASS functioning and CRISPR formation. (A) The basic model of CASS functioning (B) The variant of CASS functioning involving the CASS polymerase (C) Formation of new CRISPR with unique inserts.
Genes loosely associated with CASS
| 1 | Reverse transriptase (RT) | VVA1544, PG1982, alr1468 | Fused to COG1518 in three occasions and a remnant of RT (Mbar_A1351 and MM3360) in |
| 2 | PIN-domain | alr1560, ST0017, Ava_4168 | Ribonuclease |
| 3 | COG2442 | Ava_4167 | HTH domain, component of toxin-antitoxin system, probably targeting mRNA |
| 4 | COG1432 | MS0983 | Large family of proteins, predicted to be a phosphatase or a nuclease on the basis of sequence motifs which is shared by all three domain of life. In multidomain proteins in plants it is associated with C2H2 Zn-finger domain |
| 5 | PA2117-like | MS0982, MS0989 | An enzymatic domain, that is located in an operon with restriction-modification systems or in association with a diverged helicase |
| 6 | COG3645-like | ACIAD2479 | Homologs of phage anti-repressor Ant which is known to be inhibited by an antisense RNA |
| 7 | argonaute | MK1311 | Homolog of the eukaryotic argonaute protein, that are key player in RNA guided posttranscriptional regulation by siRNA and miRNA |
| 8 | COG1598/COG4226/HicB | MCA0653, MTH321 | Probably has an RNAseH-like fold, often fused to CopG-family of transcriptional regulators; forms a conserved operon with COG1724/hicA, which has the dsRBD-like fold; possible novel toxin-antitoxin module targeting mRNA |
| 9 | PUA-domain | LIC10933 | RNA binding domain |
| 10 | 3'-5' exonuclease | LcasA01001274 | Fused to COG1343 in |
| 11 | COG1652 | TK0459 | Regulatory ATPase of AAA family fused to RecB-family nuclease; Predicted regulator of RNA metabolism |
| 12 | AbrB/MazE domain | TK0457, PAE0118 | DNA-binding domain, belongs to the same fold as MazE, which involved in toxin-antitoxin system |
| 13 | S1-domain | CaurDRAFT_2121 | Ribosomal protein S1-like RNA-binding domain, fused to RAMP domain |
| 14 | CSP-like | Rrub02003211 | Cold shock protein-like RNA-binding domain, fused to RAMP domain |
Figure 6CRISPR and RAMPs. (A) Correlation between the number of encoded RAMPs and the number of CRISPR units in prokaryotic genomes (B) Correlation between the number of encoded RAMPs and the variance of unique insert lengths of CRISPR-related spacers in prokaryotic genomes.
Rank correlations coefficients between CRISPR spacers and selected Cas proteins
| Values selected for correlation | Rank Correlation coefficienta |
| Number of spacers vs S.D. of spacer lengths | |
| Number of spacers vs number of COG1518 proteins | |
| Number of spacers vs number of RAMPs | |
| Number of spacers vs number of COG1517 proteins | |
| Number of spacers vs total number of Cas proteins minus RAMPs | |
| Number of spacers vs total number of Cas proteins minus RAMPs and COG1517 proteins | |
| Number of RAMPs vs S.D. of spacer lengths |
aAll correlation coefficients are highly statistically significant P < 10-3
A selection of CRISPR inserts homologous to phage, plasmid and prokaryotic genes
| Species | Total spacers | Phage Plasmid | sense | antisense | Other prokaryotes. |
| 413 | 22 | 9 | 12 | 4 | |
| 222 | 0 | 1 | |||
| 449 | 4 | 4 | 0 | 2 | |
| 59 | 14 | 13 | 1 | 0 | |
| 174 | 0 | 1 | |||
| 169 | 10 | 4 | 4 | 0 | |
| 185 | 0 | 1 | |||
| 89 | 3 | 1 | 1 | 0 | |
| 41 | 20 | 15 | 5 | 0 | |
| 37 | 10 | 5 | 5 | 0 | |
| 24 | 0 | 1 |
Figure 7Folding free energy distributions for the putative psiRNA precursors. (a) GC-rich psiRNA precursors compared to the corresponding shuffled sequences and miRNAs (b) AT-rich psiRNA precursors compared to the corresponding shuffled sequences and mRNAs. The X-axis: folding energy.
Figure 8Two predicted structures of putative psiRNA precursors. The unique inserts are shown in red, and the CRISPR sequence is shown in boldface.