| Literature DB >> 34593010 |
Jochem N A Vink1,2, Jan H L Baijens1,2, Stan J J Brouns3,4.
Abstract
BACKGROUND: The adaptive CRISPR-Cas immune system stores sequences from past invaders as spacers in CRISPR arrays and thereby provides direct evidence that links invaders to hosts. Mapping CRISPR spacers has revealed many aspects of CRISPR-Cas biology, including target requirements such as the protospacer adjacent motif (PAM). However, studies have so far been limited by a low number of mapped spacers in the database.Entities:
Mesh:
Year: 2021 PMID: 34593010 PMCID: PMC8482600 DOI: 10.1186/s13059-021-02495-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Spacer targets found with BLAST. A Computational pipeline for finding spacer targets. Targets of 72,099 spacers were found using blastn and filtered based on the fraction of spacer nucleotides matching a target sequence (see the “Methods” section). B Venn diagram of spacers with matches in the NCBI nucleotide database vs metagenomic databases. C Plotted is the number of unique spacers (total 72,099) for which a match was found. Generally, spacers < 4 mismatches fall within > 90% identity threshold and are selected directly, and spacers with 4 or more mismatches generally within the > 80% and < 90% threshold and were selected in case another spacer from the same genus targeted the same sequence. D Number of sequences targeted by each spacer. Due to redundancy in the datasets, some of these sequences can be identical. E Fraction of spacers with hits for the ten genera with the highest and ten genera with the lowest fraction of hits. Only genera with at least 500 spacers are shown. F Number of spacers per subtype. The subtype of a spacer was predicted based on the similarity of the repeat sequence to repeats with a known subtype (see the “Methods” section). G Fraction of spacers with hits per subtype
Fig. 2PAM determination of repeat clusters. A Sequence logos of upstream flank of hits to spacers from type I repeat clusters. Sequence logos of protospacer flanking regions per repeat cluster. Y-axes show the information content per nucleotide position. The label includes the subtype of the repeat cluster and a representative genus in which this repeat cluster is found. The PAM of the I-E and I-F repeat clusters depicted here has been determined previously in model systems containing the same repeat [49, 50]. B The same as A but for downstream flanks of spacers from type II repeat clusters. The PAM of the type II-A (Streptococcus) and type II-C (Listeria) systems has been previously described in model systems that are closely related to the strains studied here [51, 52]. C The same as A but for upstream flanks from type III repeat clusters. D Frequency of PAM-determined repeat clusters with more than 25 hits. Nucleotide positions were considered part of PAM with a bitscore of at least 0.4 and 10 times above the median bitscore of the 23 nucleotides surrounding the hits. PAM size was at least 2 nucleotides. E Frequency of PAM-determined repeat clusters for type III systems that contain Cas1-2 vs type III systems that lack Cas1-2. Additional file 5 contains the PAM for each strain-subtype combination (Additional file 2)
Unique type I PAM sequences. Table of all unique type I PAMs found for the different subtypes and representative genera that contain the repeat cluster for which each PAM was determined. For previously described PAMs, a reference ID has been added which correspond to the following: 1 [60], 2 [61], 3 [62], 4 [63], 5 [45], 6 [64], 7 [65], 8 [46], 9 [14], 10 [66], 11 [67], 12 [49], 13 [68], 14 [69], and 15 [70]
| Type | PAM | Genus | Ref |
|---|---|---|---|
| ATG | |||
| CCN | 2 | ||
| TTA | |||
| TCN | 3 | ||
| ATN | |||
| CCA | |||
| CCN | 4 | ||
| CCT | 4 | ||
| TAC | |||
| TCA | |||
| TCN | |||
| TTA | |||
| TTC | |||
| TTN | |||
| TTTA | |||
| TTG | |||
| CTN | |||
| CCN | |||
| CTT | |||
| TTC | 8 | ||
| TTN | |||
| TTT | |||
| GCN | |||
| GGTG | |||
| GTN | 9 | ||
| GTT | |||
| GTG | 10 | ||
| AAC | 11 | ||
| AAG | 12 | ||
| AAN | |||
| AAT | |||
| AAA | 13 | ||
| AC | |||
| AG | |||
| AWG | |||
| ACC | |||
| CC | 14 | ||
| CCA | |||
| TAC | |||
| TAN | 11 | ||
| TTN | 15 | ||
| AAN | |||
| TTC |
Fig. 3Relationship between repeat and PAM sequence. A Schematic of the analysis of PAM and repeat sequence. The nucleotide identity of the PAM in each position is compared to the nucleotide of the repeat. B PAM nucleotide frequency for type I repeats. For each given repeat nucleotide position (indicated with colored boxes), the PAM nucleotide (pie chart) for each unique PAM-repeat combination of our database. The number of occurrences is indicated above the pie chart (n). C The frequency of matches (red) and mismatches (gray) between the PAM and the corresponding repeat nucleotide for each position in relation to the spacer. For type II, the positions are compared on the other side of the spacer
Fig. 4Template and coding strand targeting of spacers. A Schematic representation of a spacer targeting the template strand and a spacer targeting the coding strand inside an ORF. Spacers targeting the coding strand are also able to base pair with and target transcribed RNA. B Fraction of Escherichia spacers targeting template (blue) and coding (orange) strand by subtype. C Fraction of Moraxella spacers targeting template and coding strand by subtype. D Fraction of spacers targeting template and coding strand for type I and type IV subtypes. E Fraction of spacers targeting template and coding strand for type II and type III subtypes. F Fraction of spacers targeting template and coding strand for type I. Spacers are grouped based on which other types of Cas effector genes are present in the genome. G The same as F but for type II spacers. The significance of strand bias is calculated with a binomial test, and a p-value < 0.01 is indicated with an asterisk. Additional file 2 contains the strand targeted of each spacer and allows to extract the strand bias for each taxon
Fig. 5Different organizations of subtypes containing compatible spacer sequences. A Pie chart of the frequency of genomes each category of organization, based on the subtype combination involved. The total number of genomes for which this category was found (n) is noted in each chart (n). B–D Genome representations of the examples for the different organization categories. B Type I-type I compatibility. C Type I-type III compatibility (different repeat sequences). D Type I-type III compatibility (same repeat sequences). Genes involved in interference (blue) and adaptation (red) are shown for the different subtypes within the genome. PAM logo and strand bias of each associated repeat cluster is depicted below the genomic representations