Literature DB >> 23492433

CRISPRTarget: bioinformatic prediction and analysis of crRNA targets.

Ambarish Biswas¹, Joshua N Gagnon, Stan J J Brouns, Peter C Fineran, Chris M Brown.

Abstract

The bacterial and archaeal CRISPR/Cas adaptive immune system targets specific protospacer nucleotide sequences in invading organisms. This requires base pairing between processed CRISPR RNA and the target protospacer. For type I and II CRISPR/Cas systems, protospacer adjacent motifs (PAM) are essential for target recognition, and for type III, mismatches in the flanking sequences are important in the antiviral response. In this study, we examine the properties of each class of CRISPR. We use this information to provide a tool (CRISPRTarget) that predicts the most likely targets of CRISPR RNAs (http://bioanalysis.otago.ac.nz/CRISPRTarget). This can be used to discover targets in newly sequenced genomic or metagenomic data. To test its utility, we discover features and targets of well-characterized Streptococcus thermophilus and Sulfolobus solfataricus type II and III CRISPR/Cas systems. Finally, in Pectobacterium species, we identify new CRISPR targets and propose a model of temperate phage exposure and subsequent inhibition by the type I CRISPR/Cas systems.

Entities: CellLine Chemical Disease Species

Keywords: CRISPR; Cas; R-loop; bioinformatics; crRNA; horizontal gene transfer; phage resistance; small RNA targets

Mesh：

Substances：

Year: 2013 PMID： 23492433 PMCID： PMC3737339 DOI： 10.4161/rna.24046

Source DB: PubMed Journal: RNA Biol ISSN： 1547-6286 Impact factor: 4.652

Introduction

The CRISPR (clustered regularly interspaced short palindromic repeats)/Cas (CRISPR associated) system has evolved to defend microorganisms against foreign invading nucleic acids, principally DNA from bacteriophages (phages), plasmids and other mobile elements (reviewed in refs. 1‒5). CRISPR/Cas systems have been identified in 47% and 86% of complete bacterial and archaeal genomes. Resistance development occurs when a short sequence is acquired from the phage or plasmid genome and added, as a new spacer, to the CRISPR arrays (reviewed in ref. 7), which consist of short repeats separated by spacers. In CRISPR systems, a CRISPR RNA (crRNA) containing a “spacer” (or guide) is generated from a longer precursor (pre-crRNA)- and incorporated into a ribonucleoprotein complex of one or more Cas proteins.,- These ribonucleoprotein complexes bind to, and trigger, the destruction of complementary DNA or RNA from invading elements.,, Typically, organisms have several CRISPR arrays containing a range of spacers with different sequences derived from previous exposure to phages and plasmids. The largest predicted bacterial array, from Haliangium ochraceum DSM 14365, has 587 spacers, only two of which are identical. Despite experimental proof that CRISPR/Cas systems target phages or plasmids,,- the targets of most spacers have not been identified. For example, of 926 spacers identified for E. coli and Salmonella, Touchon and Rocha were only able to predict the likely targets of 8%; similarly, a parallel study discovered the targets of 12% of spacers. There are many contributing reasons for the lack of identified crRNA targets. This is partly due to the relative paucity of studies that investigate the sequences of phages when compared with their abundance and genetic diversity., Furthermore, many phage sequences are not easily accessible in databases such as GenBank, but many more exist in viral metagenome or virome studies. Large proportions of phage sequences have no similarity to any known phage or other sequences. Therefore, most metagenomic data remains unannotated., For example, in a recent study, the metagenomes of phages purified from thermal ocean vents were sequenced. The method targeted lambdoid viruses and resulted in the sequencing of a new lambdoid virus; however, 45–55% of sequences had no database matches. Another study of marine viromes identified only 10% of genes related to known phages. The lack of identified crRNA targets in plasmids results from a similar dearth of sequence data relative to the their abundance and diversity. Like phages, plasmids are mobile and have mosaic sequence structures and are rapidly evolving. Recent efforts have begun to sequence populations of plasmids using metagenomics, which should start to improve this plasmid data shortage. CRISPR/Cas systems are divided into three major types (I-III), and further into subtypes (e.g., types III-A and III-B). Different types share similarities, yet can have differences, such as in crRNA generation or the nature of the target (RNA or DNA). Recent studies have begun to elucidate the process of recognition of target protospacers in the major types of CRISPR/Cas systems. From early studies, it was interpreted that exact pairing along the length of the spacer RNA was required,, but recent results indicate that some mismatches are tolerated, at least for some systems.,- For type I and II systems, protospacer adjacent motifs (PAM) are required for recognition,,,,,, and a short seed sequence within the match is required in particular subtypes.,, For type III systems, it is unclear if a seed sequence exists and no PAMs have been identified. Instead, the base-pairing potential between the 5′ repeat-derived portion of the crRNA (termed the 5′ handle) and the sequence flanking the protospacer target is important to enable interference, and disallow self-targeting for type III-A systems. CRISPR/Cas systems have similarity, yet differences, to RNAi in mammals, which can also provide protection from viral infection., RNAi utilizes miRNAs of ~20–22 bases that recognize specific mRNA targets. However, the key seed determinant is only six to eight bases. Therefore, predictive tools to discover functional binding sites have been developed that use the properties of known sites to predict new ones., The critical factor is distinguishing true sites from false positives, and there are a large number of algorithms implemented for miRNA target discovery.- A number of bioinformatic tools are available for the identification of CRISPR arrays and their spacer sequences.- In contrast, few approaches have been developed to discover the targets of CRISPR.- For predicted spacers in CRISPRdb, (November 2012) the mean length is 36 bases (range: 16–100), which is consistent with typical lengths for experimentally confirmed spacers of 24–37 bases. The input used when searching for targets using CRISPRFinder is a small number of these spacer sequences without adjacent repeats. These spacers are used for a BLAST search of the nucleotide sequence database from GenBank using the default parameters., The discovery of new spacer targets would be facilitated by tools that allow flexibility in these areas and enable searches of recent metagenomic data sets. Furthermore, the ability to score or visualize PAMs, basepairing in flanking sequences and define seed regions are not available in existing tools. These properties assist in biological interpretation of putative targets. In this study, we have incorporated known features of the CRISPR/Cas system into a target discovery tool. We have also allowed flexibility to enable the incorporation of new features, to generate testable hypotheses as to the targets of CRISPR systems.

Results

CRISPRTarget: Development of a tool for discovery of crRNA targets

The lack of tools for prediction of the protospacer targets of crRNAs led us to develop a web application called CRISPRTarget. We have summarized the current state of knowledge about the three major CRISPR/Cas types, their PAMs, handles and seed regions (Table 1) and used this information when developing CRISPRTarget. Users can provide their input as either spacers in FASTA format, or as CRISPRFinder, PILER-CR or CRISPR Recognition Tool (CRT) output files (following CRISPR prediction via one of these methods). Putative protospacer targets can be identified, following a BLASTn search of the spacer input against a number of databases or user-uploaded sequences. These databases include ACLAME genes, GenBank-nt, GenBank-Environmental, GenBank-Phage, RefSeq-Microbial, RefSeq-Plasmid, RefSeq-Viral and parts of CAMERA. Although default setting allows the sensitive detection of potential targets, users have the ability to modify the search parameters, such as E-value, word size and penalties for gaps and match/mismatch. This flexibility enables the stringency of targets to be adjusted, depending on the user requirements. Either on the initial input screen, or following the BLAST search, targets can be displayed and scored for flanking sequences, PAMs and filtered by exact matching seed regions. These are important parameters when considering biological details about the predicted target, such as what type of system/CRISPR-type is involved. This information in Table 1 can assist users in choosing the appropriate parameters for their particular target search. The output provided is either visual in HTML format, but can also be saved as text and opened in a spreadsheet. The target sequence is typically displayed as an R-loop, depicting a specified part of the crRNA, as well as both the target and non-target strand of the double-stranded target DNA. The target sequence R-loop can be fully reverse complemented, when users suspect that the direction of transcription of the CRISPR array starts from the downstream end instead.

Table 1. Summary of features of CRISPR/Cas systems, including PAMs, repeats, seed regions and handles*

Type	Target	Representative species	PAM (5′-3′)^§	Typical repeat	CRISPR family⁵⁹	Seed region	5′/3′ handles (nt)
Type I (PAMs 3′ of protospacer)					Seed adjacent to PAM
I-A	DNA	Sulfolobus solfataricus P2	Protospacer-NGG⁴⁰^,⁴¹^,⁷⁸	GATAATCTCTTATAGAATTGAAAG^¶	CRISPR-7	Unknown	8/16–17¹⁶
I-B	DNA	Clostridium thermocellum ATCC 27405	Unknown	GTTTTTATCGTACCTATGAGGAATTGAAAC^¶	CRISPR-6	Unknown	8/4, 10–12⁷⁹
		C. thermocellum ATCC 27405	Unknown	GTTGAAGTGGTACTTCCAGTAAAACAAGGATTGAAAC^¶	CRISPR-9		8/2–6⁷⁹
		Haloferax volcanii H26	Protospacer-GAA, AGT, TTA, ATA, CTA, GTG⁸⁰	GTTTCAGACGAACCCTTGTGGGDTTGAAGC^¶	CRISPR-6†
		Listeria monocytogenes	Protospacer-NGG⁴¹	GTTTTAACTACTTATTATGAAATCTAAAT	CRISPR-1
I-C	?	Xanthomonas oryzae	Protospacer-GAA⁴¹^,⁸¹	GTCGCGTCCTCACGGGCGCGTGGATTGAAAC^¶	CRISPR-3	Unknown
		Bacillus halodurans	Protospacer-GAA⁴¹	GTCGCACTCTTCATGGGTGCGTGGATTGAAAT	CRISPR-3		11/21¹¹
I-D	?	Unknown	Unknown		Unknown	Unknown
I-E	DNA	Escherichia coli K12	Protospacer-CTT, CAT, CCT, CTC⁴¹^,⁷³^,⁷⁴^,⁸²	GWGTTCCCCGCGCCAGCGGGGATAAACCG^¶	CRISPR-2	1–5, 7–8³⁹	8/21⁹^,¹⁷
I-F	DNA	Pseudomonas aeruginosa PA14	Protospacer-GG²⁵^,⁴¹	GTTCACTGCCGTGTAGGCAGCTAAGAAA^¶	CRISPR-4	1–8¹⁵	8/20¹⁰^,¹⁵
		Pectobacterium atrosepticum SCRI1043	Protospacer-GG⁴¹	GTTCACTGCCGTACAGGCAGCTTAGAAA^¶	CRISPR-4
Type II (PAMs 5′ of protospacer)					Seed adjacent to PAM
II-A	DNA	Streptococcus thermophilus	WTTCTNN-protospacer⁵⁸	GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC	CRISPR-10	Unknown
		Streptococcus thermophilus	TTTYRNNN-protospacer⁸³	GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC	CRISPR-10
II-B	DNA	Streptococcus thermophilus	CNCCN-protospacer⁵⁸^,⁸⁴	GTTTTAGAGCTGTGTTGTTTCGAATGGTTCCAAAAC	CRISPR-10
		Streptococcus pyogenes	CCN-protospacer²⁰^,⁴¹	GTTTTAGAGCTATGCTGTTTTGAATGGTCCCAAAAC^¶	CRISPR-10	13²⁰	None/19–22¹³
Type III (no PAM)
III-A	DNA	Staphylococcus epidermidis	No PAM⁴³	GATCGATACCCACCCCGAAGAAAAGGGGACGAGAAC^¶	CRISPR-8	Unknown	8/(37/43 entire length)⁴³^,⁷⁶
III-B	RNA	Pyrococcus furiosus	No PAM⁸⁵	GTTCCAATAAGACTAAAATAGAATTGAAAG^¶	CRISPR-6	Unknown	8/(39/45 entire length)¹⁸^,⁸⁵
		Sulfolobus solfataricus	No PAM¹⁴	GATTAATCCCAAAAGGAATTGAAAG^¶	CRISPR-7	Unknown	8/uncertain¹⁴

Adapted from Westra et al., 2012.

§ PAM and protospacer are denoted as sequence on target strand that base pairs with crRNA.

¶ Direction of CRISPR transcription confirmed.

Described as CRISPR-9, but length and sequence suggests degenerate CRISPR-6.

Adapted from Westra et al., 2012. § PAM and protospacer are denoted as sequence on target strand that base pairs with crRNA. ¶ Direction of CRISPR transcription confirmed. Described as CRISPR-9, but length and sequence suggests degenerate CRISPR-6. A general model of the match between a spacer and protospacer target as the output from CRISRTarget is shown in Figure 1 for types I, II and III. The differing features, such as 5′ or 3′ handles and the presence or absence of PAMs, can be specified, searched, sorted and displayed in CRISPRTarget. Furthermore, the parameters can be manually adjusted to incorporate new functional information (e.g., a new PAM). For clarity, we use the definition of the protospacer as the DNA strand complementary to the crRNA, and PAMs are denoted 5′-3′ on the protospacer DNA (e.g., type I-E PAM is CTT, Table 1). In addition, we refer to the flanking sequences as being 5′ or 3′ of this protospacer and handles as 5′ or 3′ of the crRNA spacer. CRISPRTarget enables detection of the most likely complements of spacers in target sequences (Fig. 2).

Figure 1. Example annotated CRISPRTarget outputs of representatives of type I, II and III CRISPR/Cas systems. The protospacer is the DNA target complementary to the crRNA spacer. The crRNA is displayed as RNA 5′ to 3′ and the base paired protospacer is 3′ to 5′. (A) The predicted spacer 6 crRNA from the type I-F CRISPR1 (CRISPR1_6) in P. aeruginosa PA14 targets Pseudomonas phage JBD67. The output visualizes the 5′-protospacer-GG-3′ PAM and the crRNA with 8 and 20 nt 5′ and 3′ handles, respectively. (B) The CRISPR1_15 from the type II system from Streptococcus thermophilus DGCC7710 WTphi858phi2972+S13S14 matched to Streptococcus phage 5093. The output shows the predicted length of the 3′ handle, based on Streptococcus pyogenes, and the 5′-WTTCTNN-protospacer-3′ PAM. (C) Spacer 1 from the type III-A system from Staphylococcus epidermidis RP62a targeting plasmid pGO1. The output was adjusted to display the 8 nt 5′ handle with an entire mature crRNA length of 43 nt and no PAMs were scored. Yellow sequences include spacer and protospacer, blue indicates flanking sequences and PAMs are shown in green.

Figure 2. Flowchart of the steps in CRISPRTarget (details are in the Materials and Methods). Input is predictions of the CRISPR arrays, selected databases and initial parameters. This input is processed and the spacers screened using BLASTn for matches against the databases. The flanks of these matches are extended and PAMs and handles analyzed in an interactive manner. Output is as a text/spreadsheet format, or as a graphical display (HTML).

Proof of principle: Phage protospacers for Streptococcus thermophilus type II CRISPR/Cas

As an initial test, we used the well-characterized type II CRISPR1 array from Streptococcus thermophilus DGCC7710. This strain is economically important in the dairy industry and has active CRISPR/Cas systems., The sequences of arrays with recently acquired spacers are available WTphi858phi2972+S9S10S11S12 (GenBank accession: EF434477) and, WTphi858phi2972+S13S14 (EF434478), as are many Streptococcus thermophilus phage sequences (114 sequences of 6,800 in the phage division of GenBank). These two strains have become resistant to ϕ858 and ϕ2972, whereas the WT strain is sensitive (EF434469). We expect that spacers from the resistant strains will be predicted to target ϕ858 and ϕ2972, whereas the WT will not, but might target other mobile elements. Spacers were predicted from these CRISPR sequences using CRISPRFinder, CRT and PILER-CR. CRISPRFinder is the most cited CRISPR prediction tool; however, a combination of CRT and PILER-CR are used in the DOE-JGI standard pipeline for bacterial genome annotation. CRT and CRISPRFinder predicted the published array of 32 spacers in the WT and an additional two or four spacers in the resistant strains, whereas PILER-CR with default parameters split the array into two consisting of 22 and three spacers. The CRT predictions were used as input (Fig. 3), as these include information about small variations in the repeats (all inputs in this study are provided in ). These spacers were searched against the phage division of GenBank and plasmid division of RefSeq. With the default settings, there were matches from 24 of 32 spacers to 84 sequences of mobile elements in the initial output, of which 81 were Streptococcus spp phages (supplementary file html output in , text in ). This has been designated a type II-A system with a requirement for a PAM 5′ of the protospacer (5′-WTTCTNN-protospacer-3′); 38/84 had the consensus PAM. The additional spacers in strains WTphi858phi2972+S9S10S11S12 and WTphi858phi2972+S13S14 targeted ϕ2972 and ϕ858 as expected. Interestingly, the WT has a spacer (CRISPR1_14, uniquely identified as EF434469_1_14 in the text output Fig. S3) with just one mismatch (protospacer +7) to bases 31869–31897 of ϕ2972. Additionally, the 5′ region of the target differs by one base from the PAM consensus (WTcCTNN) (Fig. 4; ). Experimentally, this strain is sensitive to ϕ2972, so the system appears to have a functional requirement for the conserved consensus PAM and/or an exact match near the 5′ end of the protospacer, which corresponds to the 13 nt seed region in type II systems (Table 1). In summary, CRISPRTarget can accurately identify protospacers for crRNAs and display these with details of match/mismatch and PAMs.

Figure 4. Graphical output of CRISPRTarget. The output of a search for the targets of the Streptomyces thermophilus DGCC7710 CRISPR array. The direction of transcription is known; however, both strands are shown in diagram, as if the direction of transcription was unknown. Two relatively low-scoring matches using these interactive settings are shown (rank 44–45). They have good spacer-protospacer base pairing but lack a WTTCTNN PAM. Match 45 is to a phage to which this strain is sensitive (Φ2972). Yellow indicates spacer/protospacer, blue shows flanking sequences and mismatches between the crRNA and the target DNA protospacer are indicated in red.

Figure 3. CRISPRTarget input. Several formats are accepted. The BLASTn parameters for the initial screen are defined at this step. They default to values that favor a gapless match, but some mismatches. The output may be refined and reordered (Fig. 4) after it is obtained. Figure 4. Graphical output of CRISPRTarget. The output of a search for the targets of the Streptomyces thermophilus DGCC7710 CRISPR array. The direction of transcription is known; however, both strands are shown in diagram, as if the direction of transcription was unknown. Two relatively low-scoring matches using these interactive settings are shown (rank 44–45). They have good spacer-protospacer base pairing but lack a WTTCTNN PAM. Match 45 is to a phage to which this strain is sensitive (Φ2972). Yellow indicates spacer/protospacer, blue shows flanking sequences and mismatches between the crRNA and the target DNA protospacer are indicated in red.

Identification of targets for the RNA-targeting Sulfolobus solfataricus type III CRISPR/Cas system

The S. solfataricus P2 CRISPR/Cas system has been well characterized and, recently, the structure of the type III-B ribonucleoprotein Cmr complex was published. This study also demonstrated that crRNAs derived from all six CRISPR arrays are detected in the Cmr complex, which targets RNAs complementary to the crRNA spacer sequences. CRISPRdb lists a total of 255 spacers from seven detected arrays, which belong to the CRISPR-7 (and possibly CRISPR-11) families. Putative protospacers were discovered using CRISPRTarget with the default settings and all predicted S. solfatricus CRISPRs as input (). Of the 254 unique spacers used, 517 hits were detected for 57 spacers from five of the seven arrays (; 471 hits when E-value lowered to 0.1). An earlier study identified the targets of 29 spacers. The top hit was a perfect match from spacer 28 in locus A (NC_002754_3_28 in output) to an Acidianus two-tailed virus (AJ888457). The majority of top hits are to Sulfolobus, Stygiolobus and Acidanus viral sequences, but there are examples of plasmid matches (e.g., Sulfolobus pNOB8). One spacer in locus B (spacer 23 from leader end; NC_002754_4_73 in output) accounts for 393 hits, due to a very A-rich sequence. Since for Cmr no PAM has been identified and self-DNA cannot be targeted, as this system targets RNA, penalizing flanking matches or searching for PAMs was not required. If analyzing type III-A, rather than B, systems, mismatches between the 5′ crRNA handle and the 3′ flank of the protospacer DNA are important for interference and can be scored appropriately. However, in either case, the ability to view the pairing between the handle and protospacer flanks allows matches to different CRISPR arrays to be easily distinguished.

The P. atrosepticum type I-F system targets a prophage in Pectobacterium carotovorum

Members of the genus Pectobacterium are economically important phytopathogens that cause a range of plant diseases. CRISPR/Cas systems in plant pathogens have not been well examined to date (reviewed in ref. 62). Previously, we analyzed the type I-F system of P. atrosepticum SCRI1043 (previously known as Erwinia carotovora subsp atrosepticum),, which causes soft-rot and blackleg disease in potato. The cas genes and CRISPRs are transcribed and crRNAs generated by the Cas6f endoribonuclease. Furthermore, the P. atrosepticum Csy1, Csy2, Csy3 and Cas6f proteins form a complex, which interacts with the Cas2-Cas3 nuclease. Cas1 and the Cas2-Cas3 hybrid protein also interact, suggesting a role in acquisition., The P. atrosepticum SCRI1043 type I-F system contains three CRISPR arrays with a consensus repeat belonging to CRISPR-4 type (Table 1). These arrays contain 41 spacers with 28, 10 and three spacers present in CRISPR1, 2 and 3, respectively (Table 2). Our previous analyses using BLAST failed to identify potential viral targets of the 41 spacers. However, spacer 6 in CRISPR2 showed 100% identity to the eca0560 gene in its own genome. To test CRISPRTarget, we searched for potential targets of all spacers. CRISPRFinder output files for each array were searched against ACLAME, GenBank-Environmental, GenBank-Phage, RefSeq-Microbial, RefSeq-Plasmid, RefSeq-Viral and a subset of the CAMERA metagenomic databases in CRISPRTarget (default settings, but -1/1 match/mismatch scores to penalize self matches with the 8 nt handles).

Table 2. Predicted CRISPR arrays in Pectobacterium species

Name§	Type	P. atrosepticum SCRI1043(NC_004547)	P. carotovorum subsp carotovorum PCC21(NC_018525)	P. wasabiae(NC_013421)
CRISPR1	I-F	28	38	17
CRISPR2	I-F	10	3	25
CRISPR3	I-F	3	3
CRISPR4	I-E		14*	16
CRISPR5	I-E		14*	6

Names do not indicate CRISPR relationship between strains.

Likely to be one array of 29 spacers, with a 76 base spacer in the middle.

Names do not indicate CRISPR relationship between strains. Likely to be one array of 29 spacers, with a 76 base spacer in the middle. The CRISPR1 array identified by CRISPRFinder was in the incorrect orientation, so CRISPRTarget was adjusted for a reverse complemented output (e.g., see Fig. 4). CRISPRTarget gave 67 hits from 13/28 spacers from CRISPR1 (), compared with only two hits when CRISPRFinder was utilized. Selection of the I-F PAM in CRISPRTarget enabled visualization and scoring of targets that contained a consensus CRISPR-4/I-F PAM. Furthermore, the site of crRNA processing by Cas6F in type I-F systems is known, so 8 nt of the 5′ (handle) and 20 nt of the 3′ flanking regions were displayed for the crRNAs. By scoring flanks with penalties (e.g., -1/1 match/mismatch), self-targets can be penalized and moved down the output list. Usually the default cut-off score of 20 eliminates the self-matching results when default 8 nt handles are used (with -1/1 match/mismatch scores), while allowing bona fide targets. Using the same databases and increasing the E-value to 10, increased the number of hits to 406, which resulted in the identification of putative targets for 19 of the 28 spacers. A search with CRISPR1 against the GenBank-nt database with the same settings identified 21 hits for eight spacers when an E-value of 1 was used. When the E-value was increased to 10, 24 spacers gave 85 hits scoring 20 or more, but there were some false positive sequences (eukaryotic). CRISPR1_19 matched a putative phage gene in Pectobacterium carotovorum subsp carotovorum PCC21. Note that we denote spacer 1 as the leader-proximal spacer, but the spacer numbers in the CRISPRTarget output are numbered according to the input file. For example, since CRISPR1 was reversed in the output, spacer 19 of 28 (relative to the leader) is numbered spacer 10 in the CRISPRFinder input file. Comparing P. carotovorum PCC21 and P. atrosepticum SCRI1043, revealed that the spacer 19 target is within a 45 kb prophage containing 54 predicted coding sequences (here designated ΦPCC21_1; Fig. 5A and B). ΦPCC21_1 is inserted in ryeAB, but is absent in P. atrosepticum SCRI1043. The ryeAB genes are two overlapping small non-coding RNAs. In Salmonella, this locus is an important insertion site for prophages that have influenced this pathogen’s evolution. Interestingly, CRISPR1 spacer 2 also matched ΦPCC21_1, albeit ~32 kb from the spacer 19 target (Fig. 5A). Mismatches in the predicted RNA-DNA hybrid suggest that these spacers might no longer target this particular prophage, but it is also possible that they derived from a related phage. We propose that P. atrosepticum has been exposed to this, or a related, phage in the past, but lysogenization has been inhibited by CRISPR/Cas.

Figure 5.Pectobacterium prophages are targeted by CRISPR/Cas. (A) Prophage ϕPCC21_1 is targeted by spacers in P. atrosepticum. (B) P. atrosepticum SCRI1043 (top, 2761697–2811697) compared with ϕPCC21_1 in P. carotovorum subsp carotovorum PCC21 (bottom, phage coordinates: PCC21_018470–019020 from 2092807–2135244. PCC21 is reversed for clarity). (C) Prophage ϕECA29 is targeted by spacers in P. carotovorum subsp carotovorum PCC21. (D) P. carotovorum subsp carotovorum PCC21 (top, PCC21_017190–017500 from 1936500–1976500. PCC21 is reversed) compared with ϕECA29 (HAI9) in P. atrosepticum SCRI1043 (bottom, ECA2598-ECA2637 from 2935264–2966783). (E) Prophage ϕPC1_1 is targeted by a spacer in P. carotovorum subsp carotovorum PCC21. (F) P. carotovorum subsp carotovorum PCC21 (top, PCC21_027150–027460 from 3058299–3095299) compared with ϕPC1_1 in P. carotovorum subsp carotovorum PC1 (bottom, PC1_2622–2666 from 2989228–3022511). (G) Prophage ϕPCC21_1 is targeted by spacers in P. wasabiae. (H) P. wasabiae WPP163 (top, 2291600–2341600) compared with ϕPCC21_1 in P. carotovorum subsp carotovorum PCC21 (bottom, phage coordinates: PCC21_018470–019020 from 2092807–2135244). (I) Prophage ϕPC1_2 is targeted by spacers in P. wasabiae. (J) P. wasabiae WPP163 (top, 1192372–1236372) compared with ϕPC1_2 in P. carotovorum subsp carotovorum PC1 (bottom, phage coordinates: PC1_3152–3199 from 3573374–3608557. PC1 is reversed). Prophages (K) ϕECA29 and (L) ϕPC1_2 are targeted by P. wasabiae spacers. Genome comparisons were generated using Easyfig; genes are cyan arrows, putative prophage regions are purple and spacer target locations indicated with asterisks. Homologous regions by BLASTn are shown in shades of gray.

Pectobacterium carotovorum crRNAs match prophages in P. atrosepticum and P. carotovorum

As P. atrosepticum spacers matched a prophage in a related strain, we examined CRISPR targets in other representative Pectobacterium genomes. First, we uploaded the genome of P. carotovorum subsp carotovorum PCC21 into CRISPRFinder and identified five arrays; three CRISPR-4/type I-F arrays containing 38, 3 and 3 spacers and two CRISPR-2/type I-E arrays with 14 spacers each (output in ). Two spacers in CRISPR1 (type I-F with 38 spacers) matched different regions of eca2627 in the P. atrosepticum SCRI1043 ΦECA29 prophage (also termed HAI9; Fig. 5C). Comparison of P. carotovorum subsp carotovorum PCC21 and P. atrosepticum SCRI1043 demonstrated the absence of a ΦECA29 prophage in PCC21 (Fig. 5D). Spacer 34 also matched a putative prophage (here designated ΦPC1_1) in P. carotovorum subsp carotovorum PC1 (Fig. 5E and F). The two type I-E arrays are separated by 76 bp, so it is possible that these are one large array with 29 spacers. Spacer 8 within CRISPR4 was self-matching to its own ΦPCC21_1 prophage, but this will be non-targeting due to a position 2 seed mutation. Spacer 3 in CRISPR4 matches a transposase gene in Pectobacterium wasabiae WPP163 (Pecwa_0911), which is not predicted to be part of an island.

P. wasabiae CRISPRs have targets against multiple prophages

Next, the CRISPRs of P. wasabiae WPP163 were analyzed (). P. wasabiae has four CRISPRs, two CRISPR-4/type I-F with 17 and 25 spacers and two CRISPR-2/type I-E containing 16 and six spacers (Table 2). Spacers 2 and 10 from CRISPR1 (I-F array with 17 spacers) match ΦPCC21_1 (Fig. 5G and H), which is also targeted by the P. atrosepticum type I-F system (Fig. 5A and B). ΦPCC21_1 is absent in P. wasabiae, but in this location is Pecwa_2124 (a pseudogene homologous to the ΦPCC21_1 integrase) and Pecwa_2125-9. Remarkably, spacers 3, 4, 5 and 6, from the CRISPR2 (I-F array with 25 spacers), targeted genes PC1_3175, PC1_3187, PC1_3191 and PC1_3182, respectively, in a putative prophage in P. carotovorum subsp carotovorum PC1 (here designated ΦPC1_2) that is absent in P. wasabiae (Fig. 5I and J). In addition, spacer 5 matches to the P2-type tail fiber protein H, eca2608, in ΦECA29 (Fig. 5K) and spacer 20 targeted ΦPC1_1 (Fig. 5F and L), which is also absent in P. wasabiae (data not shown). Therefore, P. wasabiae appears to have previously encountered phages similar to ΦPCC21_1, ΦECA29, ΦPC1_1 and ΦPC1_2, and has developed CRISPR/Cas immunity to these elements. Overall, this analysis indicated that CRISPRTarget can reveal new targets of spacers in CRISPR arrays and demonstrates, with the example of Pectobacterium, that novel biologically relevant information can be obtained. Specifically, inter-species prophage exclusion by Pectobacterium type I CRISPR/Cas systems was suggested.

Discussion

We have developed a tool designed to detect, and interactively explore, the targets of CRISPR RNA spacers. This is the first tool of this kind designed for this purpose. The inputs into CRISPRTarget are predicted CRISPR arrays or spacer sequences. These CRISPR and spacer prediction methods were initially developed in 2007–2009- and, thus, do not incorporate recent refinements. These current CRISPR predictions do not take into account the direction of CRISPR transcription and errors that can occur when defining spacer and repeat boundaries. CRISPRTarget enables the user to search for matches in either or both orientations of a given input and display adjacent PAM and flanking sequences. These features provide the flexibility to discover targets with PAMs and also any adjacent pairing potential, ensuring greater power in predicting biologically relevant protospacer targets. The initial screen for database matches in CRISPRTarget is done by BLASTn, with a range of parameters able to be defined. The defaults chosen penalize gaps with -10. We know of no publications that indicate that insertions/deletions are permitted in the RNA/DNA hybrid, although in some systems, mismatches are tolerated.,- The use of BLASTn allows for a smaller exact hit match of wordsize 7, compared with MegaBLAST (minimum word size of 28). However, BLASTn is slower. Specific databases are provided; the use of databases of mobile elements (e.g., phage, plasmid, ACLAME) reduces the execution time and increases the number of biologically relevant positives. Hits that might have high expect (E) values (e.g., > 1) in larger databases will be shown as significant at the same E-value in a smaller database. Not using the “nt” database as the default also avoids the showing of high-scoring self-matches in the source or related genomes. Selected parts of the CAMERA databases, enriched in phage sequences, are provided, and the user can upload custom data e.g., new genomic or metagenomic data for searching. Following the initial BLAST screen, the user can interactively refine and reduce the putative targets shown. In some systems, PAMs are required, or seed sequences. These can be weighted so that only those with this feature are displayed. In the case of S. thermophilus DGCC7710 WT spacer 14, there is a one base mismatch to ϕ2972 and a T to C substitution in the PAM. The consensus PAM for this S. thermophilus type II system is WTTTCTNN (or NNAGAAW on the other strand). This T was conserved in experimentally confirmed protospacers. Recent reports have demonstrated that pre-existing spacers that match to a target, but can have subtle mutations that abolish interference, increase the acquisition of new spacers in a process termed priming., It is tempting to speculate that this spacer might increase the spacer acquisition activity of this CRISPR array against ϕ2972 and related phages. The ability to detect potential targets for the type III-B system of S. solfataricus P2 was also demonstrated and resulted in putative targets for ~20% of the > 250 spacers. Most of these were matches to archaeal viruses and plasmids, demonstrating potentially relevant crRNA targets. To demonstrate the utility and functionality of CRISPRTarget, we investigated possible protospacer targets in Pectobacterium species. This analysis revealed that there appears to be a history of prophage exposure and CRISPR content, indicative of an adaptive immunity against prophages. In other words, the presence of CRISPR arrays containing spacers matching prophages in other Pectobacterium genomes correlated with the absence of these mobile elements. The current role, if any, of these prophages is not clear. However, in the case of ΦECA29 in P. atrosepticum SCRI1043, this prophage was shown to excise from the chromosome and circularize. Furthermore, deletion of this entire prophage led to a reduction in motility and phytopathogenicity and, hence, CRISPR/Cas might limit the acquisition or retention of prophage-encoded virulence determinants. In our study, the detection of protospacer targets also led to the identification of new putative prophages (ΦPCC21_1, ΦPC1_1 and ΦPC1_2) in recently sequenced genomes. Thus, these CRISPRTarget hits enable confidence in the prediction of mobile regions of bacterial genomes, which are often poorly annotated. Pectobacterium strains PCC21 and WPP163 also contained spacers that matched phage ZF40 (JQ177065), a “dwarf” Myoviridae, suggesting previous exposure to this, or a related, temperate phage. Given the phage and prophage interactions detected, it is of interest that strains WPP163, PCC21 and SCRI1043 were isolated from the USA, Korea and Scotland, respectively, over 20 y apart. In conclusion, we have developed and tested CRISPRTarget, a flexible, interactive tool for the discovery of the targets of crRNAs in diverse databases. There is currently no comparable webserver available and, thus, CRISPRTarget will provide a valuable resource for the growing CRISPR research community.

Materials and Methods

Target databases

Selected databases are provided in CRISPRTarget. GenBankdatabases: BLAST Nucleotide databases (1) The nr/nt collection ~43 billion bases (15/10/2012, GenBank 192). This database contains “All GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS or phase 0, 1 or 2 HTGS sequences).” (2) env_nt, 8.5 billion bases (15/10/2012). This contains “Sequences from environmental samples, such as uncultured bacterial samples isolated from soil or marine samples. The largest single source is Sargasso Sea project. This does not overlap with nucleotide nr.” This is part of the whole genome shotgun (wgs), but these sequences have no taxonomic classification other than metagenome. (3) Phage division (phg). This is one of the smallest GenBank divisions containing 6,800 sequences of 88 million bases. RefSeq databases: Several relevant divisions of the NCBI Reference Sequence databases are available, which contain better annotated (by NCBI) versions of GenBank sequences. (1) RefSeq-Plasmid. 3,707 sequences, 282 million bases. (2) RefSeq-Viral. 4,279 sequences, 95 million bases. (3) RefSeq-Microbial. 5,234 complete microbial genomes, 7 billion bases. We also included parts of the CAMERA databases. 913,9883 sequences, 1 billion bases. ACLAME. 125,190 sequences, 96 million bases. (4) User defined. Users can upload sequences of up to 50 Mb.

CRISPR array sequences

CRISPR arrays were used from published studies or CRISPRdb. They were also predicted with CRISPRFinder, PILER-CR or CRT using the default parameters. The current tools for prediction have some limitations, notably, the lack of prediction of the transcribed strand, the imprecise definition of the DR/Spacer junctions or splitting into several sub arrays.

Algorithm

Input data

Spacer sequences are extracted from the input CRISPR arrays using the locations specified and converted to FASTA format. Alternatively, spacer sequences can be uploaded directly, without repeat sequences, however this limits subsequent processing.

BLAST screen

Each spacer sequence is used to query the selected databases. Multiple databases can be selected, except where there are identical accession identifiers (nt + phg). The default values used by NCBI BLASTn for short sequences, < 30 bases (defaults for long sequences are in brackets), are Gap open -5(-5), gap extend -2(-2), match +1(+1), mismatch -3(-3), word size 7(11), Expect (E): 1,000 (10). Filter: No (Yes). The initial CRISPRTarget defaults are the same except that a gap is penalized more highly (-10), the mismatch penalty is -1 and the E filter is 1. In addition, there is also no filter or masking for low complexity. The CRISPRTarget BLASTn parameters favor gapless matches but allow a number of mismatches at this screening stage. BLAST calculates the scores over the length of the match, and only shows this match. For example, a spacer of 32 bases that matches to a target in 17 of 20 bases would score 20 - 3 = 17 and 20 bases would be output. The expected (E) values of the match will be more likely to pass the filter if smaller databases are used (e.g., the default phg and plasmid). The hits are converted into GFF format.

Extension of the BLAST match

The full spacer and handles are extracted from the input sequences. In the case of CRISPRFinder input, only a single repeat is in the input and this is used for all spacer handles. Both CRT and PILER-CR outputs enable small differences in the repeat to be used. If the user wishes to extract more sequence than provided in the array files, e.g., the sequence following the final repeat, this can be extracted from a FASTA file (if provided by the user). Extension of the spacer is not possible if only spacer sequences are in the input. The protospacer target is extended by extracting the user-specified length of sequence from the BLAST database.

CRISPRTarget interactive scoring

All putative spacer/protospacer targets passing the BLAST screen are displayed in an interactive manner. An initial score is calculated by scoring matches (+1) and mismatches (-1) across the whole length of the spacer without gaps. Specific user defined 'seed' regions can be required to match at either or both ends of the protospacer. A match to pre-defined, or novel user-defined, PAM sequences can increase the score. In order to penalize self-matches that would match 100% in both spacers and flanking handles (e.g., to the original genomic array sequence), a score can be used that penalizes matches (e.g., -1) in the flanking handles. Mismatch penalties can also be used to identify targeting that is facilitated by mismatches in the handles (e.g., type III-A). Finally, a cutoff score can be applied to display only those matches with the best scores.

85 in total

1. Mastering seeds for genomic size nucleotide BLAST searches.

Authors: Valer Gotea; Vamsi Veeramachaneni; Wojciech Makałowski
Journal: Nucleic Acids Res Date: 2003-12-01 Impact factor: 16.971

Review 2. A first global analysis of plasmid encoded proteins in the ACLAME database.

Authors: Raphaël Leplae; Gipsi Lima-Mendez; Ariane Toussaint
Journal: FEMS Microbiol Rev Date: 2006-11 Impact factor: 16.408

3. Csy4 is responsible for CRISPR RNA processing in Pectobacterium atrosepticum.

Authors: Rita Przybilski; Corinna Richter; Tamzin Gristwood; James S Clulow; Reuben B Vercoe; Peter C Fineran
Journal: RNA Biol Date: 2011-05-01 Impact factor: 4.652

4. Two mobile Pectobacterium atrosepticum prophages modulate virulence.

Authors: Terry J Evans; Sarah J Coulthurst; Evangelia Komitopoulou; George P C Salmond
Journal: FEMS Microbiol Lett Date: 2010-01-18 Impact factor: 2.742

5. Diversity of CRISPR loci in Escherichia coli.

Authors: C Díez-Villaseñor; C Almendros; J García-Martínez; F J M Mojica
Journal: Microbiology Date: 2010-02-04 Impact factor: 2.777

6. Cas5d protein processes pre-crRNA and assembles into a cascade-like interference complex in subtype I-C/Dvulg CRISPR-Cas system.

Authors: Ki Hyun Nam; Charles Haitjema; Xueqi Liu; Fran Ding; Hongwei Wang; Matthew P DeLisa; Ailong Ke
Journal: Structure Date: 2012-07-26 Impact factor: 5.006

7. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus.

Authors: Hélène Deveau; Rodolphe Barrangou; Josiane E Garneau; Jessica Labonté; Christophe Fremaux; Patrick Boyaval; Dennis A Romero; Philippe Horvath; Sylvain Moineau
Journal: J Bacteriol Date: 2007-12-07 Impact factor: 3.490

8. Sequence- and structure-specific RNA processing by a CRISPR endonuclease.

Authors: Rachel E Haurwitz; Martin Jinek; Blake Wiedenheft; Kaihong Zhou; Jennifer A Doudna
Journal: Science Date: 2010-09-10 Impact factor: 47.728

9. Prevalence, conservation and functional analysis of Yersinia and Escherichia CRISPR regions in clinical Pseudomonas aeruginosa isolates.

Authors: K C Cady; A S White; J H Hammond; M D Abendroth; R S G Karthikeyan; P Lalitha; M E Zegans; G A O'Toole
Journal: Microbiology (Reading) Date: 2010-11-16 Impact factor: 2.777

10. CRISPR interference directs strand specific spacer acquisition.

Authors: Daan C Swarts; Cas Mosterd; Mark W J van Passel; Stan J J Brouns
Journal: PLoS One Date: 2012-04-27 Impact factor: 3.240

110 in total

1. Diversity of CRISPR/Cas system in Clostridium perfringens.

Authors: Jinzhao Long; Yake Xu; Liuyang Ou; Haiyan Yang; Yuanlin Xi; Shuaiyin Chen; Guangcai Duan
Journal: Mol Genet Genomics Date: 2019-05-27 Impact factor: 3.291

2. CRISPR-Cas and Contact-Dependent Secretion Systems Present on Excisable Pathogenicity Islands with Conserved Recombination Modules.

Authors: Megan R Carpenter; Sai S Kalburge; Joseph D Borowski; Molly C Peters; Rita R Colwell; E Fidelma Boyd
Journal: J Bacteriol Date: 2017-04-25 Impact factor: 3.490

3. Chromosomal targeting by CRISPR-Cas systems can contribute to genome plasticity in bacteria.

Authors: Ron L Dy; Andrew R Pitman; Peter C Fineran
Journal: Mob Genet Elements Date: 2013-10-25

4. Inhibition of CRISPR-Cas9 with Bacteriophage Proteins.

Authors: Benjamin J Rauch; Melanie R Silvis; Judd F Hultquist; Christopher S Waters; Michael J McGregor; Nevan J Krogan; Joseph Bondy-Denomy
Journal: Cell Date: 2016-12-29 Impact factor: 41.582

5. Structural basis of Type IV CRISPR RNA biogenesis by a Cas6 endoribonuclease.

Authors: Hannah N Taylor; Emily E Warner; Matthew J Armbrust; Valerie M Crowley; Keith J Olsen; Ryan N Jackson
Journal: RNA Biol Date: 2019-06-28 Impact factor: 4.652

6. Genome-wide correlation analysis suggests different roles of CRISPR-Cas systems in the acquisition of antibiotic resistance genes in diverse species.

Authors: Saadlee Shehreen; Te-Yuan Chyou; Peter C Fineran; Chris M Brown
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2019-05-13 Impact factor: 6.237

7. Diversity of the type I-U CRISPR-Cas system in Bifidobacterium.

Authors: Liuyang Ou; Jinzhao Long; Yanli Teng; Haiyan Yang; Yuanlin Xi; Guangcai Duan; Shuaiyin Chen
Journal: Arch Microbiol Date: 2021-04-09 Impact factor: 2.552

8. Cas13b Is a Type VI-B CRISPR-Associated RNA-Guided RNase Differentially Regulated by Accessory Proteins Csx27 and Csx28.

Authors: Aaron A Smargon; David B T Cox; Neena K Pyzocha; Kaijie Zheng; Ian M Slaymaker; Jonathan S Gootenberg; Omar A Abudayyeh; Patrick Essletzbichler; Sergey Shmakov; Kira S Makarova; Eugene V Koonin; Feng Zhang
Journal: Mol Cell Date: 2017-01-05 Impact factor: 17.970

9. Degenerate target sites mediate rapid primed CRISPR adaptation.

Authors: Peter C Fineran; Matthias J H Gerritzen; María Suárez-Diez; Tim Künne; Jos Boekhorst; Sacha A F T van Hijum; Raymond H J Staals; Stan J J Brouns
Journal: Proc Natl Acad Sci U S A Date: 2014-04-07 Impact factor: 11.205

10. Investigation of direct repeats, spacers and proteins associated with clustered regularly interspaced short palindromic repeat (CRISPR) system of Vibrio parahaemolyticus.

Authors: Pallavi Baliga; Malathi Shekar; Moleyur Nagarajappa Venugopal
Journal: Mol Genet Genomics Date: 2018-10-24 Impact factor: 3.291