Literature DB >> 27577199

Comparative genomic analysis identifies structural features of CRISPR-Cas systems in Riemerella anatipestifer.

De-Kang Zhu^1,2, Xue-Qin Yang^1,2, Yang He^1,2, Wang-Shu Zhou^1,2, Xiao-Heng Song^1,2, Jiang-Bo Wang^1,2, Yu Zhang^1,2, Ma-Feng Liu^2,3, Ming-Shu Wang^2,3, Ren-Yong Jia^1,2, Shun Chen^2,3, Kun-Feng Sun^2,3, Qiao Yang^2,3, Ying Wu^2,3, Xiao-Yue Chen^1,2, An-Chun Cheng^4,5.

Abstract

BACKGROUND: Riemerella anatipestifer infection is a contagious disease that has resulted in major economic losses in the duck industry worldwide. This study attempted to characterize CRISPR-Cas systems in the disease-causing agent, Riemerella anatipestifer (R. anatipestifer). The CRISPR-Cas system provides adaptive immunity against foreign genetic elements in prokaryotes and CRISPR-cas loci extensively exist in the genomes of archaea and bacteria. However, the structure characteristics of R. anatipestifer CRISPR-Cas systems remains to be elucidated due to the limited availability of genomic data.
RESULTS: To identify the structure and components associated with CRISPR-Cas systems in R. anatipestifer, we performed comparative genomic analysis of CRISPR-Cas systems in 25 R. anatipestifer strains using high-throughput sequencing. The results showed that most of the R. anatipestifer strains (20/25) that were analyzed have two CRISPR loci (CRISPR1 and CRISPR2). CRISPR1 was shown to be flanked on one side by cas genes, while CRISPR2 was designated as an orphan. The other analyzed strains harbored only one locus, either CRISPR1 or CRISPR2. The length and content of consensus direct repeat sequences, as well as the length of spacer sequences associated with the two loci, differed from each other. Only three cas genes (cas1, cas2 and cas9) were located upstream of CRISPR1. CRISPR1 was also shown to be flanked by a 107 bp-long putative leader sequence and a 16 nt-long anti-repeat sequence. Combined with analysis of spacer organization similarity and phylogenetic tree of the R. anatipestifer strains, CRISPR arrays can be divided into different subgroups. The diversity of spacer organization was observed in the same subgroup. In general, spacer organization in CRISPR1 was more divergent than that in CRISPR2. Additionally, only 8 % of spacers (13/153) were homologous with phage or plasmid sequences. The cas operon flanking CRISPR1 was observed to be relatively conserved based on multiple sequence alignments of Cas amino acid sequences. The phylogenetic analysis associated with Cas9 showed Cas9 sequence from R. anatipestifer was closely related to that of Bacteroides fragilis and formed part of the subtype II-C subcluster.
CONCLUSIONS: Our data revealed for the first time the structural features of R. anatipestifer CRISPR-Cas systems. The illumination of structural features of CRISPR-Cas system may assist in studying the specific mechanism associated with CRISPR-mediated adaptive immunity and other biological functions in R. anatipestifer.

Entities: Chemical Disease Gene Mutation Species

Keywords: CRISPR-Cas system; Riemerella anatipestifer; cas gene; phylogenetic analysis; repeat sequence; spacer sequence

Mesh：

Year: 2016 PMID： 27577199 PMCID： PMC5006608 DOI： 10.1186/s12864-016-3040-4

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

Riemerella anatipestifer (R. anatipestifer) is a Gram-negative bacterium belonging to the Flavobacteriaceae family. This bacterium is the causative agent of R. anatipestifer infection, which result in septicemic disease in ducks, geese, turkeys, and other birds. The associated disease represents a major problem for the duck industry worldwide, and is the cause of significant economic losses [1]. Clustered regularly interspaced short palindromic repeats (CRISPR) and the CRISPR-associated (Cas) proteins constitute the CRISPR-Cas system. This system provides a novel adaptive immunologic mechanism against exogenous nucleic acid invasion in archaea and bacteria [2-4]. The CRISPR-cas locus is found to be widespread in prokaryotes, being present in ~50 % of sequenced bacterial genomes and ~87 % of sequenced archaea genomes [5, 6]. The CRISPR-cas locus consists of a CRISPR array and a set of cas genes [2-4]. Moreover, a leader sequence is involved occasionally [7-11]. The CRISPR array is composed of direct repeat sequences (repeats) and spacer sequences (spacers) that are derived from phages, plasmids or other mobile genetic elements [2-4]. The corresponding sequences in foreign genetic elements are called protospacers [2–4, 12, 13]. The leader sequence is AT-rich, and is located upstream of CRISPR array [7-9]. This sequence is considered the promoter for the CRISPR locus in some strains [9-11]. The immunologic mechanism employed by the CRISPR-Cas system commonly consists of three steps. In the acquisition step, a fragment of exogenous genetic elements integrates into the CRISPR array, generating a new repeat-spacer unit [14-17]. During the expression step, the CRISPR array is transcribed and processed into CRISPR RNA (crRNA), which is bound to a single Cas protein or to a multi-subunit complex comprised of multiple Cas proteins [18-20]. Then the crRNA guides the Cas protein(s) to target and cleave the protospacer of cognate exogenous nucleic acids in the interference step [21-24]. However, the specific molecular mechanisms vary depending on the system used. Additionally, trans-encoded CRISPR RNAs (tracrRNA) have been identified and are considered an additional factor of type II systems [20, 25]. The tracrRNAs are encoded by sequences found close to the cas operon and CRISPR array and are distinguished by the presence of an anti-repeat sequence complement with cognate repeats [20, 25]. During interference, in addition to complementarity between protospacer and the spacer portion of crRNA, a defined sequence (known as protospacer adjacent motif or PAM) musts flank on one side of the protospacer to avoid self-targeting. Moreover, PAM is involved in spacer acquisition [26-29]. Nowadays, the CRISPR-Cas systems can be divided into two classes, five types and 16 subtypes [30]. Class 1 systems, with multi-subunit effector complexes made up of multiple Cas proteins, include type I, type III and putative type IV. Class 2 systems, with a single Cas protein effector, encompass type II and putative type V. Based on distinct Cas protein composition and architecture of cas operon, the five main types can be further classified into sixteen subtypes. Each of the five types has its own signature cas gene, which is cas3 (or its variant cas3’), cas9, cas10, csf1 and cpf1 respectively [30]. The latest research has discovered that there are three novel Class 2 systems, namely, two subtypes of putative type V (characterized by C2c1 and C2c3 effector proteins respectively) and putative type VI (C2c2 is its effector protein) [31]. Each grouping harnesses specific molecular mechanism although these systems share main functional modules and play a role in immunity [30-32]. To date, because of limitations pertaining to genomic sequence data for R. anatipestifer, structural analyses of R. anatipestifer CRISPR-Cas systems have not been performed. In this study, we performed sequence analysis of the CRISPR-Cas systems from 25 different strains, including 18 clinical strains that had previously been isolated by our research group.

Methods

R. anatipestifer strains

Eighteen clinical strains were isolated from the livers of infected ducks from seven different provinces in China. The information of bacterial strains utilized is listed in Additional file 1: Table S1. Six complete genomes of R. anatipestifer strains (DSM 15868, CP002346.1; RA-GD, CP002562.1; CH3, CP006649.1; strain 153, CP007504.1; Yb2, CP007204.1; and strain 17, CP007503.1) were retrieved from National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/genome) [33]. The accession numbers of these complete genomes were listed in Additional file 1: Table S1.

Genomic DNA extraction and sequencing

Genomic DNA was extracted with a TIANamp Bacteria DNA Kit (Tiangen Biotech Co, Ltd., Beijing, China). Then the sequences of these strains analyzed were determined using a high-throughput sequencing platform (Illumina HiSeq 2500) with an average genome coverage of 100X. Velvet version 1.2.09 [34] was used for de novo assembly. NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (http://www.ncbi.nlm.nih.gov/genome/annotation_prok) was utilized to facilitate genome annotation. All of the genome sequences were submitted to NCBI, with the most relevant information listed in Additional file 1: Table S1.

Phylogenic analysis of R. anatipestifer strains

A phylogenetic tree (Neighbor Joining Tree) of the nineteen sequenced genomes and six NCBI deposited complete genomes was constructed using a pangenome analysis tool-BPGA [35] based on pangenome of R. anatipestifer. This analysis utilized all default parameters.

CRISPR-cas locus sequence analysis and prediction

The information pertaining to the CRISPR locus including position, length and content were acquired from CRISPRfinder (http://crispr.i2bc.paris-saclay.fr/) [36] and CRISPRI (http://crispi.genouest.org/) [37]. Loci with less than five repeat numbers were termed “questionable CRISPR loci”, and those which were not termed “confirmed CRISPR loci”, the former were not used as part of this analysis. Additionally, each of the loci obtained was manually checked as previously reported. Briefly, CRISPR loci located in coding regions should be discarded. Likewise, the loci which possesses repeats larger than 48 nt ought to be abandoned. The online tool Weblogo (http://weblogo.berkeley.edu/logo.cgi) [38] was used to analyze the conservation of the associated repeats, and secondary structure prediction of the repeats was performed using Mfold (http://unafold.rna.albany.edu/?q=mfold/RNA-Folding-Form) [39]. The spacers for each locus were manually identified and compared. CRISPRTarget (http://bioanalysis.otago.ac.nz/CRISPRTarget/crispr_analysis.html) [40] was utilized to predict the presence of possible protospacers. Protospacers were identified if the associated sequences contained four or fewer SNPs (≥87 % identity or a minimum of 26/30 matching nucleotides) between spacers and phage or plasmid sequences. We aligned the protospacers with the flanking sequences comprising 20 bp on each side to find PAM by using Weblogo. The non-coding sequences upstream of the first repeat were selected as the putative leader sequences and aligned using ClustalX software (version 1.83) [41]. Subsequently, the putative leader sequences were screened for promoters using the BDGP Neural Network Promoter Prediction tool (http://www.fruitfly.org/seq_tools/promoter.html) [42]. The 10 kb-long sequences flanking the CRISPR array were extracted and analyzed using the BLAST program (http://blast.ncbi.nlm.nih.gov/Blast.cgi) [43] to identify cas genes. The composition of cas genes and Cas proteins in R. anatipestifer was analyzed using SnapGene software (version 2.3.2, from GSL Biotech; available at snapgene.com). Cas protein sequences of other bacteria were retrieved from the NCBI protein database. MEGA software (version 6) [44] was utilized to conduct the phylogenetic analysis of Cas proteins, and the phylogenetic tree associated with Cas proteins was constructed by neighbor-joining method. The default parameter values were selected when computed.

Results

Architecture of CRISPR-cas loci in R. anatipestifer strains

In the 25 strains that were analyzed, 45 CRISPR loci were identified. These loci were divided into two groups, CRISPR1 and CRISPR2. CRISPR1 was flanked by cas genes which encode for Cas proteins. However, no cas genes were in the vicinity of CRISPR2. Twenty-four and 21 different arrays were observed for CRISPR1 and CRISPR2, respectively. Twenty of the analyzed strains contained both loci (CRISPR1 and CRISPR2), while the other strains contained only one locus, ie, CRISPR1 or CRISPR2 (RA-CH-1, CH3, RCAD0131 and RCAD0111 did not contain CRISPR2, and RCAD0133 harbored only CRISPR2) (Additional file 1: Table S2, Fig. 1). The smallest CRISPR1 locus observed in a single strain (RCAD0111) encompassed four spacers and five direct repeats, while the largest CRISPR1 locus occurred in RCAD0188 and contained 24 spacers and 25 repeats. On average, CRISPR1 locus was comprised of 14 spacers and 15 repeats. The smallest CRISPR2 locus contained eight spacers and nine repeats (Yb2), while the largest CRISPR2 locus, which was observed in five strains (RA-GD, DSM 15868, ATCC 11845, RCAD0122 and RCAD0124), contained 18 spacers and 19 repeats. CRISPR2 locus consisted of 15 spacers and 16 repeats averagely. The CRISPR locus information is displayed in Additional file 1: Table S2. The distance between CRISPR1 and CRISPR2 varied from strain to strain, approximately from 200 to 800 kb. There were only three cas genes (cas1, cas2 and cas9) located upstream of the CRISPR1 locus (Fig. 1). The region between the cas2 gene and the first direct repeat was a 107 bp-long non-coding sequence and was defined as the putative leader sequence of CRISPR1 (Fig. 1). We found a 16 nt-long anti-repeat sequence, ‘TGAAAGCAATTCACAA’, which was located upstream of cas9 gene in all CRISPR1-containg strains and was considered to be a part of the sequence that encodes for a putative tracrRNA (Fig. 1).

Fig. 1

Graphic representation of CRISPR-cas loci in R. anatipestifer strains. The black blocks represent CRISPR loci. AT-rich leader sequences (light grey blocks) are located upstream of CRISPR1. There are three cas genes (cas1, cas2 and cas9) which are closed to CRISPR1, shown as white boxed arrows. Upstream of cas9, there is a 16 nt-long anti-repeat sequence, which might be considered as a portion of the sequence that encodes for a putative tracrRNA (dark grey block). The distance between CRISPR1 and CRISPR2 is long and varies from strain to strain, thus the sequence between the two loci is omitted and is represented by a double-slash

The prediction of leader sequences

According to previous reports, the leader sequence located upstream of the first CRISPR repeat is rich in A/T bases and is up to 650 bp long [7-9]. In addition, the leader sequence lacks an ORF and contains a palindromic sequence. The leader sequence has also been shown to be conserved in the same species, and is a probable recognition sequence for the insertion of new spacers [7, 8]. It may also act as a promoter for CRISPR locus [9-11]. We defined the 107 bp-long non-coding sequence between the cas2 gene and the first repeat as the putative leader sequence of CRISPR1. All of the predicted leader sequences were compared using ClustalX (Fig. 2). The putative leader sequences showed significant similarities between each of the analyzed strains, apart from RA-CH-1, CH3, RCAD0111, and RCAD0121. The latter strains harbored 19 SNPs in each of the putative leaders, and the SNPs that were hosted in these strains were identical. The putative leaders contained an obviously long palindromic sequence, ‘AAACCCATTGCAATGGGTTT’, separated by 5 bp (Fig. 2). The GC content of the putative leaders was 25 %, while that of R. anatipestifer genomes was observed to be 35 % (on average). Several promotors were predicted, and the promoter that was predicted to have the highest score (1.00, the range was 0–1) was located between 17 and 62 bp in the 107 bp putative leader sequence.

Fig. 2

Multiple sequence alignment of putative leader sequences and the first repeats. The putative leader sequence (107 bp) and the first repeat (47 bp) of each CRISPR1 locus in analyzed strains were selected for multiple sequence alignment. The nucleotide sequences of the first repeats are framed by black box. The grey shadow and arrow indicate the palindromic sequence in leaders. The asterisk represents the conserved base in all of the sequences for alignment

Characteristics of direct repeat sequences

It has previously been reported that the length of conserved repeats varies from 21 to 48 nt [5]. These sequences have also been shown to contain palindromes. The 5’ terminal portion of a repeat is normally composed of the sequence GTTT (G) and the 3’ terminus contains the sequence GAAA (C/G) [45, 46]. Generally, repeats associated with the type II system are weakly palindromic, and are typically 36 nt in length [25, 47]. The two CRISPR loci in R. anatipestifer strains had different consensus direct repeat sequences and varied with respect to both length and content (Additional file 1: Table S2, Fig. 3). The length of the CRISPR1 consensus direct repeats was 47 nt, while the length of the CRISPR2 repeats was 36 nt. The consensus direct repeats associated with CRISPR1 contained a conserved 5’-GTTG terminus and a 3’-GCAAC terminus. A conserved 5’-GTTG terminus and a 3’-CCAAC terminus were present in CRISPR2. The direct repeat variants (DRVs) were observed as part of this analysis. Two cases of DRVs (ie T23C and G39A) occurred in the consensus direct repeats of CRISPR1. Both variants were observed in the CRISPR1 repeats of Yb2, RA-GD, and RCAD0147. Similarly, A20G and C27T DRVs were observed in CRISPR2. A20G appeared in the CRISPR2 repeats of strain 153, strain 17, RA-CH-2, RCAD0122, RCAD0125, RCAD0134, RCAD0188, RCAD0124, and RCAD0183, while C27T occurred in RCAD0133, RCAD0121, and RCAD0181.

Fig. 3

Features of repeats in R. anatipestifer CRISPR loci. a and d show sequence logo for consensus direct repeats associated with CRISPR1 in 20 R. anatipestifer strains and sequence logo for consensus direct repeats associated with CRISPR2 in five R. anatipestifer strains, respectively. The height of the letters indicates the relative frequency of the corresponding base at that position. b and c display predicted secondary structure of the CRISPR1 consensus direct repeats. The putative secondary structure of consensus direct repeats containing the T23C and G39A variants is shown in c, e, f and g display predicted secondary structure of the CRISPR2 consensus direct repeats. The putative secondary structures of consensus direct repeats containing the C27T variant and A20G variant are shown in (f) and (g), respectively The predicted secondary structures of the consensus direct repeats of CRISPR1 and CRISPR2 are shown in Fig. 3. A large number of the repeats in CRISPR1 were involved in a long stem-loop structure that was interrupted by a shorter stem-loop. Repeats containing the T23C and G39A variants produced similar secondary structures with only minimal size deviations noted for the two previously mentioned stem-loop structures. The majority of repeats in CRISPR2 had a large loop next to a small loop, while repeats containing the C27T DRV produced a small loop with a long stem only. Repeats in CRISPR2 that had been shown to contain the A20G DRV, showed more complicated stem-loop structures, which were characterized by two stem-loops occurring at a larger loop (containing a short stem).

Spacer organization

The main differences of CRISPR loci occurred due to spacer organization diversity. Spacer sequences derived from foreign nucleic acid sequences (ie bacteriophages and plasmids), have been shown to insert into CRISPR locus in the acquisition step. Moreover, it has been confirmed that new spacers generally integrate into the leader-proximal terminus [7, 48, 49]. Thus, the organization of spacers is likely to be reflective of the degree of evolution associated with CRISPR-Cas immune defense systems. Synthetically considering the similarity of the CRISPR arrays and the phylogeny of the strains (Fig. 4), CRISPR1 arrays can be divided into seven subgroups, and CRISPR2 arrays into five subgroups. And then we analyzed spacer organization in the same subgroup. The CRISPR1 and CRISPR2 loci with repeats eliminated are indicated in Figs. 5 and 6.

Fig. 4

Phylogenetic analysis by BPGA based on pangenome for 25 R. anatipestifer strains

Fig. 5

Spacer organization in CRISPR1. The CRISPR1 arrays from 24 R. anatipestifer strains are graphic represented. In order to analyze conveniently, the repeats have been eliminated and only the spacers are shown. The different subgroups are separated by the dotted lines. All arrays in the same subgroup are aligned manually. The abscissa indicates serial numbers of spacers, and ordinate displays names of analyzed R. anatipestifer strains. The direction of spacers is consistent with serial numbers (ie spacer 1 is located in 5’ terminal). Identical spacers are displayed by the same color, and are aligned such that they have the same number (apart from duplicate spacer). Additionally, unique spacer is displayed by a unique combination of color and number

Fig. 6

Spacer organization in CRISPR2. The CRISPR2 arrays from 21 R. anatipestifer strains are graphic represented as in Fig. 5. The asterisk represents the spacer with large size

Phylogenetic analysis by BPGA based on pangenome for 25 R. anatipestifer strains Spacer organization in CRISPR1. The CRISPR1 arrays from 24 R. anatipestifer strains are graphic represented. In order to analyze conveniently, the repeats have been eliminated and only the spacers are shown. The different subgroups are separated by the dotted lines. All arrays in the same subgroup are aligned manually. The abscissa indicates serial numbers of spacers, and ordinate displays names of analyzed R. anatipestifer strains. The direction of spacers is consistent with serial numbers (ie spacer 1 is located in 5’ terminal). Identical spacers are displayed by the same color, and are aligned such that they have the same number (apart from duplicate spacer). Additionally, unique spacer is displayed by a unique combination of color and number Spacer organization in CRISPR2. The CRISPR2 arrays from 21 R. anatipestifer strains are graphic represented as in Fig. 5. The asterisk represents the spacer with large size Of the CRISPR loci that were analyzed as part of this study, 153 different spacers were observed. On average, the number of spacers observed in a locus was 14 and 15 for CRISPR1 and CRISPR2, respectively. The CRISPR1 spacers were observed to be 29–30 nt in length, while the CRISPR2 spacers were 29–82 nt (Additional file 1: Table S2). Apart from Yb2, the CRISPR2 arrays possessed spacers with extreme sizes occurred in all CRISPR2-containing strains. Interestingly, distinct 3–6 CRISPR2 spacers with large size usually localized in terminal of the array in succession (Fig. 6). Spacer loss occurred in both CRISPR1 and CRISPR2 (Figs. 5 and 6). Moreover, the majority of disparities associated with CRISPR2 arrays resulted from the loss of one or more spacers. Spacer loss was observed to occur in two different formats: loss of two or more successive spacers (eg in CRISPR1 of RCAD0111, RCAD0183 and RCAD0181; in CRISPR2 of RCAD0150 and Yb2), and discontinuous loss (eg in CRISPR1 of RCAD0142, RCAD0123 and RCAD0125; in CRISPR2 of RCAD0122 and RCAD0123). Given the mechanisms that underpin spacer acquisition, the spacers in a CRISPR array differ from each other. However, there is an exception to this occurrence. On occasion, two or more tandem spacers are identical because of duplication events. Duplication of spacers was observed in both CRISPR1 and CRISPR2 (Figs. 5 and 6). Spacer duplication resulted in either a singularly copied unit (CRISPR1 spacer 2 of RCAD0134, strain 153, RCAD0188, RCAD0122, and RCAD0183; CRISPR1 spacer 78 of RCAD0123), or a single spacer duplicated multiple times (CRISPR2 spacer 8 of RCAD0122). As part of this analysis, we observed the presence of unique spacers that only occur in individual strains in the same subgroup. Sixty-three unique spacers were found in the CRISPR loci that were analyzed in total, 60 and 3 for CRISPR1 and CRISPR2, respectively (Figs. 5 and 6). Unique CRISPR2 spacers were present in RCAD0124 (spacer 23), RCAD0150 (spacer 1) and RCAD0142 (spacer 14). Each of these spacers was composed of a unique nucleotide sequence. In CRISPR1, we observed 60 unique spacers, characterized by the presence of either two or more contiguous spacers (eg spacer 45–46 of RCAD0111 and spacer 33–42 of RCAD0121; spacer 56–57 of RA-CH-2, spacer 97–102 of RCAD0188, spacer 58–63 of RCAD0122 and spacer 104–109 of RCAD0183; spacer 47–51 of strain 17; spacer 71–72 of RCAD0142 and spacer 73–76 of RCAD0131; spacer 83–86 of RCAD0123, spacer 89–90 of RCAD0125 and spacer 87–88 of RCAD0181; spacer 17–19 of RCAD0150), or a single sequence (eg spacer 44 of RA-CH-1 and spacer 43 of CH3; spacer 94 of RCAD0134, spacer 93 of strain 153 and spacer 103 of RCAD0124; spacer 53 of Yb2). The majority of the unique spacers were located at the leader-proximal terminus. This would suggest that these spacers were derived following spacer acquisition. Interestingly, the spacer organization of RCAD0133 CRISPR2 was identical to RCAD0181, Additionally, the same case was observed in RCAD0134, RA-CH-2, strain 153, RCAD0188 and RCAD0183; RA-GD, DSM15868 and ATCC11845; strain 17 and RCAD0125 (Fig. 6). Conversely, this phenomenon was rare in CRISPR1, ie, only the spacer organization of DSM15868 CRISPR1 and that of ATCC11845 were identical (Fig. 5). This finding was in agreement with the number of unique spacers of the two loci. Overall, the spacer organization of CRISPR1 was more diverse than that of CRISPR2. Furthermore, the spacer organization was varied from subgroup to subgroup. Combined with analysis of the phylogenetic tree of these strains (Fig. 4), we found the subgroups of CRISPR1 spacer organization could be correlated roughly with the clusters of the phylogenetic tree (Figs. 4 and 5), while this connection was weak in CRISPR2 (Figs. 4 and 6). The subgroups of spacer organization in CRISPR1 was not completely consistent with phylogeny on account of some exceptions as follow: first, from the phylogenetic tree, it was clear to see RCAD0133 was closely related to RCAD0121 which possessed CRISPR1 locus, but it did not harbor the locus; second, the spacer organization was similar in RCAD0125, RCAD0123, RCAD0181 and RCAD0127, but the four strains were distributed in distinct clades; third, RCAD0124, RCAD0122, RCAD0183 and Yb2 were clustered in the phylogenetic tree, while their spacer organization were not grouped in the same subgroup, the same observation was made for strain 17 and RA-GD.

Identification of phage/plasmid protospacers

Given that spacers play a significant role in defense mechanisms that are elicited against cognate foreign nucleic acids (ie bacteriophages and plasmids), we next analyzed whether R. anatipestifer spacers contained homology with phage or plasmid sequences. CRISPRTarget was used to identify putative protospacers [40], and hits that showed ≥87 % identity (≥26/30 nt) were deemed to be significant. As mentioned above, 153 spacers were identified in the 45 loci that were analyzed. Of these, only 8 % (13/153) of the spacer were similar to known phage or plasmid nucleotide sequences. Twelve of these thirteen spacers were located in CRISPR1 (Table 1). The putative protospacers were present in the genomes of Enterobacteria phage phi92, Riemerella phage RAP44 and plasmids of Azospirillum brasilense, Deferribacter desulfuricans and Pantoea sp. At-9b. The sequences that showed homology with Riemerella phage RAP44 usually displayed a perfect match, with more than one match being observed in one locus (eg spacer 80 and 89 of RCAD0125 CRISPR1; spacer 34, 36 and 38 of RCAD0121 CRISPR1; spacer 80 and 88 of RCAD0181 CRISPR1) (Table 1 and Fig. 5). There were two putative protospacers with identical identity for spacer 20. The same observation was made for spacer 48.

Table 1

Identification of phage/plasmid protospacers

Spacer Number	Description of protospacer	Position of protospacer	Similarity
¹6	Enterobacteria phage phi92	146275-146304	87 %
¹16	Riemerella phage RAP44	42519–42548	100 %
¹20	Riemerella phage RAP44/Deferribacter desulfuricans SSM1 plasmid megaplasmid pDF308 DNA	42842–42871/80620–80649	87 %/87 %
¹34	Riemerella phage RAP44	8954–8983	100 %
¹36	Riemerella phage RAP44	27500–27529	90 %
¹38	Riemerella phage RAP44	15935–15964	100 %
¹45	Riemerella phage RAP44	13831–13860	100 %
¹48	Azospirillum brasilense strain Az39 plasmid AbAZ39_p4/Azospirillum brasilense Sp245 plasmid AZOBR_p4	213466–213495/224417–224446	87 %/87 %
¹55	Riemerella phage RAP44	24539–24568	100 %
¹80	Riemerella phage RAP44	28468–28497	97 %
¹88	Riemerella phage RAP44	1300–1329	100 %
¹89	Riemerella phage RAP44	2509–2538	97 %
¹95	Pantoea sp. At-9b plasmid pPAT9B01	317614–317642	93 %
²38	Riemerella phage RAP44	32595–32623	97 %

The superscript “1” represents CRISPR1, and “2” represents CRISPR2. The spacer number is corresponding to that in Figs. 5 and 6

Identification of phage/plasmid protospacers The superscript “1” represents CRISPR1, and “2” represents CRISPR2. The spacer number is corresponding to that in Figs. 5 and 6 We also analyzed whether the putative protospacers displayed conserved PAM sequences. The protospacers with flanking sequence (20 bp on each side) were extracted and aligned. Unfortunately, whether in 5’ terminal or 3’ terminal, there was no obviously conserved motif. The T at 54th position was relatively conserved (Fig. 7).

Fig. 7

Prediction of PAM consensus sequence. The numbers associated with protospacer sequence are framed by red box. The height of the letters indicates the relative frequency of the corresponding base at that position

Characteristics of the cas operon

As previously mentioned, only three cas genes were observed upstream of CRISPR1, and no cas genes were found proximal to CRISPR2. Therefore, this analysis focused on the cas1, cas2, and cas9 genes that adjacent to CRISPR1. Previous reports have published analysis of the cas1 [50] and cas2 [51] genes in the R. anatipestifer reference strain, ATCC 11845. In this study, we performed an extensive analysis on the cas operon of all of the analyzed R. anatipestifer strains. The cas operon was composed of cas9, cas1 and cas2. This was consistent with a type-II system (subtype C) [47] (Fig. 1). There was a 12 bp-long interval between cas9 and cas1, while cas1 and cas2 had an 11 bp-long overlap. The length of the cas1 gene was observed to be identical in all of the strains analyzed. The same observation was made for cas2. On average, the size of predicted ORFs for cas1 genes was 897 bp and the GC content was 36 %. The cas1 gene was predicted to translate a protein composed of 299 amino acids. The predicted ORF for cas2 was 336 bp in length and the GC content was 36 %. The putative protein encoded by cas2 constituted a protein of 112 amino acids. Cas1 and Cas2 are the core proteins of the CRISPR-Cas system, and both are conserved in one species [5]. To determine whether the two Cas proteins are conserved in the R. anatipestifer strains, we performed multiple sequence alignments with the predicted amino acid sequences for Cas1 and Cas2, respectively. The results indicated that Cas2 was identical in the R. anatipestifer strains that were analyzed (Additional file 1: Figure S1). Cas1 proteins can be divided into two groups with identical sequences. The Cas1 of the four strains (RA-CH-1, CH3, RCAD0111 and RCAD0121) were the same, and the rest of the strains had identical amino acid sequence for Cas1. But in general, Cas1 was still remarkably similar in these analyzed strains (Additional file 1: Figure S2). Differences in the cas operon predominantly occurred in the cas9 gene. The majority of cas9 genes were 4200 bp in length, and encoded proteins composed of 1400 amino acids. However, exceptions existed. For example, the cas9 gene of RA-CH-1, CH3, and RCAD0121 was 4215 bp. This gene translated into a protein of 1405 amino acids. Additionally, the RCAD0111 cas9 gene was 4227 bp and the associated translation product was 1409 amino acids sequence. Similar to the observations pertaining to Cas1, diversity in Cas9 predicted amino acid sequences occurred in the four strains, ie RA-CH-1, CH3, RCAD0111 and RCAD0121 (Additional file 1: Figure S3). Although multiple sequence alignments associated with this protein indicated that there was so much variation in the four strains that were mentioned above, intact HNH and RuvC-like domains with conserved catalytic sites were present in all of the analyzed Cas9 protein sequences (Additional file 1: Figure S3). It has been reported that a phylogenetic analysis of Cas9 in 75 representative type II systems is consistent with the classification of type II systems characterized to date [25]. In order to observe the evolutionary relationship of Cas9 protein sequences between R. anatipestifer and other type II system-containing strains and determine the type of R. anatipestifer CRISPR-Cas system, we selected 39 Cas9 amino acid sequences from Gram-negative type II system-containing bacteria and RA-CH-2 for analysis. Next, we constructed a multiple sequence alignment and a phylogenetic tree for Cas9 (Additional file 1: Table S3, Figure S4, and Fig. 8). The phylogenetic tree (Fig. 8) showed that the Cas9 sequence from R. anatipestifer was closely related to that of Bacteroides fragilis NCTC 9343 and formed part of the subtype II-C subcluster.

Fig. 8

Phylogenetic tree for Cas9 from 40 Gram-negative type II system-containing bacteria. Cas9 proteins from 39 Gram-negative type II system-containing bacteria were selected for analysis (see Table S3). A phylogenetic tree based on Cas9 proteins from RA-CH-2 and 39 Gram-negative type II system-containing bacteria was constructed by neighbor-joining method using MEGA software (version 6). The bootstrap values of each node are shown. Different subclusters are enclosed by distinct brackets, and is in accordance with subtypes of type II system. The R. anatipestifer strain, RA-CH-2, is in red

Discussion

In this study, the structural characteristics and components associated with R. anatipestifer CRISPR-Cas systems were analyzed. Based on previous studies, we observed that the CRISPR1-cas locus in R. anatipestifer is highly similar to subtype II-C. Firstly, the CRISPR1-cas locus possessed the minimal number of cas genes required to formulate the cas operon, which is a characteristic of subtype II-C [30]. Secondly, repeats of type II systems are predominantly 36 nt in length, while the repeats of R. anatipestifer CRISPR1 were 47 nt. This is in agreement with the finding that unusually long repeats (up to 48 nt) are exclusively present in type II-C systems, especially in the Bacteroidetes phylum [25, 47]. Additionally, the R. anatipestifer Cas9 protein was obviously a member of the subtype II-C subcluster (as borne out by the phylogenetic tree). These observations suggest that the CRISPR1-Cas system of R. anatipestifer is a type II-C system. Compared with the relatively conserved repeats and cas genes, the spacer organization in R. anatipestifer was more diverse as a consequence of spacer acquisition. In CRISPR1, new spacers were typically inserted into the putative leader sequence-proximal end. This occurrence resulted in older and more conserved spacers being located in the leader-distal end, and numerous unique spacers were located in the leader-proximal end. By contrast, the spacer organization in CRISPR2 appears to be more stable. By inference, the CRISPR2 loci of R. anatipestifer are likely to have lost adaptive abilities during evolution. Although the reasons that have led to the occurrence of this phenomenon are currently unknown, similar observations have been made in Salmonella [52]. Intriguingly, the spacer organization of CRISPR arrays were not completely in agreement with the phylogenetic analysis of analyzed R. anatipestifer strains, suggesting that horizontal gene transfer (HGT) associated with CRISPR loci among R. anatipestifer strains had occurred. Interestingly, the four strains (RCAD0111, RA-CH-1, CH3 and RCAD0121) were distinguished from other analyzed strains in putative leader sequence and amino acid sequences of Cas1 and Cas9. This finding is consistent with the phylogeny of R. anatipestifer strains, namely, the four strains were clustered in the phylogenetic tree of R. anatipestifer strains. However, we cannot explain why there were minor variations and whether these variations impact biological activity of these elements at present. Nevertheless, two distinct endonuclease domains – HNH and RuvC-like, which both are essential for the cleavage function of Cas9 [21, 53] were present in each analyzed Cas9 protein sequence, though there was so much variation in the sequences. The tracrRNA is critical for processing and interference activities associated with type II systems [25, 54]. The repeat portion of pre-crRNA and the anti-repeat portion of tracrRNA pair up to form tracrRNA:pre-crRNA duplex in the presence of Cas9, and then the duplex is co-processed by the host factor RNase III. The mature crRNA (maintains in a complex with tracrRNA) guides Cas9 to cleavage target DNA during the interference step [20, 21, 54]. However, the tracrRNA is difficult to identify due to the diversity of sequence length, content, structure and localization [20, 25]. Therefore, we can only predict the anti-repeat sequence of tracrRNA-encoding DNA by analyzing the conservative form of base-pairing that is utilized by these repeat sequences. Upstream of the cas operon, where CRISPR1 was located, there was a 16 nt-long anti-repeat sequence, which might be regarded as a portion of the sequence that encodes for a putative tracrRNA. As mentioned above, the repeat sequence was comprised of 47 nucleotides, so the mismatches in repeat sequence may be combined with other factors or the 16 nt-long sequence is just sufficient for following activities. Likewise, the mismatches in tracrRNA-encoding sequence may be bound to other factors or form secondary structure to maintain structural stability. Given the complementarity between pre-crRNA and tracrRNA, the repeat sequences of the type II system are weakly palindromic. In other words, the repeat itself is not likely to form stem-loop structure that is deemed essential to pre-crRNA maturation in type I and III systems [20, 47]. Intriguingly, the repeat sequences in R. anatipestifer showed ordinary stem-loop structures, though there were some differences in the secondary structures resulting from DRVs. However, the molecular mechanisms elicited by subtype II-C are likely to be different from those that underpin type II systems. It has been confirmed that a streamlined type II-C system occurred in Neisseria meningitides [55]. In this system, the CRISPR locus was devoid of identifiable leader sequences containing external promoters, but possessed functional promoters embedded within each repeat. Moreover, RNase III-mediated processing was dispensable due to immediate transcription of short crRNAs from internal promoters in the repeats. Conversely, potential promoters were not found in the repeats of R. anatipestifer CRISPR1. Additionally, we predicted (with high scores) the occurrence of 107 bp-long leader sequences containing several putative promoters upstream of the CRISPR1 arrays. Palindromic sequences were also found in the putative leaders. Taken together, we speculate that the pre-crRNA processing pathway of the R. anatipestifer type II-C system is distinguishable from that of N. meningitides. Further analysis of RNA-seq is required to provide a more detailed insight into the actual molecular mechanism that underpins the type II-C system in R. anatipestifer. The CRISPR-Cas system is an adaptive immune system in prokaryotes that promotes the protection of bacteria and archaea from invasion by phages and plasmids [14-16]. Thus, we identified putative protospacers homologous with the spacers of R. anatipestifer. The only phage to share complete identity with the analyzed spacers was the phage of R. anatipestifer (RAP44) [56]. This suggests that the R. anatipestifer strains which contain the spacers homologous to RAP44 were likely infected by a phage related to RAP44, or a common ancestor have been attacked by the phage and transferred the spacer to its descendants. Therefore, these strains are likely resistant to the phage related to RAP44, and we need to identify the biological activity by experiment. It is possible that the spacers harbor identity with other R. anatipestifer bacteriophages, but many of these bacteriophages have not yet been characterized. This is likely because CRISPR-Cas system-mediated immunity leads to difficult isolation of R. anatipestifer phages. Other analyzed spacers matched foreign fragments derived from plasmids of Azospirillum brasilense, Deferribacter desulfuricans and Pantoea sp. At-9b, suggesting that R. anatipestifer might has hosted to these genetic elements or similar elements. Unfortunately, we found no consensus sequence associated with PAM in silico, which perhaps resulted from insufficient putative protospacers, and we may be able to discover functional PAM via experimental measures [57].

Conclusions

We have identified the structural characteristics and components of R. anatipestifer CRISPR-Cas system with comparative genomics analysis. The CRISPR1-Cas system of R. anatipestifer can be classified as subtype II-C, while the neighborless CRISPR2 with smaller variance could be inactive. The polymorphism of spacer organization may reflect the process of adaptation in R. anatipestifer. The features of the components associated with CRISPR1-Cas system suggest a distinct molecular mechanism utilized by R. anatipestifer. This research may assist in studying the specific mechanism associated with CRISPR-mediated adaptive immunity and other biological functions in R. anatipestifer.

54 in total

1. The National Center for Biotechnology Information.

Authors: D Benson; M Boguski; D Lipman; J Ostell
Journal: Genomics Date: 1990-02 Impact factor: 5.736

2. Multiple sequence alignment using ClustalW and ClustalX.

Authors: Julie D Thompson; Toby J Gibson; Des G Higgins
Journal: Curr Protoc Bioinformatics Date: 2002-08

3. Interaction of the Cas6 riboendonuclease with CRISPR RNAs: recognition and cleavage.

Authors: Ruiying Wang; Gan Preamplume; Michael P Terns; Rebecca M Terns; Hong Li
Journal: Structure Date: 2011-02-09 Impact factor: 5.006

4. Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems.

Authors: Sergey Shmakov; Omar O Abudayyeh; Kira S Makarova; Yuri I Wolf; Jonathan S Gootenberg; Ekaterina Semenova; Leonid Minakhin; Julia Joung; Silvana Konermann; Konstantin Severinov; Feng Zhang; Eugene V Koonin
Journal: Mol Cell Date: 2015-10-22 Impact factor: 17.970

5. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus.

Authors: Hélène Deveau; Rodolphe Barrangou; Josiane E Garneau; Jessica Labonté; Christophe Fremaux; Patrick Boyaval; Dennis A Romero; Philippe Horvath; Sylvain Moineau
Journal: J Bacteriol Date: 2007-12-07 Impact factor: 3.490

Review 6. An updated evolutionary classification of CRISPR-Cas systems.

Authors: Kira S Makarova; Yuri I Wolf; Omer S Alkhnbashi; Fabrizio Costa; Shiraz A Shah; Sita J Saunders; Rodolphe Barrangou; Stan J J Brouns; Emmanuelle Charpentier; Daniel H Haft; Philippe Horvath; Sylvain Moineau; Francisco J M Mojica; Rebecca M Terns; Michael P Terns; Malcolm F White; Alexander F Yakunin; Roger A Garrett; John van der Oost; Rolf Backofen; Eugene V Koonin
Journal: Nat Rev Microbiol Date: 2015-09-28 Impact factor: 60.633

7. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.

Authors: Elitza Deltcheva; Krzysztof Chylinski; Cynthia M Sharma; Karine Gonzales; Yanjie Chao; Zaid A Pirzada; Maria R Eckert; Jörg Vogel; Emmanuelle Charpentier
Journal: Nature Date: 2011-03-31 Impact factor: 49.962

8. Regulation of the Type I-F CRISPR-Cas system by CRP-cAMP and GalM controls spacer acquisition and interference.

Authors: Adrian G Patterson; James T Chang; Corinda Taylor; Peter C Fineran
Journal: Nucleic Acids Res Date: 2015-05-24 Impact factor: 16.971

Review 9. Comparative genomics of defense systems in archaea and bacteria.

Authors: Kira S Makarova; Yuri I Wolf; Eugene V Koonin
Journal: Nucleic Acids Res Date: 2013-03-06 Impact factor: 16.971

10. Protospacer recognition motifs: mixed identities and functional diversity.

Authors: Shiraz A Shah; Susanne Erdmann; Francisco J M Mojica; Roger A Garrett
Journal: RNA Biol Date: 2013-02-12 Impact factor: 4.652

7 in total

1. Pan-genome analysis of Riemerella anatipestifer reveals its genomic diversity and acquired antibiotic resistance associated with genomic islands.

Authors: Dekang Zhu; Zhishuang Yang; Jinge Xu; Mingshu Wang; Renyong Jia; Shun Chen; Mafeng Liu; Xinxin Zhao; Qiao Yang; Ying Wu; Shaqiu Zhang; Yunya Liu; Ling Zhang; Yanling Yu; Xiaoyue Chen; Anchun Cheng
Journal: Funct Integr Genomics Date: 2019-10-25 Impact factor: 3.410

2. Restriction-modification mediated barriers to exogenous DNA uptake and incorporation employed by Prevotella intermedia.

Authors: Christopher D Johnston; Chelsey A Skeete; Alexey Fomenkov; Richard J Roberts; Susan R Rittling
Journal: PLoS One Date: 2017-09-21 Impact factor: 3.240

3. A Novel RAYM_RS09735/RAYM_RS09740 Two-Component Signaling System Regulates Gene Expression and Virulence in Riemerella anatipestifer.

Authors: Ying Wang; Ti Lu; Xuehuan Yin; Zutao Zhou; Shaowen Li; Mei Liu; Sishun Hu; Dingren Bi; Zili Li
Journal: Front Microbiol Date: 2017-04-21 Impact factor: 5.640

4. Cas1 and Cas2 From the Type II-C CRISPR-Cas System of Riemerella anatipestifer Are Required for Spacer Acquisition.

Authors: Yang He; Mingshu Wang; Mafeng Liu; Li Huang; Chaoyue Liu; Xin Zhang; Haibo Yi; Anchun Cheng; Dekang Zhu; Qiao Yang; Ying Wu; Xinxin Zhao; Shun Chen; Renyong Jia; Shaqiu Zhang; Yunya Liu; Yanling Yu; Ling Zhang
Journal: Front Cell Infect Microbiol Date: 2018-06-12 Impact factor: 5.293

5. Comparative genomics and metabolomics analysis of Riemerella anatipestifer strain CH-1 and CH-2.

Authors: Jibin Liu; Anchun Cheng; Mingshu Wang; Mafeng Liu; Dekang Zhu; Qiao Yang; Ying Wu; Renyong Jia; Shun Chen; Xinxin Zhao; Shaqiu Zhang; Juan Huang; Xumin Ou; Sai Mao; Qun Gao; Xingjian Wen; Ling Zhang; Yunya Liu; Yanling Yu; Bin Tian; Leichang Pan; Mujeeb Ur Rehman; Xiaoyue Chen
Journal: Sci Rep Date: 2021-01-12 Impact factor: 4.379

6. CRISPR-Cas Systems in Bacteroides fragilis, an Important Pathobiont in the Human Gut Microbiome.

Authors: Mehrdad Tajkarimi; Hannah M Wexler
Journal: Front Microbiol Date: 2017-11-23 Impact factor: 5.640

7. Diversity and microevolution of CRISPR loci in Helicobacter cinaedi.

Authors: Junko Tomida; Yuji Morita; Keigo Shibayama; Ken Kikuchi; Tomohiro Sawa; Takaaki Akaike; Yoshiaki Kawamura
Journal: PLoS One Date: 2017-10-13 Impact factor: 3.240

7 in total