Literature DB >> 22811677

Comparative genomic and transcriptional analyses of CRISPR systems across the genus Pyrobaculum.

David L Bernick¹, Courtney L Cox, Patrick P Dennis, Todd M Lowe.

Abstract

Within the domain Archaea, the CRISPR immune system appears to be nearly ubiquitous based on computational genome analyses. Initial studies in bacteria demonstrated that the CRISPR system targets invading plasmid and viral DNA. Recent experiments in the model archaeon Pyrococcus furiosus have uncovered a novel RNA-targeting variant of the CRISPR system. Because our understanding of CRISPR system evolution in other archaea is limited, we have taken a comparative genomic and transcriptomic view of the CRISPR arrays across six diverse species within the crenarchaeal genus Pyrobaculum. We present transcriptional data from each of four species in the genus (P. aerophilum, P. islandicum, P. calidifontis, P. arsenaticum), analyzing mature CRISPR-associated small RNA abundance from over 20 arrays. Within the genus, there is remarkable conservation of CRISPR array structure, as well as unique features that are have not been studied in other archaeal systems. These unique features include: a nearly invariant CRISPR promoter, conservation of direct repeat families, the 5' polarity of CRISPR-associated small RNA abundance, and a novel CRISPR-specific association with homologues of nurA and herA. These analyses provide a genus-level evolutionary perspective on archaeal CRISPR systems, broadening our understanding beyond existing non-comparative model systems.

Entities: CellLine Chemical Disease Species

Keywords: CRISPR; Pyrobaculum; RNAseq; crRNA; repeat; sRNA

Year: 2012 PMID： 22811677 PMCID： PMC3396285 DOI： 10.3389/fmicb.2012.00251

Source DB: PubMed Journal: Front Microbiol ISSN： 1664-302X Impact factor: 5.640

Introduction

CRISPR immunity systems, like the vertebrate adaptive immune system (Boehm, 2011), include mechanisms to adapt to new pathogens, surveillance methods for detecting previously encountered pathogens, and means to inactivate those pathogens. In the case of the CRISPR system, the targeted molecule is a nucleic acid sequence, and the sequence of events moves from adaptation, where the invading nucleic acid sequence is recognized and acquired, to expression, where the CRISPR-specific small RNA recognition molecules (crRNA) are transcribed, processed and loaded by the CAScade complex of CRISPR-specific proteins (Brouns et al., 2008; Jore et al., 2011). The third phase, interference, is initiated upon detection of a targeted nucleic acid sequence and results in specific inactivation of the recognized nucleic acid from the invading “pathogen.” DNA of viral or plasmid origin has been shown to be the target of CRISPR defense in bacteria (Barrangou et al., 2007; Marraffini and Sontheimer, 2008) and the archaeon Sulfolobus solfataricus (Manica et al., 2011). RNA sequences are targeted in the CRISPR system present in Pyrococcus furiosus (Hale et al., 2009, 2012), opening the possibility of endogenous targeting of messenger RNA sequence. Most archaeal and many bacterial genomes contain one or more loci that encode the CRISPR system. Each CRISPR locus consists of an array of short DNA sequences, and frequently includes a cluster of CRISPR-associated (CAS) protein coding genes (Haft et al., 2005). The DNA arrays are composed of a leader sequence, followed by a set of 24–47 nucleotide (nt) direct repeats (DR) that form the delimiting punctuation of the array. The sequences between DR, termed spacers, are found to be 26–72 nt in length and encode small RNAs that are the stored immune memory for the system. The transcriptional promoter for the array is likely to be encoded within the leader sequence (Haft et al., 2005; Lillestol et al., 2009; Horvath and Barrangou, 2010). In Escherichia coli, the specific promoters for the array and associated CAS genes have been identified (Pul et al., 2010). CRISPR arrays are dynamic structures, some containing only a single sequence while others may be quite large; for example, crispr4 in Metallosphaera sedula is over 10,000 nt in length and contains over 160 spacer sequences (Grissa et al., 2007). The genomes of most strains of Methanococcus maripaludis contain only one CRISPR array locus whereas the genome of strain S2 has no CRISPR array present. In contrast, the genomes of Methanocaldococcus strains encode between seven and 20 individual CRISPR arrays. In Sulfolobus, recent work has shown that selective pressure can be introduced in vivo, which results in deletion of genomic loci containing all or part of the CRISPR/CAS system (Gudbergsdottir et al., 2011). Individual spacer elements in CRISPR arrays are acquired in the adaptation phase, during exposure to an invading genetic element. Evidence from surviving, phage-challenged cells shows an addition of one or more spacer sequences at the leader-proximal end of the array. These new spacer sequences are identical to phage sequence, can be from either phage genome strand, and confer immunity to survivor progeny (Barrangou et al., 2007). During this spacer acquisition phase, the target sequence is integrated into the array, likely through the action of CAS1, CAS2, and possibly other CAS proteins (for example, CSN2 in the Streptococcus thermophilus Type II system). This adaptation process only requires a single direct repeat in the array (Yosef et al., 2012). It is unclear if the acquired DNA spacer is derived directly from invading DNA, or if the DNA spacer is a copy produced during the adaptation process. The mechanism of immunity is still incompletely understood, but immunity is dependent on CAS genes (Barrangou et al., 2007; Brouns et al., 2008), usually located near one or more CRISPR arrays. Early studies showed that four CAS genes (cas1–4) were frequently associated with CRISPR arrays (Jansen et al., 2002; Haft et al., 2005). A role in CRISPR adaptation (acquisition of new spacers) has been proposed for cas1 and cas2 (Wiedenheft et al., 2012). Potentially, CAS4 is also involved during the acquisition phase; this hypothesis is based on the frequent cas4 genomic proximity to cas1 (Makarova et al., 2011). The CAS genes have recently been reclassified into three main families based on gene content and mode of action of the associated system (Makarova et al., 2011). In Type I, II, and III-A CRISPR systems (Makarova et al., 2011), the target of the CRISPR immunity system is invading DNA (Marraffini and Sontheimer, 2008). In contrast, Type III-B systems target RNA instead of DNA (Hale et al., 2009, 2012). Type I systems have been studied in both bacteria and archaea, and have recently yielded low-resolution structures of the multimeric CAScade complex in both E. coli (Jore et al., 2011; Wiedenheft et al., 2011) and in the archaeon Sulfolobus solfataricus (Lintner et al., 2011). In Type I systems, the CAScade complex is required for maturing of CRISPR RNA (crRNA) that guide protective immunity during subsequent invasion by foreign DNA elements. This crRNA-enabled complex is also responsible for surveillance and eventual interference by recruiting additional CAS proteins (Wiedenheft et al., 2011). The primary transcript of the CRISPR array, pre-crRNA, is cleaved within the DR to generate the individual crRNA segments. In the Sulfolobus variant of CAScade, CAS6 is responsible for cleavage of pre-crRNA, while in E. coli this role is carried out by CAS6e, also known as CasE (Brouns et al., 2008). The short RNA segments that are released from pre-crRNA processing retain an 8 nt 5′ “handle” sequence from the upstream DR as part of the mature crRNA (Brouns et al., 2008). Processing of pre-crRNA transcripts in Sulfolobus has been reported to proceed from the 3′ distal end toward the 5′ leader sequence (Lillestol et al., 2009). It is unclear how this 3–5′ directionality is established, given the site-specific endonucleolytic nature of CAS6 (Carte et al., 2008). The Type III-B RNA-targeting CRISPR systems have been investigated in Pyrococcus furiosus (Hale et al., 2009, 2012) and in Sulfolobus solfataricus (Zhang et al., 2012). These systems include the cmr family of CAS genes along with the nearly ubiquitous cas1, cas2, and cas6. The cmr complex is composed of the protein products of cmr1, cas10, and cmr3–cmr6, plus the cas6-derived crRNA. In Sulfolobus, an additional cmr component, cmr7, joins the complex. All CRISPR systems examined to date load crRNAs with 5′ OH ends, although the crRNA length and mature state of the 3′ end varies by CRISPR type and by species. We have therefore utilized a cloning strategy that is independent of 5′ end chemistry and partially independent of 3′ end chemistry. In this study, we show linkage of CAS protein types with families of CRISPR arrays, conservation of CRISPR array elements across the genus, a novel nurA-csm6-herA gene cluster associated with Pyrobaculum CRISPR arrays, and provide transcriptional support for polarity in crRNA abundance.

Methods

Culture conditions

P. aerophilum cells were grown anaerobically in media containing 0.5 g/L yeast extract, 1X DSM390 salts, 10 g/L NaCl, 1X DSM 141 trace elements, 0.5 mg/L Fe(SO4)2(NH4)2, pH 6.5, with 10 mM NaNO3. P. islandicum and P. arsenaticum cells were grown anaerobically in media containing 10 g/L tryptone, 2 g/L yeast extract, 1X DSM390 salts, 1X DSM88 trace elements, and 20 mM Na2S2O3. P. calidifontis cells were grown aerobically in 1L flasks using 500 ml media containing 10 g/L tryptone, 2 g/L yeast extract, 1X DSM88 trace metals, 15 mM Na2S2O3, pH 6.8, loosely capped with moderate shaking at 125 rpm. Anaerobic cultures were grown in 2L flasks with 1L media, prepared under nitrogen with resazurin as a redox indicator at 0.5 mg/L; 0.25 mM Na2S was added as a reductant. All cultures were grown at 95C to late log or stationary phase, monitored at OD600. The 10X DSM390 salts are comprised of (per liter ddH2O) 1.3 g (NH4)2SO4, 2.8 g KH2PO4, 2.5 g MgSO4·7H2O. The 100X DSM88 trace metal solution is comprised (per liter 0.12N HCl), 0.9 mM MnCl2, 4.7 mM Na2B4O7, 76 μM ZnSO4, 25 μM CuCl2, 12.4 μM NaMoO4, 18 μM VOSO4, 6 μM CoSO4. The 100X DSM141 trace metal solution is comprised of 7.85 mM Nitrolotriacetic acid, 12.2 mM MgSO4, 2.96 mM MnSO4, 17.1 mM NaCl, 0.36 mM FeSO4, 0.63 mM CoSO4, 0.68 mM CaCl2, 0.63 mM ZnSO4, 40 μM CuSO4, 42 μM KAl(SO4)2, 0.16 mM H3BO3, 41 μM Na2MoO4, 0.1 mM NiCl2, 1.14 μM Na2SeO3.

cDNA library preparation

The cDNA libraries were prepared using small RNA fractions collected from cells grown to stationary and exponential phase, using methods previously described (Bernick et al., 2012), with brief details given in Results. These two preparations were constructed for each of P. aerophilum, P. islandicum, P. arsenaticum, and P. calidifontis cultures, yielding a total of eight cDNA libraries. The 3′ end chemistries of crRNA have been reported as either 2–3′ cyclic phosphate (Hale et al., 2012; Jore et al., 2011), or as 3′ OH (Hatoum-Aslan et al., 2011; Zhang et al., 2012). Under the acidic conditions (pH 5) used in RNA preparation in this study, we expect an equilibrium population of 3′ OH terminated RNA to exist under either scenario, providing a cloning method that is semi-independent of 3′ end chemistry.

Sequencing and read mapping

Sequencing was performed using a Roche/454 GS FLX sequencer, and the GS emPCR Kit II (Roche). Sequencing reads in support of this work are provided online via the UCSC Archaeal Genome Browser (http://archaea.ucsc.edu) (Chan et al., 2012). Reads that included barcodes and sequencing linkers were selected from the raw sequencing data and used to identify reads from each of the eight pooled cDNA libraries. Reads were further consolidated, combining identical sequences with associated counts for viewing with the Archaeal Genome Browser. Reads were mapped to the appropriate genome [P. aerophilum (NC_003364.1); P. arsenaticum (NC_009376.1); P. calidifontis (NC_009073.1); P. islandicum (NC_008701.1); P. oguniense (NC_016885.1); P. neutrophilum (T. neutrophilus: NC_010525.1)] using BLAT (Kent, 2002), requiring a minimum of 90% identity (-minIdentity), a maximal gap of 3 (-maxIntron) and a minimum score (matches minus mismatches) of 16 (-minScore) using alignment parameters for this size range (-tileSize = 8-stepSize = 4). Reads that mapped equally well to multiple positions in the genome were excluded from this study. The remaining, uniquely mapped reads were formatted and visualized as BED tracks within the UCSC Archaeal Genome Browser.

Computational prediction of orthologous gene clusters

Computational prediction of orthologous groups was established by computing reciprocal best BLASTP (Altschul et al., 1990) (RBB) protein coding gene-pairs among pairs of four Pyrobaculum species. When at least three RBB gene-pairs select the same inter-species gene set (for example A pairs with B, B pairs with C, and C pairs with A), the cluster is considered an orthologous gene cluster.

CRISPR array mapping

Arrays were predicted using CRISPRfinder (Grissa et al., 2007). Arrays were merged in some cases-based on sequencing data evidence.

Results

CRISPR/CAS protein families

Three distinct types of CAS gene clusters exist within the six Pyrobaculum species examined (Figure 1 and Table A1) (Makarova et al., 2011). In most Pyrobaculum species, the Type I system is present, organized in submodules. Typically we find a submodule that includes: cas1, cas2, cas4, and a cas4 variant herein referred to as cas4′, previously described as csa1 (Haft et al., 2005) (submodule abbreviation cas4′-1-2-4). A second submodule is found nearby, comprising cas6, cas7, cas5, cas3′, cas3″, and cas8a2 (abbreviated cas6-7-5-3′−3″-8a2) (Figure 1). With the exception of P. islandicum, each species in the genus has these submodules or close variants, and one or more submodules may be duplicated. In some cases, terminal members of the submodule may be relocated, such as cas6 in P. calidifontis or P. neutrophilum. Type I subtypes are defined by the presence of specific genes: cas8a1 or cas8a2 (subtype I-A); cas8b (subtype I-B); cas8c (subtype I-C); cas10d (subtype I-D); cse1 (subtype I-E); and csy1 (subtype I-F) (Makarova et al., 2011). P. aerophilum, P. oguniense, and P. neutrophilum contain cas8a2, so fall within the definitive Type I-A subtype. P. arsenaticum and P. calidifontis do not appear to contain any recognized signature genes, so the subtype remains indeterminate. Notably, the Type I system is completely absent from P. islandicum.

Figure 1

Table A1

Gene reannotations in .

Locus	Function	Strand
Pyrobaculum aerophilum
PAE0067	cas3″	−
PAE0068	cas3	−
Crispr1		−	39812	40776
PAE0075	cas6	−
PAE0077	csx1	+
PAE0079	cas4	−
PAE0080	cas2	−
PAE0081	cas1	−
PAE0082	cas4′	−
Crispr2		+	45503	46687
PAE0109	cas10	−
PAE0111	csm5	−
PAE0112	csm4	−
PAE0114	csm3	−
PAE0115	csm2	−
PAE0117	csx1	−
PAE0119	csx1	+
PAE0122	nura	+
PAE0124	csm6	+
PAE0126	csm6	+
PAE0128	hera	+
PAE0131	cas6	+
PAE0181	cas6	−
Crispr3		+	95531	101005
PAE0198	cas4	−
PAE0199	cas2	−
PAE0200	cas1	−
PAE0201	cas4′	−
PAE0202	csx1	−
PAE0205	cas8a2	−
PAE0207	cas3″	−
PAE0208	cas3	−
PAE0209	cas5	−
PAE0210	cas7	−
PAE0212	cas6	+
Crispr4	+		268866	269081
Crispr5	−		591745	592220
Crispr6/7	−		1898722	1899654
Pyrobaculum arsenaticum
Pars_1108	herA	−
Pars_1109/10	csm6	−
Pars_1111	nurA	−
Crispr2		+	999187	1001495
Pars_1114	cmr6	−
Pars_1115	cmr1	−
Pars_1116	cmr5	−
Pars_1117	cmr4	−
Pars_1118	cas10	+
Pars_1119	cmr3	+
Pars_1120	csx1	+
Crispr3		+	1012951	1018930
Pars_1121	cas4	−
Pars_1122	cas2	−
Pars_1123	cas1	−
Pars_1124	cas4′	+
Pars_1127	cas7	+
Pars_1128	cas5	+
Pars_1130	cas3	+
Pars_1131	cas3″	+
Pars_1133	cas6	+
Pars_1134	csx1	−
Pars_1145	cas2	−
Pars_1147	cas4′	−
Crispr5		+	1039190	1039289
Crispr6		−	1307876	1308104
Pyrobaculum calidifontis
Pcal_0261	cas6	−
Pcal_0263	cas1	−
Pcal_0265	cas4′	−
Crispr1		−	260542	260703
Pcal_0266	csx1	−
Crispr2		+	264904	265204
Pcal_0270	csx1	−
Pcal_0271	cmr3	−
Pcal_0272	cas10	−
Pcal_0273	cmr4	+
Pcal_0274	cmr5	+
Pcal_0275	cmr1	+
Pcal_0276	cmr6	+
Pcal_0277	csx1	+
Crispr3		−	277746	277908
Pcal_1267	cas3″	−
Pcal_1268	cas3	−
Pcal_1270	cas5	−
Pcal_1271	cas7	−
Pcal_1273	csx1	−
Pcal_1274	cas4′	+
Pcal_1275	cas1	+
Pcal_1276	cas2	+
Pcal_1277	cas4	+
Crispr4		−	1185256	1185816
Pcal_1278	cas6	−
Crispr5		−	1188156	1190531
Pcal_1280	csx1	−
Pcal_1281	csm6	−
Pcal_1283	cmr3	−
Pcal_1284	cas10	−
Pcal_1285	cmr4	+
Pcal_1286	cmr1	+
Pcal_1287	cmr6	+
Crispr6		+	1203351	1205855
Pcal_1294	nurA	+
Pcal_1295	csm6	+
Pcal_1296	herA	−
Crispr7		−	1669194	1669346
Pyrobaculum islandicum
Crispr1		−	34	1216
Crispr2		+	38866	39842
Crispr3		+	1404032	1404192
Pisl_1541	cas10	−
Pisl_1542	csm5	−
Pisl_1543	csm4	−
Pisl_1544	csm3	−
Pisl_1545	csm2	−
Crispr4		−	1413797	1414026
Pisl_1932	cas6	−
Crispr5		−	1756971	1759456
Pyrobaculum oguniense
Crispr1			937975	938897
Pogu_1100	cas4′	+
Pogu_1101	cas1	+
Pogu_1102	cas2	+
Pogu_1106	cas6	+
Crispr2			945613	946605
Crispr3			952361	953217
Pogu_1118	cmr6	−
Pogu_1119	cmr1	−
Pogu_1125	cas10	−
Pogu_1126	csm5	−
Pogu_1127	csx1	−
Pogu_1128	csm3	−
Pogu_1135	csx1	+
Pogu_1138	cas6	−
Pogu_1143	cas8a2	−
Pogu_1144	cas3″	−
Pogu_1145	cas3	−
Pogu_1146	cas5	−
Pogu_1147	cas7	−
Pogu_1149	csx1	−
Pogu_1150	cas4′	+
Pogu_1151	cas1	+
Pogu_1152	cas2	+
Pogu_1153	cas4	+
Crispr4			986121	987397
Pogu_1154	csx1	−
Pogu_1155	cmr3	−
Pogu_1156	cas10	−
Pogu_1157	cmr4	+
Pogu_1158	cmr5	+
Pogu_1159	cmr1	+
Pogu_1160	cmr6	+
Crispr5			999562	1002179
Pogu_1165	nurA	+
Pogu_1166/7	csm6	+
Pogu_1168	herA	+
Pyrobaculum neutrophilum
Crispr1		+	511830	513709
Tneu_0562	cmr3	−
Tneu_0563	cas10	−
Tneu_0564	cmr4	+
Tneu_0565	cmr5	+
Tneu_0566	cmr1	+
Tneu_0567	cmr6	+
Tneu_0572	csx1	+
Crispr2		−	526375	526738
Tneu_0576	cas4	−
Tneu_0577	cas2	−
Tneu_0578	cas1	−
Tneu_0579	cas4′	−
Crispr3		+	530828	531454
Crispr4		+	849844	851759
Crispr5		+	856227	857471
Crispr6		+	883097	885730
Tneu_0994	cas7	+
Tneu_0995	cas5	+
Tneu_0997	cas3	+
Tneu_0998	cas3″	+
Tneu_0999	cas6	+
Crispr7		+	994025	995068
Tneu_1114	cas4	+
Tneu_1128	cas6	+
Crispr8		+	1017598	1019142
Tneu_1132	cas8a2	−
Tneu_1133	cas3″	−
Tneu_1134	cas3	−
Tneu_1135	cas5	−
Tneu_1136	cas7	−
Tneu_1138	csx1	−
Tneu_1139	cas4′	+
Tneu_1140	cas1	+
Tneu_1141	cas2	+
Tneu_1142	cas4	+
Crispr9		−	1030559	1032170
Tneu_1143	nurA	+
Tneu_1144	csm6	+
Tneu_1145	herA	−
Crispr10		+	1035988	1038486
Tneu_1149	csm3	+
Tneu_1150	csm4	+
Tney_1151	csm5	+
Tneu_1152	cas10	+
Tneu_1154	csx1	+

Genomic arrangement of CRISPR modules within Colors indicate Type I CAS modules (red), Type III CAS modules (blue) and CRISPR-associated nurA-csm6-herA clusters (green). CRISPR DNA arrays represented as vertical bars. Arrangements of multiple genes in the same CAS family are indicated using filled triangles (cas1-cas2 is indicated as cas12). Genomic distances greater than 10 Kb are indicated using diagonal slashes (“/”). Gene strand is indicated relative to the solid black line for positive (above) and negative (below) orientations. Most Pyrobaculum species encode both Type I and III CAS modules; P. islandicum encodes only a Type III CAS module. A second CAS group, the Type III-B family of RNA-targeting CAS genes, is present in four Pyrobaculum species but not in P. aerophilum or P. islandicum. Again, this second family is present as submodules, with cmr4, cmr5, cmr1, and cmr6 (cmr4-5-1-6) adjacent but on the opposite strand of the cmr3-cas10 submodule. One or both of these submodules include csx1, and are currently classified as members of Type III-U (unclassified Type III). We find that csx1 also appears in the Type I modules, so this suggests a broader role for csx1 among Pyrobaculum CAS modules. The third kind of module found in the genus, Type III-A (csm), appears to be complete in P. aerophilum, and is the only apparent CAS family found in P. islandicum. Previously, Makarova suggested that CRISPR adaptation for Type III families may require use of cas1 and cas2 in trans from a resident Type I family member (Makarova et al., 2011). However, this option is unavailable in P. islandicum, suggesting that adaptation for Pyrobaculum Type III systems may not require cas1-cas2. Possibly, an undescribed enzyme fulfills this role, or P. islandicum may have lost the ability to further adapt its CRISPR arrays. Curiously, csm6 is absent from P. islandicum, but is present in every other species examined in this study. This is notable because csm6 would be expected to be part of the Type III-A system in P. islandicum, and would not be expected in species that do not encode a complete Type III-A module. Both P. oguniense and P. neutrophilum encode a portion of the Type III-A module (csm3-csm5-cas10-csx1) but both species are missing csm2 and csm4. Where csm6 is present, it is located next to a conserved paralog of nurA and herA; these genes are near a CRISPR array in species of Pyrobaculum, Thermoproteus, and Vulcanisaeta, suggesting that this arrangement is widespread among the Thermoproteales. The nurA-herA protein complex is comprised of a 5–3′ DNA exonuclease (nurA) and a bidirectional helicase (herA) with probable involvement in homologous recombination (HR) (Constantinesco et al., 2004). HR processing requires a 3′ single stranded DNA (ssDNA) resection of chromosomal ends resulting from a double-strand break, and in thermophilic archaea, that resection is carried out by the helicase-nuclease complex of HerA-NurA (Blackwood et al., 2012). In most Pyrobaculum spp., there are three or more paralogs of this gene-pair, one of which is clustered with csm6 and near a CRISPR array (Figure 1). Computationally predicted orthologs of the CRISPR-associated nurA-herA genes (RBB) show that this pair has been retained throughout the Pyrobaculum genus and more broadly among the Thermoproteales (Figure 2 and Table A2). In P. islandicum, however, the CRISPR-associated nurA-herA pair and csm6 are absent. We propose that the nurA-csm6-herA complex may be associated with adaptation in Pyrobaculum species. Three possibilities arise from this proposal: (1) adaptation in P. islandicum may have been lost; (2) adaptation in P. islandicum may occur using an alternative mechanism, possibly one of the nurA-herA paralogs; or (3) the nurA-csm6-herA trio may only be required in Type I CRISPR systems (Yosef et al., 2012).

Figure 2

NurA and HerA gene families in Pyrobaculum species encode at least three nurA-herA gene-pairs, shown above to display parallel phylogenetic structure (left and right panels), with genomically adjacent genes matched by color. Within each gene clade (red, green or blue), the relationship of genes follows the expected species tree, indicating robust ortholog groups. The CRISPR-associated nurA-herA pairs (shown in red above) maintain the paired relationship throughout the Thermoproteales. An unmatched paralogous group of herA genes appears in black. The Vulcanisaeta distributa nurA-herA genes are provided as outgroups of the Pyrobaculum genus; note that only the CRISPR-associated nurA-herA pair (Vdis_1157-Vdis_1158) is collocated in V. distributa, further supporting the tight functional association. Bootstrap percentages are included in black (100% unless otherwise noted). Gene name prefixes: Vdis—Vulcanisaeta distributa; PAE—Pyrobaculum aerophilum; Pars—P. arsenaticum; Pisl—P. islandicum; Pcal—P. calidifontis; Pogu—P. oguniense; Tneu—P. neutrophilum. Multiple alignment of amino-acid sequences was performed using Clustal-Omega (v 1.04), maximum-likelihood trees were computed using Tree-puzzle (v 5.02), and the final tree was visualized using newicktops (v 1.0), available through the Pasteur Institute's Mobyle portal (http://mobyle.pasteur.fr/).

Table A2

NurA and HerA paralogs in .

NurAFamily	Pfam Evalue	HerA Family	Blastp Evalue
PAE0122	2.2E-16	PAE0128	7.0E-06
Pcal_1294	7.9E-11	Pcal_1296	4.0E-05
Pisl	NA	Pisl_	NA
Pars_1111	5.3E-12	Pars_1108	4.0E-05
Pogu_1165	7.3E-11	Pogu_1168	1.0E-05
Tneu_1143	1.4E-09	Tneu_1145	2.0E-05
Vdis_1157	5.0E-07	Vdis_1158	3.0E-05

PAE2154	3.3E-43	PAE2155	9.0E-05
Pcal_1069	1.1E-28	Pcal_1070	1.0E-05
Pisl_0942	2.9E-33	Pisl_0941	8.0E-06
Pars_0817	2.5E-31	Pars_0816	3.0E-05
Pogu_1515	2.4E-31	Pogu_1516	6.0E-05
Tneu_1343	1.5E-31	Tneu_1344	4.0E-05
Vdis_1272	1.2E-18	Vdis_1306	5.0E-12

PAE2902	1.6E-65	PAE2903	5.0E-40
Pcal_0359	6.1E-47	Pcal_0358	3.0E-38
Pisl_0299	1.7E-43	Pisl_0300	1.0E-37
Pars_0982	1.2E-45	Pars_0983	4.0E-40
Pogu_1338	2.9E-46	Pogu_1337	5.0E-40
Tneu_1822	1.9E-50	Tneu_1821	2.0E-38
Vdis_0977	5.0E-21	Vdis_0630	1.0E-34

		PAE2998	9.0E-23
		Pcal_1112	4.0E-22
		Pisl_0395	1.0E-24
		Pars_1361	4.0E-22
		Pogu_0853	7.0E-22
		Tneu_1732	5.0E-21
		Vdis_0944	3.0E-19

E-values for NurA family paralogs are established using Pfam 26.0 (November 2011) (Punta et al., 2012); E-values for HerA family paralogs are established with Blastp (Altschul et al., 1990), using Sulfalobus solfataricus HerA (SSO2251) as the query and the specific species as the target (wordsize 2, Blosum45 score matrix, Gap existence 13, Gap extension 3). The CRISPR-associated NurA-HerA paralogs are shown in red, and the putative ortholog of NurA-HerA involved in homologous recombination is shown in blue.

CRISPR arrays

We have characterized three distinct families of CRISPR arrays present among six sequenced Pyrobaculum genomes (Table 1). These three families are defined by the sequences central to the DR and typically contain an A-rich core of 3–5 nt. These central motifs are flanked by short reverse complement (RC)-palindromes. The DR is terminated by an 8 nt-long sequence that becomes the 5′ handle of the mature crRNAs (Brouns et al., 2008). The various Pyrobaculum species encode between four and seven CRISPR arrays within their respective genomes. Except for P. islandicum, all species contain one or more representatives of family I and at least one additional representative from family III.

Table 1

Type	5′ motif	p	core	p′		5′ crRNA handle	P. aerophilum	P. arsenaticum	P. islandicum	P. neutrophilum	P. calidifontis	P. oguniense
I	GAAT	CTC	AAAAA	GAG	G	ATTGAAAG	1	3				2
	GAAT	CTC	AAGAA	GAG	G	ATTGAAAG					4
	GAAT	CTC	AAAGA	GAG	G	ATTGAAAG				2
	GAAT	CTC	AAGTT	GAG	G	ATTGAAAG				2^*
	GATT	CTC	AGATA	GAG	A	TTTGAAGG				1
III-B	GAGAAT	CCCC	AAA	GGGG		GTAGAAAC					3
III-A	CCAGAA	ATC	AAAA	GAT	A	GTTGAAAC	4					1
III	CCAGAA	ATC	AAAA	GAT	A	GTAGAAAC			5	5
III-B	GTCAAA	ATC	AAAA	GAT	A	GTTGAAAC		1				1^*

Alignment of direct repeats across known Pyrobaculum species. Pyrobaculum DR sequences include a variable length 5′ motif, two short inverted repeats (p and p′) surrounding an A-rich core region, followed by one or zero nucleotides, and ending in what will become the 5′ handle of processed crRNA. Identical motifs are shown in gray below first instance. Numbers in species columns refer to number of CRISPR arrays harboring DRs of that type. Asterisk (*) indicates DR mixing has occurred in one of the CRISPR arrays in this species. The associated CAS type is inferred by adjacency to an array using that DR family.

. Alignment of direct repeats across known Pyrobaculum species. Pyrobaculum DR sequences include a variable length 5′ motif, two short inverted repeats (p and p′) surrounding an A-rich core region, followed by one or zero nucleotides, and ending in what will become the 5′ handle of processed crRNA. Identical motifs are shown in gray below first instance. Numbers in species columns refer to number of CRISPR arrays harboring DRs of that type. Asterisk (*) indicates DR mixing has occurred in one of the CRISPR arrays in this species. The associated CAS type is inferred by adjacency to an array using that DR family. A single array may include multiple families of DR sequences, as found in crispr1 of P. oguniense and crispr5 of P. neutrophilum. In these unusual cases, the DRs are clustered; for example in the P. neutrophilum case, the type I DR array begins with 11 repeats using the “AAGTT” core, followed by a set of four repeats mixing “AAAAA” with “AAAGA” cores, and terminating with three “AAAGA” core repeats. In P. oguniense, crispr1 has eight repeats with a 5′ motif of “GTCAAA” and five repeats with a 5′ motif of “CCAGAA.” In both cases where DR mixing was observed, the array type (based on CAS proteins) is maintained (Table 1). Previous studies in E. coli have shown that new DRs are added to an array during adaptation, by copying the first DR in the array (leader-proximal) (Yosef et al., 2012). We note that non-mixed arrays exist in P. neutrophilum whose leader-proximal repeats include the “AAAGA” and “AAGTT” cores. Potentially, DR mixing may come about through HR (duplication) events, or possibly by copying a leader-proximal DR from another array during adaptation. A 5′ promoter-like sequence (AAAAACTTAAAAA) is ultra-conserved with only three single nt polymorphisms among all 37 CRISPR arrays in the six Pyrobaculum species studied. The same promoter-like element is also associated with some tRNA genes in these genomes. The sequence variation in the corresponding promoter elements for other genes is commonly much more diverse. This finding suggests that the invariant CRISPR promoter sequence is maintained either through strong purifying selection or through frequent gene-conversion (Liao, 2000). CRISPR/CAS protein families appear to be associated with arrays of a given sequence family. This association is upheld to the CAS type, but does not extend to the subtype. For example, in P. islandicum, the only CAS family present is Type III-A (Figure 1) and the five encoded arrays in that species use a single DR type (Table 1). This same DR is also found in P. neutrophilum next to a Type III-B CAS cluster. In a second example, the mixed crispr1 in P. oguniense is made up of DRs associated with Type III-A CAS clusters as found in P. aerophilum, and Type III-B CAS clusters, as found in P. arsenaticum. Both of these examples demonstrate the association of CAS types (not subtypes) with CRISPR array families in the Pyrobaculum genus. Pre-crRNA transcripts are subjected to endonucleolytic processing to yield individual crRNA sequences, which we detect within small-RNA libraries. Deep sequencing from four Pyrobaculum species yielded thousands of sequencing reads, representing between 3% (P. arsenaticum) and 20% (P. islandicum) of the total sequencing reads in the 20–70 nt size range (Table 2).

Table 2

CRISPR crRNA abundance (counts) in .

Species	CRISPR id	Type	Size	expo	stat	total
P. aerophilum	crispr1	III	13	361	146	507
	crispr2	III	17	342	91	433
	crispr3	I	80	1298	417	1715
	crispr5			degenerate array
	crispr7/6	III	11	305	101	406
	sum			2306	755	3061
	Total RNA			17,785	13,042	30,827
	crispr%			13.0%	5.8%	9.9%
P. arsenaticum	crispr2	I	34	178	339	517
	crispr3	I	84	183	230	413
	crispr4			degenerate array
	crispr5	III		degenerate array
	crispr6	I	6	5	10	15
	sum			366	579	945
	Total RNA			14,854	16,352	31,206
	crispr%			2.5%	3.5%	3.0%
P. islandicum	crispr1	III	17	691	455	1146
	crispr2	III	14	635	349	984
	crispr3	III	2	627	586	1213
	crispr4	III	3	594	416	1010
	crispr5	III	34	2363	1661	4024
	sum			4910	3467	8377
	Total RNA			28,128	14,823	42,951
	crispr%			17.5%	23.4%	19.5%
P. calidifontis	crispr1	III	2	545	340	885
	crispr2	III	3	302	226	528
	crispr3	III	2	156	150	306
	crispr4	I	8	180	85	265
	crispr5	I	35	233	270	503
	crispr6	I	36	274	248	522
	crispr7	I	2	12	13	25
	sum			1702	1332	3034
	Total RNA			22,102	17,192	39,294
	crispr%			7.7%	7.7%	7.7%

CRISPR crRNA abundance (counts) in . The abundance of individual crRNAs appears to be related to their position within the array (Figure 3). Abundance is generally highest when the spacer is located in the leader-proximal (5′) portion of the array, and decays distally (3′) (Figure 4), as seen in Pyrococcus (Hale et al., 2012). This pattern is evident in most Pyrobaculum arrays that contain more than five spacers. We also see significant variation in crRNA abundance against this decaying background pattern as described for Sulfolobus species (Zhang et al., 2012).

Figure 3

Figure 4

CRISPR small RNA abundance density in Abundance is depicted on the vertical axes, in each species at every spacer position in the array. The horizontal axes plot crRNA genetic origin in terms of spacer number within CRISPR arrays, starting at the 5′ end of arrays. The abundance profile appears higher at the 5′ portion CRISPR arrays, with considerable variation deeper (3′) in the array. Multiple arrays of the same type within a species are plotted as adjacent bars by spacer position.

Small RNA expressed from a typical CRISPR array in The crispr2 array is shown, depicting the near-identical repeat regions (brown rectangles) with intervening spacer sequences. Mature crRNA reads (black bars) that map uniquely to individual spacers within CRISPR arrays are shown with strand indicated by interior arrowheads. The 5′ ends of crRNA are sharply terminated and includes an 8-base sequence (5′ handle boxed in red) derived from the upstream DR (inset panel). Images generated from the UCSC Archaeal browser (Chan et al., 2012). CRISPR annotation derived from CRISPRFinder (Grissa et al., 2007). CRISPR small RNA abundance density in Abundance is depicted on the vertical axes, in each species at every spacer position in the array. The horizontal axes plot crRNA genetic origin in terms of spacer number within CRISPR arrays, starting at the 5′ end of arrays. The abundance profile appears higher at the 5′ portion CRISPR arrays, with considerable variation deeper (3′) in the array. Multiple arrays of the same type within a species are plotted as adjacent bars by spacer position. The majority of terminal positions of sequencing reads found in Pyrobaculum species include an 8-base portion of the upstream DR at the 5′ end (Figure A1); this corresponds to the 5′ handle (Brouns et al., 2008) (Figure 3). We also see a minority population of sequencing reads that include a 5-base portion of the upstream DR (Figure A1), though these are not present in P. islandicum.

Figure A1

Distribution of mapped 5′ ends of crRNA associated reads within CRISPR arrays in The majority of transcription sequencing reads begin at position −8 (relative to the beginning of the associated spacer (position 0). This finding implies that most crRNA associated sequencing reads include the 8 nucleotide 5′ handle sequence. A minority population of transcription reads begins at position −5. A third population of sequencing reads begin at position −9; these may be an artifact of the terminal transferase activity of MMLV derived reverse transcriptases. This activity most often yields a terminal cytosine residue to the 3′ end of the cDNA, yielding an implied “G” to the 5′ end of the sequencing read Chenchik et al. (1996).

We tested two models for 3′ maturation considering an upstream DR ruler-mechanism as seen in Staphylococcus species (Hatoum-Aslan et al., 2011), and a wrap-around model involving the downstream DR, as described for Pyrococcus furiosus (Wang et al., 2011). Because spacer sizes are not uniform in these species, we examined 3′ processing by testing distributions of 3′ end positions as measured from either the upstream DR or the downstream DR, under the assumption that spacer size variation would provide added noise to the incorrect model. Under the ruler-mechanism model, the 3′ distribution of end positions in P. aerophilum, P. arsenaticum, and P. calidifontis includes majority peaks at positions 40–41, and a minority peak at position 32 in P. aerophilum (Figure A2). Under the downstream DR based wrap-around model (Figure A3), P. aerophilum has a reduced peak at −25 (corresponding to position 40 in the ruler-mechanism model) and the minority peak is absent (seen previously at position 32). We consider this evidence as consistent with a ruler-mechanism for P. aerophilum CRISPR systems. In the remaining species, this analysis was inconclusive.

Figure A2

Distribution of mapped 3′ ends of crRNA associated sequencing reads, relative to the 5′ end of the associated spacer. This model proposes that cleavage of the 3′ end of crRNA associated reads utilizes a ruler-mechanism measured from the upstream DR. In P. aerophilum, P. arsenaticum, and P. calidifontis, the majority population of crRNA associated sequencing reads have a 3′ end centered around positions 40–41. A second minority population has a 3′ end centered around positions 32–35. In P. islandicum, the majority 3′ end is centered at positions 36–38, and a second minority 3′ end is centered around positions 42–46.

Figure A3

Distribution of mapped 3′ ends of crRNA associated sequencing reads, relative to the downstream Direct repeat. (position 0 is the start of the downstream spacer). This alternative model proposes that the downstream DR establishes the 3′ cut site. In P. aerophilum, the major population of 3′ ends shown for crispr 1–3 (Figure A2, position 40) is much more diffuse when measured in relation to the down stream direct repeat; this suggests that the 3′ cleavage of crRNA better modeled using the upstream DR as reference rather than the alternative, downstream DR reference. A single spacer region dominates abundance of crRNA in P. aerophilum.crispr7; this abundance provides the peak at −25 (corresponding to position 40 in Figure A2). In the remaining species, an attempt to distinguish between models of the underlying 3′ cleavage position was inconclusive.

We find limited evidence for bidirectional CRISPR transcription as reported in Sulfolobus (Lillestol et al., 2009). Across all four of the Pyrobaculum species in the selected 16–70 nt size range, we see less than 1% of 15,417 CRISPR reads that map to the reverse strand of the array. Where those antisense reads are present, they appear to originate within the spacers and terminate at poly-T motifs within the DR regions. With the limited number of reverse reads seen in this size range, it appears that transcription from the opposite strand is either not processed down to the size range studied, or that reverse transcripts are much less abundant in Pyrobaculum. Potentially, this negative finding could be the result of the ubiquitous poly-A sequence present in every DR studied in this genus (Table 1). We anticipate that the poly-A sequence could mimic a poly-T terminator on the reverse strand, and thereby prevent significant reverse strand transcription.

Discussion

Within CRISPR arrays, we see an overabundance of reads emanating from the 5′ proximal portion in larger arrays, where transcription from these arrays is likely initiated from a single promoter. The polarity is not perfect given that the abundance of some distal spacers is greater in comparison to more proximal spacer positions. Clearly, there are a number of mechanisms or phenomena that could contribute to crRNA abundance across the array, including: (1) simple stochastic termination of the pre-crRNA transcript, (2) differential efficiency in the endonucleolytic processing of individual crRNAs, (3) transcriptional polarity within the array, (4) differential stability of individual crRNAs, (5) selective recovery and amplification of certain crRNA sequences during library preparation, and (6) recently evolved changes in spacer content (gain or loss or rearrangements) between the reference genome strain and the cultured strains used in our RNA-seq experiments. It is unknown which or how many of the six possibilities are most relevant, although our data do not equally favor all. If we consider a model of passive, stochastic termination of the primary transcript, we could explain the 5′ polarity but fail to account for the intermediate crRNA variation. Alternatively, a model where individual spacers are matured (excised) from pre-crRNA with varying efficiency might explain the variation in spacer abundance, but the 5′ polarity would be more difficult to accommodate. Instead, we tend toward a model that relies on coupling of pre-crRNA transcription with processing of the transcript, which might explain both polarity and the intermediate variation; for example, if transcription is aborted under conditions of limiting processing capability. We note that some bacterial systems make use of rho-mediated termination, coupling transcription and translation in a manner that aborts transcription under conditions of limiting polysomes; this process yields an abundance polarity favoring genes that are near the 5′ end of an operon transcript. Recently, operon polarity has been described in the archaeon Thermococcus kodakaraensis (Santangelo et al., 2008). In a polarity model that couples CRISPR pre-crRNA transcription with crRNA processing, we hypothesize that given a limitation in processing by the CRISPR CAScade complex (or cmr-processing complex), the pre-crRNA transcript might be prematurely aborted, yielding an abundance of 5′ crRNA. Compelling evidence exists for incremental, endonucleolytic processing of the primary transcript in other species (Brouns et al., 2008; Hale et al., 2008). Under this 5′ polarity model, we would expect to see both polarity as well as a degree of variation in individual spacer abundance, which seems to match our data the closest. This model is necessarily incompatible with 3–5′ directional processing that has been suggested previously (Lillestol et al., 2006). Within the Pyrobaculum genus, one of the conserved nurA-herA clusters of syntenic orthologs is always found next to a CRISPR array (Figure 2). This cluster includes csm6, a gene classified with the Type III-A CRISPR/CAS family. In every case observed, nurA-csm6 appear to be co-transcribed, in some cases with herA. The studied function of nurA-herA involves preparation of dsDNA ends as part of HR repair. If these genes participate in CRISPR processing, we suggest that they may be part of new spacer acquisition. That process requires the creation of a new DR and the integration of a novel spacer sequence into an existing array. Generally, this process yields an array with perfect copies, suggesting that the source of the novel DR sequence is an existing array element. In this model, a nurA-herA protein complex could provide the HR activity required to repair the array incision. The phylogeny of the nurA-herA orthologous pairs suggests that they have been inherited vertically (Figure 2). Furthermore, a parsimonious interpretation of these gene trees indicates that the CRISPR-specific pair predates the divergence of Pyrobaculum species, and is well-represented across the Thermoproteaceae. The DR sequences that are in use throughout the Pyrobaculum are also remarkably conserved, with only three major sequence variants found, corresponding to the CAS proteins that make use of these structures. The structural conservation of the CAS operons is consistent across the Pyrobaculum clade, though not quite as invariant as seen in other archaeal or bacterial models. Finally, we find an ultra-conserved Pyrobaculum-specific promoter-like sequence across every CRISPR array examined. Taken together, we infer that the CRISPR system is endemic in the Pyrobaculum clade, and is unlikely to have been horizontally acquired through independent events for each of its members. Cas6 is presumed to be responsible for cleavage of pre-crRNA, and through its association with the Cas complex is likely responsible for the association of Cas protein Types with CRISPR array families. Cas6 is believed to be responsible for recognition and cleavage of pre-crRNA (Hale et al., 2008). In Type I complexes, CAScade (Brouns et al., 2008) and aCAScade (Lintner et al., 2011), Cas6 is a co-purifying member of the complex. In Type III systems where Cas6 does not appear to be part of the Cas complex, specific proteins that are members of the complex are required for maturation of crRNA (Hatoum-Aslan et al., 2011). Furthermore, the binding of Cas6 in Pseudomonas aeruginosa has been shown to be quite specific (Sternberg et al., 2012), and in S. solfataricus, there are five distinct Cas6 proteins possibly specialized for specific repeats (Zhang et al., 2012). Taken together, we suggest that Cas6 mediates the association between Cas protein families and CRISPR array families in Pyrobaculum species. This mediation may be by direct participation in the Cas complex (Type I systems), or through an indirect association as suggested for Type III systems. Our transcriptional data clearly show that the P. islandicum Type III-A system is capable of generating mature crRNA from each of its five arrays. This Type III-A system is operating without cas1, cas2, or csm6. In Pyrococcus abyssi, the Type I-A system generates crRNA (Phok et al., 2011) and is also missing cas1 and cas2. Possibly one or both of these systems has an alternative enzymatic method for incorporating novel spacers without CAS1, or one or both of these systems may be incapable of CRISPR adaptation. The missing csm6 in P. islandicum is equally surprising given that it has been considered essential in the Type III-A (csm) system, the only system present in this species. Establishing if P. islandicum is still capable of CRISPR adaptation could be a first step in identifying an alternative mechanism for spacer incorporation. The classification system authored by Makarova (Makarova et al., 2011) has been instrumental in coordinating diverse efforts across the field of CRISPR research. As we examine new phylogenetic clades in detail, we have both a convenient mechanism for classifying our findings as well as adding variations brought into focus by new groups. In light of our new analyses, the consolidation of csa1 (described herein as cas4′) with cas4 may not be justified, as this would suggest many Pyrobaculum submodule examples with two copies of cas4 (cas4′-cas1-cas2-cas4). Alternatively, we suggest that the functions of cas4 and cas4′ (csa1) are distinct in Pyrobaculum and should be uniquely classified. Furthermore, we find csm6 (previously named APE2256) deeply associated with a CRISPR-associated nurA-herA pair, and not apparently part of the Type III-A module where it is currently classified. Finally, we observe that the csx1 classification (part of Type III-U) given to the numerous Pyrobaculum genes encoding a DXTHG domain (or MJ1666-like protein) may not be optimal; in Pyrobaculum, these genes appear to be found among Type I and III systems. Clearly, the unique comparative perspective afforded by Pyrobaculum provides numerous opportunities for future discovery.

Author contributions

David L. Bernick designed and performed the experimental and computational analyses, and wrote the manuscript. Courtney L. Cox provided the analysis of the CRISPR promoter sequence conservation. Patrick P. Dennis provided assistance with the manuscript and collaborative review. Todd M. Lowe provided scientific advising, suggested analyses, and edited the manuscript.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

38 in total

1. Gene conversion drives within genic sequences: concerted evolution of ribosomal RNA genes in bacteria and archaea.

Authors: D Liao
Journal: J Mol Evol Date: 2000-10 Impact factor: 2.395

2. Identification of genes that are associated with DNA repeats in prokaryotes.

Authors: Ruud Jansen; Jan D A van Embden; Wim Gaastra; Leo M Schouls
Journal: Mol Microbiol Date: 2002-03 Impact factor: 3.501

3. BLAT--the BLAST-like alignment tool.

Authors: W James Kent
Journal: Genome Res Date: 2002-04 Impact factor: 9.043

4. A bipolar DNA helicase gene, herA, clusters with rad50, mre11 and nurA genes in thermophilic archaea.

Authors: F Constantinesco; P Forterre; E V Koonin; L Aravind; C Elie
Journal: Nucleic Acids Res Date: 2004-02-27 Impact factor: 16.971

5. Basic local alignment search tool.

Authors: S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal: J Mol Biol Date: 1990-10-05 Impact factor: 5.469

6. A putative viral defence mechanism in archaeal cells.

Authors: Reidun K Lillestøl; Peter Redder; Roger A Garrett; Kim Brügger
Journal: Archaea Date: 2006-08 Impact factor: 3.273

7. CRISPR provides acquired resistance against viruses in prokaryotes.

Authors: Rodolphe Barrangou; Christophe Fremaux; Hélène Deveau; Melissa Richards; Patrick Boyaval; Sylvain Moineau; Dennis A Romero; Philippe Horvath
Journal: Science Date: 2007-03-23 Impact factor: 47.728

8. Diversity of Antisense and Other Non-Coding RNAs in Archaea Revealed by Comparative Small RNA Sequencing in Four Pyrobaculum Species.

Authors: David L Bernick; Patrick P Dennis; Lauren M Lui; Todd M Lowe
Journal: Front Microbiol Date: 2012-07-02 Impact factor: 5.640

9. A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes.

Authors: Daniel H Haft; Jeremy Selengut; Emmanuel F Mongodin; Karen E Nelson
Journal: PLoS Comput Biol Date: 2005-11-11 Impact factor: 4.475

10. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats.

Authors: Ibtissem Grissa; Gilles Vergnaud; Christine Pourcel
Journal: BMC Bioinformatics Date: 2007-05-23 Impact factor: 3.169

16 in total

1. Selective Maintenance of Multiple CRISPR Arrays Across Prokaryotes.

Authors: Jake L Weissman; William F Fagan; Philip L F Johnson
Journal: CRISPR J Date: 2018-12

2. Resistance to UV Irradiation Caused by Inactivation of nurA and herA Genes in Thermus thermophilus.

Authors: Yuki Fujii; Masao Inoue; Kenji Fukui; Seiki Kuramitsu; Ryoji Masui
Journal: J Bacteriol Date: 2018-07-25 Impact factor: 3.490

Review 3. Diversity of CRISPR systems in the euryarchaeal Pyrococcales.

Authors: Cédric Norais; Annick Moisan; Christine Gaspin; Béatrice Clouet-d'Orval
Journal: RNA Biol Date: 2013-02-19 Impact factor: 4.652

4. DNase H Activity of Neisseria meningitidis Cas9.

Authors: Yan Zhang; Rakhi Rajan; H Steven Seifert; Alfonso Mondragón; Erik J Sontheimer
Journal: Mol Cell Date: 2015-10-15 Impact factor: 17.970

5. Double-strand DNA end-binding and sliding of the toroidal CRISPR-associated protein Csn2.

Authors: Zihni Arslan; Reinhild Wurm; Oleksandr Brener; Philipp Ellinger; Luitgard Nagel-Steger; Filipp Oesterhelt; Lutz Schmitt; Dieter Willbold; Rolf Wagner; Holger Gohlke; Sander H J Smits; Umit Pul
Journal: Nucleic Acids Res Date: 2013-04-26 Impact factor: 16.971