Literature DB >> 33085710

Ares-GT: Design of guide RNAs targeting multiple genes for CRISPR-Cas experiments.

Eugenio Gómez Minguet1.   

Abstract

Guide RNA design for CRISPR genome editing of gene families is a challenging task as usually good candidate sgRNAs are tagged with low scores precisely because they match several locations in the genome, thus time-consuming manual evaluation of targets is required. To address this issues, I have developed ARES-GT, a Python local command line tool compatible with any operative system. ARES-GT allows the selection of candidate sgRNAs that match multiple input query sequences, in addition of candidate sgRNAs that specifically match each query sequence. It also contemplates the use of unmapped contigs apart from complete genomes thus allowing the use of any genome provided by user and being able to handle intraspecies allelic variability and individual polymorphisms. ARES-GT is available at GitHub (https://github.com/eugomin/ARES-GT.git).

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 33085710      PMCID: PMC7577430          DOI: 10.1371/journal.pone.0241001

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The design of optimal single guide RNAs (sgRNAs) is a critical step in CRISPR/Cas genome editing, and it must ensure specificity and minimize the possibility of offtarget mutations. Although good online tools are available for identification of CRISPR DNA targets, which have popularized genome editing, their use is limited to a restricted list of genomes [1-6], sometimes corresponding to less than ten species [7, 8]. Even Breaking-Cas [9], a free online tool which currently offers more than 1600 genomes, lacks the flexibility to easily incorporate unpublished genomes or contemplate genomes of populations with allelic variants -an issue partially addressed by AlleleAnalyzer for the human genome [10]. Several command-line tools present more flexibility incorporating any genome provided by users, like sgRNA-cas9 [11] or CRISPRseek [12]. However, an additional problem posed by the design of sgRNAs targeting gene families is that good candidate sgRNAs can be tagged with low scores precisely because they match several locations in the genome, thus time-consuming manual evaluation of targets is required. To address this issue, I have developed ARES-GT, a local command line tool in Python programming language.

Methods

ARES-GT

ARES-GT is written in python programming language (https://www.python.org/) so it can be runned in any operative system. The software is available at GitHub (https://github.com/eugomin/ARES-GT.git): version 2.0 is a python2.7 version while version 2.0.1 is updated to python3.8. In addition of sys and re modules, ARES-GT also requires the third-party regex module (https://pypi.org/project/regex/). Complete analysis presented in this work were performed in minimum 3 hours and maximum 12 hours, depending of the analysis, in a Linux server running Ubuntu 18.04 LTS with Intel Xeon 2.0 GHz processor and 32 GB RAM. When option “OR” is selected (so only analysis of candidates matching several query sequences), the same analysis were performed in 15 min or less. Running time directly depends on the number of query sequences, genome size and selected parameters.

Genome sequences

Arabidopsis reference genome (Col-0) were obtained from TAIR (www.arabidopsis.org). Good quality genome assemblies of seven A. thaliana accessions (An-1, C24, Cvi, Eri, Kyo, Ler and Sha) [13] were downloaded from Arabidopsis 1001 genomes project (https://1001genomes.org/), and Cardamine hirsuta genome from its genetic and genomic resource (http://chi.mpipz.mpg.de/index.html). All sequences of CBF genes are available in S1 File.

CBF genes

Genomic sequences of Arabidopsis thaliana CBF genes (AtCBFs) were obtained from TAIR (https://www.arabidopsis.org/), corresponding to Col-0 TAIR v10. Genomic sequences of AtCBFs homologs in C. hirsuta were identified by BLAST in the C. hirsuta genetic and genomic resource (http://chi.mpipz.mpg.de/index.html) using the AtCBFs protein sequences and supported by alignment with ClustalX2 [14]. Ecotype specific genomic sequence of each CBF gene were retrived using the genomic coordinates from ARES-GT results using AtCBFs (Col-0).

Results

Identification of CRISPR targets candidates

The high sequence similarity shared in gene families increase the possibility of also sharing CRISPR targets, both with perfect match or with few mismatches. While this is especially interesting for targeting multiple members of the same family, they are usually discarded or evaluated with low scores. Similarly to other available software, ARES-GT starts with the identification of all candidate guide RNAs in query sequences and then the reference genome is used to find possible offtargets, but an additional step is added to evaluate which guide sequences match several query sequences. Offtargets evaluation is based in a mismatch criteria. It has been reported that the specificity of both Cas9 and Cas12a is particularly sensitive to mismatches in the PAM proximal sequence (on an 11- and 8-nucleotide stretch for Cas9 and Cas12a, respectively), named “seed” [15-18]. Mismatches in the seed sequence has a critical impact into cleavage efficiency on DNA target, and it is unlikely that seed sequences with 2 or more mismatches cause real offtargets in vivo. Sequence composition and the number and distribution of mismatches also affects cleavage efficiency [15]. Therefore the ARES-GT algorithm discards possible offtargets using as criterium the presence of 2 or more mismatches in the seed sequence, while the user defines the threshold criterium out of seed sequence. In addition, the user must also indicate whether a “NAG” PAM, which Cas9 can recognise though with lower efficiency [15], must be taken into account when evaluating possible Cas9 offtargets. ARES-GT can identify targets of the two most widely used CRISPR enzymes (Cas9 and Cas12a/Cpf1) and evaluates possible offtargets in a user-provided reference genome, including non assembled contigs and unpublished genomes from any species. A list is generated with the best candidates (those with no offtargets based on parameters selected by user) and, if multiple query genes from the same family are targeted, the list includes sgRNAs that match more than one of them. Detailed information for each possible target is also provided, including an alignment with the possible offtargets. ARES-GT have been already used successfully in Arabidopsis, tomato and rice while under development [19, 20].

Design of guide RNA matching multiple CBF genes

As a proof of concept, I have choosen the C-repeat/DRE-Binding Factor (CBF) gene family of plant transcription factors to test the various novelties implemented in ARES-GT. Among the four members identified in Arabidopsis thaliana, three of them–AtCBF1, AtCBF2 and AtCBF3–, have been implicated in the response to cold temperatures, while AtCBF4 has been implicated in the response to drought [21, 22]. The first three members of this family are closely located in less than 8 Kb in chromosome 4 (Fig 1A), making extremely difficult to obtain a triple mutant by classical crossing strategy. This has been recently achieved by CRISPR/Cas9-induced mutagenesis [23] using two sgRNAs that the authors selected by manual evaluation of sequence alignments, manual selection of candidates, and specificity verification with CRISPR-P [1]. I used the A. thaliana genomic coding sequences (TAIR v10) of the four CBF genes as a multiple query in ARES-GT, to search for candidate sgRNAs using both Cas9 and Cas12a. A total of 96 and 34 unique specific targets matching only one location in the genome and with no predicted offtargets were found for each the four genes, using Cas9 and Cas12a, respectively. More interestingly, the program also listed 13 candidates for Cas9 and 10 candidates for Cas12a that match multiple CBF genes (Tables 1 and 2). In total, 10 Cas9 and 5 Cas12a candidates were identified that match more than one CBF gene and did not present any offtarget outside CBF genes (Fig 1B and 1C). Among them were included the two sequences previously reported [23], corresponding to Cas9CBF1_015 and Cas9CBF2_124 in this work.
Fig 1

sgRNS targets in CBF genes.

A) Genomic distribution of CBF genes in Arabiopsis thaliana chromosomes 4 and 5. Location of Cas9 (B) and Cas12a (C) candidates with multiple CBF gene targets. (*) Asterisk marks candidates corresponding with previously reported sgRNAs (Cho et al., 2017).

Table 1

Multiple targets Cas9 candidates for AtCBF genes.

All possible genome targets and offtargets (with ARES-GT thresholds: L0 = 4 and L1 = 3) of each candidate are listed with indication of genome coordinates (TAIR v10) and whether it corresponds to a CBF gene. In alignments, black boxes mark mismatches and a space separates PAM (NGG or NAG) from sequence. Differences in the “N” position in the PAM are not marked.

Candidate IDTargets + Offtargets (L0 = 4, L1 = 3)
A. thalianaGenechromstartendsensesequence
Cas9AtCBF1_014AtCBF241301582013015842+AGCACGAGCTGCCATCTCAG CGG
AtCBF141302230513022327+AGCACGAGCTGCCATCTCAG CGG
AtCBF341301873713018759+AGCTCGAGCTGCCATCTCAG CGG
Cas9AtCBF1_015AtCBF241301582513015847+GAGCTGCCATCTCAGCGGTT TGG
AtCBF141302231013022332+GAGCTGCCATCTCAGCGGTT TGG
Cas9AtCBF1_018AtCBF241301592013015942+TGACGAACTCCTCTGTAAAT TGG
AtCBF141302240513022427+TGACGAACTCCTCTGTAAAT TGG
AtCBF452111761221117634+TGACGAACTCCTCTGTAAAT CGG
AtCBF341301883713018859+CGACGAACTCCTCTGTATAT TGG
Cas9AtCBF1_019AtCBF241301592113015943+GACGAACTCCTCTGTAAATT GGG
AtCBF141302240613022428+GACGAACTCCTCTGTAAATT GGG
----115972741597296+CACAATCTCCTCTGTAAATT CAG
AtCBF341301883813018860+GACGAACTCCTCTGTATATT GGG
Cas9AtCBF1_051AtCBF241301573813015760-CCG GGATTCGTAGCCGCCAAGCC
AtCBF141302222313022245-CCG GGATTCGTAGCCGCCAAGCC
Cas9AtCBF1_056AtCBF241301583113015853-CCA TCTCAGCGGTTTGGAAAGTC
AtCBF141302231613022338-CCA TCTCAGCGGTTTGGAAAGTC
AtCBF341301874813018770-CCA TCTCAGCGGTTTGAAATGTT
Cas9AtCBF1_061AtCBF241301590013015922-CCC ACTTACCGGAGTTTCTTTGA
AtCBF141302238513022407-CCC ACTTACCGGAGTTTCTTTGA
AtCBF341301881713018839-CCC ACTTACCGGAGTTTCTCCGA
Cas9AtCBF1_062AtCBF241301590113015923-CCA CTTACCGGAGTTTCTTTGAC
AtCBF141302238613022408-CCA CTTACCGGAGTTTCTTTGAC
AtCBF341301881813018840-CCA CTTACCGGAGTTTCTCCGAC
Cas9AtCBF1_063AtCBF241301590813015930-CCG GAGTTTCTTTGACGAACTCC
AtCBF141302239313022415-CCG GAGTTTCTTTGACGAACTCC
----261234196123441-CCC GACTTTCTTTGAAGAACTCC
Cas9AtCBF1_064AtCBF241301592913015951-CCT CTGTAAATTGGGTGACGAGT
AtCBF141302241413022436-CCT CTGTAAATTGGGTGACGAGT
AtCBF341301884613018868-CCT CTGTATATTGGGTGACGAGT
----142907404290762-CCT CTGTAAACTGGGTGACGTGT
----12336805423368076-CCT CTGTAGATTGGGTGACGTGT
AtCBF452111762121117643-CCT CTGTAAATCGGATGACGTGT
Cas9AtCBF2_081AtCBF241301576013015782+CGAGTCAGCGAAATTGAGAC AGG
AtCBF341301867713018699+CGAGTCAGCGAAATTGAGAC AGG
AtCBF452111745221117474+AGAATCAGCGAAATTGAGAC AAG
Cas9AtCBF2_123AtCBF241301575413015776-CCA AGCCGAGTCAGCGAAATTGA
AtCBF341301867113018693-CCA AGCCGAGTCAGCGAAATTGA
AtCBF141302223913022261-CCA AGCCGAGTCAGCGAAGTTGA
Cas9AtCBF2_124AtCBF241301575913015781-CCG AGTCAGCGAAATTGAGACAG
AtCBF341301867613018698-CCG AGTCAGCGAAATTGAGACAG
AtCBF141302224413022266-CCG AGTCAGCGAAGTTGAGACAT
Table 2

Multiple targets Cas12a candidates for AtCBF genes.

All possible genome targets and offtargets (with ARES-GT thresholds: L0 = 4 and L1 = 3) of each candidate are listed with indication of genome coordinates (TAIR v10) and whether it corresponds to a CBF gene. In alignments, black boxes mark mismatches and a space separates PAM (TTTN) from sequence. Differences in the “N” position in the PAM are not marked.

Candidate IDTargets + Offtargets (L0 = 4, L1 = 3)
A. thalianaGenechromstartendsensesequence
Cas12aAtCBF1_011AtCBF241301581413015837-GCTGCCATCTCAGCGGTTTG GAAA
AtCBF141302229913022322-GCTGCCATCTCAGCGGTTTG GAAA
Cas12aAtCBF1_012AtCBF241301582713015850-CGGTTTGGAAAGTCCCGAGC CAAA
AtCBF141302231213022335-CGGTTTGGAAAGTCCCGAGC CAAA
----12724228627242310+TTTG GCTCGGGACTTTCAACACAG
----382960238296047+TTTG GCTCGGGACGTTCGAAAGCG
----51780691017806934+TTTG GCTCGGGACATTCGACACGG
----52161854421618567-CCGTCTCAAAAGTCCCGAGC CAAA
----479329037932927+TTTG GCTCGGCACTTTTGAAACCG
----41019072210190745-CAGTTTGGAACGTTCCGAGC CAAA
AtCBF341301874413018767-CGGTTTGAAATGTTCCGAGC CAAA
Cas12aAtCBF1_014AtCBF241301590213015925-TTCTTTGACGAACTCCTCTG TAAA
AtCBF141302238713022410-TTCTTTGACGAACTCCTCTG TAAA
AtCBF452111759421117617-TCCTCTGACGAACTCCTCTG TAAA
Cas12aAtCBF1_015AtCBF241301592413015947-AATTGGGTGACGAGTCTCAC GAAA
AtCBF141302240913022432-AATTGGGTGACGAGTCTCAC GAAA
AtCBF341301884113018864-TATTGGGTGACGAGTCTCAC GAAA
AtCBF452111761621117639-AATCGGATGACGTGTCTCAC GAAA
Cas12aAtCBF1_017AtCBF241301603113016054-AATCGGAGCCAAACATTTCA GAAA
AtCBF341301894813018971-AATCGGAGCCAAACATTTCA GAAA
AtCBF141302250713022530-AATCGGAGCCAAACATTTCA GAAA
----182790338279056-AATCAGAGCCTAACACTTCA AAAA
----393994699399493+TTTA TGAAGTGTTTGGTTCCTATT
Cas12aAtCBF1_018AtCBF241301603213016055-ATCGGAGCCAAACATTTCAG AAAA
AtCBF341301894913018972-ATCGGAGCCAAACATTTCAG AAAA
AtCBF141302250813022531-ATCGGAGCCAAACATTTCAG AAAA
----195050579505081+TTTG CTGAAATGGTTGCCTCTAAT
Cas12aAtCBF1_019AtCBF341301895013018973-TCGGAGCCAAACATTTCAGA AAAA
AtCBF141302250913022532-TCGGAGCCAAACATTTCAGA AAAA
Cas12aAtCBF1_024AtCBF241301584213015865+TTTG GAAAGTCCCGAGCCAAATCC
AtCBF141302232713022350+TTTG GAAAGTCCCGAGCCAAATCC
----382960208296043-GGGTTTGGCTCGGGACGTTC GAAA
Cas12aAtCBF1_028AtCBF241301591313015936+TTTC TTTGACGAACTCCTCTGTAA
AtCBF141302239813022421+TTTC TTTGACGAACTCCTCTGTAA
----51631115616311179+TTTT TTTGACGAATTTCTCTGTGG
Cas12aAtCBF1_029AtCBF241301591713015940+TTTG ACGAACTCCTCTGTAAATTG
AtCBF141302240213022425+TTTG ACGAACTCCTCTGTAAATTG

sgRNS targets in CBF genes.

A) Genomic distribution of CBF genes in Arabiopsis thaliana chromosomes 4 and 5. Location of Cas9 (B) and Cas12a (C) candidates with multiple CBF gene targets. (*) Asterisk marks candidates corresponding with previously reported sgRNAs (Cho et al., 2017).

Multiple targets Cas9 candidates for AtCBF genes.

All possible genome targets and offtargets (with ARES-GT thresholds: L0 = 4 and L1 = 3) of each candidate are listed with indication of genome coordinates (TAIR v10) and whether it corresponds to a CBF gene. In alignments, black boxes mark mismatches and a space separates PAM (NGG or NAG) from sequence. Differences in the “N” position in the PAM are not marked.

Multiple targets Cas12a candidates for AtCBF genes.

All possible genome targets and offtargets (with ARES-GT thresholds: L0 = 4 and L1 = 3) of each candidate are listed with indication of genome coordinates (TAIR v10) and whether it corresponds to a CBF gene. In alignments, black boxes mark mismatches and a space separates PAM (TTTN) from sequence. Differences in the “N” position in the PAM are not marked. To test that AREST-GT can work with any user-provided genome, including unmapped contigs, I selected the first version of the genome of Cardamine hirsuta [24]. The available genome sequence spans over its 8 chromosomes, but also contains 622 unmapped contigs in addition to chloroplast and mithocondria genomes. The sequence information was downloaded and used locally with ARES-GT for searching CRISPR targets in the four C. hirsuta CBF homologous genes. In addition to unique specific targets (86 for Cas9 and 28 for Cas12a), 10 candidate sgRNAs for Cas9 and 3 for Cas12a were identified that perfectly match ChCBF1 and ChCBF2 (Table 3). Taking into account possible offtargets, only 5 and 3 sequences for Cas9 and Cas12a, respectively, are relyable candidate sgRNAs targeting only ChCBF family genes. For instance, Cas9ChCBF1_044 perfectly matches ChCBF1 and ChCBF2, and it also matches ChCBF3 with one mismatch.
Table 3

Multiple targets Cas9 and Cas12a candidates for ChCBF genes.

All possible genome targets and offtargets (with ARES-GT thresholds: L0 = 4 and L1 = 3) of each candidate are listed with indication of genome coordinates (Cardamine hirsuta v1.0) and whether it corresponds to a CBF gene. In alignments, black boxes mark mismatches and a space separates PAM (NGG/NAG or TTTN) from sequence. Differences in the “N” position in the PAM are not marked.

Candidate IDTargets + Offtargets (L0 = 4, L1 =3)
C. hirsutaGenechromstartendsensesequence
Cas9ChCBF1_004ChCBF2465147986514820+AGCTGTCCCAAGAAACCAGC TGG
ChCBF171790888317908905-CCG GCTGGTTTCTTGGGACAGCT
Cas9ChCBF1_010ChCBF2465148786514900+CTCCGGTAAGTGGGTGTGTG AGG
ChCBF171790880317908825-CCT CACACACCCACTTACCGGAGE
Cas9ChCBF1_018ChCBF2465149106514932+CAAACAAGAAATCTAGGATT TGG
ChCBF171790877117908793-CCA AATCCTAGATTTCTTGTTTG
ChCBF381381227413812296-CCA AATCCTCGATTTCTTGTTAG
----51863827118638293-CTT AATCCTACATTTGTAGTTTG
----52115283721152859-CTT AATCCTACATTTCTGGTTTT
Cas9ChCBF1_013ChCBF2465149156514937+AAGAAATCTAGGATTTGGCT TGG
ChCBF171790876617908788-CCG AGCCAAATCCTAGATTTCTT
----81833314018333162-CCA AGCCAAATCCTAGAACCCTT
----155562415556263+AGGAAACGGAGGATTTGGCT TGG
----1370416370438+AAAAAATCTCGGATTTGGCT CGG
ChCBF381381226913812291-CCT AACCAAATCCTCGATTTCTT
Cas9ChCBF1_033ChCBF2465152646515286+TGCCGCCTCCGTCCGTACAA TGG
ChCBF171790839017908412-CCA TTGTACGGACGGAGGCGGCA
NSCAFA.44423162338+CGCCGCCACCGTCCGTACAC CGG
Cas9ChCBF1_036ChCBF2465147936514815-CCG TGAGCTGTCCCAAGAAACCA
ChCBF171790888817908910+TGGTTTCTTGGGACAGCTCA CGG
Cas9ChCBF1_043ChCBF2465148806514902-CCG GTAAGTGGGTGTGTGAGGTA
ChCBF171790880117908823+TACCTCACACACCCACTTAC CGG
Cas9ChCBF1_044ChCBF2465149096514931-CCA AACAAGAAATCTAGGATTTG
ChCBF171790877217908794+CAAATCCTAGATTTCTTGTT TGG
ChCBF381381227513812297+CAAATCCTCGATTTCTTGTT AGG
Cas9ChCBF1_056ChCBF2465152666515288-CCG CCTCCGTCCGTACAATGGAA
ChCBF171790838817908410+TTCCATTGTACGGACGGAGG CGG
----283475788347600+GGCCAGAGTACGGACGGAGG AGG
Cas9ChCBF1_057ChCBF2465152696515291-CCT CCGTCCGTACAATGGAATCA
ChCBF171790838517908407+TGATTCCATTGTACGGACGG AGG
----11708918717089209+TGGTCCGGTTGTACGGACGG CGG
----552256815225703-CCA CCGTCCGTACACTGGATTAT
Cas21aChCBF1_018ChCBF2465148306514853+TTTC GTGAGACTCGTCACCCAATT
ChCBF171790884817908871-AATTGGGTGACGAGTCTCAC GAAA
ChCBF381381235113812374-AATCGGATGACGTGTCTCAC GAAA
Cas21aChCBF1_029ChCBF2465152606515283+TTTT GCCGCCTCCGTCCGTACAAT
ChCBF171790839117908414-ATTGTACGGACGGAGGCGGC AAAA
Cas21aChCBF1_030ChCBF2465152616515284+TTTG CCGCCTCCGTCCGTACAATG
ChCBF171790839017908413-CATTGTACGGACGGAGGCGG CAAA

Multiple targets Cas9 and Cas12a candidates for ChCBF genes.

All possible genome targets and offtargets (with ARES-GT thresholds: L0 = 4 and L1 = 3) of each candidate are listed with indication of genome coordinates (Cardamine hirsuta v1.0) and whether it corresponds to a CBF gene. In alignments, black boxes mark mismatches and a space separates PAM (NGG/NAG or TTTN) from sequence. Differences in the “N” position in the PAM are not marked. Finally, to contemplate intraspecific allelic variability in the design of sgRNAs for genome editing, I used ARES-GT in combination with the genome sequences available through the Arabidopsis 1001 genomes project (https://1001genomes.org/). ARES-GT can be used to design ecotype-specific targets taking advantage of polymorphic sequences in the different accessions. Good quality genome assemblies of seven A. thaliana accessions (An-1, C24, Cvi, Eri, Kyo, Ler and Sha) [13] were downloaded, and ARES-GT was used to design sgRNAs targeting CBF genes in each accession. As reflected in Table 4, the SNPs in CBF genes between the different accessions are responsible of the identification of different number of candidate sgRNAs that match several genes of the family, from 18 Cas9 candidates with CBF genes from Kyo genome to 11 Cas9 candidates with CBF genes from Cvi genome. The selection of CRISPR candidates with specific unique target (without offtargets) also varied between accessions (Table 4). I used each accession CBF genes as query for ARES-GT but using either the standar Col-0 reference or the corresponding accession genome. Candidates only listed when Col-0 is used as reference (Col-0 exclusive) are false positives, as they have offtargets in the corresponding accession genome. The accession`s exclusive candidates would be false negatives, as they are discarded if Col-0 is used but do not have offtargets in the corresponding accession genome (Table 4). Differences in the identification of offtargets also affects the selection of efficient candidates matching several CBF genes. For instance, candidate C24_CBF1_019 perfectly match C24_CBF1, C24_CBF2 and C24_CBF3 but has a possible offtarget (4 mismatches in distal sequence) in the chromosome 3 of C24 genome, which is above offtarget thresholds in Col-0 genome because of an extra mismatch in the proximal sequence (Table 5). In the other sense, Eri_Cas12aCBF1_017 is a candidate that perfectly match Eri_CBF1, Eri_CBF2 and Eri_CBF3 without offtargets in Eri genome, however it would be discarded because two offtargets are detected if Col-0 genome is used (Table 5).
Table 4

Intraspecies variability effect in the number of Cas9 and Cas12a candidates targeting multiple or unique AtCBF genes.

Sequence variability in the CBF genes from different Arabidopsis thaliana accessions change the number of candidates that can match multiple targets due to SNPs in the 20 nucleotides of the guide but also SNPs affecting PAM sequence. The use of the standard Col-0 genome reference (TAIR v10) or the corresponding accession genome affects the identification of offtargets thus the correct identification of specific (unique) candidates matching only one CBF gene. The column “exclusive” indicates the number of specific candidates that are only listed when the corresponding reference genome is used.

CBF genesaccessionMultiple Targets CandidatesReferenceGenomeUnique Cas9 CandidatesUnique Cas12a Candidates
Cas9Cas12aTotalExclusiveTotalExclusive
Col1310Col96-34-
An-1139Col1003372
An-11058416
C241310Col1004332
C241015310
Cvi119Col1026343
Cvi10711376
Eri1310Col1012321
Eri1012310
Kyo186Col998322
Kyo10312333
Ler1310Col1023320
Ler1056342
Sha1310Col1016312
Sha1027312
Table 5

Intraspecies variability effect in the identification of targets and possible offtargets.

For each example, upper file shows the targets and offtargets listed by ARES-GT (with thresholds L0 = 4 and L1 = 3) for each reference genome. SNPs differences between genomes that explain why some targets or offtargets are not detected are shown in lower file (separated by discontinuous line) as red boxes. Black boxes mark mismatches with candidates sequence.

Candidate ID
A. thalianaGenechromstartendsensesequence
C24_Cas21aCBF1_019C24CBF2C24_41374545713745480-TCGGAGCCAAACATTTCAGA AAAA
C24CBF3C24_41374838113748404-TCGGAGCCAAACATTTCAGA AAAA
C24CBF1C24_41375194013751963-TCGGAGCCAAACATTTCAGA AAAA
----C24_346702194670243+TTTG TCTGAAATGTGCAGTTCCGA
ColCBF3Col_41301895013018973-TCGGAGCCAAACATTTCAGA AAAA
ColCBF1Col_41302250913022532-TCGGAGCCAAACATTTCAGA AAAA
 ColCBF2Col_41301604613016068-TCGGAGCCAAACATTTCAGA AAAG
----Col_346736104673633+TTTG TCTGAAAGGTGCAGTTCCGA
Eri_Cas12aCBF1_017EriCBF2Eri_41298137412981397-AATCGGAGCCAAACATTTCA GAAA
EriCBF3Eri_41298430712984330-AATCGGAGCCAAACATTTCA GAAA
EriCBF1Eri_41298786612987889-AATCGGAGCCAAACATTTCA GAAA
ColCBF2Col_41301603113016054-AATCGGAGCCAAACATTTCA GAAA
ColCBF3Col_41301894813018971-AATCGGAGCCAAACATTTCA GAAA
ColCBF1Col_41302250713022530-AATCGGAGCCAAACATTTCA GAAA
----Col_182790338279056-AATCAGAGCCTAACACTTCA AAAA
----Col_393994699399493+TTTA TGAAGTGTTTGGTTCCTATT
 ----Eri_181944848194507-AATTAGGGCCTAACACTTCA AAAA
----Eri_394007359400758+TTTA TGAAGTGTTTGGTTCCTTTT

Intraspecies variability effect in the number of Cas9 and Cas12a candidates targeting multiple or unique AtCBF genes.

Sequence variability in the CBF genes from different Arabidopsis thaliana accessions change the number of candidates that can match multiple targets due to SNPs in the 20 nucleotides of the guide but also SNPs affecting PAM sequence. The use of the standard Col-0 genome reference (TAIR v10) or the corresponding accession genome affects the identification of offtargets thus the correct identification of specific (unique) candidates matching only one CBF gene. The column “exclusive” indicates the number of specific candidates that are only listed when the corresponding reference genome is used.

Intraspecies variability effect in the identification of targets and possible offtargets.

For each example, upper file shows the targets and offtargets listed by ARES-GT (with thresholds L0 = 4 and L1 = 3) for each reference genome. SNPs differences between genomes that explain why some targets or offtargets are not detected are shown in lower file (separated by discontinuous line) as red boxes. Black boxes mark mismatches with candidates sequence.

Discussion

Sequence similarity in gene families usually difficults the identification of CRISPR target candidates matching several member of the family and it requires manual time-consuming task. ARES-GT in addition of gene specific guide RNAs also evaluates which candidates match several query sequences. By selection of which sequences are included in the query file user has the maximal flexibility for working with complete families, subfamilies or a particular set of genes to find candidates specifically matching those genes. I have also shown how using ecotype-specific genomes can prevent the identification of false positive/negative candidates, which also apply to individual genomes taking into account polymorphisms. ARES-GT is written in Python so can be used in any operative system and it has not high computational complexity so it is expected to work without problems with any processor. ARES-GT also has an option for working only with candidates matching several query sequences (option “–OR”) which reduce computer time to 15 min.

Conclusion

In summary, I have shown how the architecture of the ARES-GT tool (i) allows the selection of candidate sgRNAs that match multiple input query sequences for simultaneous editing of several members of gene families; (ii) contemplates the use of unmapped contigs apart from complete genomes; and (iii) can be used for the design of ecotype-specific CRISPR targets. ARES-GT is available at GitHub (https://github.com/eugomin/ARES-GT.git).

CBF genes.

DNA sequences of all CBF genes used in this work. (ZIP) Click here for additional data file.
  24 in total

1.  Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system.

Authors:  Bernd Zetsche; Jonathan S Gootenberg; Omar O Abudayyeh; Ian M Slaymaker; Kira S Makarova; Patrick Essletzbichler; Sara E Volz; Julia Joung; John van der Oost; Aviv Regev; Eugene V Koonin; Feng Zhang
Journal:  Cell       Date:  2015-09-25       Impact factor: 41.582

Review 2.  Transcriptional regulatory networks in cellular responses and tolerance to dehydration and cold stresses.

Authors:  Kazuko Yamaguchi-Shinozaki; Kazuo Shinozaki
Journal:  Annu Rev Plant Biol       Date:  2006       Impact factor: 26.379

3.  E-CRISP: fast CRISPR target site identification.

Authors:  Florian Heigwer; Grainne Kerr; Michael Boutros
Journal:  Nat Methods       Date:  2014-02       Impact factor: 28.547

4.  CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants.

Authors:  Yang Lei; Li Lu; Hai-Yang Liu; Sen Li; Feng Xing; Ling-Ling Chen
Journal:  Mol Plant       Date:  2014-04-09       Impact factor: 13.164

5.  sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites.

Authors:  Shengsong Xie; Bin Shen; Chaobao Zhang; Xingxu Huang; Yonglian Zhang
Journal:  PLoS One       Date:  2014-06-23       Impact factor: 3.240

6.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.

Authors:  John G Doench; Nicolo Fusi; Meagan Sullender; Mudra Hegde; Emma W Vaimberg; Jennifer Listgarten; Katherine F Donovan; Ian Smith; Zuzana Tothova; Craig Wilen; Robert Orchard; Herbert W Virgin; David E Root
Journal:  Nat Biotechnol       Date:  2016-01-18       Impact factor: 54.908

7.  CRISPRseek: a bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems.

Authors:  Lihua J Zhu; Benjamin R Holmes; Neil Aronin; Michael H Brodsky
Journal:  PLoS One       Date:  2014-09-23       Impact factor: 3.240

8.  Identification of Transgene-Free CRISPR-Edited Plants of Rice, Tomato, and Arabidopsis by Monitoring DsRED Fluorescence in Dry Seeds.

Authors:  Norma Aliaga-Franco; Cunjin Zhang; Silvia Presa; Anjil K Srivastava; Antonio Granell; David Alabadí; Ari Sadanandom; Miguel A Blázquez; Eugenio G Minguet
Journal:  Front Plant Sci       Date:  2019-09-18       Impact factor: 5.753

9.  Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics.

Authors:  Wen-Biao Jiao; Korbinian Schneeberger
Journal:  Nat Commun       Date:  2020-02-20       Impact factor: 14.919

10.  The Cardamine hirsuta genome offers insight into the evolution of morphological diversity.

Authors:  Xiangchao Gan; Angela Hay; Michiel Kwantes; Georg Haberer; Asis Hallab; Raffaele Dello Ioio; Hugo Hofhuis; Bjorn Pieper; Maria Cartolano; Ulla Neumann; Lachezar A Nikolov; Baoxing Song; Mohsen Hajheidari; Roman Briskine; Evangelia Kougioumoutzi; Daniela Vlad; Suvi Broholm; Jotun Hein; Khalid Meksem; David Lightfoot; Kentaro K Shimizu; Rie Shimizu-Inatsugi; Martha Imprialou; David Kudrna; Rod Wing; Shusei Sato; Peter Huijser; Dmitry Filatov; Klaus F X Mayer; Richard Mott; Miltos Tsiantis
Journal:  Nat Plants       Date:  2016-10-31       Impact factor: 15.793

View more
  1 in total

1.  Differential Involvement of Arabidopsis β'-COP Isoforms in Plant Development.

Authors:  Judit Sánchez-Simarro; Pilar Selvi; César Bernat-Silvestre; Eugenio Gómez Minguet; Fernando Aniento; María Jesús Marcote
Journal:  Cells       Date:  2022-03-09       Impact factor: 6.600

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.