| Literature DB >> 36124799 |
Alexander Mitrofanov1, Marcus Ziemann2, Omer S Alkhnbashi3, Wolfgang R Hess2, Rolf Backofen1,4.
Abstract
MOTIVATION: The CRISPR-Cas9 system is a Type II CRISPR system that has rapidly become the most versatile and widespread tool for genome engineering. It consists of two components, the Cas9 effector protein, and a single guide RNA that combines the spacer (for identifying the target) with the tracrRNA, a trans-activating small RNA required for both crRNA maturation and interference. While there are well-established methods for screening Cas effector proteins and CRISPR arrays, the detection of tracrRNA remains the bottleneck in detecting Class 2 CRISPR systems.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36124799 PMCID: PMC9486595 DOI: 10.1093/bioinformatics/btac466
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Components of the CRISPRtracrRNA tool. Components 1 and 2 comprise the structural model of tracrRNAs and are designed to robustly detect the tracrRNA tail location by comparing the candidate sequence with the existing model and searching for the terminator sequence. Component 3 uses RNA–RNA interaction prediction to determine the set of anti-repeat candidates. This component requires the set of repeat sequences, which are collected in Component 4. This step is designed to identify CRISPR arrays and the associated repeats. Component 5 uses a prediction of a whole CRISPR cassette (using CRISPRCasIndentifier) to reliably determine Cas9 or Cas12 effector proteins.
Fig. 2.(A) Phylogenetic tree of the 3'-tail of experimentally validated tracrRNA used in the GraphClust2 analysis. The drawings of the three consensus structures were taken from the publication (Briner ), license number 5287080399685, as they show additional information. They agree with the consensus models as predicted by the GraphClust2 pipeline (see Supplementary Fig. S1). The phylogenetic tree is based on sequence distance as GraphClust2 does not produce a tree but only clusters. The sequences that were finally used for our three CMs are indicated with blue, yellow and green dots. (B) MAFFT-generated phylogenetic tree of the anti-repeat part of the experimentally validated tracrRNA. It shows that the independent clustering of the anti-repeats gives consistent results, as nearly the same three main clusters are generated.
Fig. 3.The distribution of organisms of the Bacteroidetes phylum by Class (inner circle of the pie chart) and by model coverage (outer circle of the pie chart). It can be seen that sequence/structure and sequence-only models found unique candidates for the vast majority of the represented classes.
Fig. 4.Comparison of the E-value exponents between CMs of Dooley et al. and CRISPRtracrRNA on the Type II tracrRNA sequences. CRISPRtracrRNA shows much higher specificity. The model by Dooley et al. also heavily relies on the anti-repeat part of the tracrRNA candidate while in CRISPRtracrRNA the CM is used to search for the tail part.