| Literature DB >> 33068435 |
Le Huang1, Bowen Yang2, Haidong Yi3, Amina Asif4,5, Jiawei Wang6, Trevor Lithgow6, Han Zhang7, Fayyaz Ul Amir Afsar Minhas8, Yanbin Yin2.
Abstract
CRISPR-Cas is an anti-viral mechanism of prokaryotes that has been widely adopted for genome editing. To make CRISPR-Cas genome editing more controllable and safer to use, anti-CRISPR proteins have been recently exploited to prevent excessive/prolonged Cas nuclease cleavage. Anti-CRISPR (Acr) proteins are encoded by (pro)phages/(pro)viruses, and have the ability to inhibit their host's CRISPR-Cas systems. We have built an online database AcrDB (http://bcb.unl.edu/AcrDB) by scanning ∼19 000 genomes of prokaryotes and viruses with AcrFinder, a recently developed Acr-Aca (Acr-associated regulator) operon prediction program. Proteins in Acr-Aca operons were further processed by two machine learning-based programs (AcRanker and PaCRISPR) to obtain numerical scores/ranks. Compared to other anti-CRISPR databases, AcrDB has the following unique features: (i) It is a genome-scale database with the largest collection of data (39 799 Acr-Aca operons containing Aca or Acr homologs); (ii) It offers a user-friendly web interface with various functions for browsing, graphically viewing, searching, and batch downloading Acr-Aca operons; (iii) It focuses on the genomic context of Acr and Aca candidates instead of individual Acr protein family and (iv) It collects data with three independent programs each having a unique data mining algorithm for cross validation. AcrDB will be a valuable resource to the anti-CRISPR research community.Entities:
Year: 2021 PMID: 33068435 PMCID: PMC7778997 DOI: 10.1093/nar/gkaa857
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Online bioinformatics tools for Acr research
| Name (ref.) | Year | Resource provided | Features |
|---|---|---|---|
| Anti-CRISPRDB ( | 2018 | Database | Experimentally characterized Acrs and their homologs and BLAST search |
| CRISPRminer ( | 2018 | Database | Experimentally characterized Acrs and their homologs and genomic context |
| Acr nomenclature ( | 2018 | Google spreadsheets | Experimentally characterized Acrs and Acas nomenclature |
| Self-Targeting Spacer Searcher ( | 2018 | Standalone package | Workflow for self-targeting spacer identification |
| AcrCatalog ( | 2020 | Database and model code | Predicted Acrs from decision tree ML classifier + heuristic filtering |
| AcRanker ( | 2020 | Web server and standalone package | XGBoost classifier using AA biases |
| AcrFinder ( | 2020 | Web server and standalone package | Workflow combining Homology + GBA + self-targeting and user-friendly website |
| PaCRISPR ( | 2020 | Web server | SVM classifier using PSSMs to capture evolutionary features |
| AcrDetector ( | 2020 | Model code | Decision tree classifier using six sequence features |
Figure 1.Overview of data collection for AcrDB. Genomes of A (Archaea), B (Bacteria), and V (Viruses) are used as input for AcrFinder. Within AcrFinder, Acr homologs, Aca homologs, CRISPR–Cas loci, self-targeting spacers (STSs) and their targets (red diamonds) are identified. Acr and Aca homologs are analyzed to see if they form operons, if they are adjacent to or within a prophage region and/or to an STS target. Genomes with AcrFinder result are further analyzed by AcRanker to receive a rank and score, by PaCRISPR to receive a score, and lastly to locate the top ranked proteins in Acr-Aca operons.
Statistics of data in AcrDB
| # of Genomes, genera and Acr-Aca operons | Archaea | Bacteria | Viruses |
|---|---|---|---|
| Total genomes searched | 961 | 15,203 | 2,659 |
| Genomes (genera) with | 27 (2) | 1127 (97) | 91(20) |
| Genomes (genera) with | 418 (94) | 5014 (603) | 2043 (424) |
| Acr-Aca operons of high, medium, and low confidence levels | 1481 | 31 683 | 4125 |
| Genomes (genera) with | 85 (36) | 1101 (244) | NAa |
| Acr-Aca operons of high and medium confidence levels | 361 | 7,889 | NAa |
| Genomes (genera) with | 293 (79) | 4753 (609) | 1330 (314) |
| Acr-Aca operons of high, medium, and low confidence levels | 634 | 17 208 | 1,857 |
| Genomes (genera) with | 67 (29) | 1045 (236) | NAa |
| Acr-Aca operons of high and medium confidence levels | 75 | 2,565 | NAa |
| Genomes (genera) with | 247 (59) | 2799 (446) | 1542 (330) |
| Acr-Aca operons of high, medium, and low confidence levels | 359 | 5,706 | 2455 |
| Genomes (genera) with | 53 (25) | 796 (185) | NAa |
| Acr-Aca operons of high and medium confidence levels | 64 | 1869 | NAa |
aNA because viruses were not analyzed for the presence of CRISPR–Cas and STSs.
bThese include single gene operon (i.e. only the Acr homolog).
Figure 2.Overview of Acr-Aca operons in AcrDB. (A) Venn diagram of the three circles (STS, AcRanker, PaCRISPR) within the AcrFinder circle (37 289 operons containing Aca homologs). (B). Heatmap of the taxonomy distribution of Acr-Aca operons meeting different criteria. Each row is a taxonomic phylum (V: Virus, B: Bacteria, A: Archaea). There are eight columns: 1. Acr-Aca operons containing Acr homologs; 2. Acr-Aca operons containing Aca homologs; 3. Acr-Aca operons containing Aca homologs and in genomes with STSs; 4. Acr-Aca operons containing Aca homologs and candidate Acrs ranked in top 10% by AcrRanker; 5. Acr-Aca operons containing Aca homologs and candidate Acrs scored by PaCRISPR >0.5; 6. Acr-Aca operons containing Aca homologs and candidate Acrs scored by PaCRISPR >0.5 and in genomes with STSs; 7. Acr-Aca operons containing Aca homologs and candidate Acrs ranked in top 10% by AcrRanker and in genomes with STSs; 8. Acr-Aca operons containing Aca homologs and candidate Acrs scored by PaCRISPR >0.5 and ranked in top 10% by AcrRanker and in genomes with STSs.
Figure 3.Screenshots of AcrDB website to demonstrate its utilities. (A) Browse page allows three options to enter the database. The default is entry by CRISPR Cas Type. One can also click on Taxonomy for browsing by the NCBI Taxonomy tree, and click on Search to type in keywords and search different data fields. (B) Result page has different components to better organize the genome-specific results. Example screenshots shown here are taken from this URL: http://bcb.unl.edu/AcrDB/anti_crispr_results.php?type=ncbi&organism=GCF_000569075.1. (B1) shows the default tabular view, where a large table is displayed. The table has 23 columns and one can use mouse to move to other parts of the table. The bottom of the result page also shows a CRISPRCasFinder summary Table. B2.1 and B2.2 are on the same page and can be seen by clicking on Graphic view in B1. (B2.1) shows the graphic global view of the genome in a circular representation. Different features are shown as different shapes, and the size of shapes do not indicate their real size in the genome: the Acr-Aca operon positions (as arrows), the STS (purple ovals) and their target positions (yellow green ovals) connected with blue arcs, and the Cas loci positions (pink squares). The green dotted lines point from these Acr-Aca operons (arrows) to their corresponding local view (B2.2) or Jbrowse view (B3). In B2.2, each Acr-Aca operon is shown with component genes (directional arrows) together with a ruler to indicate its position in the genome. CRISPR–Cas arrays are shown if they contain an STS and have an adjacent Cas locus (<10 kb). The size of the CRISPR array and Cas loci is proportional to their real lengths.