| Literature DB >> 32402073 |
Haidong Yi1,2, Le Huang2, Bowen Yang3, Javi Gomez4, Han Zhang5, Yanbin Yin3.
Abstract
Anti-CRISPR (Acr) proteins encoded by (pro)phages/(pro)viruses have a great potential to enable a more controllable genome editing. However, genome mining new Acr proteins is challenging due to the lack of a conserved functional domain and the low sequence similarity among experimentally characterized Acr proteins. We introduce here AcrFinder, a web server (http://bcb.unl.edu/AcrFinder) that combines three well-accepted ideas used by previous experimental studies to pre-screen genomic data for Acr candidates. These ideas include homology search, guilt-by-association (GBA), and CRISPR-Cas self-targeting spacers. Compared to existing bioinformatics tools, AcrFinder has the following unique functions: (i) it is the first online server specifically mining genomes for Acr-Aca operons; (ii) it provides a most comprehensive Acr and Aca (Acr-associated regulator) database (populated by GBA-based Acr and Aca datasets); (iii) it combines homology-based, GBA-based, and self-targeting approaches in one software package; and (iv) it provides a user-friendly web interface to take both nucleotide and protein sequence files as inputs, and output a result page with graphic representation of the genomic contexts of Acr-Aca operons. The leave-one-out cross-validation on experimentally characterized Acr-Aca operons showed that AcrFinder had a 100% recall. AcrFinder will be a valuable web resource to help experimental microbiologists discover new Anti-CRISPRs.Entities:
Year: 2020 PMID: 32402073 PMCID: PMC7319584 DOI: 10.1093/nar/gkaa351
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Overview of current bioinformatics tools for Acr research
| Name | Resource provided | Features | Input | Output |
|---|---|---|---|---|
| anti-CRISPRDB | Database | Experimentally characterized Acrs and their homologs and BLAST search | NA | NA |
| CRISPRminer | Database | Experimentally characterized Acrs and their homologs and genomic context | NA | NA |
| Acr nomenclature | Google spreadsheets | Experimentally characterized Acrs and Acas nomenclature | NA | NA |
| Self-Targeting Spacer Searcher | Standalone package | Workflow for self-targeting spacer identification | List of genomes | Self-targeting spacers |
| AcrCatalog | Database | Predicted Acrs from decision tree ML classifier + heuristic filtering | NA | NA |
| AcRanker | Web server and standalone package | XGBoost ML classifier using AA biases | Protein sequences | Ranked protein list (no Acr subtype) |
| AcrFinder | Web server and standalone package | Workflow combining Homology + GBA + Self-targeting and user-friendly website | Protein or DNA sequences | Acr-Aca operons (with Acr subtype) |
Figure 1.Sequence properties of 56 experimentally characterized Acr proteins and their genomic context. Numbers in the pies are the number of proteins or loci: (1) 52 out of the 56 Acr proteins are shorter than 200 aa; (2) all the 32 Aca proteins are shorter than 150 aa; (3) all the 56 Acr proteins are located in genomic operons with all the genes in the operon running in the same direction (on the same strand); (4) 32 out of the 56 Acr genes have neighboring Aca genes; (5) 30 Acr-Aca operons have all intergenic distances < 150 bp; (6) 43 out of the 56 Acr proteins have isoelectric point < 7. The detailed information can be found in Supplementary Table S1.
Figure 2.AcrFinder workflow. Two major routes are included: (i) Acr homology search; once Acr homolog is found, the gene neighborhood is examined to only keep those that are located in short-gene operons. (ii) Aca GBA route contains three major steps (described in the main text). The resulting Acr-Aca operons are classified into three groups with different confidence levels.
Leave-one-out evaluation of AcrFinder on genomes containing 8 Acr proteins*
| Prophage hits within | Min. # of prophage hits | Found positives | Total positives | Recall |
|---|---|---|---|---|
|
| 1 | 5 | 8 | 62.5% |
| 0 | 8 | 8 | 100.0% | |
|
| 1 | 7 | 8 | 87.5% |
| 0 | 8 | 8 | 100.0% |
* Detailed experiment results can be found in Supplementary Table S2.
Independent evaluation of AcrFinder and AcRanker on genomes containing 4 Acr proteins (not in the training set)
| Acr family | AcrIE4-F7 | AcrIIA3 | AcrIIA12 | AcrIIA21 |
|---|---|---|---|---|
|
| WP_064584002.1 | WP_014930691.1 | WP_003731276.1 | WP_000384271.1 |
|
| WP_064584003.1 | WP_014930689.1 | WP_003722518.1 | WP_000134666.1 |
|
| GCF_001654435.1 | GCF_000210795.2 | GCF_009807465.1 | GCF_002197205.1 |
|
| 6716 | 2822 | 2938 | 2153 |
|
| TypeIF | TypeIIA + TypeIB | TypeIB | TypeIIA |
|
| 78th | 10th | 5th | 159th |
|
| AcrIE4-IF7 | AcrIIA3 | - | - |
|
| 5 | 10 | 10 | 6 |
* AcrFinder condition: up- or down-stream prophage hits n = 10, Min. # of prophage hits = 1, DIAMOND search mode = –more-sensitive and E-value < 0.01 and query coverage > 0.8.
Figure 3.AcrFinder web server case study. The URL of this case study is http://bcb.unl.edu/AcrFinder/result.php?jobid=1583809584. In this case study, we submitted the fna, faa, and gff files of the RefSeq bacterial genome assembly (GCF_000210795.2), which is known to encode AcrIIA3. (A) is the job submission page, where users can choose different parameters (default values are shown in the text fields). Clicking on ‘Run An EXAMPLE’ will initiate this case study job, which will take ∼2 minutes to finish. The result page will contain five major sections: (B) is the Guilt-by-Association result in a table, which has 17 columns with a variety of information including the inferred Acr subtype (the screenshot only shows the left nine columns); (C) is the JBrowse view of the GBA loci (genes in the loci are highlighted in yellow background); (D) is the Homology-based Acr search result in a table, which has 12 columns with a variety of information including the best known Acr homolog (the screenshot only shows the left six columns); (E) is the JBrowse view of the homology-based loci (genes in the loci are highlighted in yellow background); (F) is the result of CRISPRCasFinder result in a table (parsed to keep only high confidence CRISPR-Cas loci).