| Literature DB >> 35518360 |
Rui Gan1, FengXia Zhou1, Yu Si1, Han Yang1, Chuangeng Chen1, Chunyan Ren2, Jiqiu Wu3, Fan Zhang1.
Abstract
As an intracellular form of a bacteriophage in the bacterial host genome, a prophage usually integrates into bacterial DNA with high specificity and contributes to horizontal gene transfer (HGT). With the exponentially increasing number of microbial sequences uncovered in genomic or metagenomics studies, there is a massive demand for a tool that is capable of fast and accurate identification of prophages. Here, we introduce DBSCAN-SWA, a command line software tool developed to predict prophage regions in bacterial genomes. DBSCAN-SWA runs faster than any previous tools. Importantly, it has great detection power based on analysis using 184 manually curated prophages, with a recall of 85% compared with Phage_Finder (63%), VirSorter (74%), and PHASTER (82%) for (Multi-) FASTA sequences. Moreover, DBSCAN-SWA outperforms the existing standalone prophage prediction tools for high-throughput sequencing data based on the analysis of 19,989 contigs of 400 bacterial genomes collected from Human Microbiome Project (HMP) project. DBSCAN-SWA also provides user-friendly result visualizations including a circular prophage viewer and interactive DataTables. DBSCAN-SWA is implemented in Python3 and is available under an open source GPLv2 license from https://github.com/HIT-ImmunologyLab/DBSCAN-SWA/.Entities:
Keywords: density-based spatial clustering; phage; phage-host interaction; prophage; sliding window
Year: 2022 PMID: 35518360 PMCID: PMC9061938 DOI: 10.3389/fgene.2022.885048
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Performance comparison of DBSCAN-SWA with other prophage detection tools on Xylella fastidiosa Temecula1genome sequence (NC_004556).
| DBSCAN-SWA | Prophage hunter | PHASTER | Phage_Finder | VirSorter | |
|---|---|---|---|---|---|
| Last updated | 2020 | 2019 | 2016 | 2006 | 2015 |
| Input type | FASTA/GBK | FASTA | FASTA/GBK | Special format | FASTA |
| Timing | ∼1.5 min | ∼9 min | Slow (queuing) | ∼2 min | ∼15 min |
| Standalone | YES | NO | NO | YES | YES |
| Interactive | YES | YES | YES | NO | YES |
| Att site prediction | YES | YES | YES | NO | NO |
| Gene annotation | YES | YES | NO | YES | NO |
| Recall | 100% | N/A | ∼71% | ∼57% | ∼57% |
N/A means more tests are needed. Timing was tested on a Linux platform for Xylella fastidiosa Temecula1, which has a genome of approximately 2.5 Mbp. Slow means depending on the queuing time. No in “standalone” means only a webserver is provided. Recall was calculated for Xylella fastidiosa Temecula1 using (Multi−) FASTA, sequences. Special input files are needed for Phage_Finder including pep/.ffa, .ptt, and .con/.fna files.
FIGURE 1The pipeline for detection and annotation of prophages for bacterial genomes. (A) Identification of phage or phage-like proteins. (B) Detection of prophage clusters by Density-Based Spatial Clustering of Application with Noise algorithm. (C) Detection of prophage clusters by Sliding Window Algorithm. (D) Identification of attachment sites in prophage clusters. (E) Annotation of infecting phages for the predicted prophage regions.
FIGURE 2Visualizations of DBSCAN-SWA for prophage detection. (A) Interactive XHTML visualization of predicted prophages for Xylella fastidiosa Temecula1 (NC_004556) including a circular prophage viewer to display colored prophage regions with att sequences and interactive tables to display the detailed information of each prophage. (B) Interactive tables to display the predicted infecting phages and hit information for Xylella fastidiosa Temecula1 by using the parameter: “--add annotation PGPD”.
FIGURE 3Recall of prophage detection tools for 184 manually curated prophages and 400 HMP bacterial genomes. (A) Recall of detection results for 184 manually curated prophages using DBSCAN-SWA, PHASTER, VirSorter, and Phage_Finder. (B) Detection rate and time of predicting prophages for 400 HMP bacterial genomes. (C) Shared prophages of DBSCAN-SWA, Phage_Finder, and VirSorter for 400 HMP bacterial genomes.