| Literature DB >> 30619469 |
Ailton Lopes de Sousa1, Dener Maués2, Amália Lobato1, Edian F Franco1, Kenny Pinheiro1, Fabrício Araújo1, Yan Pantoja1, Artur Luiz da Costa da Silva1, Jefferson Morais2, Rommel T J Ramos1.
Abstract
This study developed a computational tool with a graphical interface and a web-service that allows the identification of phage regions through homology search and gene clustering. It uses G+C content variation evaluation and tRNA prediction sites as evidence to reinforce the presence of prophages in indeterminate regions. Also, it performs the functional characterization of the prophages regions through data integration of biological databases. The performance of PhageWeb was compared to other available tools (PHASTER, Prophinder, and PhiSpy) using Sensitivity (Sn) and Positive Predictive Value (PPV) tests. As a reference for the tests, more than 80 manually annotated genomes were used. In the PhageWeb analysis, the Sn index was 86.1% and the PPV was approximately 87%, while the second best tool presented Sn and PPV values of 83.3 and 86.5%, respectively. These numbers allowed us to observe a greater precision in the regions identified by PhageWeb while compared to other prediction tools submitted to the same tests. Additionally, PhageWeb was much faster than the other computational alternatives, decreasing the processing time to approximately one-ninth of the time required by the second best software. PhageWeb is freely available at http://computationalbiology.ufpa.br/phageweb.Entities:
Keywords: bacterial genome; characterization; clustering; phage; prophage; web interface; web service
Year: 2018 PMID: 30619469 PMCID: PMC6305541 DOI: 10.3389/fgene.2018.00644
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Pipeline for identification and characterization of prophages by the PhageWeb computational tool. (A) The pipeline receives as input the parameters (Alignment Identity and MinPts – minimum number of phage proteins in a region) and the annotation file (GenBank, Embl or the NCBI’s Accession Number) of the target genome which will be evaluated; (B) The homology search of the coding sequences in a local database based on the publically available sequences annotated as phage obtained from NCBI, which is automatically updated once per week; (C) After identifying the homology sequences, a clustering analysis based on the distance of the elements is performed: part of phage region to be evaluated on the amount of prophage there; (D) This optional step is useful to know if the identified region has more features which can be an evidence of prophage: G+C content (due to be possible variation of G+C content on the flanks of the prophage) and the presence of tRNAs on the flank; (E) Use of web services to connect biological databases to perform the functional characterization of the identified sequences; (F) Verification of the probable integrity of the prophage based on the composition of genes of each identified phage.
Performance Evaluation of Clustering algorithms in the identification of prophage regions, based on the metrics Silhouette, Dunn, Davies-Bouldin (DB), and Density-Based Clustering Validation index (DBCV).
| Algorithms | Cluster | Silhouette | DBCV | Dunn | DB |
|---|---|---|---|---|---|
| Dbscan | 151 | 0.47 | -0.73323973 | 0.0006 | 0.553 |
| Optic | 168 | 0.54 | -0.677653797 | 0.003 | |
| Hdbscan | 1.2 |
Comparison of functionalities and features of phage prediction tools.
| Resource | Phaster | Prophinder | PhiSpy | PhageWeb |
|---|---|---|---|---|
| Using graphical interface | Yes | Yes | No | Yes |
| Homology analyses | Yes | Yes | Yes | Yes |
| Analyses of tRNA sites | Yes | No | No | Yes |
| G+C content analysis | No | No | No | Yes |
| Results exportation | Yes | Yes | No | Yes |
| Circular genome view | Yes | No | No | Yes |
| Characterization of sequences | Yes | No | No | Yes |
| Alignment details | Yes | No | No | Yes |
| Support for biological databases integration | No | No | No | Yes |
| Output types | Text, graphics | Text, graphics | Text only | Text, graphics |
| Run time (seconds) | ∼365 | ∼1890 | ∼5547 | ∼22 |
Comparative analysis of values obtained for Sn (Sensitivity) and PPV (Positive Predictive Value) between computational tools.
| Phaster | Prophinder | PhiSpy | PhageWeb | |
|---|---|---|---|---|
| Sn | 83.33% | 81.02% | 52.78% | 86.11% |
| PPV | 86.54% | 77.43% | 88.37% | 87.32% |
Prophage regions identified by computational tools for the genome of Lactococcus lactis subsp. lactis ll1403 (NC 002662) compared to that of the lineage that was manually curated annotation.
| Prophage | Reference coordinates | Phaster | Prophinder | PhiSpy | PhageWeb |
|---|---|---|---|---|---|
| Region 1 | 35516-49727 | 28461-56371 | 35516-49727 | 28818-56368 | 35516-72698 |
| Region 2 | 447236-483244 | 443651-484066 | 451007-483244 | 447083-484064 | 447236-483552 |
| Region 3 | 502723-513742 | 502338-520485 | 502723-511542 | - | 502723-517314 |
| Region 4 | 1036642-1071558 | 1033815-1079175 | 1036642-1071558 | 1036482-1113152 | 1036642-1159446 |
| Region 5 | 1414112-1456949 | 1414112-1457046 | 1439215-1446438 | 1415361-1457456 | 1415811-1456949 |
| Region 6 | 2013685-2025635 | 1997701-2028023 | 2011426-2025635 | - | 2013685-2024681 |
| - | False positives | - | - | 633126-658623 | - |
FIGURE 2Graphical representation of the results from analyzing the genome of Lactococcus lactis subsp. lactis Il1403 (NC 002662) by BRIG software.