| Literature DB >> 35215934 |
Kumarasan Yukgehnaish1, Heera Rajandas1,2, Sivachandran Parimannan1,2, Ravichandran Manickam1,3,4, Kasi Marimuthu1,3, Bent Petersen1,2,5, Martha R J Clokie6, Andrew Millard6, Thomas Sicheritz-Pontén1,2,5.
Abstract
The characterization of therapeutic phage genomes plays a crucial role in the success rate of phage therapies. There are three checkpoints that need to be examined for the selection of phage candidates, namely, the presence of temperate markers, antimicrobial resistance (AMR) genes, and virulence genes. However, currently, no single-step tools are available for this purpose. Hence, we have developed a tool capable of checking all three conditions required for the selection of suitable therapeutic phage candidates. This tool consists of an ensemble of machine-learning-based predictors for determining the presence of temperate markers (integrase, Cro/CI repressor, immunity repressor, DNA partitioning protein A, and antirepressor) along with the integration of the ABRicate tool to determine the presence of antibiotic resistance genes and virulence genes. Using the biological features of the temperate markers, we were able to predict the presence of the temperate markers with high MCC scores (>0.70), corresponding to the lifestyle of the phages with an accuracy of 96.5%. Additionally, the screening of 183 lytic phage genomes revealed that six phages were found to contain AMR or virulence genes, showing that not all lytic phages are suitable to be used for therapy. The suite of predictors, PhageLeads, along with the integrated ABRicate tool, can be accessed online for in silico selection of suitable therapeutic phage candidates from single genome or metagenomic contigs.Entities:
Keywords: AMR; genomics; lysogeny; machine learning; phage therapy
Mesh:
Substances:
Year: 2022 PMID: 35215934 PMCID: PMC8879740 DOI: 10.3390/v14020342
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Keywords used for labeling positive and negative data from annotated protein names and the resulting number of positive and negative labels in training and validation datasets.
| Dataset | Keywords Used for Positive Labels | Number of Positive Labels | Number of Negative Labels |
|---|---|---|---|
| Integrase | Integrase, site-specific recombinase, tyrosine recombinase, serine recombinase, int, tyr recombinase, ser recombinase | Training: 2224 | Training: 156,132 |
| Cro/CI | Cro, CI, C1, CL | Training: 657 | Training: 156,170 |
| Immunity repressor | Immunity related repressor, immunity repressor, ImmR | Training: 754 | Training: 157,602 |
| ParA | DNA partitioning protein A, Chromosome partitioning protein A, ParA, partitioning protein A | Training: 78 | Training: 158,663 |
| Antirepressor | Antirepressor, anti-repressor, antirepressor Rha, Rha | Training: 945 | Training: 157,864 |
Figure 1Network graph of the integrase dataset with green dots indicating positive labels and red dots indicating negative labels.
Figure 2Test MCC of integrase predictor using original dataset and filtered datasets using different numbers of features. Graph on the right shows the magnified section of the original graph with MCC ranging from 0.8 to 1.
Figure 3Test MCC of Cro/CI predictor using original dataset and filtered datasets using different numbers of features. Graph on the right shows the magnified section of the original graph with MCC ranging from 0.5 to 1.
Figure 4Test MCC of immunity repressor predictor using original dataset and filtered datasets using different numbers of features. Graph on the right shows the magnified section of the original graph with MCC ranging from 0.5 to 1.
Figure 5Test MCC of ParA predictor using original dataset and filtered datasets using different numbers of features.
Figure 6Test MCC of antirepressor predictor using original dataset and filtered datasets using different numbers of features. Graph on the right shows the magnified section of the original graph with MCC ranging from 0.5 to 1.
Optimized prediction threshold and validation score metrics of each predictor.
| Predictors | Threshold | Validation MCC | Validation F1 | Validation Accuracy |
|---|---|---|---|---|
| Integrase | 0.252 | 0.92 | 0.92 | 0.99 |
| Cro/CI | 0.363 | 0.71 | 0.71 | 0.99 |
| Immunity repressor | 0.472 | 0.87 | 0.86 | 0.99 |
| ParA | 0.03 | 0.76 | 0.75 | 0.99 |
| Antirepressor | 0.389 | 0.92 | 0.92 | 0.99 |
Figure 7Mean MCC of random 10-fold cross-validation and clustered 10-fold cross-validation.
Confusion matrix of the prediction of phage lifestyle by using integrase, Cro/CI, immunity repressor, ParA, and antirepressor predictor.
| Lytic | Temperate | |
|---|---|---|
| Lytic | 177 | 7 |
| Temperate | 9 | 231 |
Lytic phages that have been predicted as temperate phages (false positive) that contained actual and putative temperate markers.
| Phage Genome | Protein Name | Protein ID | Predicted Marker | Remark |
|---|---|---|---|---|
| Salmonella phage SPN19 (NC_019417) | hypothetical protein | YP_006990261.1 | Integrase | Blastp result shows high similarity (99.1%) to viral integrase family 4 [Salmonella Phage 37;YP_009221453.1] |
| Salmonella phage FSL SP-088 (NC_021780) | hypothetical protein | YP_008239914.1 | Integrase | Blastp result shows high similarity (97.3%) to viral integrase family 4 [Salmonella Phage 37;YP_009221453.1] |
| Lactococcus phage 4268 (NC_004746) | DUF739 family protein | NP_839893.1 | Cro/CI | DUF739/pfam05339 contains putative Cro/CI repressors |
| helix-turn-helix transcriptional regulator | NP_839899.1 | Cro/CI | HTH-transcriptional regulator/repressor/HTH -containing proteins are common ambiguous annotations of Cro/CI | |
| putative antirepressor | NP_839894 | Antirepressor | Lytic phage with lysogeny marker | |
| Lactobacillus phage Lc-Nu (NC_007501) | CI-like repressor | YP_358780.1 | Cro/CI | Lytic phage with lysogeny marker |
| Cro-like repressor | YP_358781.1 | Cro/CI | Lytic phage with lysogeny marker | |
| Mycobacterium phage LRRHood (GQ303262) | immunity repressor | ACU41572.1 | Immunity repressor | Lytic phage with lysogeny marker |
| Mycobacterium phage Alice (JF704092) | hypothetical protein | AEJ94305 | Immunity repressor | Blastp result shows high similarity(98.82%) to immunity repressor [Mycobacterium phage Phox; ATN91327.1] |
Antimicrobial resistance and virulence genes obtained by screening the genome of lytic phages using ABRicate.
| Phage Genome | Gene | Product | Resistance/Virulence | Database Source |
|---|---|---|---|---|
| Stx2 converting phage vB_EcoP_24B (NC_027984) | CatA1 | type A-1 chloramphenicol O-acetyltransferase | Chloramphenicol resistance | NCBI, CARD, ARGANNOT, resfinder, megares |
| iss2 | Increase serum survival protein | Virulence | ecoli_vf | |
| Enterobacteria phage lambda (NC_001416) | iss2 | Increase serum survival protein | Virulence | ecoli_vf |
| Phage cdtI | nleH1 | Type III secretion system effector NleH1 | Virulence | ecoli_vf |
| cif | Type III secretion system effector Cif cyclomodulin | Virulence | ecoli_vf | |
| cdtA | Cytolethal distending toxin subunit A | Virulence | ecoli_vf | |
| cdtB | Cytolethal distending toxin subunit B | Virulence | ecoli_vf | |
| cdtC | Cytolethal distending toxin subunit C | Virulence | ecoli_vf | |
| Enterobacteria phage YYZ-2008 (NC_011356) | stx1A | Shiga toxin 1 subunit A | Virulence | ecoli_vf |
| stx1B | Shiga-like toxin 1 subunit B | Virulence | ecoli_vf | |
| nleG6-3 | NleG Type 3 Effectors | Virulence | ecoli_vf | |
| nleG5-1 | NleG Type 1 Effectors | Virulence | ecoli_vf | |
| Escherichia phage TL-2011c (NC_019442) | stx2A | Shiga toxin 2 subunit A | Virulence | ecoli_vf |
| stx2B | Shiga-like toxin II subunit B | Virulence | ecoli_vf | |
| iss2 | Increase serum survival protein | Virulence | ecoli_vf | |
| Enterobacteria phage HK629 (NC_019711) | iss2 | Increase serum survival protein | Virulence | ecoli_vf |