| Literature DB >> 29642590 |
Casandra W Philipson1,2, Logan J Voegtly3,4, Matthew R Lueder5,6, Kyle A Long7,8, Gregory K Rice9,10, Kenneth G Frey11, Biswajit Biswas12, Regina Z Cer13,14, Theron Hamilton15, Kimberly A Bishop-Lilly16.
Abstract
Multi-drug resistance is increasing at alarming rates. The efficacy of phage therapy, treating bacterial infections with bacteriophages alone or in combination with traditional antibiotics, has been demonstrated in emergency cases in the United States and in other countries, however remains to be approved for wide-spread use in the US. One limiting factor is a lack of guidelines for assessing the genomic safety of phage candidates. We present the phage characterization workflow used by our team to generate data for submitting phages to the Federal Drug Administration (FDA) for authorized use. Essential analysis checkpoints and warnings are detailed for obtaining high-quality genomes, excluding undesirable candidates, rigorously assessing a phage genome for safety and evaluating sequencing contamination. This workflow has been developed in accordance with community standards for high-throughput sequencing of viral genomes as well as principles for ideal phages used for therapy. The feasibility and utility of the pipeline is demonstrated on two new phage genomes that meet all safety criteria. We propose these guidelines as a minimum standard for phages being submitted to the FDA for review as investigational new drug candidates.Entities:
Keywords: IND; best practices; high-throughput sequencing; phage therapy; viral genomes
Mesh:
Year: 2018 PMID: 29642590 PMCID: PMC5923482 DOI: 10.3390/v10040188
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Phage characterization workflow. This pipeline is a simplified representation of tools and methods used to obtain high-quality phage genomes that are deemed viable phage therapy candidates. The pipeline begins with raw reads sequenced on an Illumina machine. To reduce potential bias introduced by bioinformatics tools, quality control and genome assembly are performed using two pipelines in parallel. The final genome sequence is obtained after resolving genome ends. Key viability checkpoints are outlined with dashed borders. In the initial viability check, phages are assessed for problematic genes (antimicrobial resistance (AMR), virulence factors (VF), toxins) and lifestyle. If a candidate passes the initial viability check, a combinatorial approach is applied to identify open reading frames followed by rigorous manual annotation. A final check is performed after completing annotation. Phage candidates that pass the final check point are considered safe for potential use in humans.
Databases curated with virulence factor and antimicrobial resistance genes.
| Database | # Of Genes in Database | Last Updated 1 | Database Source |
|---|---|---|---|
| ShortBRED VF 2 | 26,187 | July 2017 | |
| ShortBRED AR 3 | 932 | July 2017 | |
| Virulence Factor DataBase (VFDB) | 30,246 | February 2018 | |
| Comprehensive Antibiotic Resistance Database (CARD) | 2514 | February 2018 |
1 Last update available for public download. Database download dates for analyses in this manuscript are described in Materials and Methods. 2 Database built using Victors, VFDB and MvirDB. 3 Database built using CARD.
Sequencing and Assembly Statistics.
| Pipeline Output | Pseudomonas Phage vB_PaeP_130_113 | Staphylococcus Phage vB_SauM_0414_108 |
|---|---|---|
| Total Reads | 206,222 | 347,594 |
| Reads Pass FaQCs (%) | 98.82 | 98.38 |
| Reads Pass CLC (%) | 96.97 | 94.03 |
| Reads sub-sampled (#) | 50,000 | 50,000 |
| SPAdes all reads 1 | 1 | 2 |
| CLC all reads 1 | 1 | 1 |
| SPAdes subsampled 1 | 1 | 1 |
| CLC subsampled 1 | 1 | 1 |
| SPAdes all reads 2 | 43,742 | 141,507 |
| CLC all reads 2 | 43,742 | 141,334 |
| SPAdes subsampled 2 | 43,742 | 141,331 |
| CLC subsampled 2 | 43,742 | 141,330 |
1 Number of contigs >700 base pairs long. 2 Length of largest contig (bp), SPAdes assembly artifacts removed.
Genomic termini statistics.
| Phage | Class 1 | DTR Region Length | Start, End τ Metric 2 | Coverage in DTR Region | Coverage Outside of DTR Region |
|---|---|---|---|---|---|
| Pseudomonas phage vB_PaeP_130_113 | Short DTR | 463 bp | 0.63, 0.64 | 1018.0× | 634.9× |
| Staphylococcus phage vB_SauM_0414_108 | Long DTR | 10,296 bp | 0.75, 0.55 | 753.1× | 343.9× |
Above metrics are determined by PhageTerm. 1 One of the following: 5′ cos, 3′ cos, Short DTR, Long DTR, headful (with or without pac site detected), Mu-like, or unknown. 2 τ in forward direction for first nucleotide of DTR region, τ in reverse direction for last nucleotide of DTR region.
Phage lifestyle assessment.
| Phage | PHACTS Lytic Score | PHACTS Temperate Score | PHACTS Standard Deviation | PHASTER Integrase | RAST Integrase | NCBI Annotated Integrase |
|---|---|---|---|---|---|---|
| Pseudomonas phage vB_PaeP_130_113 | 0.66 | 0.34 | 0.073 | No | No | N/A |
| Pseudomonas phage DL62 (GI:KR054031) | 0.73 | 0.26 | 0.117 | No | No | No |
| Pseudomonas phage vB_PaeS_PMG1 (GI:NC_016765) | 0.42 | 0.58 | 0.042 | Yes | Yes | Yes |
| Staphylococcus phage vB_SauM_0414_108 | 0.60 | 0.40 | 0.082 | No | No | N/A |
| Staphylococcus phage K (GI:KF76114) | 0.59 | 0.41 | 0.107 | No | No | No |
| Staphylococcus phage phiSaus-IPLA88 (GI:NC_011614) | 0.28 | 0.72 | 0.048 | Yes | Yes | Yes |
N/A = Not applicable due to in house anntoation.
Figure 2Contaminant analysis using read-based taxonomy classification. Read-based taxonomy results are presented for Pseudomonas phage vB_PaeP_130_113 (A,B); and Staphylococcus phage vB_SauM_0414_108 (C,D). Taxonomy results for all classification tools (relative abundance) using all reads that pass QC are presented as heatmaps (A,C). Reads were classified by GOTTCHA using databases comprised of bacteria (species-level: gottcha-speDB-b; strain-level: gottcha-strDB-b) or viruses (species-level: gottcha-speDB-v; strain-level: gottcha-strDB-v), Kraken (kraken_mini), metaphlan and BWA against RefSeq (BWA-mem). All reads that were classified by BWA are presented as a Krona plots, where percentages are the number of reads that map to each organism divided by the total number of classified reads (B,D).
Finished genome details.
| Phage | Size (bp) | %GC | CDS (#) | Genes with Functional Annotation (#) | Hypothetical Genes (#) | tRNA (#) | Assigned Family |
|---|---|---|---|---|---|---|---|
| Pseudomonas phage vB_PaeP_130_113 | 44,205 | 62.4 | 57 | 35 | 22 | 0 | Podoviridae |
| Staphylococcus phage vB_SauM_0414_108 | 151,627 | 30.4 | 241 | 154 | 87 | 4 | Myoviridae |
Figure 3Whole genome maps for finished annotated phage genomes. Annotations for selected predicted open reading frames (ORFs) are presented for Pseudomonas phage vB_PaeP_130_113 (A); and Staphylococcus phage vB_SauM_0414_108 (B). Mauve colored arrows indicate the ORF has been annotated; grey colored arrows indicate ORFs annotated “hypothetical”; yellow arrows indicate tRNA.