Literature DB >> 31086982

multiPhATE: bioinformatics pipeline for functional annotation of phage isolates.

Carol L Ecale Zhou¹, Stephanie Malfatti¹, Jeffrey Kimbrel¹, Casandra Philipson^2,3, Katelyn McNair⁴, Theron Hamilton², Robert Edwards⁴, Brian Souza¹.

Abstract

SUMMARY: To address the need for improved phage annotation tools that scale, we created an automated throughput annotation pipeline: multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE). multiPhATE is a throughput pipeline driver that invokes an annotation pipeline (PhATE) across a user-specified set of phage genomes. This tool incorporates a de novo phage gene calling algorithm and assigns putative functions to gene calls using protein-, virus- and phage-centric databases. multiPhATE's modular construction allows the user to implement all or any portion of the analyses by acquiring local instances of the desired databases and specifying the desired analyses in a configuration file. We demonstrate multiPhATE by annotating two newly sequenced Yersinia pestis phage genomes. Within multiPhATE, the PhATE processing pipeline can be readily implemented across multiple processors, making it adaptable for throughput sequencing projects. Software documentation assists the user in configuring the system.
AVAILABILITY AND IMPLEMENTATION: multiPhATE was implemented in Python 3.7, and runs as a command-line code under Linux or Unix. multiPhATE is freely available under an open-source BSD3 license from https://github.com/carolzhou/multiPhATE. Instructions for acquiring the databases and third-party codes used by multiPhATE are included in the distribution README file. Users may report bugs by submitting to the github issues page associated with the multiPhATE distribution. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Species

Mesh：

Year: 2019 PMID： 31086982 PMCID： PMC6821344 DOI： 10.1093/bioinformatics/btz258

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

A bacteriophage (also known as ‘phage’) is a virus that parasitizes a bacterium by infecting it and reproducing within it. This work was motivated by a need to increase the throughput potential for describing newly sequenced phage genomes. Global pathogen discovery efforts, such as The Global Virome Project (Carrol ), are projected to invest billions of dollars to support surveillance projects that characterize the earth’s virosphere over the next 10 years. Already, the PhagesDB contains >13 000 phage genomes (Russell and Hatfull, 2017). Phage therapy has resurfaced as a method to combat antimicrobial resistance, and upcoming clinical trials necessitate complete sequencing and characterization of therapeutic candidates, but high-quality gene calling and functional annotation are vital for successful genomic comparison studies and for discovery of new phage-based therapeutic leads (Kutter ). Because annotation of phage genomes is a relatively new science, there exist few bioinformatics pipelines for phage analysis that can be readily adapted for use in phage research efforts. Currently, researchers typically apply bacterial gene callers for annotation of phage DNA, followed by largely manual analyses using web forms, and integration of summary results can be time consuming. Although there exist several codes for identifying prophage sequences in bacterial genomes (Arndt ; Kang ; Roux ; and others), once these sequences have been identified, they are typically annotated using methods developed for sequences from other taxa (Perkel, 2017; Seemann, 2014). Currently there exists only one automated annotation pipeline specifically for phage: Philipson describe a pipeline that identifies features in phage that determine their potential suitability as therapeutic reagents. However, there remains a need for an automated phage annotation pipeline that can be readily implemented on multiple nodes of a local server and that requires minimal software development expertise. To address this need, we present the multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE) automated high-throughput phage annotation pipeline.

2Description

The PhATE annotation pipeline incorporates four gene callers (if selected): GeneMarkS (Lomsadze ), Glimmer (Delcher ), Prodigal (Hyatt ) and a novel phage-centric gene caller, PHANOTATE (McNair ). Functional annotation is achieved by Basic Local Alignment Search Tool (BLAST) and Hidden Markov Model (HMM) searches for homologous sequences in protein- and phage-centric databases. The PhATE workflow is depicted in Supplementary File, ‘phate_Fig_1_PhATE_Workflow.pdf’.

2.1 Input

Input to multiPhATE consists of a configuration file that specifies a list of genomes to be processed by PhATE and a set of parameters controlling software execution. The user specifies the names of phage genome fasta files, the names of output subdirectories and other metadata pertaining to the genomes being analyzed. The user also specifies the following optional analyses: (i) gene caller(s) to be run; (ii) gene-caller to use for subsequent annotation (default: PHANOTATE); (iii) blast parameters; (iv) blast databases to be searched; (v) turn hmm search on/off. It is possible to run PhATE using any or all of the specified gene callers, databases and searches. In this way, installation can be achieved one gene-caller or database at a time, with stepwise testing. Also, the user can switch on/off searches (e.g. NR) in order to control execution time (this may be useful in performing preliminary annotation of large numbers of sequences). Although multiPhATE is intended for phage sequence annotation, it would be reasonable to run multiPhATE with bacterial genomes to assist identification of embedded phage sequence.

2.2 Annotation

PhATE begins by performing gene calling using the selected gene caller(s). When two or more are invoked, PhATE outputs a summary table showing a side-by-side comparison of the gene calls, plus summary statistics regarding the numbers and lengths of gene calls for each algorithm, and the numbers of calls in common and unique to each. Next, PhATE uses BLAST+ programs (Camacho ) blastn and blastp, and the HMM search program jackhmmer (Johnson ), to identify homologs of the input genome and its predicted gene and peptide sequences using several databases: National Center for Biological Information (NCBI) virus genomes, NCBI Refseq proteins, NCBI refseq genes, NCBI virus proteins and Non-Redundant protein sequence database (NR) (NCBI Resource Coordinators, 2016), as well as Swissprot (Bairoch and Apweiler, 2000), Phage Annotation Tools and Methods (PhAnToMe) (www.phantome.org), a virus subset of Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa ) and a fasta sequence dataset derived using the database of phage Virus Orthologous Groups (pVOG) identifiers (Grazziotin ). The latter database is modified to contain the pVOG identifiers in the fasta headers, by means of scripts included in the multiPhATE distribution.

2.3 Output

PhATE generates the following files and directories: (i) output from the gene-call algorithms and the gene-call comparison (Supplementary Material ‘phate_P2_CGC.pdf’); (ii) gene and translated peptide fasta files; (iii) combined-annotation summary files; (iv) directories containing raw BLAST outputs for genome and peptide blast runs; (v) directories with raw HMM search outputs for peptide searches; (vi) alignment-ready fasta files containing each predicted peptide plus the members of each identified pVOG family to which a peptide may be assigned and (vii) log files. BLAST and HMM raw data outputs can be saved or cleaned from the output directories (see README). We demonstrate application of multiPhATE to the annotation of two newly sequenced Yersinia pestis phage genomes (see Supplementary Material ‘phate_results.pdf’. Click here for additional data file.

17 in total

1. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors: A Bairoch; R Apweiler
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors: Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal: Bioinformatics Date: 2007-01-19 Impact factor: 6.937

3. How bioinformatics tools are bringing genetic analysis to the masses.

Authors: Jeffrey M Perkel
Journal: Nature Date: 2017-02-28 Impact factor: 49.962

4. The Global Virome Project.

Authors: Dennis Carroll; Peter Daszak; Nathan D Wolfe; George F Gao; Carlos M Morel; Subhash Morzaria; Ariel Pablos-Méndez; Oyewale Tomori; Jonna A K Mazet
Journal: Science Date: 2018-02-23 Impact factor: 47.728

5. VirSorter: mining viral signal from microbial genomic data.

Authors: Simon Roux; Francois Enault; Bonnie L Hurwitz; Matthew B Sullivan
Journal: PeerJ Date: 2015-05-28 Impact factor: 2.984

6. KEGG: new perspectives on genomes, pathways, diseases and drugs.

Authors: Minoru Kanehisa; Miho Furumichi; Mao Tanabe; Yoko Sato; Kanae Morishima
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

7. Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation.

Authors: Ana Laura Grazziotin; Eugene V Koonin; David M Kristensen
Journal: Nucleic Acids Res Date: 2016-10-26 Impact factor: 16.971

8. PHASTER: a better, faster version of the PHAST phage search tool.

Authors: David Arndt; Jason R Grant; Ana Marcu; Tanvir Sajed; Allison Pon; Yongjie Liang; David S Wishart
Journal: Nucleic Acids Res Date: 2016-05-03 Impact factor: 16.971

9. Database resources of the National Center for Biotechnology Information.

Authors:
Journal: Nucleic Acids Res Date: 2015-11-28 Impact factor: 16.971

10. Characterizing Phage Genomes for Therapeutic Applications.

Authors: Casandra W Philipson; Logan J Voegtly; Matthew R Lueder; Kyle A Long; Gregory K Rice; Kenneth G Frey; Biswajit Biswas; Regina Z Cer; Theron Hamilton; Kimberly A Bishop-Lilly
Journal: Viruses Date: 2018-04-10 Impact factor: 5.048

15 in total

1. Systematic exploration of Escherichia coli phage-host interactions with the BASEL phage collection.

Authors: Enea Maffei; Aisylu Shaidullina; Marco Burkolter; Yannik Heyer; Fabienne Estermann; Valentin Druelle; Patrick Sauer; Luc Willi; Sarah Michaelis; Hubert Hilbi; David S Thaler; Alexander Harms
Journal: PLoS Biol Date: 2021-11-16 Impact factor: 8.029

2. In Vitro Demonstration of Targeted Phage Therapy and Competitive Exclusion as a Novel Strategy for Decolonization of Extended-Spectrum-Cephalosporin-Resistant Escherichia coli.

Authors: Sam Abraham; Mark O'Dea; Tanya Laird; Rebecca Abraham; Shafi Sahibzada
Journal: Appl Environ Microbiol Date: 2022-03-07 Impact factor: 5.005

3. MultiPhATE2: code for functional annotation and comparison of phage genomes.

Authors: Carol L Ecale Zhou; Jeffrey Kimbrel; Robert Edwards; Katelyn McNair; Brian A Souza; Stephanie Malfatti
Journal: G3 (Bethesda) Date: 2021-05-07 Impact factor: 3.154

4. Bacteriophage genotyping using BOXA repetitive-PCR.

Authors: Dragica Damnjanovic; Xabier Vázquez-Campos; Daniel L Winter; Melissa Harvey; Wallace J Bridge
Journal: BMC Microbiol Date: 2020-06-11 Impact factor: 3.605

5. Galaxy and Apollo as a biologist-friendly interface for high-quality cooperative phage genome annotation.

Authors: Jolene Ramsey; Helena Rasche; Cory Maughmer; Anthony Criscione; Eleni Mijalis; Mei Liu; James C Hu; Ry Young; Jason J Gill
Journal: PLoS Comput Biol Date: 2020-11-02 Impact factor: 4.475

6. Long-read metagenomics using PromethION uncovers oral bacteriophages and their interaction with host bacteria.

Authors: Koji Yahara; Masato Suzuki; Aki Hirabayashi; Wataru Suda; Masahira Hattori; Yutaka Suzuki; Yusuke Okazaki
Journal: Nat Commun Date: 2021-01-04 Impact factor: 14.919

7. Phage Annotation Guide: Guidelines for Assembly and High-Quality Annotation.

Authors: Dann Turner; Evelien M Adriaenssens; Igor Tolstoy; Andrew M Kropinski
Journal: Phage (New Rochelle) Date: 2021-12-16

8. Genome Sequence of the Bacteriophage CL31 and Interaction with the Host Strain Corynebacterium glutamicum ATCC 13032.

Authors: Max Hünnefeld; Ulrike Viets; Vikas Sharma; Astrid Wirtz; Aël Hardy; Julia Frunzke
Journal: Viruses Date: 2021-03-17 Impact factor: 5.048

9. Improving bioinformatics software quality through incorporation of software engineering practices.

Authors: Adeeb Noor
Journal: PeerJ Comput Sci Date: 2022-01-05

10. Genome analysis of Pseudomonas sp. OF001 and Rubrivivax sp. A210 suggests multicopper oxidases catalyze manganese oxidation required for cylindrospermopsin transformation.

Authors: Erika Berenice Martínez-Ruiz; Myriel Cooper; Jimena Barrero-Canosa; Mindia A S Haryono; Irina Bessarab; Rohan B H Williams; Ulrich Szewzyk
Journal: BMC Genomics Date: 2021-06-22 Impact factor: 3.969