| Literature DB >> 33734357 |
Carol L Ecale Zhou1, Jeffrey Kimbrel2, Robert Edwards3,4,5,6, Katelyn McNair3, Brian A Souza2, Stephanie Malfatti2.
Abstract
To address a need for improved tools for annotation and comparative genomics of bacteriophage genomes, we developed multiPhATE2. As an extension of multiPhATE, a functional annotation code released previously, multiPhATE2 performs gene finding using multiple algorithms, compares the results of the algorithms, performs functional annotation of coding sequences, and incorporates additional search algorithms and databases to extend the search space of the original code. MultiPhATE2 performs gene matching among sets of closely related bacteriophage genomes, and uses multiprocessing to speed computations. MultiPhATE2 can be re-started at multiple points within the workflow to allow the user to examine intermediate results and adjust the subsequent computations accordingly. In addition, multiPhATE2 accommodates custom gene calls and sequence databases, again adding flexibility. MultiPhATE2 was implemented in Python 3.7 and runs as a command-line code under Linux or MAC operating systems. Full documentation is provided as a README file and a Wiki website.Entities:
Keywords: bacteriophage; bioinformatics tool; comparative genomics; gene prediction; genome annotation; phage
Mesh:
Year: 2021 PMID: 33734357 PMCID: PMC8104953 DOI: 10.1093/g3journal/jkab074
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Overview of multiPhATE2 system and workflow. User-specified configurations (configuration file) are input to the multiPhATE2 system, which invokes four subsystems: Gene Calling, the PhATE annotation pipeline, Compare Gene Profiles, which performs binary genome-to-genome comparisons of genes and proteins, and Genomics, which consolidates binary comparisons into gene–gene and protein–protein correspondences among all input genomes.
Figure 2System overview and configurable process control features of multiPhATE2. Large blue arrows: multiPhATE2 subsystems; CGP = Compare Gene Profiles; curved grey arrows: process controls (stop = stopping point; checkpoint = point at which processing may be restarted); “parallel processing” indicates multiprocessing applied to functional annotation of input genomes and binary genome-to-genome comparisons; “parallel blast” indicates multithreading option provided by BLAST+.
Figure 3Gene callers and gene-call comparison in the Gene Calling subsystem of multiPhATE2. The user may select any or all of the supported gene callers and/or provide their own gene calls (custom). The Compare Gene Calls module computes a set of calls that are common among all selected callers (common calls), a consensus set comprising gene calls produced by at least two callers (consensus calls), and a nonredundant superset of gene calls (superset). The user may select the results of one gene caller or a super/subset for input to the PhATE subsystem.
Figure 4Functional annotation options supported within the PhATE subsystem of multiPhATE2. The user may select any or all of the algorithms to search any or all of the databases over genome, gene, and/or protein sequences. dbxrefs = database external references, which comprise additional information about a given database entry.