| Literature DB >> 34189396 |
John Huddleston1,2, James Hadfield2, Thomas R Sibley2, Jover Lee2, Kairsten Fay2, Misja Ilcisin2, Elias Harkins2, Trevor Bedford1,2, Richard A Neher3,4, Emma B Hodcroft3,4,5.
Abstract
The analysis of human pathogens requires a diverse collection of bioinformatics tools. These tools include standard genomic and phylogenetic software and custom software developed to handle the relatively numerous and short genomes of viruses and bacteria. Researchers increasingly depend on the outputs of these tools to infer transmission dynamics of human diseases and make actionable recommendations to public health officials (Black et al., 2020; Gardy et al., 2015). In order to enable real-time analyses of pathogen evolution, bioinformatics tools must scale rapidly with the number of samples and be flexible enough to adapt to a variety of questions and organisms. To meet these needs, we developed Augur, a bioinformatics toolkit designed for phylogenetic analyses of human pathogens.Entities:
Year: 2021 PMID: 34189396 PMCID: PMC8237802 DOI: 10.21105/joss.02906
Source DB: PubMed Journal: J Open Source Softw ISSN: 2475-9066
Figure 1:Example workflows composed with Snakemake from Augur commands for A) Zika virus, B) tuberculosis, C) a BEAST analysis, and D) the Nextstrain SARS-CoV-2 pipeline as of 2020-11-27. Each node in the workflow graph represents a command that performs a specific part of the analysis (e.g., aligning sequences, building a tree, etc.) with Augur commands in black, external software in red, and custom scripts in blue. A typical workflow starts by filtering sequences and metadata to a desired subset for analysis followed by inference of a phylogeny, annotation of that phylogeny, and export of the annotated phylogeny to a JSON that can be viewed on Nextstrain. Workflows for viral (A) and bacterial (B) pathogens follow a similar structure but also support custom pathogen-specific steps. Augur’s modularity enables workflows that build on outputs from other tools in the field like BEAST (C) as well as more complicated analyses such as that behind Nextstrain’s daily SARS-CoV-2 builds (D) which often require custom scripts to perform analysis-specific steps. Multiple outgoing edges from a single node represent opportunities to run the workflow in parallel. See the full workflows behind A, B, and D at https://github.com/nextstrain/zika-tutorial, https://github.com/nextstrain/tb, and https://github.com/nextstrain/ncov.