| Literature DB >> 35332213 |
Niema Moshiri1, Kathleen M Fisch2,3, Amanda Birmingham2, Peter DeHoff3, Gene W Yeo4,5,6, Kristen Jepsen6, Louise C Laurent3, Rob Knight7,8,9,10.
Abstract
Throughout the COVID-19 pandemic, massive sequencing and data sharing efforts enabled the real-time surveillance of novel SARS-CoV-2 strains throughout the world, the results of which provided public health officials with actionable information to prevent the spread of the virus. However, with great sequencing comes great computation, and while cloud computing platforms bring high-performance computing directly into the hands of all who seek it, optimal design and configuration of a cloud compute cluster requires significant system administration expertise. We developed ViReflow, a user-friendly viral consensus sequence reconstruction pipeline enabling rapid analysis of viral sequence datasets leveraging Amazon Web Services (AWS) cloud compute resources and the Reflow system. ViReflow was developed specifically in response to the COVID-19 pandemic, but it is general to any viral pathogen. Importantly, when utilized with sufficient compute resources, ViReflow can trim, map, call variants, and call consensus sequences from amplicon sequence data from 1000 SARS-CoV-2 samples at 1000X depth in < 10 min, with no user intervention. ViReflow's simplicity, flexibility, and scalability make it an ideal tool for viral molecular epidemiological efforts.Entities:
Mesh:
Year: 2022 PMID: 35332213 PMCID: PMC8943356 DOI: 10.1038/s41598-022-09035-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Pipeline comparison.
| V-pipe | nf-core/viralrecon | HAVoC | ViralFlow | ViReflow | |
|---|---|---|---|---|---|
| Graphical user interface (GUI) | No | No | No | No | Yes |
| Amplicon sequencing support | No | Yes | Yes | Yes | Yes |
| Workflow tool | Snakemake[ | Nextflow[ | Bash script | Python script | Reflow |
| Native cloud compute support | None | AWS, GCP, Azure | None | None | AWS |
| Automatic compute resource scaling | No | No | No | No | Yes |
| Supported read trimmers | PRINSEQ[ | Cutadapt[ | fastp, Trimmomatic[ | fastp | fastp, iVar, PRINSEQ, pTrimmer[ |
| Supported read mappers | BWA-MEM[ | Bowtie2[ | Bowtie2, BWA-MEM | BWA-MEM | Bowtie2, BWA-MEM, HISAT2[ |
| Supported variant callers | LoFreq[ | iVar, bcftools[ | LoFreq | iVar | FreeBayes[ |
Bold denotes analyses that are optional in ViReflow.
Figure 1ViReflow pipeline. Vireflow implements a standard viral consensus sequence reconstruction pipeline, with multiple tool choices for each step of the pipeline. The output consensus sequence is produced by incorporating high-depth variant calls into the reference genome sequence.
Figure 2ViReflow Graphical User Interface (GUI).
Benchmark of ViReflow.
| # FASTQ pairs | Runtime (s) | Cost (USD) | Cost/Sample (USD) |
|---|---|---|---|
| 1S | 284 (4) | 0.01 (2 × 10–18) | 0.0100 |
| 10S | 255 (12) | 0.04 (0.003) | 0.0041 |
| 100S | 416 (21) | 0.49 (0.024) | 0.0049 |
| 1000S | 491 (9) | 5.65 (0.119) | 0.0057 |
| 10,000S | 12,075 (N/A) | 1197.53 (N/A) | 0.1198 |
| 684R | 8,267 (N/A) | 59.04 (N/A) | 0.0863 |
| 2607R,C | 4,144 (N/A) | 117.48 (N/A) | 0.0451 |
ViReflow was executed on 1, 10, 100, 1 K, and 10 K random 1000X depth sub-samplings of the single highest-depth sample from a NovaSeq SARS-CoV-2 amplicon sequencing run (denoted withS). ViReflow was also executed on two real NovaSeq runs (denoted withR), one of which was capped at 2 million successfully-mapped reads for each sample (denoted withC). All executions were run single-threaded. Total runtime (seconds) and total cost (US Dollars) across 10 technical replicates are shown as Mean (SD) pairs. “N/A” denotes single replicate execution due to high per-replicate compute costs. Specific details of tool choices (with versions) for each step of the pipeline can be found in the “Methods” section.