| Literature DB >> 35561186 |
Chien-Chi Lo1, Migun Shakya1, Ryan Connor2, Karen Davenport1, Mark Flynn1, Adán Myers Y Gutiérrez1, Bin Hu1, Po-E Li1, Elais Player Jackson1, Yan Xu1, Patrick S G Chain1.
Abstract
SUMMARY: Genomics has become an essential technology for surveilling emerging infectious disease outbreaks. A range of technologies and strategies for pathogen genome enrichment and sequencing are being used by laboratories worldwide, together with different and sometimes ad hoc, analytical procedures for generating genome sequences. A fully integrated analytical process for raw sequence to consensus genome determination, suited to outbreaks such as the ongoing COVID-19 pandemic, is critical to provide a solid genomic basis for epidemiological analyses and well-informed decision making. We have developed a web-based platform and integrated bioinformatic workflows that help to provide consistent high-quality analysis of SARS-CoV-2 sequencing data generated with either the Illumina or Oxford Nanopore Technologies (ONT). Using an intuitive web-based interface, this workflow automates data quality control, SARS-CoV-2 reference-based genome variant and consensus calling, lineage determination and provides the ability to submit the consensus sequence and necessary metadata to GenBank, GISAID and INSDC raw data repositories. We tested workflow usability using real world data and validated the accuracy of variant and lineage analysis using several test datasets, and further performed detailed comparisons with results from the COVID-19 Galaxy Project workflow. Our analyses indicate that EC-19 workflows generate high-quality SARS-CoV-2 genomes. Finally, we share a perspective on patterns and impact observed with Illumina versus ONT technologies on workflow congruence and differences.Entities:
Mesh:
Year: 2022 PMID: 35561186 PMCID: PMC9113274 DOI: 10.1093/bioinformatics/btac176
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of the EDGE COVID-19 workflow. EC-19 includes Quality Control, mapping reads to a SARS-CoV-2 reference genome sequence, removing primer sequences when needed, variant allele analysis and generation of consensus genomes, lineage determination and phylogenetic placement. Dotted lines indicate optional steps. Greater detail, including underlying tools and versioning can be found in Supplementary Table S2 and in Supplementary Material
SNVs, insertions and deletions that are shared, or specific to the EC-19 and Galaxy workflows
| SNVs | Insertions | Deletions | |||||||
|---|---|---|---|---|---|---|---|---|---|
| EC-19 | Shared | Galaxy | EC-19 | Shared | Galaxy | EC-19 | Shared | Galaxy | |
| Illumina | 16 | 1149 | 49 | 0 | 4 | 0 | 3 | 50 | 0 |
| ONT | 34 | 1919 | 14 | 0 | 3 | 44 | 10 | 69 | 4 |
Fig. 2.Distribution of all SNV mutations detected by the EC-19 and Galaxy workflows, displaying depth of coverage (DP) (X axis) and allele frequency (AF) (Y axis). SNVs detected by both workflows for Illumina (left panel) and ONT (right panel) data are shown in gray (detected and called by both workflows) and black (detected by both but above 0.5 AF cutoff in only one workflow). SNV mutations detected by only one of the workflows yet not detected in the other are colored in red. A dotted vertical line is drawn at 100× Depth of coverage to separate SNVs with high and low depth of coverage. All the AFs and SNVs are based on EC-19 mapping, except for Galaxy specific SNVs