| Literature DB >> 35437474 |
Renato R M Oliveira1,2, Tatianne Costa Negri1, Gisele Nunes1, Inácio Medeiros3, Guilherme Araújo3,4, Fabricio de Oliveira Silva1, Jorge Estefano Santana de Souza3,4, Ronnie Alves1,5, Guilherme Oliveira1.
Abstract
Motivation: Since the identification of the novel coronavirus (SARS-CoV-2), the scientific community has made a huge effort to understand the virus biology and to develop vaccines. Next-generation sequencing strategies have been successful in understanding the evolution of infectious diseases as well as facilitating the development of molecular diagnostics and treatments. Thousands of genomes are being generated weekly to understand the genetic characteristics of this virus. Efficient pipelines are needed to analyze the vast amount of data generated. Here we present a new pipeline designed for genomic analysis and variant identification of the SARS-CoV-2 virus.Entities:
Keywords: Annotation; Covid19; Genomics; Pipeline; Sarscov2; Variant identification; Virus
Year: 2022 PMID: 35437474 PMCID: PMC9013232 DOI: 10.7717/peerj.13300
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1The PipeCoV workflow.
The figure shows all steps and software used by PipeCoV. (A) Preprocessing, trimming, and removal of low-quality reads. (B) Mapping reads using a reference genome. (C) D. novo genome assembly. (D) Reference-based assembly. (E) Gap closing. (F) Genome annotation using PROKKA Database. (G) Variant identification by Pangolin software.
Figure 2(A) Consensus sequence length generated by the pipelines. (B) The average genome coverage of the consensus sequence generated by the pipelines.
Figure 3(A) The number of consensus sequences generated by the pipelines that are considered high-quality consensus, according to Briones et al. (2020). (B) The X-axis represents all the 120 samples ordered by descending number of N’s obtained by PipeCoV. The Y-Axis represents the number of N’s. Colors represent different pipelines.
Figure 4The number of N’s in the consensus sequences generated by each pipeline.
Figure 5Variants identified by the pipelines.
The figure shows that all pipelines identified almost the same variants.