Literature DB >> 32497295

Comment on "Genetic variants and source of introduction of SARS-CoV-2 in South America".

Pedro E Romero1.   

Abstract

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32497295      PMCID: PMC7300916          DOI: 10.1002/jmv.26122

Source DB:  PubMed          Journal:  J Med Virol        ISSN: 0146-6615            Impact factor:   20.693


× No keyword cloud information.
To the Editor, I read with great interest a recent study by Poterico and Mestanza who described mutations in 30 SARS‐CoV‐2 genomes from South American countries (Argentina, Brazil, Chile, Colombia, Ecuador, and Peru). Next‐generation sequencing (NGS) technologies have accelerated genomic and metagenomic studies providing affordable tools to obtain pathogen genomes and improving diagnosis and surveillance efforts. However, many downstream analyses after assembling the genomes are impacted by low‐quality sequences and sequence contamination, which could lead to wrong conclusions. The authors mentioned poor quality of some viral genomes as a limitation in their study, along with other issues such as the lack of epidemiological metadata, possible primer design variations, and limited number of South American samples. To overcome this problem the authors used high coverage complete genomes (>29 000 bp) from GISAID (https://www.gisaid.org/). In addition, they mentioned that they did not used genomes from Colombia and Ecuador in their phylogenetic analyses due to poor quality of their sequences. Although GISAID provides information on the genome length and coverage, it does not provide raw sequence reads, which are important to validate the observed mutations. As an alternative, it is possible to use another genomic database, for instance, the Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/sra), which is the largest publicly available repository of NGS data. Nevertheless, most of the public genome sequences are stored in GISAID (25 369) in comparison to SRA (4904) or GenBank (3812), data recovered on 15 May 2020. This could be problematic if independent research groups try to find similar mutations using public data. I looked for the South American SARS‐CoV‐2 genome sequences in the SRA database and found only three records from two countries, Brazil (SRR11365239 and SRR11365239) and Peru (SRR11508492). Raw reads from Brazil were obtained using Ion Torrent sequencing technology and did not /ccorrespond to any of the 92 Brazilian records on GISAID. On the contrary, the Peruvian sequence on GISAID (EPI_ISL_415787) is identical to the GenBank record (MT263074.1). The raw reads of this assembly were obtained using Illumina technology and they are stored in the SRA database (SRR11508492). I downloaded the raw reads form the Peruvian genome (2 359 909 paired end reads) and trimmed the reads in Trimmomatic 0.39 using the following parameters: ILLUMINACLIP:NexteraPE‐PE.fa:2:30:10:2:keepBothReads LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:30. After that, I conducted a de novo assembly using SPAdes 3.13. In addition, I mapped the trimmed reads to the reference genome (NC_045512.2) using Bowtie2. Then, I aligned the reference genome against the Peruvian genome and the new obtained reassembly using MAFFT. The alignment and the mapping were visualized in Geneious R7. Both, the alignment and mapping results can be obtained from FigShare. The de novo reassembly and mapped reads provided independent evidence to validate mutations reported by Poterico and Mestanza based on the Peruvian SARS‐CoV‐2 genome. First, mutation N2894D in nsp4 (table 1 in ref. 1) corresponding to a change from A to G in the nucleotide position 8945 occurs only in few reads (4 out of 33 mapped reads) and is not considered in the consensus sequence in the de novo reassembly (Figure 1A). Thus, we should be very careful in considering this mutation as a real variant. Second, the authors reported a non‐synonymous mutation E1207E in the S gene (table 1 in ref. 1). This corresponds to a change from T to C in the nucleotide position 24022. Again, this mutation occurred only in 4 of 29 mapped reads and it is not present in the consensus sequence (Figure 1B).
Figure 1

A portion of the trimmed reads from the Peruvian SARS‐CoV‐2 sequence (SRR11508492) compared against the reference genome (NC_045512.2). A, Position 8945, change from A to G. B, Position 24022, change from T to G. Only few reads showed mutations described in Ref. 1

A portion of the trimmed reads from the Peruvian SARS‐CoV‐2 sequence (SRR11508492) compared against the reference genome (NC_045512.2). A, Position 8945, change from A to G. B, Position 24022, change from T to G. Only few reads showed mutations described in Ref. 1 This evidence supports the necessity of using original sequence reads to verify if the previously described mutations in SARS‐CoV‐2 genomes are accurate, assembly artifacts or sequencing errors. Erroneous conclusions such as the presence of high mutation rates, unreal evolutionary relationships among the lineages, and flawed target sites for vaccines and antiviral drugs, could be drawn from problematic data and would impede the urgent development of more initiatives to respond against SARS‐CoV‐2. Additionally, the authors performed phylogenetic analyses but did not mention if they performed analyses of statistical branch support, namely, bootstrap replications. These results could also provide a better assessment of the significance of the described groups or clades in their work.
  8 in total

1.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

2.  Bootstrap confidence levels for phylogenetic trees.

Authors:  B Efron; E Halloran; S Holmes
Journal:  Proc Natl Acad Sci U S A       Date:  1996-11-12       Impact factor: 11.205

3.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

4.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

5.  Near-Complete Genome Sequence of a 2019 Novel Coronavirus (SARS-CoV-2) Strain Causing a COVID-19 Case in Peru.

Authors:  Carlos Padilla-Rojas; Priscila Lope-Pari; Karolyn Vega-Chozo; Johanna Balbuena-Torres; Omar Caceres-Rey; Henri Bailon-Calderon; Maribel Huaringa-Nuñez; Nancy Rojas-Serrano
Journal:  Microbiol Resour Announc       Date:  2020-05-07

Review 6.  Towards a genomics-informed, real-time, global pathogen surveillance system.

Authors:  Jennifer L Gardy; Nicholas J Loman
Journal:  Nat Rev Genet       Date:  2017-11-13       Impact factor: 53.242

7.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

8.  Genetic variants and source of introduction of SARS-CoV-2 in South America.

Authors:  Julio A Poterico; Orson Mestanza
Journal:  J Med Virol       Date:  2020-07-19       Impact factor: 20.693

  8 in total
  1 in total

Review 1.  Characterization of SARS-CoV-2 different variants and related morbidity and mortality: a systematic review.

Authors:  SeyedAhmad SeyedAlinaghi; Pegah Mirzapour; Omid Dadras; Zahra Pashaei; Amirali Karimi; Mehrzad MohsseniPour; Mahdi Soleymanzadeh; Alireza Barzegary; Amir Masoud Afsahi; Farzin Vahedi; Ahmadreza Shamsabadi; Farzane Behnezhad; Solmaz Saeidi; Esmaeil Mehraeen
Journal:  Eur J Med Res       Date:  2021-06-08       Impact factor: 2.175

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.