Literature DB >> 34554000

Whole-Genome Sequences of SARS-CoV-2 Isolates from Ethiopian Patients.

Dawit Hailu Alemayehu1, Bethlehem Adnew1, Fekadu Alemu1, Dessalegn Abeje Tefera1, Tamrayehu Seyoum1, Getachew Tesfaye Beyene1, Tesfaye Gelanew1, Abel Abera Negash1, Markos Abebe1, Adane Mihret1, Abebe Genetu Bayih1, Alemseged Abdissa1, Andargachew Mulu1.   

Abstract

Three complete severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from Ethiopian patients were compared with deposited global genomes. Two genomes belonged to genetic group 20A/B.1/GH, and the other belonged to genetic group 20A/B.1.480/GH. Enhancing genomic capacity is important to investigate the transmission and to monitor the evolution and mutational patterns of SARS-CoV-2 in this country.

Entities:  

Year:  2021        PMID: 34554000      PMCID: PMC8459669          DOI: 10.1128/MRA.00721-21

Source DB:  PubMed          Journal:  Microbiol Resour Announc        ISSN: 2576-098X


ANNOUNCEMENT

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that emerged in Wuhan, China, is an RNA virus that belongs to the genus Betacoronavirus, in the family Coronaviridae (1). Like most RNA viruses, SARS-CoV-2 is expected to display a relatively high rate of genetic mutations, which may influence viral transmission and pathogenesis, enable escape from host defenses, and negatively affect the efficacy of vaccines and molecular diagnostic tools (2). Thus, enhancing genomic capacity is important to investigate the transmission and to monitor the evolution and mutational patterns of SARS-CoV-2 in this country. Here, we report three SARS-CoV-2 genome sequences using Illumina NextSeq sequencing technology. The protocol was ethically approved by the ALERT/AHRI Research Ethics Committee. Nasopharyngeal swab samples were collected from subjects with suspected SARS-CoV-2 following routine surveillance and diagnostic procedures. The first two samples (GenBank accession numbers MZ172407 and MZ172408) were collected from a hospital setting, and the last one (GenBank accession number MZ172409) was collected from a health center. Nucleic acid was extracted using a Da An Gene extraction kit (catalog number DA0591) following the manufacturer’s protocol. The extracted RNA was reverse transcribed and SARS-CoV-2 was detected using the BGI real-time fluorescent reverse transcription (RT)-PCR kit (catalog number MFG030010). Positive RNA samples were selected for sequencing based on their threshold cycle (C) values (C values of <24). The RNA was concentrated using SPRI magnetic beads, and reverse-transcribed RNA was sequenced using the shotgun metagenomic workflow outlined by Illumina (3). In short, 200 to 450 ng of input RNA was subjected to ribodepletion, fragmentation, first- and second-strand cDNA synthesis, adenylation, adapter ligation, and amplification, according to the TruSeq stranded total RNA protocol. The prepared libraries were loaded on the NextSeq 500 system for a paired-end 2 × 76-bp sequencing run. The base call (BCL) files from the NextSeq 500 system were demultiplexed and converted to FASTQ files using Illumina bcl2fastq2 software v2.20. Quality-checked paired-end FASTQ files (4) were trimmed using Trimmomatic v0.36 (5). Taxonomic classification was performed using Kraken2 (6), and the host reads were removed using Bowtie2 (7) and SAMtools (8) with the human reference genome (GRCh38) (ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes) to yield unmapped reads. The reads with the host reads removed were aligned to the complete genome of SARS-CoV-2 Wuhan-Hu-1 (GenBank accession number NC_045512.2) using BWA (9), and SAMtools was used for intermediate file conversion and summary. Ivar consensus sequences were used as genome sequences. Variants were called using Snippy (https://github.com/tseemann/snippy) and Nextclade. Local Nextstrain/Nextclade v0.13.0 was also implemented for clade assignment and variant annotation. The phylogenetic tree was generated with Nextstrain/Augur using its default subsampling scheme and focusing on country Ethiopia, region Africa, where 1,960 samples were subsampled between December 2019 and February 2021; the tree was visualized using the Nextstrain/Auspice tool. Lineage assignments were made using the Phylogenetic Assignment of Named Global Outbreak Lineages (Pangolin) v1.07 tool (https://github.com/hCoV-2019/pangolin) and clades from GISAID (https://www.gisaid.org). All tools were run with default parameters unless otherwise specified. There is 99.68 to 99.92% sequence identity using BLAST between the full genome sequences of the isolates and the reference strain at the nucleotide level and 99.94% identity at the amino acid level. All three isolates have 99.97 to 100% coverage, with 100% coverage of the coding region. The genome sizes were 29,860, 29,856, and 29,871 bp, with GC contents of 53%, 51%, and 49%, for isolates MZ172407, MZ172408, and MZ172409, respectively. Similarly, the average coverage depths were 2,56.7× (range, 1× to 3,183×), 23.8× (range, 1× to 1,110×), and 1,288.3× (range, 4× to 8,002×) for the isolates MZ172407, MZ172408, and MZ172409, respectively. Phylogenomic analysis showed that two of the detected SARS-CoV-2 isolates (isolates MZ172408 and MZ172409) belonged to lineage B.1 of the Pangolin lineage, sharing the most common recent ancestor with viruses detected in Germany (Fig. 1). One of the isolates (isolate MZ172407) was found to belong to lineage B.1.480. According to Nextstrain (10), the phylogenetic tree revealed that all of the isolates belonged to Nextstrain clade 20A and GISAID clade GH.
FIG 1

Phylogenetic analysis of representative SARS-CoV-2 genome sequences, including the three current isolates. Available genomes were retrieved from GISAID (https://www.gisaid.org) in January 2021. Sequences with low quality (i.e., ambiguous bases) were discarded. The figure was created using Nextstrain.

Phylogenetic analysis of representative SARS-CoV-2 genome sequences, including the three current isolates. Available genomes were retrieved from GISAID (https://www.gisaid.org) in January 2021. Sequences with low quality (i.e., ambiguous bases) were discarded. The figure was created using Nextstrain. Mutations among the three SARS-CoV-2 strains were identified throughout the whole genome, with reference to the SARS-CoV-2 Wuhan strain (GenBank accession number NC_045512.2), and marked nucleotide differences in some positions were found, as shown in Table 1. In general, several synonymous and nonsynonymous mutations with pyrimidine exchanges (C to T or T to C) (55%) were observed in all three genomes (Table 1). Currently, we are sequencing more genomes to further investigate the transmission and to monitor the evolution and mutational patterns of SARS-CoV-2 in this country.
TABLE 1

Alterations of the SARS-CoV-2 genome

Amino acid positionaBase
GenebProteincAmino acid substitutionMutation type
ReferenceAlternativeIsolate MZ172407Isolate MZ172408Isolate MZ172409
140CTCCT5′-UTRNANANoncoding
241CTTTT5′-UTRNANANoncoding
875CTTCCORF1abORF1ab polyprotein/NSP2L204FMissense
936CTCCTORF1abT224IMissense
2300TCTCCORF1aORF1ab polyprotein/NSP2F679LMissense
2416CTTTTORF1abNSP2Missense
2445CTTCTORF1abNSP2T727IMissense
3037CTTTTORF1abNSP3Missense
3643AGTTGORF1abSynonymous
4071CTTCCORF1abNSP3T1269ISynonymous
4280GAAGGORF1abORF1ab polyprotein/NSP2V1339IMissense
7534TCCTTORF1abORF1ab polyprotein/NSP2Synonymous
9724CTTCCORF1abORF1ab polyprotein/NSP2Synonymous
10904AGGAAORF1abORF1ab polyprotein/NSP2S3547GMissense
11758CTCTCORF1abORF1ab polyprotein/NSP2Synonymous
12076CTTCCORF1abORF1ab polyprotein/NSP2Synonymous
14022CTTCCORF1abORF1ab polyprotein/NSP2Synonymous
14407dCTCCTORF1abSynonymous
14408CTTTTORF1abORF1ab polyprotein/NSP12P314LMissense
14925CTCCTORF1abSynonymous
15384GTTGGORF1abORF1ab polyprotein/NSP12L639FMissense
16269GAAGGORF1abORF1ab polyproteinMissense
16647GTGTTORF1abSynonymous
21619ATTAASSpike proteinMissense
21721CTCTCSSpike proteinMissense
21796GTGGTSSpike proteinMissense
21800GTGGTSSpike proteinD80Y
23063ATTAASSpike proteinN501YSynonymous
23403AGGGGSSpike proteinD614GSynonymous
24070ACAACSSpike proteinQ836HMissense
25249GTTGGSSpike proteinM1229IMissense
25563GTTTTORF3aQ57HMissense
25844GTTGGORF3aT151ISynonymous
25904CTCTCORF3aORF3a proteinS171LMissense
26416GCCGGEE proteinV58LSynonymous
27484TCCTTORF7aORF7a proteinSynonymous
27546TCTCTORF6ORF 6 protein
27667GAGGAORf7aE92KUpstream
28854CTTCCNN proteinS194LSynonymous
28869CTCTTNN proteinP199LMissense
29550CTCTTNN proteinNA
29702GAAGG3′-UTRNA

Variants were called using Snippy (https://github.com/tseemann/snippy) and Nextclade.

UTR, untranslated region; ORF, open reading frame.

NA, not applicable; NSP, nonstructural protein.

Multiple-nucleotide polymorphism (CC to TT).

Alterations of the SARS-CoV-2 genome Variants were called using Snippy (https://github.com/tseemann/snippy) and Nextclade. UTR, untranslated region; ORF, open reading frame. NA, not applicable; NSP, nonstructural protein. Multiple-nucleotide polymorphism (CC to TT).

Data availability.

The coding-complete sequences were deposited in GenBank with accession numbers MZ172407, MZ172408, and MZ172409 and SRA accession numbers SAMN20692030, SAMN20692031, and SAMN20692032 and in GISAID (https://www.gisaid.org) with accession numbers EPI_ISL_2970353, EPI_ISL_2970354, and EPI_ISL_2970355 for Ethiopia/AHRI-01/2020, Ethiopia/AHRI-02/2020, and Ethiopia/AHRI-03/2020, respectively.
  8 in total

1.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

2.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

3.  Nextstrain: real-time tracking of pathogen evolution.

Authors:  James Hadfield; Colin Megill; Sidney M Bell; John Huddleston; Barney Potter; Charlton Callender; Pavel Sagulenko; Trevor Bedford; Richard A Neher
Journal:  Bioinformatics       Date:  2018-12-01       Impact factor: 6.931

4.  A new coronavirus associated with human respiratory disease in China.

Authors:  Fan Wu; Su Zhao; Bin Yu; Yan-Mei Chen; Wen Wang; Zhi-Gang Song; Yi Hu; Zhao-Wu Tao; Jun-Hua Tian; Yuan-Yuan Pei; Ming-Li Yuan; Yu-Ling Zhang; Fa-Hui Dai; Yi Liu; Qi-Min Wang; Jiao-Jiao Zheng; Lin Xu; Edward C Holmes; Yong-Zhen Zhang
Journal:  Nature       Date:  2020-02-03       Impact factor: 49.962

5.  Improved metagenomic analysis with Kraken 2.

Authors:  Derrick E Wood; Jennifer Lu; Ben Langmead
Journal:  Genome Biol       Date:  2019-11-28       Impact factor: 17.906

6.  Genetic Diversity of SARS-CoV-2 over a One-Year Period of the COVID-19 Pandemic: A Global Perspective.

Authors:  Miao Miao; Erik De Clercq; Guangdi Li
Journal:  Biomedicines       Date:  2021-04-11

7.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

8.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.