Literature DB >> 33347989

Different mutations in SARS-CoV-2 associate with severe and mild outcome.

Ádám Nagy1, Sándor Pongor2, Balázs Győrffy3.   

Abstract

INTRODUCTION: Genomic alterations in a viral genome can lead to either better or worse outcome and identifying these mutations is of utmost importance. Here, we correlated protein-level mutations in the SARS-CoV-2 virus to clinical outcome.
METHODS: Mutations in viral sequences from the GISAID virus repository were evaluated by using "hCoV-19/Wuhan/WIV04/2019" as the reference. Patient outcomes were classified as mild disease, hospitalization and severe disease (death or documented treatment in an intensive-care unit). Chi-square test was applied to examine the association between each mutation and patient outcome. False discovery rate was computed to correct for multiple hypothesis testing and results passing FDR cutoff of 5% were accepted as significant.
RESULTS: Mutations were mapped to amino acid changes for 3,733 non-silent mutations. Mutations correlated to mild outcome were located in the ORF8, NSP6, ORF3a, NSP4, and in the nucleocapsid phosphoprotein N. Mutations associated with inferior outcome were located in the surface (S) glycoprotein, in the RNA dependent RNA polymerase, in ORF3a, NSP3, ORF6 and N. Mutations leading to severe outcome with low prevalence were found in the ORF3A and in NSP7 proteins. Four out of 22 of the most significant mutations mapped onto a 10 amino acid long phosphorylated stretch of N indicating that in spite of obvious sampling restrictions the approach can find functionally relevant sites in the viral genome.
CONCLUSIONS: We demonstrate that mutations in the viral genes may have a direct correlation to clinical outcome. Our results help to quickly identify SARS-CoV-2 infections harboring mutations related to severe outcome.
Copyright © 2020. Published by Elsevier Ltd.

Entities:  

Keywords:  SARS-CoV-2; death; genome; high-risk; mutation; next generation sequencing

Mesh:

Substances:

Year:  2020        PMID: 33347989      PMCID: PMC7755579          DOI: 10.1016/j.ijantimicag.2020.106272

Source DB:  PubMed          Journal:  Int J Antimicrob Agents        ISSN: 0924-8579            Impact factor:   5.283


Introduction

There are seven human coronaviruses including MERS, Human-HKU-1, Human NL63, Human 229E, Human OC43, SARS-CoV, and SARS-CoV-2. The natural host of this latest RNA virus is the Chinese rufous horseshoe bat (Rhinolophus sinicus) and the transfer to human initiated the ongoing COVID-19 outbreak at the end of 2019 [1]. Some studies estimated a low mortality rate of SARS-CoV-2 in the overall population[2,3], while other investigators reported mortality percentages up to 26% when the virus strikes a critically ill patient [4]. Overall, based on current data of the WHO (October 2020), the mortality rate is around 2.7%. The linear genome of the SARS-CoV-2 virus has 29,903 bases and harbors 25 genes [5], the reference sequence is accessible in GeneBank using the accession number MN908947. Phylogenetic analysis of SARS-CoV-2 genomes show three variants termed A, B and C which have different distribution when comparing sequences from Asia, Europe or the Americans [6]. The viral genes encode among others an envelope protein, an RNA dependent RNA polymerase, a surface glycoprotein, an exonuclease, a methyltransferase, and 11 nonstructural proteins. Some of these are within the virus, but others, including the spike glycoprotein, the membrane glycoprotein, and the envelope protein are on the viral surface. In theory, any functional or structural viral gene can have an effect on the efficiency of a virus and both mutations [7] or alteration in the expression [8] can increase pathogenicity. It is important to emphasize that even the untranslated regions of a coronavirus can have important role in the viral replication as has been previously demonstrated for the 3’ untranslated region [9]. SARS-CoV-2 is no different compared to other viruses and new mutations continually pop up with its spread [10]. Some mutations uncovered in the SARS-CoV-2 virus lead to a novel RNA-dependent-RNA polymerase variant [11], while other genomic changes drive the evolution and the spread of the virus by resulting in a more transmissible form of the virus [12]. Mutations potentially making the virus more transmissible have a significant evolutionary advantage as has been demonstrated for the SARS-CoV-2 variant with spike G614 which mainly replaced D614 between February and July 2020 [13]. In this context, the most important question is to identify viral mutations leading to different patient outcomes. Mutations resulting in a mild disease could facilitate the spread of the virus and thereby maintain the outbreak. Other mutations leading to a more severe disease need immediate attention to prevent detrimental outcomes. Here, our goal was to identify and rank mutations associated with altered patient outcome by simultaneously correlating outcomes to all mutations across a large cohort of patients.

Materials and Methods

Data source

All available SARS-CoV-2 (taxid: 2697049) viral nucleic acid sequences were downloaded from the GISAID virus repository (https://www.gisaid.org/). The sequences were acquired in FASTA format. Those viral sequences were selected where the entire viral nucleic acid sequence was published. A second filtering was executed to include only virus genomes with available patient follow-up status.

Mapping of mutations to viral genes

The mutations were evaluated using the CoVsurver (https://corona.bii.a-star.edu.sg). To achieve this, the viral sequences in .FASTA format were used as the query and the “hCoV-19/Wuhan/WIV04/2019” was used as the reference. The analysis was run by using batches of 1000 samples in one run. Protein mutations do not have overlaps, and the genomic boundaries of the various proteins in the WIV04 reference genome are displayed in Table 1 .
Table 1

Genomic boundaries of the SARS-CoV-2 proteins in the WIV04 reference genome.

Protein nameCodeGenomic positions
Envelope (E) proteinE26245-26472
Membrane glycoproteinM26523-27191
Nucleocapsid phosphoproteinN28274-29533
ORF3a proteinNS325393-26220
ORF6 proteinNS627202-27387
ORF7a proteinNS7a27394-27759
ORF7bNS7b27756-27887
ORF8 proteinNS827894-28259
NSP1NSP1266-805
NSP10NSP1013025-13441
NSP11NSP1113442-13480
RNA dependent RNA polymeraseNSP1213442-13468|13468-16236
helicaseNSP1316237-18039
3′-to5′ exonucleaseNSP1418040-19620
endoRNAseNSP1519621-20658
2′O'ribose methyltransferaseNSP1620659-21552
NSP2NSP2806-2719
NSP3NSP32720-8554
NSP4NSP48555-10054
NSP5NSP510055-10972
NSP6NSP610973-11842
NSP7NSP711843-12091
NSP8NSP812092-12685
NSP9NSP912686-13024
Surface (S) glycoproteinSpike21563-25384
Genomic boundaries of the SARS-CoV-2 proteins in the WIV04 reference genome.

Clinical classification

As the patient samples were annotated with all together more than sixty different outcome classification, we had to coerce these into three major categories. Patients who were “asymptomatic”, were “not hospitalized”, had a “mild” disease, were at “home” were all assigned to have a “mild” disease. Also patients who were treated at outpatient departments, were quarantined or were treated by the physician network were classified as “mild”. Patients who definitely needed medical care were assigned to the “hospitalized” group. These include those “hospitalized”, “inpatient”, “discharged”, “released”, and “recovered”. In addition, combinations of the annotations which included any of these were also assigned into this cohort (e.g. “initially hospitalized” or “to be hospitalized”). Finally, patients with detrimental outcome were allocated to the “severe” cohort. These include those “deceased”, those with a “severe” disease, those who entered “intensive care units”. Also any combination of these with other annotations (e.g. “hospitalized / ICU”) were also added to this category.

Statistical computation

All data processing and statistical analysis steps were performed in the R statistical environment v 3.6.3. Data processing was performed on 18th October 2020. Chi-square test was applied to examine the association between each mutation and patient status data. False discovery rate using the Benjamini-Hochberg method was computed to correct for multiple hypothesis testing and only results passing a FDR cutoff of 5% were accepted as significant.

Results

Dataset

All together 149,061 SARS-CoV-2 viral nucleic acid sequences were available, and 147,960 of these included the entire viral nucleic acid sequence. Clinical data was available for 7,702 patients, and 4,566 of these had also follow-up data. This is a small fraction of the total data which implies that our findings could contain a sampling bias. When looking on the clinical parameters of these patients, 58.6% were male and 36.5% were female (remaining samples did not had this information). The geographical origin of the samples covers the entire globe: 4.2% were from Africa, 46% from Asia, 26.8% from Europe, 12.4% from North America and 10.2% from South America. Collection of the samples happened between 30.12.2019 and 14.9.2020. Of all patients with a follow-up 708 had a mild disease, 3,306 had to be hospitalized and 552 patients had a severe disease.

Mutation rate

All together 3,733 different mutations affecting the protein amino acid sequence were identified, and 937 of these mutations were not present in samples with clinical follow-up. When looking on all mutations, we have identified on average 4.7 mutations in each sample. As an internal control to validate any potential bias in the mutation prevalence related to patient proportions we computed the average numbers of mutation in each clinical outcome cohort and found similar values (mean in those with mild, hospitalized, and severe outcome were 4.8, 4.6, and 5.1, respectively). When analyzing the correlation to clinical outcome across all mutations, 79 mutations reached statistical significance at FDR<5%. The complete list of these mutations with sample numbers in each cohort is displayed in Supplemental Table 1 and mutation data for each investigated patient is provided in Supplemental Table 2.

Mutations related to mild disease

In order to concentrate only on mutations with a clinical relevance, we selected only those mutations which were present in at least 2% of the samples (this corresponds to a cutoff of at least 91 patient samples with a mutation). When looking at mutation related to mild outcome, only five mutations passed all filtering criteria - L84S in the ORF8 protein, L37F in the NSP6 protein, G196V in the ORF3a protein, F308Y in the NSP4 protein, and the S197L mutation in the nucleocapsid phosphoprotein. The complete list as well as distribution among patient samples is provided in Table 2 .
Table 2

SARS-CoV-2 mutations correlated to mild outcome in 4,566 patients with available genomic and follow-up information were found in five distinct genes.

Protein nameProtein mutationN wild type + mild outcomeN wild type + hospitalizedN wild type + severe outcomeN mutant + mild outcomeN mutant + hospitalizedN mutant + severe outcomeChi squared test p value
ORF8 proteinL84S4773085536231221162.3E-101
NSP6L37F5642816537144490151.1E-18
ORF3a proteinG196V53432475451745973.7E-137
NSP4F308Y53332415451756572.2E-133
Nucleocapsid phosphoproteinS197L53332415451756572.2E-133
SARS-CoV-2 mutations correlated to mild outcome in 4,566 patients with available genomic and follow-up information were found in five distinct genes.

Mutations associated with severe disease

When searching for mutations related to hospitalization or to severe outcome, we used the above filter of including only mutations present in at least 2% of the samples. All together 15 mutations passed these criteria. These originated in seven genes: L54F, D614G and V1176F in the surface (S) glycoprotein, A97V and P323L in the RNA dependent RNA polymerase, Q57H and G251V in the ORF3a protein, P13L, S194L, R203K, G204R and I292T in the nucleocapsid phosphoprotein, I33T in the ORF6 protein, S1197R and T1198K mutations in the NSP3 protein. In order not to miss mutations leading to deadly outcome we also included all mutations which were present in at least 10 patients with severe outcome. This additional analysis delivered two further mutations, the L71F in the NSP7 protein and the S253P mutation in the ORF3A gene. These were linked to 53 and 11 severe outcomes after being spotted in 60 (L71F) and 11 (NSP7) patients, respectively. Interestingly, the overall prevalence of mutations leading to mild outcome (n=1,851) was smaller than the prevalence of those leading to worse outcome (n=11,725), but at the same time the proportion of patients with mild outcome in the entire cohort was also smaller (18.3%). Nevertheless, a significant proportion of the mutations (n=7,875) were not significantly correlated to any clinical outcome. The complete list of all mutations correlated to severe disease is presented in Table 3 .
Table 3

SARS-CoV-2 mutations correlated to hospitalization and severe outcome in 4,566 patients with available genomic and follow-up information were found in seven distinct genes.

Protein nameProtein mutationN wild type + mild outcomeN wild type + hospitalizedN wild type + severe outcomeN mutant + mild outcomeN mutant + hospitalizedN mutant + severe outcomeChi squared test p value
Surface (S) glycoproteinD614G3829808332623264691.02E-52
RNA dependent RNA polymeraseP323L39410284331422785091.07E-72
ORF3a proteinQ57H60224483581068581941.09E-15
Nucleocapsid phosphoproteinR203K59025803091187262432.12E-33
Nucleocapsid phosphoproteinG204R59425863111147202411.21E-33
RNA dependent RNA polymeraseA97V65029965465831064.1E-10
Nucleocapsid phosphoproteinP13L64830095476029755.54E-10
NSP3T1198K65430105475429655.21E-10
Nucleocapsid phosphoproteinS194L70530674963239562.89E-13
NSP3S1197R7013151552715508.31E-11
ORF3a proteinG251V7003182546812461.95E-05
Surface (S) glycoproteinV1176F7083298443081095.2E-162
Nucleocapsid phosphoproteinI292T7033245515561371.06E-13
Surface (S) glycoproteinL54F706321454329290.000147
ORF6 proteinI33T7033247516559362.34E-13
NSP7L71F708329949907535.53E-73
ORF3a proteinS253P708330654100113.88E-18
SARS-CoV-2 mutations correlated to hospitalization and severe outcome in 4,566 patients with available genomic and follow-up information were found in seven distinct genes.

Discussion

We have simultaneously analyzed the correlation between patient outcome and all identified mutations resulting in amino acid sequence changes in the viral proteins. Strikingly, we have not only found a significant number of mutations, but some of these were correlated to mild diseases while other had a significant correlation to severe outcome. Nucleocapsid phosphoprotein was the protein with most significant mutations linked to both mild and severe patient outcome. All these changes are at a close genomic positions, G196V and S197L resulting in mild outcome and R203K, G204R, and S194L resulting in inferior outcome. Interestingly, when comparing the S197L (71% of mild outcome) to the S194L (1% chance of a mild outcome) variants, the relative risk was extremely high. Interestingly, the majority of the nucleocapsid phosphoprotein mutations were mapped to a small stretch of amino acids from position 194 to 204. This region coincides with the phosphorylated “RS-motif” [14] which maps onto the intrinsically unstructured serine rich region 181-213 of the protein [15]. Phosphorylation of this site is known to play important roles such as recruitment of host RNA helicase DDX1 which facilitates template readthrough and enables longer subgenomic mRNA synthesis (https://www.uniprot.org/uniprot/P59595). This observation needs further follow-up – especially because the nucleocapsid phosphoprotein is one of the potential drug targets against SARS-CoV-2 [16]. Overall, we have observed more mutations in the structural proteins (spike and nucleocapsid phosphoprotein) than in non-structural proteins. Of note, destabilization mutations in nonstructural proteins were suggested to represent a potential mechanism differentiating SARS-CoV-2 from SARS-CoV [17]. It remains an open question how our results will relate to the recent observation of different molecular architectures of the first and second waves of infection [18]. Researchers from the University of Washington compared two dominant clades of virus in circulation and have observed no difference in outcome when comparing these in patients sufficiently ill to warrant testing for virus [19]. Previously, a 382-nucleotide deletion (∆382) in the open reading frame 8 was associated with a milder infection [20]. In another recent study, a set of common deletions were identified in the spike protein of SARS-CoV-2 [21]. Other deletions were also validated by RT-PCR [22]. However, due to missing data about insertions and deletions in GISAID we could not evaluate a potential link between deletions and patient outcome. Importantly, our findings might contain a sampling bias, since only a fraction of the available genomes had patient outcome data. On the other hand, four out of 22 potentially significant mutations (listed in Tables 2 and 3) map to an about 10 amino acid long, functionally important region of the nucleocapsid phosphoprotein which leads us to believe that the current statistical approach can reveal functionally important sites within the COVID 19 genome. The main limitation of our study results from the database used. Information was retrieved from GISAID, a repository that contains only general information about patient outcome. The patient treatment protocols resulting in designation into “mild”, “hospitalized” and “severe” cohorts may significantly depend on the country and even on the region where patients were managed. We could also not include potential confounding factors including age, comorbidities and treatment against COVID-19 in our analysis. Coronaviruses have generally a stable genome which changes very little over time [23]. A fundamental question of SARS-CoV-2 research is whether or not the virus can get weaker or stronger with time. Our findings suggest that there are mutations that can support either of these changes so the theoretical possibility is there that in the future the viral effect will shift towards milder or more severe patient outcomes.

Funding

The research was financed by the 2018-2.1.17-TET-KR-00001 and KH-129581 grants and by the Higher Education Institutional Excellence Programme of the Ministry for Innovation and Technology (MIT) in Hungary, within the framework of the Bionic thematic programme of the Semmelweis University as well as by OTKA grant 12065 provided by MIT, Hungary to Pázmány University.

Ethical Approval

Not required

Declaration of Competing Interests

None
  43 in total

1.  COVIDOUTCOME-estimating COVID severity based on mutation signatures in the SARS-CoV-2 genome.

Authors:  Ádám Nagy; Balázs Ligeti; János Szebeni; Sándor Pongor; Balázs Gyrffy
Journal:  Database (Oxford)       Date:  2021-05-08       Impact factor: 3.451

2.  Variants in SARS-CoV-2 associated with mild or severe outcome.

Authors:  Jameson D Voss; Martin Skarzynski; Erin M McAuley; Ezekiel J Maier; Thomas Gibbons; Anthony C Fries; Richard R Chapleau
Journal:  Evol Med Public Health       Date:  2021-06-27

3.  Cytokine Profiles Associated With Acute COVID-19 and Long COVID-19 Syndrome.

Authors:  Maria Alice Freitas Queiroz; Pablo Fabiano Moura das Neves; Sandra Souza Lima; Jeferson da Costa Lopes; Maria Karoliny da Silva Torres; Izaura Maria Vieira Cayres Vallinoto; Carlos David Araújo Bichara; Erika Ferreira Dos Santos; Mioni Thieli Figueiredo Magalhães de Brito; Andréa Luciana Soares da Silva; Mauro de Meira Leite; Flávia Póvoa da Costa; Maria de Nazaré do Socorro de Almeida Viana; Fabíola Brasil Barbosa Rodrigues; Kevin Matheus Lima de Sarges; Marcos Henrique Damasceno Cantanhede; Rosilene da Silva; Clea Nazaré Carneiro Bichara; Ana Virgínia Soares van den Berg; Adriana de Oliveira Lameira Veríssimo; Mayara da Silva Carvalho; Daniele Freitas Henriques; Carla Pinheiro Dos Santos; Juliana Abreu Lima Nunes; Iran Barros Costa; Giselle Maria Rachid Viana; Francisca Regina Oliveira Carneiro; Vera Regina da Cunha Menezes Palacios; Juarez Antonio Simões Quaresma; Igor Brasil-Costa; Eduardo José Melo Dos Santos; Luiz Fábio Magno Falcão; Antonio Carlos Rosário Vallinoto
Journal:  Front Cell Infect Microbiol       Date:  2022-06-30       Impact factor: 6.073

4.  Increased Abundance of Achromobacter xylosoxidans and Bacillus cereus in Upper Airway Transcriptionally Active Microbiome of COVID-19 Mortality Patients Indicates Role of Co-Infections in Disease Severity and Outcome.

Authors:  Priti Devi; Ranjeet Maurya; Priyanka Mehta; Uzma Shamim; Aanchal Yadav; Partha Chattopadhyay; Akshay Kanakan; Kriti Khare; Janani Srinivasa Vasudevan; Shweta Sahni; Pallavi Mishra; Akansha Tyagi; Sujeet Jha; Sandeep Budhiraja; Bansidhar Tarai; Rajesh Pandey
Journal:  Microbiol Spectr       Date:  2022-05-17

5.  Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method.

Authors:  Feiming Huang; Lei Chen; Wei Guo; Xianchao Zhou; Kaiyan Feng; Tao Huang; Yudong Cai
Journal:  Life (Basel)       Date:  2022-05-28

Review 6.  SARS-CoV-2 mutations: the biological trackway towards viral fitness.

Authors:  Parinita Majumdar; Sougata Niyogi
Journal:  Epidemiol Infect       Date:  2021-04-30       Impact factor: 2.451

Review 7.  Evolution of SARS-CoV-2: Review of Mutations, Role of the Host Immune System.

Authors:  Helene Banoun
Journal:  Nephron       Date:  2021-04-28       Impact factor: 2.847

8.  Molecular Epidemiology Surveillance of SARS-CoV-2: Mutations and Genetic Diversity One Year after Emerging.

Authors:  Alejandro Flores-Alanis; Armando Cruz-Rangel; Flor Rodríguez-Gómez; James González; Carlos Alberto Torres-Guerrero; Gabriela Delgado; Alejandro Cravioto; Rosario Morales-Espinosa
Journal:  Pathogens       Date:  2021-02-09

9.  Comparative mutational analysis of SARS-CoV-2 isolates from Pakistan and structural-functional implications using computational modelling and simulation approaches.

Authors:  Abdullah Shah; Saira Rehmat; Iqra Aslam; Muhmmad Suleman; Farah Batool; Abdul Aziz; Farooq Rashid; Muhmmad Asif Nawaz; Syed Shujait Ali; Muhammad Junaid; Abbas Khan; Dong-Qing Wei
Journal:  Comput Biol Med       Date:  2021-12-25       Impact factor: 6.698

10.  Mutations in SARS-CoV-2 ORF8 Altered the Bonding Network With Interferon Regulatory Factor 3 to Evade Host Immune System.

Authors:  Farooq Rashid; Muhammad Suleman; Abdullah Shah; Emmanuel Enoch Dzakah; Haiying Wang; Shuyi Chen; Shixing Tang
Journal:  Front Microbiol       Date:  2021-07-16       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.