Literature DB >> 34159627

Genome-wide association analysis of COVID-19 mortality risk in SARS-CoV-2 genomes identifies mutation in the SARS-CoV-2 spike protein that colocalizes with P.1 of the Brazilian strain.

Georg Hahn¹, Chloe M Wu², Sanghun Lee^1,3, Sharon M Lutz^1,4, Surender Khurana⁵, Lindsey R Baden⁶, Sebastien Haneuse¹, Dandi Qiao^7,8, Julian Hecker^4,7, Dawn L DeMeo^7,8, Rudolph E Tanzi⁹, Manish C Choudhary⁷, Behzad Etemad⁷, Abbas Mohammadi⁷, Elmira Esmaeilzadeh⁷, Michael H Cho^7,8, Jonathan Z Li⁷, Adrienne G Randolph^7,10, Nan M Laird¹, Scott T Weiss^7,8, Edwin K Silverman^7,8, Katharina Ribbeck², Christoph Lange^1,7,8.

Abstract

SARS-CoV-2 mortality has been extensively studied in relation to host susceptibility. How sequence variations in the SARS-CoV-2 genome affect pathogenicity is poorly understood. Starting in October 2020, using the methodology of genome-wide association studies (GWAS), we looked at the association between whole-genome sequencing (WGS) data of the virus and COVID-19 mortality as a potential method of early identification of highly pathogenic strains to target for containment. Although continuously updating our analysis, in December 2020, we analyzed 7548 single-stranded SARS-CoV-2 genomes of COVID-19 patients in the GISAID database and associated variants with mortality using a logistic regression. In total, evaluating 29,891 sequenced loci of the viral genome for association with patient/host mortality, two loci, at 12,053 and 25,088 bp, achieved genome-wide significance (p values of 4.09e-09 and 4.41e-23, respectively), though only 25,088 bp remained significant in follow-up analyses. Our association findings were exclusively driven by the samples that were submitted from Brazil (p value of 4.90e-13 for 25,088 bp). The mutation frequency of 25,088 bp in the Brazilian samples on GISAID has rapidly increased from about 0.4 in October/December 2020 to 0.77 in March 2021. Although GWAS methodology is suitable for samples in which mutation frequencies varies between geographical regions, it cannot account for mutation frequencies that change rapidly overtime, rendering a GWAS follow-up analysis of the GISAID samples that have been submitted after December 2020 as invalid. The locus at 25,088 bp is located in the P.1 strain, which later (April 2021) became one of the distinguishing loci (precisely, substitution V1176F) of the Brazilian strain as defined by the Centers for Disease Control. Specifically, the mutations at 25,088 bp occur in the S2 subunit of the SARS-CoV-2 spike protein, which plays a key role in viral entry of target host cells. Since the mutations alter amino acid coding sequences, they potentially imposing structural changes that could enhance viral infectivity and symptom severity. Our analysis suggests that GWAS methodology can provide suitable analysis tools for the real-time detection of new more transmissible and pathogenic viral strains in databases such as GISAID, though new approaches are needed to accommodate rapidly changing mutation frequencies over time, in the presence of simultaneously changing case/control ratios. Improvements of the associated metadata/patient information in terms of quality and availability will also be important to fully utilize the potential of GWAS methodology in this field.

Entities: Chemical

Keywords: GISAID database; SARS-CoV-2; logistic regression; mortality; spike protein; whole-genome sequencing

Mesh：

Substances：

Year: 2021 PMID： 34159627 PMCID： PMC8426743 DOI： 10.1002/gepi.22421

Source DB: PubMed Journal: Genet Epidemiol ISSN： 0741-0395 Impact factor: 2.344

INTRODUCTION

Viral mutations can cause increased virulence/transmissibility/immune evasion/pathogenicity (Long et al., 2020), both in animals (Brault et al., 2007; Geoghegan & Holmes, 2018), and in humans (Bae et al., 2018; Nogales et al., 2017). Especially for the SARS‐CoV‐2 virus, the discovery of potential links between viral mutations and disease outcome would have important implications for COVID‐19 surveillance and containment (Lo & Jamrozy, 2020), diagnosis, prognosis, and treatment development. In this contribution, we probed each locus of the single‐stranded RNA of the SARS‐CoV‐2 virus for direct association with host/patient mortality. In our initial analysis (October 2020), we aimed to identify potential links between viral mutations and mortality by utilizing the GISAID database (Elbe & Buckland‐Merrett, 2017; Shu & McCauley, 2017). Although continuously updating the analysis, in December 2020, GISAID contained data on 7548 COVID‐19 patients from 86 countries for whom metadata was available, that is, age, sex, location, and patient status, and whose viral genomes are sequenced (see Table 1). The variable “patient status” indicates if the patient was alive or deceased at the time the virus sample was submitted to GISAID; we used it as a surrogate for mortality in our analysis. As non‐deceased patients at enrollment could have died of Covid‐19 later, such misclassifications can lead to reduced statistical power, but not to an inflated type‐1 error. For the analysis, we repurposed the methodology of genome‐wide association studies (GWAS) (Manolio, 2010). This approach is widely used in human genetics and can test thousands of genetic loci for association in data sets such as the one of GISAID.

Table 1

Characteristics of all patients in the GISAID data set for whom complete meta‐information and sequenced viral genomes were available

							Mutation frequency in % at the following loci
Region	#total	#females	#males	Deceased/non‐ deceased	%deceased	Mean age	12,053	25,088
Entire data set	7548	3313	4235	722/6826	9.6	47.6	1.2	2.2
Africa	1517	954	563	2/1515	0.1	38.8	0.0	0.2
Eastern Mediterranean	730	180	550	131/599	17.9	45.4	0.0	0.1
Europe	1872	896	976	70/1802	3.7	56.0	0.1	0.0
Pan American Health Organization	1505	637	868	435/1070	28.9	51.9	5.7	10.6
Brazil	430	223	207	192/238	44.7	55.1	20.0	37.0
South‐East Asia	1116	367	749	83/1033	7.4	45.1	0.0	0.1
Western Pacific	808	279	529	1/807	0.1	41.6	0.0	0.2

Note: Total number of samples (as well as males/females), numbers of deceased/non‐deceased, rate of deceased samples at enrollment, mean age, and mutation frequencies for 12,053 and 25,088 bp.

Characteristics of all patients in the GISAID data set for whom complete meta‐information and sequenced viral genomes were available Note: Total number of samples (as well as males/females), numbers of deceased/non‐deceased, rate of deceased samples at enrollment, mean age, and mutation frequencies for 12,053 and 25,088 bp. To identify potential confounding geographic factors in the sequencing data, we first conducted principal component analysis of the Jaccard similarity matrix (Figure 1) that was computed for the 7548 viral genomes available for our analysis. We utilized the Jaccard similarity matrix because its computation does not require estimates of the mutation frequency for each locus in the SARS‐CoV‐2 genome, in contrast to other similarity matrices such as the variance/covariance matrix (Prokopenko et al., 2016). We found that the virus genomes clustered in distinctive branches that correspond to the geographic regions from where their data was submitted to GISAID (Forster et al., 2020, Hahn, Lee, Weiss, et al., 2020) (see Figure 1). Both, the geographical clustering of the viral genomes and their similarity within regions, can cause bias in the association analysis if unaccounted for. Hence, we generated additional eigenvector plots to investigate the number of eigenvectors needed to eliminate bias caused by such clustering. Based on a visual inspection of these plots, we selected the first 10 eigenvectors of the Jaccard matrix as covariates for the following logistic regression analyses.

Figure 1

Geographic distribution of 7548 SARS‐CoV‐2 genomes. Genomes are depicted according to their first two eigenvectors of the Jaccard matrix and colored by geographic region. The eigenvector plot shows distinct grouping of SARS‐CoV‐2 genomes according to their geographic origin. Furthermore, genomes that carry a mutation at 12,053 or 25,088 bp are depicted by triangles. The majority of those are located in a subbranch whose samples come predominantly from Pan America

METHODS

Data acquisition

The analysis presented in this article is based on nucleotide sequences with accession numbers EPI_ISL_403962 to EPI_ISL_636981, downloaded from the GISAID database (Elbe & Buckland‐Merrett, 2017; Shu & McCauley, 2017) as a file in “fasta” format on 06 December 2020. Only patients with additional metadata (age, sex, and hospitalization status as plain text comments) were selected on GISAID, resulting in 8647 samples.

Data cleaning

We filtered the 8647 samples for complete nucleotide sequences and aligned them to the SARS‐CoV‐2 reference sequence (published on GISAID under the accession number EPI_ISL_402124) using MAFFT (Katoh et al., 2002). Using the location tag in the fasta file, we grouped all samples according to the WHO regional offices for Africa (AFRO, N = 1517), for the Eastern Mediterranean (EMRO, N = 730), for Europe (EURO, N = 1872), for South‐East Asia (SEARO, N = 1116), for the Western Pacific (WPRO, N = 808), as well as the Pan American Health Organization (PAHO, N = 1505). In particular, the countries included in each group are as follows: (1) AFRO (Algeria, South Africa, Gambia, Nigeria, Senegal, as well as Congo, Madagascar, Mozambique, Tunisia, Ghana, Rwanda, Cameroon); (2) EMRO (Egypt, Morocco, Kuwait, Lebanon, Oman, Saudi Arabia, United Arab Emirates, as well as Iran, Iraq, Bahrain); (3) EURO (Austria, Belgium, Bosnia and Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Faroe Islands, France, Germany, Hungary, Italy, Israel, Poland, Portugal, Romania, Russia, Slovakia, Spain, Sweden, Turkey, Kazakhstan, as well as Andorra, Georgia, Norway, Ukraine, Switzerland, Saint Barthelemy, Guadeloupe, Saint Martin, Mongolia, Greece, Finland, Moldova, Reunion); (4) PAHO (Canada, USA, Costa Rica, Mexico, Argentina, Brazil, Chile, Colombia, Ecuador, Peru, Venezuela, as well as Puerto Rico, Uruguay, Panama, Dominican Republic); (5) SEARO (Bangladesh, India, Indonesia, Myanmar, Nepal, Sri Lanka, Thailand); (6) WPRO (Cambodia, Japan, Malaysia, Vietnam, Australia, Guam, Hong Kong, China, Singapore, as well as South Korea, Taiwan, New Zealand, Philippines). Finally, we matched the samples to the metadata information (age, sex, clinical outcome) available on GISAID. Filtering for those samples having complete metadata information resulted in n = 7548 samples.

Data analysis

After alignment with MAFFT (Katoh et al., 2002), we compared all aligned sequences of length p = 29,891 entrywise to the SARS‐CoV‐2 reference sequence, and denoted in a matrix X with an entry X = 1 that sequence i deviated from the reference sequence at position j. All other entries of X are zero. We used the R‐package “locStra” (Hahn, Lutz, Hecker, et al., 2020; Hahn, Lutz, & Lange, 2020) to calculate the Jaccard similarity matrix (Jaccard, 1901; Prokopenko et al., 2016; Schlauch et al., 2017; Tan et al., 2005) for the n viral genomes based on the matrix X. The Jaccard matrix J(X) has n rows and n columns, and each entry (i,j) is the Jaccard similarity index between the binary vector of mismatches/mutations (with respect to the reference sequence) for the ith and jth SARS‐CoV‐2 genome in our data set. Computation of the first 10 eigenvectors of the Jaccard similarity matrix J(X) allows us to visualize the geographic clustering of the viral genomes. We guard the logistic regression analysis against confounding by including the first eigenvectors in the regression analysis as covariates. For the association analysis of the entire viral genome, we defined the response to be a binary indicator for the clinical outcome, where we only distinguish between all those patients/hosts whose hospitalization status tag at enrollment into the GISAID database was listed as “deceased” (outcome of 1) versus the remaining samples as non‐deceased (outcome of 0). At this point, no other information regarding clinical outcome is available in GISAID. We performed a logistic regression of the binary outcome variable for each of the p = 29,891 loci on the following covariates: the column vector X·i encoding the mismatches/mutations of each sample at the ith location on the SARS‐CoV‐2 nucleotide sequence, the patient's age, sex, location (WHO region), and the first 10 eigenvectors of the Jaccard matrix. The WHO region was included as we observed in Figure 1 that the viral genomes cluster into distinct branches that correspond to the geographic regions. The logistic regression was carried out in R using the default “glm” command, where the parameter “family” was set to “family=binomial(link = “logit”).” We tested the ith locus/location of the viral genome for association with mortality by testing whether the regression coefficient for column X·i is equal to zero. We controlled for multiple tests using the Bonferroni correction at an uncorrected threshold of 0.05, resulting in the corrected threshold of 0.05/29,891 = 1.67e−06. To quantify unmeasured confounding, we computed E‐values (VanderWeele & Ding, 2017) with the help of the function “evalues.OLS” of the R‐package “EValue” (Mathur et al., 2021) on CRAN. Conditional on the measured covariates, the E‐value is the minimum strength of association (with both treatment and outcome) required for an unmeasured confounder to fully explain a specific treatment–outcome association. The E‐value is measured on the risk ratio scale. A small (large) E‐value indicates that small (considerable) amounts of unmeasured confounding is needed to explain an effect estimate. Finally, we also perform an analysis with a matched data set. For this, we match each sample in GISAID that is deceased at submission to the closest non‐deceased one, measured in Euclidean distance in the eigenvector space of the Jaccard matrix (Figure 1). When running the logistic regression on the matched data set, we test each of the p = 29,891 loci on the column vector X·i (encoding the mismatches to the reference genome), as well as the patient's age and sex only.

RESULTS

After testing each locus (presence/absence of mutation) of the viral genome individually for association with the status indicator variable (deceased/non‐deceased) of the host/patient at submission to GISAID, two loci of the SARS‐CoV‐2 genome achieved genome‐wide significance: one at position 12,053 bp with p value 4.09e−09, and one at 25,088 bp with p value 4.41e−23 (Table 2). The E‐values for both loci are 715 and 6696, respectively, hinting at the fact that a considerable unmeasured confounding would be needed to explain such an effect estimate.

Table 2

Analysis	Sample size	Deceased	Locus	p Value	Odds ratio
Overall	7548	722	12,053	4.09e−09	6.4
			25,088	4.41e−23	12.9
Matched analysis	1452	722	12,053	5.53e−05	3.5
			25,088	4.91e−11	4.8
PAHO	1505	435	12,053	1.22e−09	7.3
			25,088	3.10e−24	15.9
Brazil	430	192	12,053	2.27e−04	3.5
			25,088	4.90e−13	9.2

Sample size, number of deceased samples, as well as p values and odds ratios from the logistic regression on the two mutations: for the entire data set, for each WHO region, and for samples from Brazil only To investigate the robustness of the highly significant association signals, we examined the data set at the individual patient and locus level. Our findings were enabled by two features specific to the data: (1) the Brazilian centers reported much larger numbers of deceased patients than the other centers world‐wide. At enrollment, 44.7% of the Brazilian patients were deceased in contrast to 9.6% in the entire data set (including Brazil). (2) We also noticed that all genomes that carry at least one of the mutations either at 12,053 or 25,088 bp are located predominantly in the branch of the eigenvector plot (see Figure 1) that corresponds to the PAHO/South America region. We conducted two different types of sensitivity analyses to minimize the chances that the observed associations are caused by confounding/GISAID data set composition (Table 2): (1) Our data set was restricted to genomes that were matched based proximity in the eigenvector plots (see Section 2 for details), called “matching” in Table 2. (2) As further examination of the deceased indicator variable revealed that all “deceased” carrier genomes came from Brazil, our second sensitivity analysis was restricted to genomes that were submitted from the PAHO region and Brazil, respectively. In both analyses, 25,088 bp maintained significance at 0.05/29,891 = 1.67e−06, but 12,053 bp ceased to be significant. The effect size estimates showed risk increases for mortality of a factor of 5–16 for carriers of a mutation at 25,088 (Table 2). The E‐value for 25,088 bp in the Brazil analysis is 3.0, that is, to move the confidence interval to include the null, an unmeasured confounder that is associated with the Covid‐19 mortality and the presence of the mutation at 25,088 by a risk ratio of 3.0‐fold each could do so, but weaker confounding could not. To summarize, all the results of the secondary analyses (Table 2) support the genome‐wide significant association between the mutation at 25,088 bp and mortality. The large effect estimate and E‐value for the mutation at 25,088 bp (Table 2) are substantial in support of the association, as it is difficult to imagine an unaccounted confounding mechanism that would affect this mutation among roughly 30k loci and that would be strong enough to cause such profound association signals in our analysis. Since the criteria for selection into the study likely varies by country, and may be related to the deceased indicator, the odds ratio estimate from the Brazil sample alone may be most interpretable. Among the samples from Brazil, 18.2% of the patients whose viral genome did not carry any mutation at either loci were deceased at enrollment, compared with 82.4% for patients whose viral genomes carried the mutation at 25,088 bp only. As of December 2020, Table 1 also provides a regional breakdown of the “deceased‐at‐enrollment” rates and the mutation frequencies for both loci. The rarity of the mutations outside of Brazil in December 2020 means that there was virtually no power to detect any association (if they existed). It is important to note that locus at 25,088 bp colocalizes with the P.1 variant that has become part of the CDC definition (precisely, substitution V1176F) of the Brazilian strain in April 2021 (UCSC Genome Browser on SARS‐CoV‐2, 2021).

DISCUSSION

Single mutations in viruses can confer enhanced transmission and/or virulence associated with patient mortality (Bae et al., 2018; Brault et al., 2007). In our analysis of SARS‐CoV‐2, the mutation at 25,088 bp occurs in the spike glycoprotein, which mediates viral attachment and cellular entry. The spike protein consists of two functional subunits: S1, which contains the receptor‐binding domain, and S2, which contains the machinery needed to fuse the viral membrane to the host cellular membrane. The mutation at 25,088 bp is in the S2 subunit, and specifically occurs within the S2′ site, which is cleaved by host proteases to activate membrane fusion (Figure 2). The V1176F mutation in S2 is located in the Heptad repeat 2 domain, which is involved in the viral fusion machinery. In many viruses, membrane fusion is activated by proteolytic cleavage, an event which has been closely linked to infectivity—for instance, a multibasic cleavage site is a signature of highly pathogenic viruses including avian influenza (Walls et al., 2020). In coronaviruses, membrane fusion is known to depend on proteolytic cleavage at multiple sites, including the S1/S2 site, located at the interface between the S1 and S2 domains, and the S2′ site located within the S2 domain. These cleavage events can impact infection—in fact, a distinct furin cleavage site present in the SARS‐CoV‐2 S1/S2 site is not found in SARS‐CoV (Vankadari, 2020), and it is thought to increase infectivity through enhanced membrane fusion activity (Vankadari, 2020; Walls et al., 2020; Xia et al., 2020). Consequently, mutations at these sites can alter virulence—for instance, a recent study reported that mutations disrupting the multibasic nature of the S1/S2 site affect SARS‐CoV‐2 membrane fusion and entry into human lung cells (Hoffmann et al., 2020). Several studies have also found that SARS‐CoV mutants with an added furin recognition site at S2′ had increased membrane fusion activity (Belouzard et al., 2009; Watanabe et al., 2008). Although enhanced infectivity does not always cause a higher fatality rate, more infectious viruses can lead to a higher viral load, which can impact symptom severity and mortality (Pujadas et al., 2020).

Figure 2

Proposed model showing how the S2 mutation may enhance proteolytic activation. The SARS‐CoV‐2 spike protein is colored by region (blue—S1, green—S2, magenta—S2′). The S2′ site is cleaved by host proteases, facilitating membrane fusion and viral entry into host cells. A mutation in this region, depicted in yellow, could theoretically increase proteolytic activity and membrane fusion, thereby causing greater infectivity All carriers of a mutation at 25,088 bp exhibit a G to T missense mutation (Table 3), which changes the encoded amino acid from valine to phenylalanine. Compared to the branched chain structure of valine, phenylalanine has a bulkier aromatic structure. Such a substitution may impose local structural constraints, stabilize particular secondary structures (Makwana & Mahalakshmi, 2015), or introduce specific interactions which lead to preferential binding. Therefore, a mutation in the S2′ domain which promotes proteolytic cleavage could theoretically enhance viral infectivity (Figure 2) and consequently, patient mortality. Although many current therapies primarily target the receptor binding domain within the S1 subunit of the SARS‐CoV‐2 spike protein, our findings suggest that the S2 domain may be an important additional target for therapeutic development. The emergence of a more aggressive P.1 lineage carrying this mutation was associated with a second wave of infection across Brazil (Faria et al., 2021). Several modeling approaches have estimated P.1 to have higher transmission and reinfection (Faria et al., 2021; Coutinho et al., 2021, preprint), and there is evidence suggesting that P.1 is less susceptible to therapeutic or vaccine‐induced neutralizing antibodies (Hoffmann et al., 2021). Further experimental characterization of the biological effects of this mutations can have important implications for SARS‐CoV‐2 treatment and containment.

Table 3

Number of genomic variants at each locus, affected protein position, and corresponding amino acid change

Locus	A	C	G	T	Protein	Position	Primary substitution
12,053	0	7453	0	87	nsp7	71	Leu ⟶ Phe
25,088	0	0	7331	166	Spike	1176	Val ⟶ Phe

Note: Amino acid in the reference sequence in bold.

Number of genomic variants at each locus, affected protein position, and corresponding amino acid change Note: Amino acid in the reference sequence in bold. The mutation at 12,053 bp occurs within the ORF1ab gene, which expresses a polyprotein comprised of 16 nonstructural proteins (Yoshimoto, 2020). Specifically, 12,053 bp occurs in NSP7, which dimerizes with NSP8 to form a heterodimer that complexes with NSP12, ultimately forming the RNA polymerase complex essential for genome replication and transcription. Mutations causing enhanced viral polymerase activity have been linked to increased pathogenicity of influenza viruses. All carriers of a mutation at 12,053 bp exhibit a C to T missense mutation, which causes leucine to be substituted for phenylalanine (Table 3). Such a mutation may confer structural rigidity which could potentially alter interactions with other components of replication and transcription machinery, though experimental analysis is needed to test these hypotheses. From a methodological perspective, there are potential strengths to our GWAS analysis approach to sequenced SARS‐CoV‐2 genomes. As the independent support of our association findings for the locus at 25,088 bp illustrates, GWAS methodology might be a well‐suited tool for the early detection of new viral strains in global database systems such as GISAID, to which scientists submit their viral genomes during pandemics with minimal requirements regarding the meta/clinical information about the host/patient. In general, GWAS methodology would be suitable to analyze the highly correlated viral genomes in such data sets, as the GWAS approach can simultaneously handle different subpopulations with different proportions of cases/controls. However, there are important limitations to applying GWAS methodology in a pandemic. More transmissible variants will alter mutation frequencies and increase the case/control ratio, as occurred for these two variants. Deployment of vaccines or targeted monoclonal antibodies may exert immunologic pressure on the virus leading to selective viral evolution. Standard GWAS methodology assumes a stable mutation frequency and is then no longer valid. Additional analytic methods are required to adjust for a time‐changing variant frequency but to fully utilize all viral genome sequences, the availability and the quality of meta information/patient information must be robust, using consistent outcome definitions and accurate data capture.

CONFLICT OF INTERESTS

The authors declare that there are no conflict of interests.

30 in total

Review 1. Implications of aromatic-aromatic interactions: From protein structures to peptide models.

Authors: Kamlesh Madhusudan Makwana; Radhakrishnan Mahalakshmi
Journal: Protein Sci Date: 2015-10-07 Impact factor: 6.725

2. Entry from the cell surface of severe acute respiratory syndrome coronavirus with cleaved S protein as revealed by pseudotype virus bearing cleaved S protein.

Authors: Rie Watanabe; Shutoku Matsuyama; Kazuya Shirato; Masami Maejima; Shuetsu Fukushi; Shigeru Morikawa; Fumihiro Taguchi
Journal: J Virol Date: 2008-09-10 Impact factor: 5.103

3. Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites.

Authors: Sandrine Belouzard; Victor C Chu; Gary R Whittaker
Journal: Proc Natl Acad Sci U S A Date: 2009-03-24 Impact factor: 11.205

4. Unsupervised cluster analysis of SARS-CoV-2 genomes reflects its geographic progression and identifies distinct genetic subgroups of SARS-CoV-2 virus.

Authors: Georg Hahn; Sanghun Lee; Scott T Weiss; Christoph Lange
Journal: Genet Epidemiol Date: 2021-01-08 Impact factor: 2.135

Review 5. The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n-COV19), the Cause of COVID-19.

Authors: Francis K Yoshimoto
Journal: Protein J Date: 2020-06 Impact factor: 2.371

6. SARS-CoV-2 variants B.1.351 and P.1 escape from neutralizing antibodies.

Authors: Markus Hoffmann; Prerna Arora; Rüdiger Groß; Alina Seidel; Bojan F Hörnich; Alexander S Hahn; Nadine Krüger; Luise Graichen; Heike Hofmann-Winkler; Amy Kempf; Martin S Winkler; Sebastian Schulz; Hans-Martin Jäck; Bernd Jahrsdörfer; Hubert Schrezenmeier; Martin Müller; Alexander Kleger; Jan Münch; Stefan Pöhlmann
Journal: Cell Date: 2021-03-20 Impact factor: 41.582

7. A Single Amino Acid in the Polymerase Acidic Protein Determines the Pathogenicity of Influenza B Viruses.

Authors: Joon-Yong Bae; Ilseob Lee; Jin Il Kim; Sehee Park; Kirim Yoo; Miso Park; Gayeong Kim; Mee Sook Park; Joo-Yeon Lee; Chun Kang; Kisoon Kim; Man-Seong Park
Journal: J Virol Date: 2018-06-13 Impact factor: 5.103

Review 8. The phylogenomics of evolving virus virulence.

Authors: Jemma L Geoghegan; Edward C Holmes
Journal: Nat Rev Genet Date: 2018-12 Impact factor: 53.242

9. Phylogenetic network analysis of SARS-CoV-2 genomes.

Authors: Peter Forster; Lucy Forster; Colin Renfrew; Michael Forster
Journal: Proc Natl Acad Sci U S A Date: 2020-04-08 Impact factor: 11.205

10. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein.

Authors: Alexandra C Walls; Young-Jun Park; M Alejandra Tortorici; Abigail Wall; Andrew T McGuire; David Veesler
Journal: Cell Date: 2020-03-09 Impact factor: 41.582

5 in total

1. Predictive model for severe COVID-19 using SARS-CoV-2 whole-genome sequencing and electronic health record data, March 2020-May 2021.

Authors: Lei Zhu; Jane W Marsh; Marissa P Griffith; Kevin Collins; Vatsala Srinivasa; Kady Waggle; Daria Van Tyne; Graham M Snyder; Tung Phan; Alan Wells; Oscar C Marroquin; Lee H Harrison
Journal: PLoS One Date: 2022-07-12 Impact factor: 3.752

2. Data-driven platform for identifying variants of interest in COVID-19 virus.

Authors: Priya Ramarao-Milne; Yatish Jain; Letitia M F Sng; Brendan Hosking; Carol Lee; Arash Bayat; Michael Kuiper; Laurence O W Wilson; Natalie A Twine; Denis C Bauer
Journal: Comput Struct Biotechnol J Date: 2022-06-03 Impact factor: 6.155

3. Elimination of SARS-CoV-2 in nasopharynx and oropharynx after use of an adjuvant gargling and rinsing protocol with an antiseptic mouthwash.

Authors: Fabiano Vieira Vilhena; Bernardo da Fonseca Orcina; Lúcio Lemos; Jeanette Cecília Fournier Less; Isabella Pinto; Paulo Sérgio da Silva Santos
Journal: Einstein (Sao Paulo) Date: 2022-01-05

4. Demographic and Viral-Genetic Analyses of COVID-19 Severity in Bahrain Identify Local Risk Factors and a Protective Effect of Polymerase Mutations.

Authors: Evan M Koch; Justin Du; Michelle Dressner; Hashmeya Erahim Alwasti; Zahra Al Taif; Fatima Shehab; Afaf Merza Mohamed; Amjad Ghanem; Alireza Haghighi; Shamil Sunyaev; Maha Farhat
Journal: medRxiv Date: 2022-08-16

5. Genome-wide association analysis of COVID-19 mortality risk in SARS-CoV-2 genomes identifies mutation in the SARS-CoV-2 spike protein that colocalizes with P.1 of the Brazilian strain.

Authors: Georg Hahn; Chloe M Wu; Sanghun Lee; Sharon M Lutz; Surender Khurana; Lindsey R Baden; Sebastien Haneuse; Dandi Qiao; Julian Hecker; Dawn L DeMeo; Rudolph E Tanzi; Manish C Choudhary; Behzad Etemad; Abbas Mohammadi; Elmira Esmaeilzadeh; Michael H Cho; Jonathan Z Li; Adrienne G Randolph; Nan M Laird; Scott T Weiss; Edwin K Silverman; Katharina Ribbeck; Christoph Lange
Journal: Genet Epidemiol Date: 2021-06-22 Impact factor: 2.344

5 in total