| Literature DB >> 35194619 |
Fritz Obermeyer1,2, Martin Jankowiak1,2, Nikolaos Barkas1, Stephen F Schaffner1,3,4, Jesse D Pyle1, Lonya Yurkovetskiy5, Matteo Bosso5, Daniel J Park1, Mehrtash Babadi1, Bronwyn L MacInnis1,4,6, Jeremy Luban1,5,6,7, Pardis C Sabeti1,3,4,6,8, Jacob E Lemieux1,9.
Abstract
Repeated emergence of SARS-CoV-2 variants with increased fitness necessitates rapid detection and characterization of new lineages. To address this need, we developed PyR 0 , a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR 0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR 0 forecasts growth of new lineages from their mutational profile, identifies viral lineages of concern as they emerge, and prioritizes mutations of biological and public health concern for functional characterization. ONE SENTENCEEntities:
Year: 2022 PMID: 35194619 PMCID: PMC8863165 DOI: 10.1101/2021.09.07.21263228
Source DB: PubMed Journal: medRxiv
Figure 1.A. Overview of the PyR0 analysis pipeline. After clustering UShER’s mutation annotated tree, sequence data are used to construct spatio-temporal lineage prevalence counts ytpc and amino acid substitution covariates Xcf. Pyro is used to fit a Bayesian multivariate logistic multinomial regression model to ytpc and Xcf.
B. Relative fitness versus date of lineage emergence. Circle size is proportional to cumulative case count inferred from lineage proportion estimates and confirmed case counts. Inset table lists the 10 fittest lineages inferred by the model. R/RA is the fold increase in relative fitness over the Wuhan (A) lineage, assuming a fixed generation time of 5.5 days.
Figure 2.A. Infectivity relative to WT of lentiviral vectors pseudotyped with the indicated Spike mutants. Target cells were HEK293T cells expressing ACE2 and TMPRSS2 transgenes. The genetic background of the Spike was Wuhan-Hu-1 bearing D614G. Red bars were significantly different from WT (adjusted p values shown). Black bars were not significantly different from WT. B. For the 1701 SARS-CoV-2 clusters with at least one amino acid substitution in the RBD domain we compare: i) the PyR0 prediction for the contribution to Δ log R from RBD substitutions only; to ii) antibody binding computed using the antibody-escape calculator in (17). The escape calculator is based on an intuitive non-linear model parameterized using deep mutational scanning data for 33 neutralizing antibodies elicited by SARS-CoV-2. PyR0 predictions exhibit high (Spearman) correlation with predictions from Greaney et al. C-E. We dissect PyR0 Δ log R estimates into S-gene (C), RBD (D), and non-S-gene (E) contributions for 3000 SARS-CoV-2 clusters (blue dots). The horizontal axis corresponds to the date at which each cluster first emerged. Red squares denote the median Δ log R within each monthly bin. The increased importance of S-gene mutations (notably in the RBD) over non-S-gene mutations starting around November 2021 is apparent.
Amino acid substitutions most significantly associated with increased fitness. Significance is defined as posterior mean / posterior standard deviation. Fitness is per 5.5 days (estimated generation time of the Wuhan (A) lineage (1, 19)). Final column: number of PANGO lineages in which each substitution emerged independently.
| Rank | Gene | Substitution | Fold Increase in Fitness | Number of Lineages |
|---|---|---|---|---|
| 1 | S | H655Y | 1.051 | 33 |
| 2 | S | T95I | 1.046 | 30 |
| 3 | ORF1a | P3395H | 1.039 | 5 |
| 4 | S | N764K | 1.040 | 6 |
| 5 | ORF1a | K856R | 1.039 | 2 |
| 6 | S | S371L | 1.041 | 3 |
| 7 | E | T91 | 1.040 | 5 |
| 8 | S | Q954H | 1.040 | 5 |
| 9 | ORF9b | P10S | 1.039 | 25 |
| 10 | S | L981F | 1.040 | 2 |
| 11 | N | P13L | 1.040 | 25 |
| 12 | S | G339D | 1.039 | 4 |
| 13 | S | S375F | 1.040 | 5 |
| 14 | S | S477N | 1.039 | 47 |
| 15 | S | N679K | 1.040 | 11 |
| 16 | S | S373P | 1.040 | 5 |
| 17 | M | Q19E | 1.039 | 5 |
| 18 | S | D796Y | 1.038 | 11 |
| 19 | S | N969K | 1.040 | 5 |
| 20 | S | T547K | 1.038 | 3 |
Figure 3.Manhattan plot of amino acid changes assessed in this study. A. Changes across the entire genome. B. Changes in the first 850 amino acids of S. In each of A-C the y axis shows effect size Δ log R, the estimated change in log relative fitness due to each amino acid change. The bottom three axes show the background density of all observed amino acid changes, the density of those associated with growth (weighted by |Δ log R|), and the ratio of the two. The top 55 amino acid changes are labeled. See Figure S13 for detailed views of S, N, ORF1a, and ORF1b. C. Changes in the first 250 amino acids of N. D. Structure of the spike-ACE2 complex (PDB: 7KNB). Spike subunits colored light blue, light orange, and gray. Top-ranked mutations are shown as red spheres. ACE2 is shown in magenta. E. Close-up view of the RBD interface. F. Top-ranked mutations in the N-terminal RNA-binding domain of N. Residues 44–180 of N (PDB: 7ACT) are shown in light blue. Amino acid positions corresponding to top mutations in this region are shown as red spheres. A 10-nt bound RNA is shown in gray.