| Literature DB >> 35608456 |
Pardis C Sabeti1,2,3,4,5, Jacob E Lemieux1,4,6, Fritz Obermeyer1,7, Martin Jankowiak1,7, Nikolaos Barkas1, Stephen F Schaffner1,2,3, Jesse D Pyle1,8, Leonid Yurkovetskiy9, Matteo Bosso9, Daniel J Park1, Mehrtash Babadi1, Bronwyn L MacInnis1,3,4, Jeremy Luban1,4,10,9.
Abstract
Repeated emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants with increased fitness underscores the value of rapid detection and characterization of new lineages. We have developed PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many nonspike mutations within the nucleocapsid and nonstructural proteins. PyR0 forecasts growth of new lineages from their mutational profile, ranks the fitness of lineages as new sequences become available, and prioritizes mutations of biological and public health concern for functional characterization.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35608456 PMCID: PMC9161372 DOI: 10.1126/science.abm1208
Source DB: PubMed Journal: Science ISSN: 0036-8075 Impact factor: 63.714
Fig. 1.
Relative fitness versus date of lineage emergence.
Circle size is proportional to cumulative case count inferred from lineage proportion estimates and confirmed case counts. Inset table lists the 10 fittest lineages inferred by the model. R/RA is the fold increase in relative fitness over the Wuhan (A) lineage, assuming a fixed generation time of 5.5 days.
Amino acid substitutions most significantly associated with increased fitness.
Significance is defined as posterior mean / posterior standard deviation. Fitness is per 5.5 days (estimated generation time of the Wuhan (A) lineage ( , )). Final column: number of PANGO lineages in which each substitution emerged independently.
|
|
|
|
|
|
| 1 | S | H655Y | 1.051 | 33 |
| 2 | S | T95I | 1.046 | 30 |
| 3 | ORF1a | P3395H | 1.039 | 5 |
| 4 | S | N764K | 1.04 | 6 |
| 5 | ORF1a | K856R | 1.039 | 2 |
| 6 | S | S371L | 1.041 | 3 |
| 7 | E | T9I | 1.04 | 5 |
| 8 | S | Q954H | 1.04 | 5 |
| 9 | ORF9b | P10S | 1.039 | 25 |
| 10 | S | L981F | 1.04 | 2 |
| 11 | N | P13L | 1.04 | 25 |
| 12 | S | G339D | 1.039 | 4 |
| 13 | S | S375F | 1.04 | 5 |
| 14 | S | S477N | 1.039 | 47 |
| 15 | S | N679K | 1.04 | 11 |
| 16 | S | S373P | 1.04 | 5 |
| 17 | M | Q19E | 1.039 | 5 |
| 18 | S | D796Y | 1.038 | 11 |
| 19 | S | N969K | 1.04 | 5 |
| 20 | S | T547K | 1.038 | 3 |
Fig. 2.
Manhattan plot of amino acid changes assessed in this study.
(A) Changes across the entire genome. (B) Changes in the first 850 amino acids of S. In each of (A) to (C) the y axis shows effect size Δ log R, the estimated change in log relative fitness due to each amino acid change. The bottom three axes show the background density of all observed amino acid changes, the density of those associated with growth (weighted by |Δ log R|), and the ratio of the two. The top 55 amino acid changes are labeled. See fig. S13 for detailed views of S, N, ORF1a, and ORF1b. C. Changes in the first 250 amino acids of N. (D) Structure of the spike-ACE2 complex (PDB: 7KNB). Spike subunits colored light blue, light orange, and gray. Top-ranked mutations are shown as red spheres. ACE2 is shown in magenta. (E) Close-up view of the RBD interface. (F) Top-ranked mutations in the N-terminal RNA-binding domain of N. Residues 44-180 of N (PDB: 7ACT) are shown in light blue. Amino acid positions corresponding to top mutations in this region are shown as red spheres. A 10-nt bound RNA is shown in gray.
Fig. 3.
(A) Infectivity relative to WT of lentiviral vectors pseudotyped with the indicated Spike mutants.
Target cells were HEK293T cells expressing ACE2 and TMPRSS2 transgenes. The genetic background of the Spike was Wuhan-Hu-1 bearing D614G. Red bars were significantly different from WT (adjusted p values shown). Black bars were not significantly different from WT. (B) For the 1701 SARS-CoV-2 clusters with at least one amino acid substitution in the RBD domain we compare: i) the PyR0 prediction for the contribution to Δ log R from RBD substitutions only; to ii) antibody binding computed using the antibody-escape calculator in ( ). The escape calculator is based on an intuitive non-linear model parameterized using deep mutational scanning data for 33 neutralizing antibodies elicited by SARS-CoV-2. PyR0 predictions exhibit high (Spearman) correlation with predictions from Greaney et al. ( ) (C to E) We dissect PyR0 Δ log R estimates into S-gene (C), RBD (D), and non-S-gene (E) contributions for 3000 SARS-CoV-2 clusters (blue dots). The horizontal axis corresponds to the date at which each cluster first emerged. Red squares denote the median Δ log R within each monthly bin. The increased importance of S-gene mutations (notably in the RBD) over non-S-gene mutations starting around November 2021 is apparent.