Literature DB >> 33024977

Molecular Architecture of Early Dissemination and Massive Second Wave of the SARS-CoV-2 Virus in a Major Metropolitan Area.

S Wesley Long1,2, Randall J Olsen1,2, Paul A Christensen1, David W Bernard1,2, James J Davis3,4, Maulik Shukla3,4, Marcus Nguyen3,4, Matthew Ojeda Saavedra1, Prasanti Yerramilli1, Layne Pruitt1, Sishir Subedi1, Hung-Che Kuo5, Heather Hendrickson1, Ghazaleh Eskandari1, Hoang A T Nguyen1, J Hunter Long1, Muthiah Kumaraswami1, Jule Goike5, Daniel Boutz6, Jimmy Gollihar1,6, Jason S McLellan5, Chia-Wei Chou5, Kamyab Javanmardi5, Ilya J Finkelstein5,7, James M Musser1,2.   

Abstract

We sequenced the genomes of 5,085 SARS-CoV-2 strains causing two COVID-19 disease waves in metropolitan Houston, Texas, an ethnically diverse region with seven million residents. The genomes were from viruses recovered in the earliest recognized phase of the pandemic in Houston, and an ongoing massive second wave of infections. The virus was originally introduced into Houston many times independently. Virtually all strains in the second wave have a Gly614 amino acid replacement in the spike protein, a polymorphism that has been linked to increased transmission and infectivity. Patients infected with the Gly614 variant strains had significantly higher virus loads in the nasopharynx on initial diagnosis. We found little evidence of a significant relationship between virus genotypes and altered virulence, stressing the linkage between disease severity, underlying medical conditions, and host genetics. Some regions of the spike protein - the primary target of global vaccine efforts - are replete with amino acid replacements, perhaps indicating the action of selection. We exploited the genomic data to generate defined single amino acid replacements in the receptor binding domain of spike protein that, importantly, produced decreased recognition by the neutralizing monoclonal antibody CR30022. Our study is the first analysis of the molecular architecture of SARS-CoV-2 in two infection waves in a major metropolitan region. The findings will help us to understand the origin, composition, and trajectory of future infection waves, and the potential effect of the host immune response and therapeutic maneuvers on SARS-CoV-2 evolution.

Entities:  

Year:  2020        PMID: 33024977      PMCID: PMC7536878          DOI: 10.1101/2020.09.22.20199125

Source DB:  PubMed          Journal:  medRxiv


[Introduction]

Pandemic disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus is now responsible for massive human morbidity and mortality worldwide (1–5). The virus was first documented to cause severe respiratory infections in Wuhan, China, beginning in late December 2019 (6–9). Global dissemination occurred extremely rapidly and has affected major population centers on most continents (10, 11). In the United States, the Seattle and the New York City (NYC) regions have been especially important centers of COVID-19 disease caused by SARS-CoV-2. For example, as of August 19, 2020, there were 227,419 confirmed SARS-CoV-2 cases in NYC, causing 56,831 hospitalizations and 19,005 confirmed fatalities and 4,638 probable fatalities (12). Similarly, in Seattle and King County, 17,989 positive patients and 696 deaths have been reported as of August 18, 2020 (13). The Houston metropolitan area is the fourth largest and most ethnically diverse city in the United States, with a population of approximately 7 million (14, 15). The 2,400-bed Houston Methodist health system has seven hospitals and serves a large, multiethnic, and socioeconomically diverse patient population throughout greater Houston (13, 14). The first COVID-19 case in metropolitan Houston was reported on March 5, 2020 with community spread occurring one week later (16). Many of the first cases in our region were associated with national or international travel in areas known to have SARS-CoV-2 virus outbreaks (16). A central molecular diagnostic laboratory serving all Houston Methodist hospitals and our very early adoption of a molecular test for the SARS-CoV-2 virus permitted us to rapidly identify positive patients and interrogate genomic variation among strains causing early infections in the greater Houston area. Our analysis of SARS-CoV-2 genomes causing disease in Houston has continued unabated since early March and is ongoing. Genome sequencing and related efforts were expanded extensively in late May as we recognized that a prominent second wave was underway (Figure 1).
FIG 1

(A) Confirmed COVID-19 cases in the Greater Houston Metropolitan region. Cumulative number of COVID-19 patients over time through July 7, 2020. Counties include Austin, Brazoria, Chambers, Fort Bend, Galveston, Harris, Liberty, Montgomery, and Waller. The shaded area represents the time period during which virus genomes characterized in this study were recovered from COVID-19 patients. The red line represents the number of COVID-19 patients diagnosed in the Houston Methodist Hospital Molecular Diagnostic Laboratory. (B) Distribution of strains with either the Asp614 or Gly614 amino acid variant in spike protein among the two waves of COVID-19 patients diagnosed in the Houston Methodist Hospital Molecular Diagnostic Laboratory. The large inset shows major clade frequency for the time frame studied.

Here, we report that SARS-CoV-2 was introduced to the Houston area many times, independently, from diverse geographic regions, with virus genotypes representing genetic clades causing disease in Europe, Asia, South America and elsewhere in the United States. There was widespread community dissemination soon after COVID-19 cases were reported in Houston. Strains with a Gly614 amino acid replacement in the spike protein, a polymorphism that has been linked to increased transmission and in vitro cell infectivity, increased significantly over time and caused virtually all COVID-19 cases in the massive second disease wave. Patients infected with strains with the Gly614 variant had significantly higher virus loads in the nasopharynx on initial diagnosis. Some naturally occurring single amino acid replacements in the receptor binding domain (RBD) of spike protein resulted in decreased reactivity with a neutralizing monoclonal antibody, consistent with the idea that some virus variants arise due to host immune pressure.

RESULTS

Description of metropolitan Houston.

Houston, Texas, is located in the southwestern United States, 50 miles inland from the Gulf of Mexico. It is the most ethnically diverse city in the United States (14). Metropolitan Houston is comprised predominantly of Harris County plus parts of eight contiguous surrounding counties. In the aggregate, the metropolitan area includes 9,444 square miles. The estimated population size of metropolitan Houston is 7 million (https://www.houston.org/houston-data).

Epidemic curve characteristics over two disease waves.

The first confirmed case of COVID-19 in the Houston metropolitan region was reported on March 5, 2020 (16), and the first confirmed case diagnosed in Houston Methodist hospitals was reported on March 6, 2020. The epidemic curve indicated a first wave of COVID-19 cases that peaked around April 11–15, followed by a decline in cases until May 11. Soon thereafter, the slope of the case curve increased with a very sharp uptick in confirmed cases beginning on June 12 (Figure 1B). We consider May 11 as the transition between waves, as this date is the inflection point of the cumulative new cases curve and had the absolute lowest number of new cases in the mid-May time period. Thus, for the data presented herein, wave 1 is defined as March 5 through May 11, 2020, and wave 2 is defined as May 12 through July 7, 2020. Epidemiologic trends within the Houston Methodist Hospital population were mirrored by data from Harris County and the greater metropolitan Houston region (Figure 1A). Through the 7th of July, 25,366 COVID-19 cases were reported in Houston, 37,776 cases in Harris County, and 53,330 in metropolitan Houston, including 9,823 cases in Houston Methodist facilities (inpatients and outpatients) (https://www.tmc.edu/coronavirus-updates/infection-rate-in-the-greater-houston-area/ and https://harriscounty.maps.arcgis.com/apps/opsdashboard/index.html#/c0de71f8ea484b85bb5efcb7c07c6914). During the first wave (early March through May 11), 11,476 COVID-19 cases were reported in Houston, including 1,729 cases in the Houston Methodist Hospital system. Early in the first wave (from March 5 through March 30, 2020), we tested 3,080 patient specimens. Of these, 406 (13.2%) samples were positive for SARS-CoV-2, representing 40% (358/898) of all confirmed cases in metropolitan Houston during that time period. As our laboratory was the first hospital-based facility to have molecular testing capacity for SARS-CoV-2 available on site, our strain samples are likely representative of COVID-19 infections during the first wave. For the entire study period (March 5 through July 7, 2020), we tested 68,418 specimens from 55,800 patients. Of these, 9,121 patients (16.4%) had a positive test result, representing 17.1% (9,121/53,300) of all confirmed cases in metropolitan Houston. Thus, our strain samples are also representative of those responsible for COVID-19 infections in the massive second wave. To test the hypothesis that, on average, the two waves affected different groups of patients, we analyzed individual patient characteristics (hospitalized and non-hospitalized) in each wave. Consistent with this hypothesis, we found significant differences in the COVID-19 patients in each wave (Table S1). For example, patients in the second wave were significantly younger, had fewer comorbidities, were more likely to be Hispanic/Latino (by self-report), and lived in zip codes with lower median incomes (Table S1). A detailed analysis of the characteristics of patients hospitalized in Houston Methodist facilities in the two waves has recently been published (17).

SARS-CoV-2 genome sequencing and phylogenetic analysis.

To investigate the genomic architecture of the virus across the two waves, we sequenced the genomes of 5,085 SARS-CoV-2 strains dating to the earliest time of confirmed COVID-19 cases in Houston. Analysis of SARS-CoV-2 strains causing disease in the first wave (March 5 through May 11) identified the presence of many diverse virus genomes that, in the aggregate, represent the major clades identified globally to date (Figure 1B). Clades G, GH, GR, and S were the four most abundantly represented phylogenetic groups (Figure 1B). Strains with the Gly614 amino acid variant in spike protein represented 82% of the SARS-CoV-2 strains in wave 1, and 99.9% in wave 2 (p<0.0001; Fisher’s exact test) (Figure 1B). This spike protein variant is characteristic of clades G, GH, and GR. Importantly, strains with the Gly614 variant represented only 71% of the specimens sequenced in March, the early part of wave 1 (Figure 1B). We attribute the decrease in strains with this variant observed in the first two weeks of March (Figure 1B) to fluctuation caused by the relatively fewer COVID-19 cases occurring during this period.

Relating spatiotemporal genome analysis with virus genotypes over two disease waves.

We examined the spatial and temporal mapping of genomic data to investigate community spread during wave 1 (Figure 2). Rapid and widespread community dissemination occurred soon after the initial COVID-19 cases were reported in Houston. The heterogenous virus genotypes present very early in wave 1 indicate that multiple strains independently entered metropolitan Houston, rather than introduction and spread of a single strain. An important observation was that strains of most of the individual subclades were distributed over broad geographic areas (Figure S1). These findings are consistent with the known ability of SARS-CoV-2 to spread very rapidly from person to person.
FIG 2

Sequential time-series heatmaps for all COVID-19 Houston Methodist patients during the study period. Geospatial distribution of COVID-19 patients is based on zip code. Panel A (left) shows geospatial distribution of sequenced SARS-CoV-2 strains in wave 1 and panel B (right) shows wave 2 distribution. The collection dates are shown at the bottom of each panel. The insets refer to numbers of strains in the color spectrum used. Note difference in numbers of strains used in panel A and panel B insets.

Relationship between virus clades, clinical characteristics of infected patients, and additional metadata.

It is possible that SARS-CoV-2 genome subtypes have different clinical characteristics, analogous to what is believed to have occurred with Ebola virus (18–20) and known to occur for other pathogenic microbes (21). As an initial examination of this issue in SARS-CoV-2, we tested the hypothesis that patients with disease severe enough to warrant hospitalization were infected with a non-random subset of virus genotypes. We also examined the association between virus clades and disease severity based on overall mortality, highest level of required care (intensive care unit, intermediate care unit, inpatient or outpatient), need for mechanical ventilation, and length of stay. There was no simple relationship between virus clades and disease severity using these four indicators. Similarly, there was no simple relationship between virus clades and other metadata, such as sex, age, or ethnicity (Figure S2).

Machine learning analysis.

Machine learning models can be used to identify complex relationships not revealed by statistical analyses. We built machine learning models to test the hypothesis that virus genome sequence can predict patient outcomes including mortality, length of stay, level of care, ICU admission, supplemental oxygen use, and mechanical ventilation. Models to predict outcomes based on virus genome sequence alone resulted in low F1 scores less than 50% (0.41 – 0.49) and regression models showed similarly low R2 values (−0.01 – −0.20) (Table S2). F1 scores near 50% are indicative of classifiers that are performing similarly to random chance. The use of patient metadata alone to predict patient outcome improved the model’s F1 scores by 5–10% (0.51 – 0.56) overall. The inclusion of patient metadata with virus genome sequence data improved most predictions of outcomes, compared to genome sequence alone, to 50% to 55% F1 overall (0.42 – 0.55) in the models (Table S2). The findings are indicative of two possibilities that are not mutually exclusive. First, patient metadata, such as age and sex, may provide more signal for the model to use and thus result in better accuracies. Second, the model’s use of single nucleotide polymorphisms (SNPs) may have resulted in overfitting. Most importantly, no SNP predicted a significant difference in outcome. A table of classifier accuracy scores and performance information is provided in Table S2.

Patient outcome and metadata correlations.

Overall, very few metadata categories correlated with patient outcomes (Table S3). Mortality was independently correlated with increasing age, with a Pearson correlation coefficient (PCC) equal to 0.27. This means that 27% of the variation in mortality can be predicted from patient age. Length of stay correlated independently with increasing age (PCC=0.20). All other patient metadata correlations to outcomes had PCC less than 0.20 (Table S3). We further analyzed outcomes correlated to isolates from wave 1 and 2, and the presence of the Gly614 variant in spike protein. Being in wave 1 was independently correlated with mechanical ventilation days, overall length of stay, and ICU length of stay, with PCC equal to 0.20, 0.18, and 0.14, respectively. Importantly, the presence of the Gly614 variant did not correlate with patient outcomes (Table S3).

Analysis of the nsp12 polymerase gene.

The SARS-CoV-2 genome encodes an RNA-dependent RNA polymerase (RdRp, also referred to as Nsp12) used in virus replication (22–25). Two amino acid substitutions (Phe479Leu and Val556Leu) in RdRp each confer significant resistance in vitro to remdesivir, an adenosine analog (26). Remdesivir is inserted into RNA chains by RdRp during replication, resulting in premature termination of RNA synthesis and inhibition of virus replication. This compound has shown prophylactic and therapeutic benefit against MERS-CoV and SARS-CoV-2 experimental infection in rhesus macaques (27, 28). Recent reports indicate that remdesivir has therapeutic benefit in some COVID-19 hospitalized patients (29–33), leading it to be now widely used in patients worldwide. Thus, it may be important to understand variation in RdRp in large strain samples. To acquire data about allelic variation in the nsp12 gene, we analyzed our 5,085 virus genomes. The analysis identified 265 SNPs, including 140 nonsynonymous (amino acid-altering) SNPs, resulting in amino acid replacements throughout the protein (Table 1, Figure 3, Figure 4, Figure S3, and Figure S4). The most common amino acid change was Pro322Leu, identified in 4,893 of the 5,085 (96%) patient isolates. This amino acid replacement is common in genomes from clades G, GH, and GR, which are distinguished from other SARS-CoV-2 clades by the presence of the Gly614 amino acid change in the spike protein. Most of the other amino acid changes in RdRp were present in relatively small numbers of strains, and some have been identified in other isolates in a publicly available database (34). Five prominent exceptions included amino acid replacements: Ala15Val in 138 strains, Met462Ile in 59 strains, Met600Ile in 75 strains, Thr907Ile in 45 strains, and Pro917Ser in 80 strains. All 75 Met600Ile strains were phylogenetically closely related members of clade G, and also had the Pro322Leu amino acid replacement characteristic of this clade (Figure S3). These data indicate that the Met600Ile change is likely the evolved state, derived from a precursor strain with the Pro322Leu replacement. Similarly, we investigated phylogenetic relationships among strains with the other four amino acid changes noted above. In all cases, the vast majority of strains with each amino acid replacement were found among individual subclades of strains (Figure S3).
Table 1.

Nonsynonymous SNPs of SARS-CoV-2 nsp12.

Genomic LocusGene LocusAmino Acid ChangeDomainWave 1 (n=1026)Wave 2 (n=4059)Total (n=5085)
13446C3TA1VN-terminus22
13448G5AD2NN-terminus11
13487C44TA15VN-terminus138138
13501C58TP20SN-terminus11
13514G71AG24DN-terminus33
13517C74TT25IN-terminus44
13520G77AS26NN-terminus11
13523C80TT27IN-terminus11
13526A83CD28AN-terminus11
13564G121AV41IB hairpin11
13568C125TA42VB hairpin11
13571G128TG43VB hairpin11
13576G133TA45SB hairpin1212
13617G174TK58NNiRAN11
13618G175TD59YNiRAN2424
13620C177GD59ENiRAN11
13627G184TD62YNiRAN11
13661G218AR73KNiRAN11
13667C224TT75INiRAN22
13694C251TT84INiRAN11
13712A269GK90RNiRAN11
13726G283AV95INiRAN11
13730C287TA96VNiRAN224
13762G319CG107RNiRAN11
13774C331AP111TNiRAN11
13774C331TP111SNiRAN1515
13777C334TH112YNiRAN11
13790A347GQ116RNiRAN22
13835G392TR131MNiRAN11
13858G415TD139YNiRAN33
13862C419TT140INiRAN156
13868A425GK142RNiRAN11
13897G454TD152YNiRAN44
13901A458GD153GNiRAN22
13957C514TR172CNiRAN22
13963T520CY174HNiRAN11
13966G523AA175TNiRAN11
13975G532TG178CNiRAN44
13984G541AV181INiRAN11
13994C551TA184VNiRAN88
14104T661CF221LNiRAN22
14109A666GI222MNiRAN11
14120C677TP226LNiRAN22
14185A742GR248GNiRAN11
14187G744TR248SNiRAN11
14188G745AA249TNiRAN11
14225C782AT261KInterface44
14230C787TP263SInterface11
14233T790CY264HInterface11
14241G798TK266NInterface11
14290G847TD283YInterface11
14335G892TV298FInterface88
14362C919AL307MInterface22
14371G928CA310PInterface11
14396C953TT318IInterface11
14398G955TV319LInterface11
14407C964TP322SInterface22
14408C965TP322LInterface84340504893
14500G1057TV353LInterface55
14536C1093TL365FInterface11
14557G1114TV372LFingers44
14584G1141TA381S11
14585C1142TA381VFingers1010
14593G1150AG384SFingers11
14657C1214TA405VFingers11
14708C1265TA422V11
14747A1304GE435GFingers22
14768C1325TA442VFingers2121
14786C1343TA448VFingers369
14821C1378TP460S11
14829G1386TM462IFingers5959
14831G1388TC463FFingers33
14857G1414TV472F11
14870A1427GD476GFingers55
14874G1431TK477NFingers11
14912A1469GN490SFingers112
14923G1480AV494IFingers22
14980C1537TL513FFingers112
14990A1547GD516G11
15006G1563CE521DFingers235
15016G1573TA525SFingers33
15026C1583TA528VFingers516
15037C1594TR532CFingers11
15100G1657CA553PFingers11
15101C1658TA553VFingers11
15124A1681GI561VFingers22
15202G1759CV587LPalm77
15211A1768GT590A11
15226G1783AG595SPalm11
15243G1800TM600IPalm71475
15251C1808GT603SPalm11
15257A1814GY605C11
15260G1817AS606NPalm11
15327G1884TM628IFingers314
15328C1885TL629FFingers11
15334A1891GI631VFingers11
15341C1898TA633VFingers11
15352C1909TL637FFingers11
15358C1915TR639CFingers11
15362A1919GK640RFingers11
15364C1921GH641DFingers11
15368C1925TT642IFingers11
15380G1937TS646IFingers11
15386C1943TS648LFingers22
15391C1948TR650CFingers11
15406G1963TA655SFingers33
15407C1964TA655VFingers11
15436A1993GM665VFingers22
15438G1995TM665IFingers2424
15452G2009TG670VFingers2828
15487G2044CG682RPalm11
15497C2054AT685KPalm11
15572A2129GD710GPalm11
15596A2153GY718SPalm22
15619C2176TL726FPalm11
15638G2195AR732KPalm11
15640A2197GN733DPalm11
15640A2197TN733YPalm11
15655A2212GT738APalm22
15656C2213TT738IPalm22
15658G2215AD739NPalm22
15664G2221AV741MPalm11
15715T2272CS758PPalm11
15760G2317AG773SPalm11
15761G2318AG773DPalm11
15827A2384GE795GPalm11
15848C2405TT802IPalm11
15850G2407TD803YPalm11
15853C2410TL804FPalm22
15878G2435TC812FPalm11
15886C2443TH815YPalm11
15906G2463TQ821HThumb112
15908G2465TG822VThumb11
15979A2536GI846VThumb44
16045C2602TL868FThumb11
16084C2641TH881YThumb11
16148A2705GY902CThumb11
16163C2720TT907IThumb4545
16178C2735TS912LThumb22
16192C2749TP917SThumb8080
FIG 3

Location of amino acid replacements in RNA-dependent RNA polymerase (RdRp/Nsp12) among the 5,085 genomes of SARS-CoV-2 sequenced. The various RdRp domains are color-coded. The numbers refer to amino acid site. Note that several amino acid sites have multiple variants identified.

FIG 4

Amino acid changes identified in Nsp12 (RdRp) in this study that may influence interaction with remdesivir. The schematic at the top shows the domain architecture of Nsp12. (Left) Ribbon representation of the crystal structure of Nsp12-remdesivir monophosphate-RNA complex (PDB code: 7BV2). The structure in the right panel shows a magnified view of the boxed area in the left panel. The Nsp12 domains are colored as in the schematic at the top. The catalytic site in Nsp12 is marked by a black circle in the right panel. The side chains of amino acids comprising the catalytic site of RdRp (Ser758, Asp759, and Asp760) are shown as balls and stick and colored yellow. The nucleotide binding site is boxed in the right panel. The side chains of amino acids participating in nucleotide binding (Lys544, Arg552, and Arg554) are shown as balls and sticks and colored light blue. Remdesivir molecule incorporated into the nascent RNA is shown as balls and sticks and colored light pink. The RNA is shown as a blue cartoon and bases are shown as sticks. The positions of Cα atoms of amino acids identified in this study are shown as red and green spheres and labeled. The amino acids that are shown as red spheres are located above the nucleotide binding site, whereas Cys812 located at the catalytic site is shown as a green sphere. The side chain of active site residue Ser758 is shown as ball and sticks and colored yellow. The location of Cα atoms of remdesivir resistance conferring amino acid Val556 is shown as blue sphere and labeled.

Importantly, none of the observed amino acid polymorphisms in RdRp were located precisely at two sites known to cause in vitro resistance to remdesivir (26). Most of the amino acid changes are located distantly from the RNA-binding and catalytic sites (Figure S4 and Table 1). However, replacements at six amino acid residues (Ala442Val, Ala448Val, Ala553Pro/Val, Gly682Arg, Ser758Pro, and Cys812Phe) may potentially interfere with either remdesivir binding or RNA synthesis. Four (Ala442Val, Ala448Val, Ala553Pro/Val, and Gly682Arg) of the six substitution sites are located immediately above the nucleotide-binding site, that is comprised of Lys544, Arg552, and Arg554 residues as shown by structural studies (Figure 4). The positions of these four variant amino acid sites are comparable to Val556 (Figure 4), for which a Val556Leu mutation in SARS-CoV was identified to confer resistance to remdesivir in vitro (26). The other two substitutions (Ser758Pro and Cys812Phe) are inferred to be located either at, or in the immediate proximity of, the catalytic active site, that is comprised of three contiguous residues (Ser758, Asp759, and Asp760). A proline substitution we identified at Ser758 (Ser758Pro) is likely to negatively impact RNA synthesis. Although Cys812 is not directly involved in the catalysis of RNA synthesis, it is only 3.5 Å away from Asp760. The introduction of the bulkier phenylalanine substitution at Cys812 (Cys812Phe) may impair RNA synthesis. Consequently, these two substitutions are expected to detrimentally affect virus replication or fitness.

Analysis of the gene encoding the spike protein.

The densely glycosylated spike protein of SARS-CoV-2 and its close coronavirus relatives binds directly to host-cell angiotensin-converting enzyme 2 (ACE2) receptors to enter host cells (35–37). Thus, the spike protein is a major translational research target, including intensive vaccine and therapeutic antibody (35–64). Analysis of the gene encoding the spike protein identified 470 SNPs, including 285 that produce amino acid changes (Table 2, Figure 5). Forty-nine of these replacements (V11A, T51A, W64C, I119T, E156Q, S205A, D228G, L229W, P230T, N234D, I235T, T274A, A288V, E324Q, E324V, S325P, S349F, S371P, S373P, T385I, A419V, C480F, Y495S, L517F, K528R, Q628E, T632I, S708P, T719I, P728L, S746P, E748K, G757V, V772A, K814R, D843N, S884A, M902I, I909V, E918Q, S982L, M1029I, Q1142K, K1157M, Q1180R, D1199A, C1241F, C1247G, and V1268A) are not represented in a publicly available database (34) as of August 19, 2020. Interestingly, 25 amino acid sites have three distinct variants (that is, the reference amino acid plus two additional variant amino acids), and five amino acid sites (amino acid positions 21, 27, 228, 936, and 1050) have four distinct variants represented in our sample of 5,085 genomes (Table 2, Figure 5).
Table 2.

Nonsynonymous SNPs in SARS-CoV-2 spike protein.

Genomic LocusGene LocusAmino Acid ChangeDomainWave 1 (n=1026)Wave 2 (n=4059)Total (n=5085)
21575C13TL5FS1112536
21578G16TV6FS111
21587C25TP9SS122
21588C26TP9LS1112
21594T32CV11AS111
21597C35TS12FS166
21604G42TQ14HS111
21614C52TL18FS1 - NTD11112
21618C56TT19IS1 - NTD112
21621C59TT20IS1 - NTD11
21624G62TR21IS1 - NTD66
21624G62AR21KS1 - NTD11
21624G62CR21TS1 - NTD33
21627C65TT22IS1 - NTD246
21638C76TP26SS1 - NTD1717
21641G79TA27SS1 - NTD112
21641G79AA27TS1 - NTD11
21642C80TA27VS1 - NTD11
21648C86TT29IS1 - NTD145
21707C145TH49YS1 - NTD142142
21713A151GT51AS1 - NTD11
21724G162TL54FS1 - NTD1111
21754G192TW64CS1 - NTD11
21767C205TH69YS1 - NTD178
21770G208AV70IS1 - NTD11
21770G208TV70FS1 - NTD11
21774C212TS71FS1 - NTD11
21784T222AN74KS1 - NTD11
21785G223CG75RS1 - NTD11
21793G231TK77NS1 - NTD11
21824G262AD88NS1 - NTD11
21834A272TY91FS1 - NTD11
21846C284TT95IS1 - NTD11011
21852A290GK97RS1 - NTD11
21855C293TS98FS1 - NTD123
21861T299CI100TS1 - NTD22
21918T356CI119TS1 - NTD11
21930C368TA123VS1 - NTD11
21941G379TV127FS1 - NTD11
21942T380CV127AS1 - NTD44
21974G412TD138YS1 - NTD22
21985G423TL141FS1 - NTD11
21986G424AG142SS1 - NTD22
21993A431GY144CS1 - NTD11
21995T433CY145HS1 - NTD22
21998C436TH146YS1 - NTD123
22014G452AS151NS1 - NTD11
22014G452TS151IS1 - NTD22
22017G455TW152LS1 - NTD112
22021G459TM153IS1 - NTD11
22021G459AM153IS1 - NTD11
22022G460AE154KS1 - NTD11
22028G466CE156QS1 - NTD22
22037G475AV159IS1 - NTD11
22097C535TL179FS1 - NTD11
22104G542TG181VS1 - NTD11
22107A545GK182RS1 - NTD11
22135A573TE191DS1 - NTD11
22139G577TV193LS1 - NTD11
22150T588GN196KS1 - NTD11
22175T613GS205AS1 - NTD11
22205G643TD215YS1 - NTD11
22206A644GD215GS1 - NTD22
22214C652GQ218ES1 - NTD11
22227C665TA222VS1 - NTD11
22241G679AV227IS1 - NTD22
22242T680CV227AS1 - NTD11
22244G682CD228HS1 - NTD22
22245A683GD228GS1 - NTD11
22246T684GD228ES1 - NTD22
22248T686GL229WS1 - NTD11
22250C688AP230TS1 - NTD11
22253A691GI231VS1 - NTD11
22254T692CI231TS1 - NTD11
22259A697GI233VS1 - NTD11
22260T698CI233TS1 - NTD11
22262A700GN234DS1 - NTD11
22266T704CI235TS1 - NTD11
22281C719TT240IS1 - NTD55
22286C724TL242FS1 - NTD11
22295C733TH245YS1 - NTD22
22304T742CY248HS1 - NTD33
22311C749TT250IS1 - NTD145
22313C751TP251SS1 - NTD22
22320A758GD253GS1 - NTD22
22320A758CD253AS1 - NTD11
22323C761TS254FS1 - NTD33
22329C767TS256LS1 - NTD11
22335G773TW258LS1 - NTD11
22344G782TG261VS1 - NTD33
22346G784TA262SS1 - NTD44
22350C788TA263VS1 - NTD11
22382A820GT274AS1 - NTD11
22398A836TY279FS1 - NTD11
22408T846GN282KS1 - NTD11
22425C863TA288VS1 - NTD11
22430G868TD290YS1 - NTD11
22484G922TV308LS133
22487G925CE309QS111
22532G970CE324QS111
22533A971TE324VS111
22535T973CS325PS111
22536C974TS325FS111
22550C988TP330SS1 - RBD22
22574T1012CF338LS1 - RBD11
22608C1046TS349FS1 - RBD11
22616G1054TA352SS1 - RBD77
22661G1099TV367FS1 - RBD11
22673T1111CS371PS1 - RBD33
22679T1117CS373PS1 - RBD11
22712C1150TP384SS1 - RBD11
22716C1154TT385IS1 - RBD33
22785G1223CR408TS1 - RBD11
22793G1231TA411SS1 - RBD11
22818C1256TA419VS1 - RBD11
22895G1333TV445FS1 - RBD11
22899G1337TG446VS1 - RBD22
22928T1366CF456LS1 - RBD11
23001G1439TC480FS1 - RBD11
23012G1450CE484QS1 - RBD11
23046A1484CY495SS1 - RBD11
23111C1549TL517FS1 - RBD11
23120G1558TA520SS1 - RBD167
23121C1559TA520VS1 - RBD11
23127C1565TA522VS1 - RBD112
23145A1583GK528RS1 - RBD22
23149G1587TK529NS111
23170C1608AN536KS111
23202C1640AT547KS122
23202C1640TT547IS111
23223A1661TE554VS122
23224G1662TE554DS143135
23270G1708TA570SS133
23277C1715TT572IS15510
23282G1720TD574YS111
23292G1730TR577LS111
23311G1749TE583DS166
23312A1750GI584VS111
23315C1753TL585FS1178
23349G1787AS596NS111
23373C1811TT604IS122
23380C1818AN606KS122
23403A1841GD614GS184140544895
23426G1864TV622FS122
23426G1864CV622LS122
23435C1873TH625YS111
23439C1877TA626VS111
23444C1882GQ628ES177
23453C1891TP631SS111
23457C1895TT632IS111
23481C1919TS640FS114243
23486G1924TV642FS111
23502C1940TA647VS111
23536C1974AN658KS144
23564G2002TA668SS111
23586A2024GQ675RS11414
23587G2025CQ675HS111
23587G2025TQ675HS144
23589C2027TT676IS1123
23593G2031TQ677HS1112
23595C2033TT678IS111
23624G2062TA688SS244
23625C2063TA688VS21616
23655C2093TS698LS211
23664C2102TA701VS22121
23670A2108GN703SS211
23679C2117TA706VS211
23684T2122CS708PS211
23709C2147TT716IS211
23718C2156TT719IS211
23745C2183TP728LS211
23755G2193TM731IS2314
23798T2236CS746PS211
23802C2240TT747IS211
23804G2242AE748KS211
23832G2270TG757VS211
23856G2294TR765LS211
23868G2306TG769VS233
23873G2311TA771SS288
23877T2315CV772AS211
23895C2333TT778IS211
23900G2338CE780QS211
23936C2374TP792SS211
23948G2386TD796YS222
23955G2393TG798VS211
23987C2425TP809SS222
23988C2426TP809LS211
23997C2435TP812LS211
24003A2441GK814RS211
24014A2452GI818VS2 - FP55
24026C2464TL822FS2 - FP9797
24041A2479TT827SS2 - FP44
24077G2515TD839YS222
24089G2527AD843NS2112
24095G2533TA845SS255
24099C2537TA846VS211
24129A2567GN856SS277
24138C2576TT859IS255
24141T2579CV860AS211
24170A2608GI870VS233
24188G2626TA876SS211
24197G2635TA879SS23131
24198C2636TA879VS211
24212T2650GS884AS21111
24237C2675TA892VS211
24240C2678TA893VS211
24268G2706TM902IS211
24287A2725GI909VS2 - HR122
24314G2752CE918QS2 - HR111
24328G2766CL922FS2 - HR122
24348G2786TS929IS2 - HR111
24356G2794TG932CS2 - HR111
24357G2795TG932VS2 - HR111
24368G2806AD936NS2 - HR133
24368G2806CD936HS2 - HR111
24368G2806TD936YS2 - HR1347
24374C2812TL938FS2 - HR133
24378C2816TS939FS2 - HR144
24380T2818GS940AS2 - HR155
24389A2827GS943GS2 - HR166
24463C2901AS967RS2 - HR122
24507C2945TS982LS2 - HR111
24579C3017TT1006IS2 - CH11
24588C3026GT1009SS2 - CH11
24621C3059TA1020VS2 - CH11
24638G3076TA1026SS2 - CH22
24642C3080TT1027IS2 - CH55
24649G3087TM1029IS2 - CH11
24710A3148TM1050LS211
24710A3148GM1050VS2112
24712G3150TM1050IS222
24718C3156AF1052LS21166167
24770G3208TA1070SS222
24794G3232TA1078SS2 - CD325
24812G3250TD1084YS2 - CD12930
24834G3272TR1091LS2 - CD11
24867G3305TW1102LS2 - CD11
24872G3310TV1104LS2 - CD11
24893G3331CE1111QS2 - CD22
24897C3335TP1112LS2 - CD224
24912C3350TT1117IS2 - CD11
24923T3361CF1121LS2 - CD22
24933G3371TG1124VS2 - CD123
24959G3397TV1133FS2 - CD11
24977G3415TD1139YS2 - CD11
24986C3424AQ1142KS211
24998G3436TD1146YS244
24998G3436CD1146HS21313
25019G3457TD1153YS21111
25032A3470TK1157MS211
25046C3484TP1162SS255
25047C3485TP1162LS233
25050A3488TD1163VS222
25088G3526TV1176FS21818
25101A3539GQ1180RS211
25104A3542GK1181RS244
25116G3554AR1185HS222
25121A3559TN1187YS211
25135G3573TK1191NS211
25137A3575CN1192TS211
25158A3596CD1199AS211
25160C3598TL1200FS211
25163C3601AQ1201KS211
25169C3607TL1203FS211
25183G3621TE1207DS211
25186G3624TQ1208HS211
25217G3655TG1219CS2134
25234G3672TL1224FS211
25241A3679GI1227VS211
25244G3682TV1228LS222
25249G3687TM1229IS211
25249G3687CM1229IS222
25250G3688AV1230MS211
25266G3704TC1235FS244
25273G3711TM1237IS222
25284G3722TC1241FS211
25287G3725TS1242IS244
25297G3735TK1245NS211
25301T3739GC1247GS211
25302G3740TC1247FS244
25305G3743TC1248FS222
25317C3755TS1252FS211
25340G3778TD1260YS222
25350C3788TP1263LS2123
25352G3790TV1264LS211
25365T3803CV1268AS211

The domain region of RBD is based on structural information found in Cai et al. 2020 (98).

Forty-nine of these amino acid replacements (V11A, T51A, W64C, I119T, E156Q, S205A, D228G, L229W, P230T, N234D, I235T, T274A, A288V, E324Q, E324V, S325P, S349F, S371P, S373P, T385I, A419V, C480F, Y495S, L517F, K528R, Q628E, T632I, S708P, T719I, P728L, S746P, E748K, G757V, V772A, K814R, D843N, S884A, M902I, I909V, E918Q, S982L, M1029I, Q1142K, K1157M, Q1180R, D1199A, C1241F, C1247G, and V1268A) were not represented in a publicly available database (34) as of August 19, 2020.

FIG 5

Location of amino acid replacements in spike protein among the 5,085 genomes of SARS-CoV-2 sequenced. The various spike protein domains are color-coded. The numbers refer to amino acid site. Note that many amino acid sites have multiple variants identified.

We mapped the location of amino acid replacements onto a model of the full-length spike protein (35, 65) and observed that the substitutions are found in each subunit and domain of the spike (Figure 6). However, the distribution of amino acid changes is not uniform throughout the protein regions. For example, compared to some other regions of the spike protein, the RBD has relatively few amino acid changes, and the frequency of strains with these substitutions is low, each occurring in fewer than 10 isolates. This finding is consistent with the functional constraints on RBD to mediate interaction with ACE2. In contrast, the periphery of the S1 subunit NTD contains a dense cluster of substituted residues, with some single amino acid replacements found in 10–20 isolates (Table 2, Figure 5, Figure 6). Clustering of amino acid changes in a distinct region of the spike protein may be a signal of positive selection. Inasmuch as infected patients make antibodies against the NTD, we favor the idea that host immune selection is one force contributing to some of the amino acid variation in this region. One NTD substitution, H49Y, was found in 142 isolates. This position is not well exposed on the surface of the NTD and is likely not a result of immune pressure. The same is true for another highly represented substitution, F1052L. This substitution was observed in 167 isolates, and F1052 is buried within the core of the S2 subunit. The substitution observed most frequently in the spike protein in our sample is D614G, a change observed in 4,895 of the isolates. As noted above, strains with the Gly614 variant significantly increased in wave 2 compared to wave 1.
FIG 6

Location of amino acid substitutions mapped on the SARS-CoV-2 spike protein. Model of the SARS-CoV-2 spike protein with one protomer shown as ribbons and the other two protomers shown as a molecular surface. The Cα atom of residues found to be substituted in one or more virus isolates identified in this study is shown as a sphere on the ribbon representation. Residues found to be substituted in 1–9 isolates are colored tan, 10–99 isolates yellow, 100–999 isolates colored red (H49Y and F1052L), and >1000 isolates purple (D614G). The surface of the aminoterminal domain (NTD) that is distal to the trimeric axis has a high density of substituted residues. RBD, receptor binding domain.

As observed with RdRp, the majority of strains with each single amino acid change in the spike protein were found on a distinct phylogenetic lineage (Figure S5), indicating identity by descent. A prominent exception is the Leu5Phe replacement that is present in all major clades, suggesting that this amino acid change arose multiple times independently or very early in the course of SARS-CoV-2 evolution. Finally, we note that examination of the phylogenetic distribution of strains with multiple distinct amino acid replacements at the same site (e.g., Arg21Ile/Lys/Thr, Ala27Ser/Thr/Val, etc.) revealed that they were commonly found in different genetic branches, consistent with independent origin (Figure S5).

Cycle threshold (Ct) comparison of SARS-CoV-2 strains with either the Asp614 or Gly614 amino acid replacements in spike protein.

It has been reported that patients infected with strains having spike protein Gly614 variant have, on average, higher virus loads on initial diagnosis (66–70). To determine if this is the case in Houston strains, we examined the cycle threshold (Ct) for every sequenced strain that was detected from a patient specimen using the SARS-CoV-2 Assay done by the Hologic Panther instrument. We identified a significant difference (p<0.0001) between the mean Ct value for strains with an Asp614 (n=102) or Gly614 (n=812) variant of the spike protein (Figure 7). Strains with Gly614 had a Ct value significantly lower than strains with the Asp614 variant, indicating that patients infected with the Gly614 strains had, on average, higher virus loads on initial diagnosis than patients infected by strains with the Asp614 variant (Figure 7). This observation is consistent with the conjecture that, on average, strains with the Gly614 variant are better able to disseminate.
FIG 7

Cycle threshold (Ct) for every SARS-CoV-2 patient sample tested using the Hologic Panther assay. Data are presented as mean +/− standard error of the mean for strains with an aspartate (D614, n=102 strains, blue) or glycine (G614, n=812 strains, red) at amino acid 614 of the spike protein. Mann-Whitney test, *P<0.0001.

Characterization of recombinant proteins with single amino acid replacements in the receptor binding domain region of spike protein.

The RBD of spike protein binds the ACE2 surface receptor and is also targeted by neutralizing (36, 37, 41, 43–46, 48–62, 71). Thus, single amino acid replacements in this domain may have functional consequences that enhance virus fitness. To begin to test this idea, we expressed spike variants with the Asp614Gly replacement and 13 clinical RBD variants identified in our genome sequencing studies (Figure 8, Table S4A, B). All RBD variants were cloned into an engineered spike protein construct that stabilizes the perfusion state and increases overall expression yield (spike-6P, here referred to as spike) (64).
FIG 8

Biochemical characterization of spike RBD variants. (A) Size-exclusion chromatography (SEC) traces of the indicated spike-RBD variants. Dashed line indicates the elution peak of spike-6P. (B) The relative expression of all RBD variants as determined by the area under the SEC traces. All expression levels are normalized relative to spike-6P. (C) Thermostability analysis of RBD variants by differential scanning fluorimetry. Each sample had three replicates and only mean values were plotted. Black vertical dashed line indicates the first melting temperature of 6P-D614G and orange vertical dashed line indicates the first melting temperature of the least stable variant (spike-G446V). (D) First apparent melting temperature of all RBD variants. (E) ELISA-based binding affinities for ACE2 and (F) the neutralizing antibody CR3022 to the indicated RBD variants. (G) Summary of EC50s for all measured RBD variants.

We first assessed the biophysical properties of spike-Asp614Gly, an amino acid polymorphism that is common globally and increased significantly in our wave 2 strain isolates. Pseudotyped viruses expressing spike-Gly614 have higher infectivity for host cells in vitro than spike-Asp614 (66, 67, 69, 72, 73). The higher infectivity of spike-Gly614 is correlated with increased stability and incorporation of the spike protein into the pseudovirion (73). We observed a higher expression level (Figure 8A, B) and increased thermostability for the spike protein construct containing this variant (Figure 8C, D). The size exclusion chromatography (SEC) elution profile of spike-Asp614 was indistinguishable from spike-Gly614, consistent with a trimeric conformation (Figure 8A). These results are broadly consistent with higher-resolution structural analyses of both spike variants. Next, we purified and biophysically characterized 13 RBD mutants that each contain Gly614 and one additional single amino acid replacement we identified by genome sequencing our clinical samples (Table S4C). All variants eluted as trimers, indicating the global structure, remained intact (Figure 8 and Figure S6). However, several variants had reduced expression levels and virtually all had decreased thermostability relative to the variant that had only a D614G single amino acid replacement (Figure 8D). The A419V and A522V mutations were especially deleterious, reducing yield and precluding further downstream analysis (Figure 8B). We next assayed the affinity of the 11 highest-expressing spike variants for ACE2 and the neutralizing monoclonal antibody CR3022 via enzyme-linked immunosorbent assays (ELISAs) (Figure 8E–G and Table S4C). Most variants retained high affinity for the ACE2 surface receptor. However, importantly, three RBD variants (F338L, S373P, and R408T) had substantially reduced affinity for CR3022, a monoclonal antibody that disrupts the spike protein homotrimerization interface (63, 74). Notably, the S373P mutation is one amino acid away from the epitope recognized by CR3022. These results are consistent with the interpretation that some RBD mutants arising in COVID-19 patients may have increased ability to escape humoral immune pressure, but otherwise retain strong ACE2 binding affinity.

DISCUSSION

In this work we analyzed the molecular population genomics, sociodemographic, and medical features of two waves of COVID-19 disease occurring in metropolitan Houston, Texas, between early March and early July 2020. We also studied the biophysical and immunologic properties of some naturally occurring single amino acid changes in the spike protein RBD identified by sequencing the 5,085 genomes. We discovered that the first COVID-19 wave was caused by a heterogenous array of virus genotypes assigned to several different clades. The majority of cases in the first wave are related to strains that caused widespread disease in European and Asian countries, as well as other localities. We conclude that the SARS-CoV-2 virus was introduced into Houston many times independently, likely by individuals who had traveled to or from different parts of the world, including other communities in the United States. In support of this conclusion, the first cases in metropolitan Houston were associated with a travel history to a known COVID-19 region (16). The data are consistent with the fact that Houston is a large international city characterized by a multi-ethnic population and is a prominent transport hub with direct flights to major cities globally. The second wave of COVID-19 cases also is characterized by SARS-CoV-2 strains with diverse genotypes. Virtually all cases in the second and ongoing disease wave were caused by strains with the Gly614 variant of spike protein (Figure 1B). Our data unambiguously demonstrate that strains with the Gly614 variant increased significantly in frequency in wave 2 relative to wave 1 in the Houston metropolitan region. This shift occurred very rapidly in a matter of just a few months. Amino acid residue Asp614 is located in subdomain 2 (SD-2) of the spike protein and forms a hydrogen bond and electrostatic interaction with two residues in the S2 subunit of a neighboring protomer. Replacement of aspartate with glycine would eliminate both interactions, thereby substantively weakening the contact between the S1 and S2 subunits. We previously speculated (75) that this weakening produces a more fusogenic spike protein, as S1 must first dissociate from S2 before S2 can refold and mediate fusion of virus and cell membranes. Stated another way, virus strains with the Gly614 variant may be better able to enter host cells, potentially resulting in enhanced spread. Consistent with this idea, Korber et al. (66) showed that the Gly614 variant grows to higher titer as pseudotyped virions. On initial diagnosis infected individuals had lower RT-PCR cycle thresholds suggesting higher upper respiratory tract viral loads. Our data (Figure 7) are fully consistent with that finding Zhang et al. (73) reported that pseudovirus with the 614Gly variant infected ACE2-expressing cells more efficiently than the 614Asp. Similar results have been described by Hu et al. (67) and Lorenzo-Redondo et al. (68). Plante et al. (76) recently studied isogenic mutant SARS-CoV-2 strains with either the 614Asp or 614Gly variant and found that the 614Gly variant virus had significantly increased replication in human lung epithelial cells in vitro and increased infectious titers in nasal and trachea washes obtained from experimentally infected hamsters. These results are consistent with the idea that the 614Gly variant bestows increased virus fitness in the upper respiratory tract (76). Additional work is needed to investigate the potential biomedical relevance and public health importance of the Asp614Gly polymorphism, including but not limited to virus dissemination, overall fitness, impact on clinical course and virulence, and development of vaccines and therapeutics. Although it is possible that stochastic processes alone may account for the rapid increase in COVID-19 disease frequency caused by viruses containing the Gly614 variant, we do not favor that interpretation in part because of the cumulative weight of the epidemiologic, human RT-PCR diagnostics data, in vitro experimental findings, and animal infection studies using isogenic mutant virus strains. In addition, if stochastic processes solely are responsible, we believe it is difficult to explain essentially simultaneous increase in frequency of the Gly614 variant in genetically diverse viruses in three distinct clades (G, GH, and GR) in a geographically large metropolitan area with 7 million ethnically diverse people. Regardless, more research on this important topic is warranted. The diversity present in our 1,026 virus genomes from the first disease wave contrasts somewhat with data reported by Gonzalez-Reiche et al., who studied 84 SARS-CoV-2 isolates causing disease in patients in the New York City region (11). Those investigators concluded that the vast majority of disease was caused by progeny of strains imported from Europe. Similarly, Bedford et al. (10) reported that much of the COVID-19 disease in the Seattle, Washington area was caused by strains that are progeny of a virus strain recently introduced from China. Some aspects of our findings are similar to those reported recently by Lemieux et al. based on analysis of strains causing disease in the Boston area (81). Our findings, like theirs, highlight the importance of multiple importation events of genetically diverse strains in the epidemiology of COVID-19 disease in this pandemic. Similarly, Icelandic and Brazilian investigators documented that SARS-CoV-2 was imported by individuals traveling to or from many European and other countries (82, 83). The virus genome diversity and large sample size in our study permitted us to test the hypothesis that distinct virus clades were nonrandomly associated with hospitalized COVID-19 patients or disease severity. We did not find evidence to support this hypothesis, but our continuing study of COVID-19 cases accruing in the second wave will further improve statistical stratification. We used machine learning classifiers to identify if any SNPs contribute to increased infection severity or otherwise affect virus-host outcome. The models could not be trained to accurately predict these outcomes from the available virus genome sequence data. This may be due to sample size or class imbalance. However, we do not favor this interpretation. Rather, we think that the inability to identify particular virus SNPs predictive of disease severity or infection outcome likely reflects the substantial heterogeneity in underlying medical conditions and treatment regimens among COVID-19 patients studied herein. An alternative but not mutually exclusive hypothesis is that patient genotypes play an important role in determining virus-human interactions and resulting pathology. Although some evidence has been presented in support of this idea (84, 85), available data suggest that in the aggregate, host genetics does not play an overwhelming role in determining outcome in the great majority of adult patients, once virus infection is established. Remdesivir is a nucleoside analog reported to have activity against MERS-CoV, a coronavirus related to SARS-CoV-2. Recently, several studies have reported that remdesivir shows promise in treating COVID-19 patients (29–33), leading the FDA to issue an emergency use authorization. Because in vitro resistance of SARS-CoV to remdesivir has been reported to be caused by either of two amino acid replacements in RdRp (Phe479Leu or Val556Leu), we interrogated our data for polymorphisms in the nsp12 gene. Although we identified 140 different inferred amino acid replacements in RdRp in the 5,085 genomes analyzed, none of these were located precisely at the two positions associated with in vitro resistance to remdesivir. Inasmuch as remdesivir is now being deployed widely to treat COVID-19 patients in Houston and elsewhere, our findings suggest that the majority of SARS-CoV-2 strains currently circulating in our region should be susceptible to this drug. The amino acid replacements Ala442Val, Ala448Val, Ala553Pro/Val, and Gly682Arg that we identified occur at sites that, intriguingly, are located directly above the nucleotide substrate entry channel and nucleotide binding residues Lys544, Arg552, and Arg554 (22, 23) (Figure 4). One possibility is that substitution of the smaller alanine or glycine residues with the bulkier side chains of Val/Pro/Arg may impose structural constraints for the modified nucleotide analog to bind, and thereby disfavor remdesivir binding. This, in turn, may lead to reduced incorporation of remdesivir into the nascent RNA, increased fidelity of RNA synthesis, and ultimately drug resistance. A similar mechanism has been proposed for a Val556Leu change (23). We also identified one strain with a Lys477Asn replacement in RdRp. This substitution is located close to a Phe479Leu replacement reported to produce partial resistance to remdesivir in vitro in SARS-CoV patients from 2004, although the amino acid positions are numbered differently in SARS-CoV and SARS-CoV-2. Structural studies have suggested that this amino acid is surface-exposed, and distant from known key functional elements. Our observed Lys477Asn change is also located in a conserved motif described as a finger domain of RdRp (Figure 3 and 4). One speculative possibility is that Lys477 is involved in binding a yet unidentified cofactor such as Nsp7 or Nsp8, an interaction that could modify nucleotide binding and/or fidelity at a distance. These data warrant additional study in larger patient cohorts, especially in individuals treated with remdesivir. Analysis of the gene encoding the spike protein identified 285 polymorphic amino acid sites relative to the reference genome, including 49 inferred amino acid replacements not present in available databases as of August 19, 2020. Importantly, 30 amino acid sites in the spike protein had two or three distinct replacements relative to the reference strain. The occurrence of multiple variants at the same amino acid site is one characteristic that may suggest functional consequences. These data, coupled with structural information available for spike protein, raise the possibility that some of the amino acid variants have functional consequences, for example including altered serologic reactivity and shown here. These data permit generation of many biomedically relevant hypotheses now under study. A recent study reported that RBD amino acid changes could be selected in vitro using a pseudovirus neutralization assay and sera obtained from convalescent plasma or monoclonal antibodies (86). The amino acid sites included positions V445 and E484 in the RBD. Important to note, variants G446V and E484Q were present in our patient samples. However, these mutations retain high affinity to CR3022 (Figure 8F, G). The high-resolution structure of the RBD/CR3022 complex shows that CR3022 makes contacts to residues 369–386, 380–392, and 427–430 of RBD (74). Although there is no overlap between CR3022 and ACE2 epitopes, CR3022 is able to neutralize the virus through an allosteric effect. We found that the Ser373Pro change, which is located within the CR3022 epitope, has reduced affinity to CR3022 (Figure 8F, G). The F338L and R408T mutations, although not found directly within the interacting epitope, also display reduced binding to CR3022. Other investigators (86) using in vitro antibody selection identified a change at amino acid site S151 in the N-terminal domain, and we found mutations S151N and S151I in our patient samples. We also note that two variant amino acids (Gly446Val and Phe456Leu) we identified are located in a linear epitope found to be critical for a neutralizing monoclonal antibody described recently by Li et al. (87). In the aggregate, these findings suggest that mutations emerging within the spike protein at positions within and proximal to known neutralization epitopes may result in escape from antibodies and other therapeutics currently under development. Importantly, our study did not reveal that these mutant strains had disproportionately increased over time. The findings may also bear on the occurrence of multiple amino acid substitutions at the same amino acid site that we identified in this study, commonly a signal of selection. In the aggregate, the data support a multifaceted approach to serological monitoring and biologics development, including the use of monoclonal antibody cocktails (46, 47, 88).

CONCLUDING STATEMENT

Our work represents analysis of the largest sample to date of SARS-CoV-2 genome sequences from patients in one metropolitan region in the United States. The investigation was facilitated by the fact that we had rapidly assessed a SARS-CoV-2 molecular diagnostic test in January 2020, more than a month before the first COVID-19 patient was diagnosed in Houston. In addition, our large healthcare system has seven hospitals and many facilities (e.g., outpatient care centers, emergency departments) located in geographically diverse areas of the city. We also provide reference laboratory services for other healthcare entities in the Houston area. Together, our facilities serve patients of diverse ethnicities and socioeconomic status. Thus, the data presented here likely reflect a broad overview of virus diversity causing COVID-19 infections throughout metropolitan Houston. We previously exploited these features to study influenza and Klebsiella pneumoniae dissemination in metropolitan Houston (89, 90). We acknowledge that every “twig” of the SARS-CoV-2 evolutionary tree in Houston is not represented in these data. The samples studied are not comprehensive for the entire metropolitan region. For example, it is possible that our strain samples are not fully representative of individuals who are indigent, homeless, or of very low socioeconomic groups. In addition, although the strain sample size is relatively large compared to other studies, the sample represents only about 10% of all COVID-19 cases in metropolitan Houston documented in the study period. In addition, some patient samples contain relatively small amounts of virus nucleic acid and do not yield adequate sequence data for high-quality genome analysis. Thus, our data likely underestimate the extent of genome diversity present among SARS-CoV-2 causing COVID-19 and will not identify all amino acid replacements in the virus in this geographic region. It will be important to sequence and analyze the genomes of additional SARS-CoV-2 strains causing COVID-19 cases in the ongoing second massive disease wave in metropolitan Houston, and these studies are underway. Data of this type will be especially important to have if a third and subsequent waves were to occur in metropolitan Houston, as it could provide insight into molecular and epidemiologic events contributing to them. The genomes reported here are an important data resource that will underpin our ongoing study of SARS-CoV-2 molecular evolution, dissemination, and medical features of COVID-19 in Houston. As of August 19, 2020, there were 135,866 reported cases of COVID-19 in metropolitan Houston, and the number of cases is increasing daily. Although the full array of factors contributing to the massive second wave in Houston is not known, it is possible that the potential for increased transmissibility of SARS-CoV-2 with the Gly614 may have played a role, as well as changes in behavior associated with the Memorial Day and July 4th holidays, and relaxation of some of the social constraints imposed during the first wave. The availability of extensive virus genome data dating from the earliest reported cases of COVID-19 in metropolitan Houston, coupled with the database we have now constructed, may provide critical insights into the origin of new infection spikes and waves occurring as public health constraints are further relaxed, schools and colleges re-open, holidays occur, commercial air travel increases, and individuals change their behavior because of COVID-19fatigue.” The genome data will also be useful in assessing ongoing molecular evolution in spike and other proteins as baseline herd immunity is generated, either by natural exposure to SARS-CoV-2 or by vaccination. The signal of potential selection contributing to some spike protein diversity and identification of naturally occurring mutant RBD variants with altered serologic recognition warrant close attention and expanded study.

MATERIALS AND METHODS

Patient specimens.

All specimens were obtained from individuals who were registered patients at Houston Methodist hospitals, associated facilities (e.g., urgent care centers), or institutions in the greater Houston metropolitan region that use our laboratory services. Virtually all individuals met the criteria specified by the Centers for Disease Control and Prevention to be classified as a person under investigation.

SARS-CoV-2 molecular diagnostic testing.

Specimens obtained from symptomatic patients with a high degree of suspicion for COVID-19 disease were tested in the Molecular Diagnostics Laboratory at Houston Methodist Hospital using an assay granted Emergency Use Authorization (EUA) from the FDA (https://www.fda.gov/medical-devices/emergency-situations-medical-devices/faqs-diagnostic-testing-sars-cov-2#offeringtests). Multiple testing platforms were used, including an assay that follows the protocol published by the WHO (https://www.who.int/docs/default-source/coronaviruse/protocol-v2-1.pdf) using the EZ1 virus extraction kit and EZ1 Advanced XL instrument or QIASymphony DSP Virus kit and QIASymphony instrument for nucleic acid extraction and ABI 7500 Fast Dx instrument with 7500 SDS software for reverse transcription RT-PCR, the COVID-19 test using BioFire Film Array 2.0 instruments, the Xpert Xpress SARS-CoV-2 test using Cepheid GeneXpert Infinity or Cepheid GeneXpert Xpress IV instruments, the SARS-CoV-2 Assay using the Hologic Panther instrument, and the Aptima SARS-CoV-2 Assay using the Hologic Panther Fusion system. All assays were performed according to the manufacturer’s instructions. Testing was performed on material obtained from nasopharyngeal or oropharyngeal swabs immersed in universal transport media (UTM), bronchoalveolar lavage fluid, or sputum treated with dithiothreitol (DTT). To standardize specimen collection, an instructional video was created for Houston Methodist healthcare workers (https://vimeo.com/396996468/2228335d56).

Epidemiologic curve.

The number of confirmed COVID-19 positive cases was obtained from USAFacts.org (https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/) for Austin, Brazoria, Chambers, Fort Bend, Galveston, Harris, Liberty, Montgomery, and Waller counties. Positive cases for Houston Methodist Hospital patients were obtained from our Laboratory Information System and plotted using the documented collection time.

SARS-CoV-2 genome sequencing.

Libraries for whole virus genome sequencing were prepared according to version 1 or version 3 of the ARTIC nCoV-2019 sequencing protocol (https://artic.network/ncov-2019). Long reads were generated with the LSK-109 sequencing kit, 24 native barcodes (NBD104 and NBD114 kits), and a GridION instrument (Oxford Nanopore). Short reads were generated with the NexteraXT kit and NextSeq 550 instrument (Illumina).

SARS-CoV-2 genome sequence analysis.

Consensus virus genome sequences from the Houston area isolates were generated using the ARTIC nCoV-2019 bioinformatics pipeline. Publicly available genomes and metadata were acquired through GISAID on August 19, 2020. GISAID sequences containing greater than 1% N characters, and Houston sequences with greater than 5% N characters were removed from consideration. Identical GISAID sequences originating from the same geographic location with the same collection date were also removed from consideration to reduce redundancy. Nucleotide sequence alignments for the combined Houston and GISAID strains were generated using MAFFT version 7.130b with default parameters (91). Sequences were manually curated in JalView (92) to trim the ends and to remove sequences containing spurious inserts. Phylogenetic trees were generated using FastTree with the generalized time-reversible model for nucleotide sequences (93). CLC Genomics Workbench (QIAGEN) was used to generate the phylogenetic tree figures.

Geospatial mapping.

The home address zip code for all SARS-CoV-2 positive patients was used to generate the geospatial maps. To examine geographic relatedness among genetically similar isolates, geospatial maps were filtered to isolates containing specific amino acid changes.

Time series.

Geospatial data were filtered into wave 1 (3/5/2020–5/11/2020) and wave 2 (5/12/2020–7/7/2020) time intervals to illustrate the spread of confirmed SARS-CoV-2 positive patients identified over time.

Machine learning.

Virus genome alignments and patient metadata were used to build models to predict patient metadata and outcomes using both classification models and regression. Metadata considered for prediction in the classification models included age, ABO and Rh blood type, ethnic group, ethnicity, sex, ICU admission, IMU admission, supplemental oxygen use, and ventilator use. Metadata considered for prediction in regression analysis included ICU length of stay, IMU length of stay, total length of stay, supplemental oxygen use, and ventilator use. Because sex, blood type, Rh factor, age, age decade, ethnicity, and ethnic group are features in the patient features and combined feature sets, models were not trained for these labels using patient and combined feature sets. Additionally, age, length of stay, IMU length of stay, ICU length of stay, mechanical ventilation days, and supplemental oxygen days were treated as regression problems and XGBoost regressors were built while the rest were treated as classification problems and XGBoost classifiers were built. Three types of features were considered for training the XGBoost classifiers: alignment features, patient features, and the combination of alignment and patient features. Alignment features were generated from the consensus genome alignment such that columns containing ambiguous nucleotide bases were removed to ensure the models did not learn patterns from areas of low coverage. These alignments were then one-hot encoded to form the alignment features. Patient metadata values were one-hot encoded with the exception of age, which remained as a raw integer value, to create the patient features. These metadata values consisted of age, ABO, Rh blood type, ethnic group, ethnicity, and sex. All three types of feature sets were used to train models that predict ICU length of stay, IMU length of stay, overall length of stay, days of supplemental oxygen therapy, and days of ventilator usage while only alignment features were used to train models that predict age, ABO, Rh blood type, ethnic group, ethnicity, and sex. A ten-fold cross validation was used to train XGBoost models (94) as described previously (95, 96). Depths of 4, 8, 16, 32, and 64 were used to tune the models, but accuracies plateaued after a depth of 16. SciKit-Learn’s (97) classification report and r2 score were then used to access overall accuracy of the classification and regression models, respectively.

Patient metadata correlations.

We encoded values into multiple columns for each metadata field for patients if metadata was available. For example, the ABO column was divided into four columns for A, B, AB, and O blood type. Those columns were encoded with a 1 for the patientsABO type, with all other columns encoded with 0. This was repeated for all non-outcome metadata fields. Age, however, was not re-encoded, as the raw integer values were used. Each column was then correlated to the various outcome values for each patient (deceased, ICU length, IMU length, length of stay, supplemental oxygen length, and ventilator length) to obtain a Pearson coefficient correlation value for each metadata label and outcome.

Analysis of the nsp12 polymerase and S protein genes.

The nsp12 virus polymerase and S protein genes were analyzed by plotting SNP density in the consensus alignment using Python (Python v3.4.3, Biopython Package v1.72). The frequency of SNPs in the Houston isolates was assessed, along with amino acid changes for nonsynonymous SNPs.

Cycle threshold (Ct) comparison of SARS-CoV-2 strains with either Asp614 or Gly614 amino acid replacements in the spike protein.

The cycle threshold (Ct) for every sequenced strain that was detected from a patient specimen using the SARS-CoV-2 Assay on the Hologic Panther instrument was retrieved from the Houston Methodist Hospital Laboratory Information System. Statistical significance between the mean Ct value for strains with an aspartate (n=102) or glycine (n=812) amino acid at position 614 of the spike protein was determined with the Mann-Whitney test (GraphPad PRISM 8).

Creation and characterization of spike protein RBD variants.

Spike RBD variants were cloned into the spike-6P (HexaPro; F817P, A892P, A899P, A942P, K986P, V987P) base construct that also includes the D614G substitution (pIF638). Briefly, a segment of the gene encoding the RBD was excised with EcoRI and NheI, mutagenized by PCR, and assembled with a HiFi DNA Assembly Cloning Kit (NEB). FreeStyle 293-F cells (Thermo Fisher Scientific) were cultured and maintained in a humidified atmosphere of 37°C and 8% CO2 while shaking at 110–125rpm. Cells were transfected with plasmids encoding spike protein variants using polyethylenimine. Three hours post-transfection, 5μM kifunensine was added to each culture. Cells were harvested four days after transfection and the protein containing supernatant was separated from the cells by two centrifugation steps: 10 min at 500rcf and 20 min at 10,000rcf. Supernatants were kept at 4°C throughout. Clarified supernatant was loaded on a Poly-Prep chromatography column (Bio-Rad) containing Strep-Tactin Superflow resin (IBA), washed with five column volumes (CV) of wash buffer (100mM Tris-HCl pH 8.0, 150mM NaCl; 1mM EDTA), and eluted with four CV of elution buffer (100mM Tris-HCl pH 8.0, 150mM NaCl, 1mM EDTA, 2.5mM d-Desthiobiotin). The eluate was spin-concentrated (Amicon Ultra-15) to 600μL and further purified via size-exclusion chromatography (SEC) using a Superose 6 Increase 10/300 column (G.E.) in SEC buffer (2mM Tris pH 8.0, 200mM NaCl and 0.02% NaN3). Proteins were concentrated to 300μL and stored in SEC buffer. The RBD spike mutants chosen for analysis were all RBD amino acid mutants identified by our genome sequencing study as of June 15, 2020. We note that the exact boundaries of the RBD domain varies depending on the paper used as reference. We used the boundaries demarcated in Figure 1A of Cai et al. Science paper 21 July) (98) that have K528R located at the RBD-CTD1 interface.

Differential scanning fluorimetry.

Recombinant spike proteins were diluted to a final concentration of 0.05mg/mL with 5X SYPRO orange (Sigma) in a 96-well qPCR plate. Continuous fluorescence measurements (λex=465nm, λem=580nm) were collected with a Roche LightCycler 480 II. The temperature was increased from 22°C to 95°C at a rate of 4.4°C/min. We report the first melting transition.

Enzyme-linked immunosorbent assays.

ELISAs were performed to characterize binding of S6P, S6P D614G, and S6P D614G-RBD variants to human ACE2 and the RBD-binding monoclonal antibody CR3022. The ACE2-hFc chimera was obtained from GenScript (Z03484), and the CR3022 antibody was purchased from Abcam (Ab273073). Corning 96-well high-binding plates (CLS9018BC) were coated with spike variants at 2μg/mL overnight at 4°C. After washing four times with phosphate buffered saline + 0.1% Tween20 (PBST; 300μL/well), plates were blocked with PBS+2% milk (PBSM) for 2 h at room temperature and again washed four times with PBST. These were serially diluted in PBSM 1:3 seven times in triplicate. After 1 h incubation at room temperature, plates were washed four times in PBST, labeled with 50μL mouse anti-human IgG1 Fc-HRP (SouthernBlots, 9054–05) for 45 min in PBSM, and washed again in PBST before addition of 50μL 1-step Ultra TMB-ELISA substrate (Thermo Scientific, 34028). Reactions were developed for 15 min and stopped by addition of 50μL 4M H2SO4. Absorbance intensity (450nm) was normalized within a plate and EC50 values were calculated through 4-parameter logistic curve (4PL) analysis using GraphPad PRISM 8.4.3. Supplemental Table 3 Pearson correlation coefficient data for correlation analysis. Supplemental Table 4 Primers and plasmids used for the in vitro characterization of recombinant proteins with single amino acid replacements in the receptor binding domain (RBD) region of spike protein, and their biophysical properties. To test the hypothesis that RBD amino acid changes enhance viral fitness, we expressed spike variants with the Asp614Gly replacement and 13 clinical RBD variants identified in our genome sequencing studies. Table S4A contains the primers used, Table S4B contains the plasmid construct information, and Table S4C contains the biophysical properties of the resultant spike protein variants. Supplemental Table 2 Classifier accuracy scores and performance of machine learning models. Supplemental Table 1 Patient demographics in wave 1 and wave 2. Supplemental FIG 1 Geographic distribution of representative SARS-CoV-2 subclades in the Houston metropolitan region. Blue shaded areas denote zip codes containing COVID-19 cases with the designated subclade. Supplemental FIG 2 Cladograms showing distribution of patient metadata, including (A) age (in decade), (B) sex, (C) ethnicity/ethnic group, (D) wave, (E) level of care, (F) mechanical ventilation, (G) length of stay, and (H) mortality. Supplemental FIG 3 Distribution of subclades characterized by particular amino acid replacements in Nsp12 (RdRp). Supplemental FIG 4 Mapping the location of amino acid replacements on Nsp12 (RdRp) from COVID-19 virus. The schematic on the top shows the domain architecture of Nsp12. The individual domains of Nsp12 are color-coded and labeled. Ribbon representation of the crystal structure of Nsp12-remdesivir monophosphate-RNA complex is shown (PDB code: 7BV2). The structure in the right panel is obtained by rotating the left panel 180° along the y-axis. The Nsp12 domains are colored as in the schematic at the top. The positions of Cα atoms of the surface-exposed amino acids identified in this study are shown as yellow spheres, whereas the positions of Cα atoms of the buried amino acids are depicted as cyan spheres. The catalytic site in RdRp is marked by a black circle in the right panel. The side chains of amino acids comprising the catalytic site of RdRp are shown as balls and sticks and colored yellow. The nucleotide binding site is boxed and labeled in the right panel. The side chains of amino acids participating in nucleotide binding (Lys545, Arg553, and Arg555) are shown as balls and sticks. Remdesivir molecule incorporated into the nascent RNA is shown as balls and sticks and colored light pink. The RNA is shown as blue cartoon and bases are shown as sticks. The positions of Cα atoms of amino acids that are predicted to influence remdesivir binding are shown as red spheres. The amino acid Cys812 located at the catalytic site is shown as green sphere. The location of Cα atoms of remdesivir resistance conferring amino acid Val556 is shown as blue sphere and labeled. Supplemental FIG 5 Distribution of subclades characterized by particular amino acid replacements in spike protein. Supplemental FIG 6 Biochemical characterization of single amino acid variants of spike protein RBD. (A, B) Size-exclusion chromatography (SEC) traces of the indicated spike-RBD variants. Dashed line indicates the elution peak of spike-6P. (C) Thermostability analysis of RBD variants. Each sample had three replicates and only mean values were plotted. Black vertical dashed line indicates the first melting temperature of 6P-D614G. (D) ELISA-based binding affinities for ACE2 and (E) neutralizing monoclonal antibody CR3022 to the indicated RBD variants.
  75 in total

1.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

2.  Effect of Remdesivir vs Standard Care on Clinical Status at 11 Days in Patients With Moderate COVID-19: A Randomized Clinical Trial.

Authors:  Christoph D Spinner; Robert L Gottlieb; Gerard J Criner; José Ramón Arribas López; Anna Maria Cattelan; Alex Soriano Viladomiu; Onyema Ogbuagu; Prashant Malhotra; Kathleen M Mullane; Antonella Castagna; Louis Yi Ann Chai; Meta Roestenberg; Owen Tak Yin Tsang; Enos Bernasconi; Paul Le Turnier; Shan-Chwen Chang; Devi SenGupta; Robert H Hyland; Anu O Osinusi; Huyen Cao; Christiana Blair; Hongyuan Wang; Anuj Gaggar; Diana M Brainard; Mark J McPhail; Sanjay Bhagani; Mi Young Ahn; Arun J Sanyal; Gregory Huhn; Francisco M Marty
Journal:  JAMA       Date:  2020-09-15       Impact factor: 56.272

3.  Human neutralizing antibodies elicited by SARS-CoV-2 infection.

Authors:  Bin Ju; Qi Zhang; Jiwan Ge; Ruoke Wang; Jing Sun; Xiangyang Ge; Jiazhen Yu; Sisi Shan; Bing Zhou; Shuo Song; Xian Tang; Jinfang Yu; Jun Lan; Jing Yuan; Haiyan Wang; Juanjuan Zhao; Shuye Zhang; Youchun Wang; Xuanling Shi; Lei Liu; Jincun Zhao; Xinquan Wang; Zheng Zhang; Linqi Zhang
Journal:  Nature       Date:  2020-05-26       Impact factor: 49.962

4.  A neutralizing human antibody binds to the N-terminal domain of the Spike protein of SARS-CoV-2.

Authors:  Xiangyang Chi; Renhong Yan; Jun Zhang; Guanying Zhang; Yuanyuan Zhang; Meng Hao; Zhe Zhang; Pengfei Fan; Yunzhu Dong; Yilong Yang; Zhengshan Chen; Yingying Guo; Jinlong Zhang; Yaning Li; Xiaohong Song; Yi Chen; Lu Xia; Ling Fu; Lihua Hou; Junjie Xu; Changming Yu; Jianmin Li; Qiang Zhou; Wei Chen
Journal:  Science       Date:  2020-06-22       Impact factor: 47.728

5.  Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir.

Authors:  Wanchao Yin; Chunyou Mao; Xiaodong Luan; Dan-Dan Shen; Qingya Shen; Haixia Su; Xiaoxi Wang; Fulai Zhou; Wenfeng Zhao; Minqi Gao; Shenghai Chang; Yuan-Chao Xie; Guanghui Tian; He-Wei Jiang; Sheng-Ce Tao; Jingshan Shen; Yi Jiang; Hualiang Jiang; Yechun Xu; Shuyang Zhang; Yan Zhang; H Eric Xu
Journal:  Science       Date:  2020-05-01       Impact factor: 47.728

6.  Structural basis for neutralization of SARS-CoV-2 and SARS-CoV by a potent therapeutic antibody.

Authors:  Zhe Lv; Yong-Qiang Deng; Qing Ye; Lei Cao; Chun-Yun Sun; Changfa Fan; Weijin Huang; Shihui Sun; Yao Sun; Ling Zhu; Qi Chen; Nan Wang; Jianhui Nie; Zhen Cui; Dandan Zhu; Neil Shaw; Xiao-Feng Li; Qianqian Li; Liangzhi Xie; Youchun Wang; Zihe Rao; Cheng-Feng Qin; Xiangxi Wang
Journal:  Science       Date:  2020-07-23       Impact factor: 47.728

7.  Remdesivir is a direct-acting antiviral that inhibits RNA-dependent RNA polymerase from severe acute respiratory syndrome coronavirus 2 with high potency.

Authors:  Calvin J Gordon; Egor P Tchesnokov; Emma Woolner; Jason K Perry; Joy Y Feng; Danielle P Porter; Matthias Götte
Journal:  J Biol Chem       Date:  2020-04-13       Impact factor: 5.157

8.  Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail.

Authors:  Johanna Hansen; Alina Baum; Kristen E Pascal; Vincenzo Russo; Stephanie Giordano; Elzbieta Wloga; Benjamin O Fulton; Ying Yan; Katrina Koon; Krunal Patel; Kyung Min Chung; Aynur Hermann; Erica Ullman; Jonathan Cruz; Ashique Rafique; Tammy Huang; Jeanette Fairhurst; Christen Libertiny; Marine Malbec; Wen-Yi Lee; Richard Welsh; Glen Farr; Seth Pennington; Dipali Deshpande; Jemmie Cheng; Anke Watty; Pascal Bouffard; Robert Babb; Natasha Levenkova; Calvin Chen; Bojie Zhang; Annabel Romero Hernandez; Kei Saotome; Yi Zhou; Matthew Franklin; Sumathi Sivapalasingam; David Chien Lye; Stuart Weston; James Logue; Robert Haupt; Matthew Frieman; Gang Chen; William Olson; Andrew J Murphy; Neil Stahl; George D Yancopoulos; Christos A Kyratsous
Journal:  Science       Date:  2020-06-15       Impact factor: 47.728

9.  Immunogenicity and safety of a recombinant adenovirus type-5-vectored COVID-19 vaccine in healthy adults aged 18 years or older: a randomised, double-blind, placebo-controlled, phase 2 trial.

Authors:  Feng-Cai Zhu; Xu-Hua Guan; Yu-Hua Li; Jian-Ying Huang; Tao Jiang; Li-Hua Hou; Jing-Xin Li; Bei-Fang Yang; Ling Wang; Wen-Juan Wang; Shi-Po Wu; Zhao Wang; Xiao-Hong Wu; Jun-Jie Xu; Zhe Zhang; Si-Yue Jia; Bu-Sen Wang; Yi Hu; Jing-Jing Liu; Jun Zhang; Xiao-Ai Qian; Qiong Li; Hong-Xing Pan; Hu-Dachuan Jiang; Peng Deng; Jin-Bo Gou; Xue-Wen Wang; Xing-Huan Wang; Wei Chen
Journal:  Lancet       Date:  2020-07-20       Impact factor: 202.731

10.  Cryptic transmission of SARS-CoV-2 in Washington state.

Authors:  Trevor Bedford; Alexander L Greninger; Pavitra Roychoudhury; Lea M Starita; Michael Famulare; Helen Y Chu; Jay Shendure; Keith R Jerome; Meei-Li Huang; Arun Nalla; Gregory Pepper; Adam Reinhardt; Hong Xie; Lasata Shrestha; Truong N Nguyen; Amanda Adler; Elisabeth Brandstetter; Shari Cho; Danielle Giroux; Peter D Han; Kairsten Fay; Chris D Frazar; Misja Ilcisin; Kirsten Lacombe; Jover Lee; Anahita Kiavand; Matthew Richardson; Thomas R Sibley; Melissa Truong; Caitlin R Wolf; Deborah A Nickerson; Mark J Rieder; Janet A Englund; James Hadfield; Emma B Hodcroft; John Huddleston; Louise H Moncla; Nicola F Müller; Richard A Neher; Xianding Deng; Wei Gu; Scot Federman; Charles Chiu; Jeffrey S Duchin; Romesh Gautom; Geoff Melly; Brian Hiatt; Philip Dykema; Scott Lindquist; Krista Queen; Ying Tao; Anna Uehara; Suxiang Tong; Duncan MacCannell; Gregory L Armstrong; Geoffrey S Baird
Journal:  Science       Date:  2020-09-10       Impact factor: 47.728

View more
  1 in total

1.  Multicentre Performance Evaluation of the Elecsys Anti-SARS-CoV-2 Immunoassay as an Aid in Determining Previous Exposure to SARS-CoV-2.

Authors:  Elena Riester; Mario Majchrzak; Annelies Mühlbacher; Caroline Tinguely; Peter Findeisen; Johannes Kolja Hegel; Michael Laimighofer; Christopher M Rank; Kathrin Schönfeld; Florina Langen; Tina Laengin; Christoph Niederhauser
Journal:  Infect Dis Ther       Date:  2021-08-09
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.