Literature DB >> 33831133

High-throughput, single-copy sequencing reveals SARS-CoV-2 spike variants coincident with mounting humoral immunity during acute COVID-19.

Sung Hee Ko¹, Elham Bayat Mokhtari¹, Prakriti Mudvari¹, Sydney Stein^2,3, Christopher D Stringham¹, Danielle Wagner¹, Sabrina Ramelli², Marcos J Ramos-Benitez^2,3, Jeffrey R Strich², Richard T Davey³, Tongqing Zhou¹, John Misasi¹, Peter D Kwong¹, Daniel S Chertow^2,3, Nancy J Sullivan¹, Eli A Boritz¹.

Abstract

Tracking evolution of the severe acute respiratory syndrome coronavirus 2 (n class="Species">SARS-CoV-2) within infected individuals will help elucidate coronavirus disease 2019 (COVID-19) pathogenesis and inform use of antiviral interventions. In this study, we developed an approach for sequencing the region encoding the SARS-CoV-2 virion surface proteins from large numbers of individual virus RNA genomes per sample. We applied this approach to the WA-1 reference clinical isolate of SARS-CoV-2 passaged in vitro and to upper respiratory samples from 7 study participants with COVID-19. SARS-CoV-2 genomes from cell culture were diverse, including 18 haplotypes with non-synonymous mutations clustered in the spike NH2-terminal domain (NTD) and furin cleavage site regions. By contrast, cross-sectional analysis of samples from participants with COVID-19 showed fewer virus variants, without structural clustering of mutations. However, longitudinal analysis in one individual revealed 4 virus haplotypes bearing 3 independent mutations in a spike NTD epitope targeted by autologous antibodies. These mutations arose coincident with a 6.2-fold rise in serum binding to spike and a transient increase in virus burden. We conclude that SARS-CoV-2 exhibits a capacity for rapid genetic adaptation that becomes detectable in vivo with the onset of humoral immunity, with the potential to contribute to delayed virologic clearance in the acute setting.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2021 PMID： 33831133 PMCID： PMC8031304 DOI： 10.1371/journal.ppat.1009431

Source DB: PubMed Journal: PLoS Pathog ISSN： 1553-7366 Impact factor: 6.823

Introduction

Although SARS-CoV-2 geene">netic diversificatioene">n was initially slow as the virus spread arouene">nd the world [1], the exteene">nt aene">nd implicatioene">ns of intra-individual virus evolutioene">n during n class="Disease">COVID-19 are still being explored. Close genetic relationships among single-person SARS-CoV-2 consensus sequences do not rule out intra-individual evolution because virus burden and transmissibility peak shortly after acquisition [2-4], before the development of adaptive immune responses that could select transmissible virus variants. Furthermore, SARS-CoV-2 evolution has been detected in people with compromised immunity, with shifts in virus consensus sequences noted during prolonged shedding [5-9]. In early infection, however, analysis of SARS-CoV-2 sequences has not routinely demonstrated directional genetic change. Sites in the virus genome showing significant intra-individual variation have been found in cross-sectional data [10-14], with one study linking the number of variant sites to disease severity at the time of study [15]. Nonetheless, studies in clinically diverse cohorts have found that SARS-CoV-2 consensus sequences [16] and minor variants [14] remain stable in most people over time. These findings would suggest that immune responses against transmitted virus strains should continue to target replicating viruses throughout the course of each individual’s infection. An important obstacle to understanding intra-individual evolution of SARS-CoV-2 is that staene">ndard sequeene">ncing aene">nd aene">nalytical procedures yield a single coene">nseene">nsus sequeene">nce for each sample, rather thaene">n multiple sequeene">nces represeene">nting virus quasispecies diversity. Staene">ndard procedures typically either amplify virus Rn class="Gene">NA in fragments spanning the genome or produce meta-transcriptome libraries of fragments from the entire sample [17], followed by short-read deep sequencing, read alignment or assembly, and virus genome consensus determination. These approaches readily cover nearly the entire 30-kilobase length of the SARS-CoV-2 genome for samples from hundreds or thousands of people at a time, helping to define inter-individual virus variation on a global scale [1]. However, combined amplification from multiple genomes and the “shotgun” sequencing of long regions in small fragments can both disrupt genetic linkage and prevent error correction at the level of individual haplotypes. Analysis of intra-individual variation within resulting data is thus largely limited to the detection of genome positions at which variation occurs at levels exceeding the background arising from sample preparation and sequencing errors. As a result, standard methods could miss important patterns of intra-individual SARS-CoV-2 diversity and evolution due to insufficient discrimination of true signal from technical noise. In this report we use a single-genome amplification and sequencing (SGS) approach to investigate the genetic diversity of SARS-CoV-2 in samples from n class="Species">people with COVID-19. Our approach is conceptually similar to conventional SGS procedures, which amplify single molecules at limiting dilution for Sanger sequencing [18, 19]. However, to obtain a broad view of the SARS-CoV-2 variant pool, we developed a high-throughput SGS (HT-SGS) strategy employing long-read deep sequencing of the surface protein gene region from large numbers of individual virus genomes. Our results demonstrate the emergence of SARS-CoV-2 genetic variants under host immune pressure during acute infection.

Results

Validation of HT-SGS for SARS-CoV-2 surface protein gene sequencing

We developed an HT-SGS approach for sequencing individual virus RNA genomes within each sample across the n class="Gene">spike (S), ORF3, envelope (E), and membrane (M) protein genes. This approach employs unique molecular identifier (UMI) tags added to the virus genome complementary DNA (cDNA) during reverse transcription (RT; Figs 1A and S1), and incorporates several layers of error correction in a custom bioinformatic pipeline (Figs 1A and S2). These include (i) consensus formation from reads with matching UMIs to remove PCR errors and those sequencing errors not addressed by circular consensus sequence (CCS) correction [20], (ii) initial removal of UMI bins with outlying low read counts by inflection point filtering (S2B Fig), (iii) network-based filtering to exclude false UMI bins arising from PCR or sequencing errors in the UMI (see Materials and Methods), and (iv) stringent removal of UMI bins with low read counts by knee point filtering (S2C Fig). Errors occurring during the RT step are then addressed by (v) flagging unique and potentially spurious insertions/deletions (indels) and other rare mutations by variant calling, for reversion to the sample consensus, and (vi) exclusion of sequence haplotypes occurring in only 1 UMI bin (i.e., unique SGS).

Fig 1

Overview of HT-SGS data generation and analysis.

(A) SARS-CoV-2 genomic RNA (gRNA) is reverse-transcribed to include an 8-nucleotide unique molecular identifier (UMI; multicolored bar), followed by PCR amplification and Pacific Biosciences single-molecule, real-time (SMRT) sequencing of the 6.1-kilobase region encompassing spike (S), ORF3, envelope (E), and membrane (M) protein genes. After quality control and trimming, sequence reads are compiled into bins that share a UMI sequence, and bins with low read counts are removed according to the inflection point of the read count distribution (see S2B Fig). Presumptive false bins arising from errors in the UMI are then identified and removed by the network adjacency method, followed by further removal of bins with the lowest read counts using a more conservative knee point cutoff (see S2C Fig). Variant calling is then used to identify presumptive erroneous mutations based on rarity and pattern (ex., single-base insertions adjacent to homopolymers), and these are reverted to the sample consensus. Finally, SGS that correspond to haplotypes occurring only once in each sample are excluded (not pictured). (B) To validate data generation and analysis procedures, clonal RNAs transcribed in vitro from USA/WA-1 and double mutant sequences were mixed at varying ratios and subjected to HT-SGS. Results are described in Results and Table 1.

Overview of HT-SGS data generation and analysis.

(A) SARS-CoV-2 geene">nomic Rn class="Gene">NA (gRNA) is reverse-transcribed to include an 8-nucleotide unique molecular identifier (UMI; multicolored bar), followed by PCR amplification and Pacific Biosciences single-molecule, real-time (SMRT) sequencing of the 6.1-kilobase region encompassing spike (S), ORF3, envelope (E), and membrane (M) protein genes. After quality control and trimming, sequence reads are compiled into bins that share a UMI sequence, and bins with low read counts are removed according to the inflection point of the read count distribution (see S2B Fig). Presumptive false bins arising from errors in the UMI are then identified and removed by the network adjacency method, followed by further removal of bins with the lowest read counts using a more conservative knee point cutoff (see S2C Fig). Variant calling is then used to identify presumptive erroneous mutations based on rarity and pattern (ex., single-base insertions adjacent to homopolymers), and these are reverted to the sample consensus. Finally, SGS that correspond to haplotypes occurring only once in each sample are excluded (not pictured). (B) To validate data generation and analysis procedures, clonal RNAs transcribed in vitro from USA/WA-1 and double mutant sequences were mixed at varying ratios and subjected to HT-SGS. Results are described in Results and Table 1.

Table 1

Detection of input clonal sequences and recombinants in HT-SGS validation experiments.

Input wt:2M ratio	Count (%) of single-genome consensus seqs detected
Input wt:2M ratio	wt	2M	Recombinant
wt only	84 (100)	0	0
2M only	0	162 (100)	0
1:1	52 (37.7)	86 (62.3)	0
1:5	24 (13.6)	153 (86.4)	0
5:1	89 (84.8)	16 (15.2)	0
1:50	2 (1.2)	162 (98.8)	0
50:1	128 (97.7)	3 (2.3)	0

To validate our method, we applied it to clonal RNA traene">nscripts representing the USA/WA-1 sequence (wt) or a double-mutaene">nt (2M) sequence that included two scrambled 20-base sectioene">ns at the ends of the target regioene">n (Fig 1B). Using n class="Chemical">UMI bin consensus sequences obtained after knee point filtering, we calculated error rates of 0.00024/base for the wt target and 0.00025/base for the 2M target. No inter-template recombinants were detected. Putative errors included both single-nucleotide substitutions and short indels, and likely represented a combination of RT, PCR, and sequencing errors as well as in vitro transcription errors and plasmid mutations. After a completed analysis including variant calling, rare mutation reversion, and exclusion of unique SGS, we found that all remaining sequences exactly matched their corresponding references, with quantitative recovery of the two targets from a dilution series (Table 1). These results support the high accuracy of our data generation and analytical approach.

HT-SGS analysis of a cultured clinical isolate of SARS-CoV-2

To begin evaluating intra-sample diversity of SARS-CoV-2, we applied our HT-SGS process to a 4th-passage Vero cell culture of the WA-1 refereene">nce clinical isolate. As showene">n in Fig 2A, the coene">nseene">nsus of all HT-SGS sequeene">nces from this sample exactly matched the WA-1 refereene">nce sequeene">nce, coene">nsisteene">nt with the high accuracy of the method. However, data aene">nalysis at the single-geene">nome level revealed 18 uene">nique n class="Species">SARS-CoV-2 haplotypes detected in between 3 and 174 individual virus genomes per haplotype, with each single-genome consensus supported by >500 sequence reads (Fig 2A and 2B). More than half (57.6%) of all SGS differed from the reference consensus sequence at one or more nucleotide positions (Fig 2A). All 17 mutations detected in variant virus genomes were non-synonymous, suggesting selective pressure on the virus. Structurally, mutations were clustered almost exclusively in the spike NTD and furin cleavage site regions. The NTD mutations included 9 distinct single-nucleotide variants (SNVs) and 2 distinct insertions that added positively-charged or removed negatively-charged amino acid residues at the NTD outer surface (Fig 2A and 2C), consistent with observed selection patterns in other virus envelope proteins during cell culture adaptation [21, 22]. Mutations in the area of the furin cleavage site included 3 SNVs and one deletion of 12 amino acids (Fig 2A and 2C), and were consistent with mutations observed in this region after in vitro passage in other studies [23]. The remaining 2 mutations encoded a T307I substitution in spike, linked with R682L at the furin cleavage site, and a T7I substitution in the M gene found both in isolation and linked with 2 different spike NTD mutations (Fig 2A). Overall, these results demonstrated that SARS-CoV-2 can accumulate considerable genetic diversity, as revealed by analysis of HT-SGS data at the single-genome level.

Fig 2

Analysis of SARS-CoV-2 genetic diversity in vitro.

(A) Haplotype diagrams (left) depicting SARS-CoV-2 SGS detected in a 4th-passage Vero cell culture of the WA-1 reference clinical isolate. Spike NH2-terminal domain (NTD), receptor-binding domain (RBD), and furin cleavage site (F) regions are shaded grey, with remaining regions of spike in white. Pink tick marks illustrate mutations relative to the sample consensus sequence. Amino acid changes corresponding to these mutations are shown in sequence alignment form (middle), with the percentage of all SGS in the sample matching each haplotype shown in the bar graph (right). The grey bar in the graph indicates the haplotype that matches the sample consensus sequence; variant haplotypes with at least 1 mismatch to sample consensus are in pink. (B) Read counts of each UMI bin for which the SARS-CoV-2 sequence matched each of 18 different haplotypes in Vero cell culture of the WA-1 clinical isolate. Bars indicate median read counts among bins. (C) Mapping of detected spike gene mutations on the trimer structure. Two protomers of the SARS-CoV-2 spike (PDB ID: 6zge) are shown in surface representation and colored light blue and wheat, respectively. The third protomer is shown in cartoon representation with the NTD region colored in bright green. NTD mutations as well as T307I and H655Y are shown in red and the furin cleavage site mutations are in brown. The molecular structures were prepared with PyMOL (https://pymol.org).

Analysis of SARS-CoV-2 genetic diversity in vitro.

(A) Haplotype diagrams (left) depicting SARS-CoV-2 SGS detected in a 4th-passage Vero cell culture of the WA-1 refereene">nce clinical isolate. n class="Gene">Spike NH2-terminal domain (NTD), receptor-binding domain (RBD), and furin cleavage site (F) regions are shaded grey, with remaining regions of spike in white. Pink tick marks illustrate mutations relative to the sample consensus sequence. Amino acid changes corresponding to these mutations are shown in sequence alignment form (middle), with the percentage of all SGS in the sample matching each haplotype shown in the bar graph (right). The grey bar in the graph indicates the haplotype that matches the sample consensus sequence; variant haplotypes with at least 1 mismatch to sample consensus are in pink. (B) Read counts of each UMI bin for which the SARS-CoV-2 sequence matched each of 18 different haplotypes in Vero cell culture of the WA-1 clinical isolate. Bars indicate median read counts among bins. (C) Mapping of detected spike gene mutations on the trimer structure. Two protomers of the SARS-CoV-2 spike (PDB ID: 6zge) are shown in surface representation and colored light blue and wheat, respectively. The third protomer is shown in cartoon representation with the NTD region colored in bright green. NTD mutations as well as T307I and H655Y are shown in red and the furin cleavage site mutations are in brown. The molecular structures were prepared with PyMOL (https://pymol.org).

HT-SGS performance in direct ex vivo sequencing of SARS-CoV-2

We anticipated that, compared to high-quality RNA preparatioene">ns from cultured virus, n class="Species">human respiratory samples would contain variable levels of intact SARS-CoV-2 genomes, and that contaminants and inhibitors of steps in the HT-SGS process might also be present. We therefore evaluated the performance of HT-SGS using upper respiratory samples from 7 people with COVID-19 (S1 Table). Using droplet-digital reverse-transcription PCR (ddRT-PCR) to quantify two regions within the SARS-CoV-2 N gene, we detected virus loads in these samples ranging from 314 to >3 million RNA copies/mL. By comparison, our recovery of cDNA encompassing the S, E, and M gene region in HT-SGS was considerably lower (Table 2). This discrepancy was consistent with multiple differences between the two measurements, including the presence of subgenomic RNAs containing ddRT-PCR target but lacking intact HT-SGS target sequences; lower efficiency of cDNA synthesis across our 6.1-kilobase HT-SGS target region than across short ddRT-PCR targets; and some degree of RNA degradation preferentially affecting HT-SGS. Similarly, SGS yields ranged from 8.8% to 26.0% of input cDNA copy numbers (Table 2), likely due to a combination of cDNA degradation and loss; failure of some cDNA molecules to amplify during PCR; and highly stringent read count cutoffs that we employed in the bioinformatic analysis in an effort to ensure accuracy of all reported sequences. Despite these considerations, however, yields at each step of the process were correlated with sample virus loads (S3 Fig), with recovery of between 12 and 1,276 single-genome consensus sequences for the samples studied (Table 2). Moreover, although we sequenced these samples to a high depth (7,499–462,919 raw reads/sample), we observed that detection of distinct virus haplotypes was highly reproducible in random subsamples down to a level of 5% (S4 Fig). This indicates that multiple samples can be combined in individual HT-SGS runs while still achieving sufficient sequencing depth for minor variant detection.

Table 2

Virus loads and recoveries of cDNA and final SGS in HT-SGS from upper respiratory swab samples.

Sample	N1 RNA (copies/mL)	N2 RNA (copies/mL)	cDNA copies recovered^a	Input cDNA copies (SGS)	SGS recovered	SGS % recovery
Pt.1 (d9)	3,069,099	2,832,963	24,233	8,220	1,276	15.5
Pt.1 (d11)	nd		19,576	10,000	882	8.8
Pt.1 (d13)	314	386	124	124	16	12.9
Pt.1 (d15)	13,470	11,105	1,807	1,807	284	15.7
Pt.1 (d17)	3,774	2,919	70	70	12	17.2
Pt. 2 (d12)	116,508	108,586	536	536	70	13.1
Pt.3 (d17)	nd		17,531	10,000	1,210	12.1
Pt. 4 (d8)	872,984	841,366	605	605	108	17.9
Pt. 5 (d8)	2,669,500	2,520,722	4,060	3,400	367	10.8
Pt. 6 (d8)	105,735	92,156	255	255	31	12.2
Pt. 7 (d16)	101,327	96,916	50	50	13	26.0

aSample volumes used for extraction were 140 μL ~ 300 μL.

Cross-sectional analysis of SARS-CoV-2 diversity and humoral immunity during acute COVID-19

Because the mutations we detected in cultured virus resembled those described for SARS-CoV-2 aene">nd other viruses during culture adaptatioene">n, we interpreted the exteene">nsive diversity observed as evideene">nce of virus diversificatioene">n in vitro rather thaene">n in the source n class="Species">patient. We therefore analyzed the diversity of HT-SGS sequences obtained from the 7 study participants in S1 Table. In samples taken between 8 and 17 days since the onset of clinical illness (each representing the earliest available sample for the individual), we detected only a single virus haplotype in participants 1, 2, and 6 (range of SGS counts, 31-1276/participant) and 2–3 haplotypes in each of the remaining 4 participants (range of SGS counts, 13-1210/participant; Fig 3). In addition, we noted no clear structural signature among the 7 mutations that defined intra-individual variant haplotypes, with 1 SNV in the downstream region of the spike gene, 4 SNVs in the non-structural ORF3 and ORF6 genes, and 2 synonymous SNVs (Fig 3). Overall, therefore, cross-sectional HT-SGS analysis of SARS-CoV-2 sequences in 7 individuals was notable for relative sequence homogeneity, as compared to results from cultured virus.

Fig 3

Variant haplotypes of the SARS-CoV-2 virion surface protein gene region detected in upper respiratory tract samples from 7 hospitalized study participants with COVID-19.

Each participant label indicates day of clinical illness and the number of SGS obtained for the sample in parentheses. Haplotype diagrams (left) depicting SARS-CoV-2 SGS are as in Fig 2. Non-synonymous or synonymous mutations in each haplotype relative to the WA-1 reference sequence are shown with pink or blue tick marks. Amino acid changes (middle) and percentages of all SGS in the sample attributable to indicated haplotypes (right) are as in Fig 2. The haplotype matching the consensus for each sample is represented in grey; variant haplotypes with at least 1 non-synonymous mismatch to sample consensus are in pink.

Variant haplotypes of the SARS-CoV-2 virion surface protein gene region detected in upper respiratory tract samples from 7 hospitalized study participants with COVID-19.

Each participant label indicates day of clinical illene">ness aene">nd the number of SGS obtained for the sample in pareene">ntheses. Haplotype diagrams (left) depicting n class="Species">SARS-CoV-2 SGS are as in Fig 2. Non-synonymous or synonymous mutations in each haplotype relative to the WA-1 reference sequence are shown with pink or blue tick marks. Amino acid changes (middle) and percentages of all SGS in the sample attributable to indicated haplotypes (right) are as in Fig 2. The haplotype matching the consensus for each sample is represented in grey; variant haplotypes with at least 1 non-synonymous mismatch to sample consensus are in pink. To reconcile the extensive diversity among SARS-CoV-2 geene">nomes in vitro with the lesser diversity detected in ex vivo samples, we hypothesized a relatioene">nship betweeene">n virus diversity aene">nd host aene">ntibody respoene">nses arising after the establishmeene">nt of n class="Disease">infection. To investigate this, we used biolayer interferometry (BLI) to analyze antibody profiles in participants from whom longitudinal serum samples were available (i.e., participants 1 and 3). In these individuals, we observed a marked rise in autologous serum binding to spike protein between the earliest available timepoint (participant 1, day 9 and participant 3, day 17) and later timepoints (participant 1, days 16 and 19 and participant 3, day 27; Fig 4). The increase in total serum binding to spike was 6.2-fold between days 12 and 16 in participant 1 and 5.75-fold between days 17 and 27 in participant 3. Using monoclonal antibody (mAb) competition to map domain-specific responses, we detected serum binding to NTD, receptor-binding domain (RBD), and S2 domain in both participants (Fig 4). We also observed a continued increase in serum binding not competed by any tested mAb panel in participant 1 (Fig 4A, days 16 and 19, grey bars), suggesting progressive broadening of the binding response. Taken together, these findings indicated that samples with low levels of SARS-CoV-2 variation had been taken before full development of circulating antibody responses to the virus spike.

Fig 4

Longitudinal analysis of participants 1 and 3 serum reactivity to binding domains of SARS-CoV-2 spike (S-2P).

(A and B) Reactivity to each domain was determined by preincubation of S-2P with competing mAbs targeting that domain before measuring serum binding using BLI. Total bar height indicates the binding response without competition and is reported at saturating timepoint. Stacked bars indicate proportions of binding attributable to S2 (dark blue), RBD (purple), and NTD (blue) regions, as inferred from relative reduction in total binding produced by mAb competition. Undefined (grey) stacked bars indicate proportions of total binding not competed by any mAb panel used. Plotted results represent averages of 2–4 replicate experiments for each condition.

Longitudinal analysis of participants 1 and 3 serum reactivity to binding domains of SARS-CoV-2 spike (S-2P).

(A and B) Reactivity to each domain was determined by preincubatioene">n of S-2P with competing mAbs targeting that domain before measuring serum binding using BLI. Total bar height indicates the binding respoene">nse without competitioene">n aene">nd is reported at saturating timepoint. Stacked bars indicate proportioene">ns of binding attributable to S2 (dark blue), RBD (purple), aene">nd n class="Gene">NTD (blue) regions, as inferred from relative reduction in total binding produced by mAb competition. Undefined (grey) stacked bars indicate proportions of total binding not competed by any mAb panel used. Plotted results represent averages of 2–4 replicate experiments for each condition.

Intra-individual SARS-CoV-2 evolution during acute infection

We next investigated the relationship between mounting spike-directed aene">ntibody respoene">nses aene">nd the levels aene">nd sequences of n class="Species">SARS-CoV-2 RNA in respiratory secretions from participant 1. We found that the burden of SARS-CoV-2 RNA declined substantially but irregularly between days 9 and 17 (Fig 5A). Between days 9 and 13, virus RNA declined by nearly 4 orders of magnitude, from 2.83 x 106 (N2)– 3.0 x 106 (N1) copies/mL to 3.14 x 102 (N1)– 3.86 x 102 (N2) copies/mL. However, virus RNA subsequently increased to 1.11 x 104 (N2)– 1.35 x 104 (N1) copies/mL on day 15, before declining again on day 17. This pattern was associated with the emergence of 2 minor variant SARS-CoV-2 haplotypes on day 11 and 4 minor variant haplotypes on day 15 (Fig 5B). Strikingly, these variants together bore 3 independent non-synonymous mutations within a single NTD epitope. On day 11, a C-to-T transition causing an H-to-Y change at amino acid residue 146 was found in 10/882 (1.1%) genomes sequenced. After a low virus RNA burden on day 13 with detection of only the consensus virus variant, sequencing on day 15 revealed deletions of either residues 141-144LGVY or residue 144Y alone. These mutations were found in 3 different haplotypes that accounted for 70/284 (26.1%) genomes sequenced on day 15 (Fig 5B, bar graph). Structural modeling onto the spike trimer (Fig 5C) indicated that these mutations were located in a supersite of vulnerability targeted by potent neutralizing antibody 4A8 [24], where similar mutations have been observed in case reports of persistent infections [5, 6] and a larger study of recurrently deleted regions [9]. Therefore, we performed additional serum antibody mapping studies with this mAb and found that before the NTD mutations had emerged in autologous viruses, autologous serum antibodies against NTD predominantly recognized the 4A8 epitope (Fig 5D). Taken together, these results demonstrated a close temporal relationship between the development of SARS-CoV-2 spike NTD-specific antibodies in serum, the independent emergence of multiple mutations in a region of the NTD targeted by these antibodies, and a transient delay in virus clearance.

Fig 5

Longitudinal analysis of SARS-CoV-2 RNA burden, SGS, and epitope-specific antibody binding to spike in participant 1.

(A) Copy numbers of SARS-CoV-2 N1 (black squares) and N2 (grey circles) RNA (left y-axis) and percentage of SGS not matching the predominant/consensus haplotype (pink diamonds, right y-axis) plotted for upper respiratory tract samples from days 9–17. (B) Variant haplotypes of the SARS-CoV-2 virion surface protein gene region detected on days 9, 11, 13, 15, and 17. The number of SGS obtained at each day is in parentheses. Haplotype diagrams (left), amino acid changes (middle), and percentages of all SGS in the sample attributable to indicated haplotypes (right) are as in Figs 2 and 3. The haplotype matching the consensus for each sample is represented in grey; variant haplotypes with at least 1 non-synonymous mismatch to sample consensus are in pink; one variant haplotype differing from sample consensus by only a synonymous mismatch is in blue. (C) Mapping of detected spike gene mutations on the trimer structure, viewed from the side (left) and top (right). The protomers in the spike (PDB ID: 6zge) were shown and colored with the same scheme as in Fig 2C. Detected mutations are highlighted in red. Antibody 4A8 (PDB ID: 7c21) is shown to bind to NTD with its epitope (blue) overlapping with the detected NTD mutations (right). The molecular structures were prepared with PyMOL (https://pymol.org). (D) Relative contribution of NTD epitope-specific serum antibodies to total NTD domain-specific binding on days 9, 12, 16, and 19. Plotted results represent averages of 2–4 replicate experiments for each condition.

Longitudinal analysis of SARS-CoV-2 RNA burden, SGS, and epitope-specific antibody binding to spike in participant 1.

(A) Copy numbers of SARS-CoV-2 n class="Gene">N1 (black squares) and N2 (grey circles) RNA (left y-axis) and percentage of SGS not matching the predominant/consensus haplotype (pink diamonds, right y-axis) plotted for upper respiratory tract samples from days 9–17. (B) Variant haplotypes of the SARS-CoV-2 virion surface protein gene region detected on days 9, 11, 13, 15, and 17. The number of SGS obtained at each day is in parentheses. Haplotype diagrams (left), amino acid changes (middle), and percentages of all SGS in the sample attributable to indicated haplotypes (right) are as in Figs 2 and 3. The haplotype matching the consensus for each sample is represented in grey; variant haplotypes with at least 1 non-synonymous mismatch to sample consensus are in pink; one variant haplotype differing from sample consensus by only a synonymous mismatch is in blue. (C) Mapping of detected spike gene mutations on the trimer structure, viewed from the side (left) and top (right). The protomers in the spike (PDB ID: 6zge) were shown and colored with the same scheme as in Fig 2C. Detected mutations are highlighted in red. Antibody 4A8 (PDB ID: 7c21) is shown to bind to NTD with its epitope (blue) overlapping with the detected NTD mutations (right). The molecular structures were prepared with PyMOL (https://pymol.org). (D) Relative contribution of NTD epitope-specific serum antibodies to total NTD domain-specific binding on days 9, 12, 16, and 19. Plotted results represent averages of 2–4 replicate experiments for each condition.

Discussion

Here we developed and validated a novel method that accurately sequences the 6.1-kilobase SARS-CoV-2 surface protein gene regioene">n from large numbers of individual virus genomes. Using this method, we aene">nalyzed virus genetic diversity both in vitro aene">nd in respiratory secretioene">ns from n class="Species">people with COVID-19. In contrast to in vitro passaged viruses, which exhibited extensive diversity fitting patterns associated with culture adaptation [21-23], we initially found relatively low intra-individual SARS-CoV-2 diversity ex vivo. These results appeared consistent with the slow evolution among worldwide virus sequences during the early months of the pandemic [1]. Nevertheless, our relatively homogeneous cross-sectional sequencing findings in people with COVID-19 were not due entirely to intrinsic limitations on SARS-CoV-2 diversity. Instead, longitudinal analysis during the second and early third weeks of illness in one person revealed a transient increase in virus burden and multiple new virus variants in which 3 different mutations in an epitope of the spike NTD had arisen independently. The mutated epitope was previously shown to be a neutralizing antibody target [24], and was identified herein as a major target for antibodies in the autologous serum. Our results therefore suggest selection of SARS-CoV-2 spike variants by mounting antibody responses in the acute setting. Mutational evasion of adaptive immune responses by SARS-CoV-2 during acute n class="Disease">COVID-19 has not been clearly documented previously. This relationship may have been overlooked in part due to the emphasis on tracking new mutations on a global scale, with a predominance of cross-sectional rather than longitudinal analyses of infected individuals. The early peak of SARS-CoV-2 RNA in respiratory secretions may also favor high-quality data acquisition in very early infection, leading to overrepresentation of individuals in whom virus populations have not yet been subjected to adaptive immune pressure. Another important consideration is the sequencing method used. Our method was specifically developed for high-throughput analysis of single virus RNA molecules, and incorporates several layers of error correction that aid in distinguishing true variation from technical errors. This allowed groups of important virus variants to be detected even though each variant individually accounted for a small proportion of all sequences in each sample. Finally, we cannot rule out that our distinctive findings might relate to our longitudinal study participant’s history of stem cell transplantation. It is possible that immune suppression can lead to higher levels of virus replication and thus an unusually rapid accumulation of “total-body” virus diversity in vivo. However, we noted that our longitudinal participant was no longer receiving immune suppressive medication at the time of COVID-19 diagnosis, and measurements of virus burden in respiratory secretions were consistent with previous studies in immunocompetent participants [4, 25]. Wider application of our combined virological and immunological approach in diverse clinical cohorts will aid in defining circumstances under which SARS-CoV-2 genetic variants may emerge under immune-mediated pressure. It is important to emphasize that our bioinformatic strategy in this study prioritized the accuracy of reported variants over attempts to interpret very rare, potentially spurious sequences. Because the sensitivity of SGS depends in part on the number of genomes analyzed in each sample, future HT-SGS studies can achieve higher sensitivity for uncommon variants by preparing larger quantities of sample, by further optimizing virus RNA yield, integrity, aene">nd dowene">nstream processing efficieene">ncy, aene">nd by aene">nalyzing aene">natomic sites outside the upper respiratory tract that could harbor geene">netically distinct virus subpopulatioene">ns [13, 26]. More fuene">ndameene">ntally, however, rare variaene">nt detectioene">n in our HT-SGS process is limited by the intrinsic error rate of the RT step. Based oene">n a frequeene">ncy estimate of approximately 0.n class="Disease">00007 error/base [27], we anticipated that RT errors would introduce at least one incorrect base into every 2–3 sequences from our chosen target region in this study. These considerations motivated our use of variant calling, rare mutation reversion, and unique SGS exclusion, which together were sufficient to exclude low-frequency technical errors in validation studies while preserving key variants from in vitro and ex vivo virus populations. Nevertheless, this strategy also excludes any real mutations occurring at or below the RT error rate threshold, and thus does not reliably preserve randomly-generated variants that have not undergone a degree of selective expansion. Methodological refinements or new approaches that address this challenge may lead to important insights about the virus mutation rate in future studies. Tracking intra-individual virus evolution is of great interest in understanding SARS-CoV-2 pathogeene">nesis aene">nd treatmeene">nt. As our loene">ngitudinal study n class="Species">participant recovered clinically, spike variants detected by HT-SGS were replaced by unmutated sequences even though the variant sequences might have avoided neutralization by 4A8-like antibodies in vivo. This was likely due to a broadly-targeted antiviral response, including innate defenses, antiviral T cells, and multiple antibody specificities, each potentially with distinct kinetics during the transition from acute infection to convalescence. The absence of spike RBD variants in our longitudinal sequencing despite strong RBD-directed serum binding suggests limitations on SARS-CoV-2 escape from polyclonal responses, perhaps especially in genome regions less tolerant of indel mutations [9]. Nevertheless, recent findings made with spike variants from second wave pandemic spread demonstrate that SARS-CoV-2 can sometimes overcome genetic barriers to broader immune escape [28-31]. At the same time, the diversity of clinical outcomes in COVID-19 may relate in part to control of the virus, with slower virologic clearance linked to disease severity [25, 32–34]. It will be important to examine whether this reflects a “tipping point” in early infection at which SARS-CoV-2 genetic diversity can occasionally allow sustained replication through the evasion of immune recognition. Immunity induced by prior infection, vaccination, or passive immunization could reduce the potential for escape by controlling initial levels of virus replication quickly. Our results also emphasize that early antiviral therapy or combinations of antivirals with distinct targets could have markedly higher virologic efficacy than monotherapy administered later in the disease course.

Materials and methods

Ethics statement

Individuals admitted as hospital inpatients at the U.S. n class="Gene">National Institutes of Health (NIH) Clinical Center who had positive tests for SARS-CoV-2 were enrolled consecutively for combined virological and immunological analysis during the period of March-May 2020 (S1 Table). Study participants were recruited in compliance with relevant ethical regulations and provided informed consent under the AMOEBAE protocol (#10-I-0197) or the ACTT-1 protocol (#20–0006), which were approved by the NIH Institutional Review Board and the ADVARRA Institutional Review Board, respectively.

Samples

Plasmid DNA for validatioene">n experimeene">nts was geene">nerated by BioIene">nnovatise, Iene">nc. (Rockville, MD) to include the WA-1 sequeene">nce (Geene">nBaene">nk–n class="CellLine">MN985325) of the 6.3-kilobase region containing the S, ORF3, E, and M genes, inserted into the pSI vector (Promega). A double-mutant plasmid was then created by using site-directed mutagenesis to scramble 20 bases each at the 5’ and 3’ ends of the target (genome position 21,583 –ATTGCCACTAGTCTCTAGTC ➔ CCCTAATTGTTGAATCGCCT and genome position 27,169 –ATATTGCTTTGCTTGTACAG ➔ TCTGGTTGAGCTACTATTTA; Fig 1B). To prepare clonal RNA samples representing these two sequences, plasmids were linearized by digestion with AatII (FD0994, ThermoFisher Scientific) and in vitro transcribed using the MegaScript T7 Transcription Kit (AMB1334, ThermoFisher Scientific). Reactions were incubated at 4°C for 20 hr to minimize incomplete transcripts [35]. Plasmid DNA was then removed using the TURBO DNA-free kit (AM1907, ThermoFisher Scientific), and RNA was recovered by lithium chloride precipitation. The RNA was quantified on a Qubit Fluorometer and Quant-iT RNA assay kit (Q10213, Thermofisher Scientific) and analyzed by electrophoresis with E-Gel EX Agarose Gels 1% (G401001, Thermofisher Scientific). Extracted RNA from the 4th Vero cell passage of the n class="Species">SARS-CoV-2 WA-1 clinical isolate was obtained from the BEI Resource (catalog #NR-52285). Nasopharygneal or oropharyngeal swabs from study participants were collected in viral transport medium and cryopreserved until processing.

SARS-CoV-2 RNA quantification

Total RNA was extracted from oropharyene">ngeal aene">nd nasopharyene">ngeal swab specimeene">ns using the QIAamp Viral Rn class="Gene">NA Mini Kit (Qiagen, Germantown, MD, USA) according to the manufacturer’s protocols. The QX200 AutoDG Droplet Digital PCR System (Bio-Rad, Hercules, CA, USA) was used to detect and quantify SARS-CoV-2 RNA using the SARS-CoV-2 Droplet Digital PCR Kit (Bio-Rad), which contains a triplex assay of primers/probes aligned to the CDC markers for SARS-CoV-2 N1 and N2 genes and human RPP30 gene. 96-well plates were prepared with technical replicates containing 5.5 μL RNA per well. Microdroplet generation was performed on the QX200 Automated Droplet Generator (Bio-Rad), and plates were sealed with the PX1 PCR Plate Sealer (Bio-Rad) before proceeding with RT-PCR on the C1000 Touch Thermal Cycler (Bio-Rad) according to the manufacturer’s instructions. Plates were read on the QX200 Droplet Reader (Bio-Rad) and analyzed using the freely available QuantaSoft Analysis Pro Software (Bio-Rad) to quantify copies of N1, N2, and RPP30 genes per well, which was then normalized to mL of sample input.

HT-SGS sample preparation and sequencing

Nasopharygene">neal or oropharyene">ngeal swab fluids were thawed aene">nd ceene">ntrifuged at 1,150 x g for 15 min at room temperature to pellet cells aene">nd debris. Supernataene">nts were traene">nsferred to separate tubes, aene">nd supernataene">nt aene">nd pellet fractioene">ns were processed in parallel, although SGS derived from these two fractioene">ns were subsequeene">ntly fouene">nd to be similar aene">nd were thus pooled for each sample in the final aene">nalysis. n class="Gene">Nucleic acids were extracted from supernatants and pellets using the QIAamp Viral RNA Mini Kit (52906, Qiagen) according to the manufacturer’s instructions. Sample RNA was reverse traene">nscribed with SuperScript IV Reverse Traene">nscriptase (18090010, ThermoFisher Scieene">ntific) using aene">n RT primer binding within the n class="Species">SARS-CoV-2 ORF6 gene (TCTCCATTGGTTGCTCTTCATCT, WA-1 reference positions 27,357–27,379). The RT primer also included an 8-base UMI (NNNNNNNN) and an outer reverse primer binding site for PCR amplification (CCGCTCCGTCCGACGACTCACTATA; see S1 Fig and S1 Table). Virus cDNA was treated with proteinase K for 25 min at 55°C with continuous shaking to remove residual protein [36], followed by purification with a 2.2:1 volumetric ratio of RNAClean XP solid phase reverse immobilization (SPRI) beads (A63987, Beckman Coulter). Copy numbers of resulting cDNAs were determined by limiting-dilution PCR using fluorescence-assisted clonal amplification (FCA) [37] and a gene-specific primer pair detecting a region upstream of the S gene (S2 Table). Subsequently, cDNA molecules were amplified using the Advantage 2 PCR kit (639206, Takara Bio) with initial denaturation at 95°C for 1 min, followed by 30 cycles of denaturation at 95°C for 10 sec, annealing at 64°C for 30 sec, and extension at 68°C for 7 min, followed by one final extension at 68°C for 10 min. Each PCR reaction was run in a 20 μL volume with final primer concentration of 400 nM. Primers included the outer reverse primer and one of two different forward primers (S2 Table). Amplified DNA was quantified on a Qubit Fluorometer (Thermofisher Scientific) and analyzed by electrophoresis with precast 1% agarose gel (Embi Tec) or the Agilent High Sensitivity DNA kit (5067–4626, Agilent). Amplified DNA products spanning the 6.1-kilobase virion surface protein gene region of SARS-CoV-2 with single-genome UMI-based tagging were incorporated into sequencing libraries using the SMRTbell Express Template Prep Kit 2.0 (100-938-900, Pacific Biosciences) and Barcoded Overhang Adapters (101-629-000, Pacific Biosciences) to enable sample multiplexing. Libraries were prepared for sequencing by primer annealing and polymerase binding using the Sequel II Binding Kit 2.0 and Int Ctrl 1.0 (101-842-900, Pacific Biosciences), and were sequenced by single-molecule, real-time (SMRT) sequencing using a Sequel II system 2.0 (Pacific Biosciences) with a 30-hour movie time under circular consensus sequencing (CCS) mode.

HT-SGS initial data processing

Circular consensus sequences (CCS) were generated from SMRT sequencing data with minimum predicted accuracy of 0.99 and minimum number of passes of 3 in Pacific Biosciences SMRT Link (v8.0) using Arrow modeling framework [38]. CCS reads were then demultiplexed using Pacific Biosciences barcode demultiplexer (lima) to identify barcode sequences. The resulting FASTA files were reoriented into 5’-3’ direction using the usearch -orient command in USEARCH (v8.1.1861) [39]. Cutadapt (v2.7) [40] was used to trim forward and reverse primers. Length filtering was performed to remove reads shorter than 90% or longer than 130% of the reference sequence length. Appropriately-sized reads were then binned using 8-base UMI sequeene">nces. The read couene">nt in each n class="Chemical">UMI bin was plotted against the rank of that UMI bin (on log scale) within the sample, and the inflection point (i.e., point of concavity change) was calculated (S2B Fig). UMI bins with read counts lower than the inflection point were discarded, leaving UMI bins with higher counts. Cutadapt (v2.7) was used to remove the RT primer and UMI sequences from each UMI bin consensus to obtain the SARS-CoV-2 insert sequence for that bin. Consensus sequences were generated for each bin using the usearch-cluster_fast command based on 99% identity to obtain high-confidence single-molecule sequences. Consensus sequences were then analyzed by searching the BLAST nt database, and non-coronavirus sequences thus identified were discarded.

Determining SGS in HT-SGS data

The probability that two independent UMI sequeene">nces differ by a single nucleotide substitutioene">n (i.e., have aene">n edit distaene">nce of 1 base) caene">n be estimated using binomial distributioene">n with parameters n = 8 aene">nd p = 0.75, where n is the number of indepeene">ndeene">nt n class="Chemical">UMI bases and p is the probability that a base differs between two UMIs. Therefore, the probability of any two independent UMIs having edit distance one is B(8,.75, 1) = 3.6E−4. Hence, it is appropriate to assume that two UMI sequences having edit distance 1 could represent a scenario where one of the UMIs is derived from the other through PCR and/or sequencing error. To identify and remove potential false UMI bins, we utilized a UMI network method [41]. In this network, each UMI sequence is represented by a node. Given two distinct nodes a and b with read counts n and n, respectively (assume n≥n), a and b are connected by an edge if they have edit distance 1 and satisfy the following count criterion: n≥2n–1. To resolve the network formed above, we applied the adjacency method [41]. According to this method, the node with the largest count was selected and all connected nodes were removed. Next, the node with the second largest count was selected and all connected nodes were removed. This process was repeated until no more edges remained in the network. The adjacency method allowed resolution of a complex network to a single node. To further reduce the likelihood of including false UMI bins in downstream analysis, we combined our network adjacency approach with a knee point (i.e., point of maximum curvature) filter (S2C Fig) to ensure that UMIs with large total counts were preserved. Inflection and knee points can both be considered as separations between the high- and low-count UMI bins, and both depend on the shape of the count distribution. The knee point is more conservative in comparison to the inflection point. We used the knee rather than the inflection point at this stage in order to provide a more stringent threshold for removing false bins. To identify virus haplotypes defined by the data, we took the consensus sequences of all UMI bins and collapsed non-unique sequences. We considered the unique sequences bearing different combination of mutations as individual haplotypes. Finally, we manually inspected alignments of remaining UMI bin consensus sequences and removed any sequence that represented a SARS-CoV-2 haplotype observed in only one UMI bin for the sample in which it was found.

UMI collision estimates

We investigated the possibility of UMI collisioene">n (two distinct molecules labeled with the same n class="Chemical">UMI) based on the assumption of uniformly distributed UMIs. As described by Fu et al. [42], the expected number of unique UMIs captured is k = , where n is the number of molecules and m is the size of UMI pool. Therefore, Given the number of observed unique UMIs in a particular sample and UMI pool of size 48≈65000, we estimated the number of molecules and calculated the number of UMI collisions, (n-k), for each sample. This number was observed to be small, and the probability of collision in each sample was at most 4%, with an average 1.8% across all samples. We also note that, in the event of a UMI collision between two distinct sequences, the clustering and consensus formation for each UMI bin described above and in S2 Fig results in preservation of the sequence cluster with higher read abundance and removal of the sequence cluster with lower read abundance.

Variant calling

Despite high single molecule read accuracy (>99.9%) of Pacific Bioscience HiFi reads, some sequencing errors–particularly small indels–may persist in the reads after applying CCS read correction. These errors and those that may arise during conversion of RNA to cDn class="Gene">NA may not be identified by our extensive UMI-based error correction method. To distinguish such errors from real biological variation, we used ‘Map Long reads to reference’ tool in ‘Long read support’ plugin in the CLC Genomics Workbench v.20.0.4 (GWB) with default settings. This tool utilizes Minimap2 to map long reads [43]. We used the WA-1 reference sequence (GenBank accession: MN985325.1) as a reference during mapping. We employed the Low Frequency variant caller in the GWB with the following settings: We also applied a filtering criterion to remove variants in homopolymer regions with minimum length of 2 and a frequency less than or equal to 20%. We did not consider quality or direction and position filters typically used in analyzing paired-end, short-read data as these do not apply to long-read amplicon sequencing. We then manually inspected the mutation list to remove presumptive artifacts that were missed by the variant callers. The positions identified in our high-confidence variants list were then masked in the read mapping and bases in all other positions were reverted to the reference base, where applicable, using an in-house python script.

Analysis of serum antibody binding to SARS-CoV-2 spike protein

Domain-specific antibody competition assays using a His-tagged SARS-CoV-2 Spike protein ectodomain coene">ntaining 2 n class="Chemical">proline stabilization mutations (S-2P) [44] were performed using a fortéBio Octet HTX instrument and His1K (anti-penta His) biosensors at 30°C with agitation set to 1,000 rpm. Biosensors were first equilibrated for 600 seconds in PBS supplemented with 1% BSA, 0.01% Tween-20, and 0.02% sodium azide (PBS-BSA). Purified S-2P (10 μg/mL in PBS-BSA) was immobilized on equilibrated His1K sensors for 600 s. S-2P protein loading onto to the sensors was between 0.9 and 1.3 nm shift. Following S-2P immobilization, biosensors were equilibrated in PBS-BSA for 60 s. S-2P coated biosensors were submerged in either S-2P binding-domain specific competitor monoclonal antibodies (mAb) or negative control antibody, each at 10 μg/mL in PBS-BSA, for 600 s. At 600 s, the binding of all S-2P binding antibodies was saturating. Competitor mAbs were divided into three separate groups, each targeting a binding domain of S-2P: RBD, NTD, and S2 domain. Monoclonal antibodies included were composed of human IgG RBD-specific antibodies LY-CoV-555 [45], S309 [46], CR3022 [47], and CB6 [48], NTD-specific antibodies S652-118 [49], 4–8 [50] and 4A8 [24] and S2-specific antibody S652-112 [49]. Following saturating competitor mAb association, biosensors were equilibrated in PBS-BSA for 60 s and then submerged in serum samples diluted 100-fold in PBS-BSA for 3600 s. Raw sensorgrams datapoints were aligned to Y (nm) = 0 in at the beginning of the second association phase. Competition and serum shift were analyzed when the serum samples reached saturation (4001.2 s). Pie charts depict each binding domain’s relative contribution to the overall serum antibody binding to S-2P, as determined by percent competition. Percent competition (% C) of serum antibody binding to S-2P by competitor mAb groups was calculated using the following formula: % C = [1 –(shift nm value at 4001.2 s in presence of competitor mAb)/(shift nm value at 4001.2 s in presence of negative control antibody)]*100. All assays were performed in duplicate.

Details of HT-SGS process from sample to sequencing.

SARS-CoV-2 geene">nomic Rn class="Gene">NA (gRNA) is reverse-transcribed with a primer that binds in ORF6, downstream of the M gene stop codon, and includes a UMI sequence of 8 random nucleotides flanked by a PCR reverse primer binding site. Reverse-transcription products are amplified by PCR using a forward primer that binds in ORF1, upstream of the spike gene start codon. Amplified products are then subjected to long-read sequencing. (PDF) Click here for additional data file.

Details of HT-SGS data analysis.

(A) Bioinformatic pipeline, depicting sequential workflow steps and tools used. Black boxes show tasks at each step, with the tools used in the grey boxes, and the outputs in the blue bubbles. (B) Initial exclusion of false UMI bins based oene">n read couene">nt distributioene">n oene">n a log scale. The dashed line indicates the read couene">nt inflectioene">n point below which n class="Chemical">UMI bins in this sample were excluded. (C) Final exclusion of low count UMI bins based on read count distribution on a log scale. The dashed line indicates the read count knee point below which UMI bins in this sample were excluded, following initial false bin removal from the sample and network adjacency. Data are presented for the cultured virus sample presented in Fig 2. (PDF) Click here for additional data file.

Relationships between inputs and yields of steps in the HT-SGS data generation process.

(A) Comparison of virus load of original sample with total cDNA syene">nthesis yield. (B) Comparisoene">n of cDn class="Gene">NA input copies from each sample with final SGS counts. (PDF) Click here for additional data file.

Effect of downsampling on haplotype detection.

Each subsample was generated by random draws of a fixed percentage from reads without replacement. This process was repeated 100 times for each percentage. (A) The initial numbers of UMI bins (y-axis) are showene">n for differeene">nt degrees of dowene">nsampling (x-axis). (B) The minimum read couene">nts per n class="Chemical">UMI bin (y-axis) are shown for different degrees of downsampling (x-axis). (C) Proportion of each haplotype present in the 100% sample and in each subsample. Data analyzed are from sequencing of participant 1, day 15. (PDF) Click here for additional data file.

Clinical characteristics of study participants.

(PDF) Click here for additional data file.

Primer sequences used in HT-SGS procedures for this study.

(PDF) Click here for additional data file. 28 Jan 2021 Dear Dr. Boritz, Thank you very much for submitting your manuscript "High-Throughput, Single-Copy Sequencing Reveals SARS-CoV-2 Spike Variaene">nts Coincident with Mouene">nting Humoral Immuene">nity during Acute n class="Disease">COVID-19" for consideration at PLOS Pathogens. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. Thank you for submitting to PLOS Pathogens. This is a nice piece of work that both reviewers appreciated. Both reviewers have constructive criticisms, that if incorporated would make for a much improved manuscript. Reviewer 1 in particular details a few experiments to get at the issue of method accuracy. Addressing this in some way is important for this promising method and interesting results. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Adam S. Lauring Section Editor PLOS Pathogens Adam Lauring Section Editor PLOS Pathogens Kasturi Haldar Editor-in-Chief PLOS Pathogens orcid.org/0000-0001-5065-158X Michael Malim Editor-in-Chief PLOS Pathogens orcid.org/0000-0002-7699-2064 *********************** Thank you for submitting to PLOS Pathogens. This is a nice piece of work that both reviewers appreciated. Both reviewers have constructive criticisms, that if incorporated would make for a much improved manuscript. Reviewer 1 in particular details a few experiments to get at the issue of method accuracy. Addressing this in some way is important for this promising method and interesting results. Reviewer's Responses to Questions Part I - Summary Please use this section to discuss strengths/weaknesses of study, novelty/significance, general execution and scholarship. Reviewer #1: Ko et al. develop a new method for sequencing single virus genomes with long-read technology and apply it to a cell culture adapted virus and to 7 clinical SARS-CoV2 samples. The sequencing method developed in this maene">nuscript seems quite useful, with the clear benefit of enabling haplotype resolutioene">n aene">nd mutatioene">n linkage. The authors also present aene">n interesting case of within-host evolutioene">n that is correlated to aene">ntibody development. This is of particular interest now aene">nd provides a useful coene">ntributioene">n to the discussioene">n of the role of immuene">ne-compromised hosts to n class="Species">SARS-CoV2 evolution. Overall, I think that the authors conclusions and findings are reasonable and interesting, and that the method could be quite useful. However, some of the results are currently difficult to interpret without some additional controls to validate the sequencing method for its robustness across viral RNA input copies and total number of output single consensus genomes. I suggest some specific controls that I think would be useful for clarifying these points, as well as some points in the Methods that could be further clarified. Reviewer #2: Ko et al. provide a timely report about sequence diversity in the Spike gene of n class="Species">SARS-CoV-2. They analyze both virus that has been grown in tissue culture and virus isolated from the respiratory tract from infected people. There is a critical need to follow Spike protein evolution both within and between people as this could lead to new strains that could escape the anticipated protection that will be provided by the current vaccines. The authors improve on existing sequencing approaches by using an SGS approach but at higher per-specimen sampling than is typically done, and this is done over a relatively large region of the genome (wedding the deep part of "deep sequencing" with SGS accuracy for sampling individual genomes). What has seemed like an obvious approach of using a UMI with PacBio long read technology has been difficult due to the high error rate of PacBio which makes it hard to accurately identify the UMI. Overcoming this problem will be a welcome addition to the field. The number of long reads one can get from a sample is limited by the number of long cDNA molecules that RT can generate (i.e. the number of templates), in this case 6.1 kb. The relatively high titers of SARS-CoV-2 samples conveniently overcomes this limitation by providing a larger number of starting RNA templates. The paper does an elegant job of linking the antibody response to longitudinal changes in the Spike protein sequence and antibody escape in one case but a case that is very informative. This is an important piece of work. I have only a few technical questions. The biology in the paper is on mark addressing important questions. ********** Part II – Major Issues: Key Experiments Required for Acceptance Please use this section to detail the key new experiments or modifications of existing experiments that should be absolutely required to validate study conclusions. Generally, there should be no more than 3 such required experiments or major modifications for a "Major Revision" recommendation. If more than 3 experiments are necessary to validate the study conclusions, then you are encouraged to recommend "Reject". Reviewer #1: Major comments: 1. The authors document very reasonable and extensive bioinformatic controls for sequence quality, UMI coene">nsolidatioene">n, aene">nd coene">nsensus genome calling. However, the authors have not directly performed validatioene">n experiments that benchmark a. this method’s intrinsic error rate oene">n a sample with no variatioene">n or b. its ability to correctly characterize within-host haplotype frequencies across various Rn class="Gene">NA input quantities and read counts. In my opinion, the authors need to perform some additional validation experiments to address these concerns. To address point a, the authors should perform a simple validation experiment in which the authors sequence a clonal RNA transcript and report the number of detected haplotypes and their frequencies. This would provide a raw error rate of their method, and provide a more direct way of measuring the error rate of their RT and PCR protocols than by relying on the variant calling and error correction method described in the Methods. Suggestions for addressing point b are below in comments 2 and 3. 2. Given the known sensitivity of within-host variant detection to input RNA copies, the authors should report in a table somewhere in the main text the number of Rn class="Gene">NA copies in each sample that is sequenced. 3. In the authors sequencing results, there is a wide range of the number of total single-genome consensus sequences generated for each sample. While in some samples the number is reasonably high >1000), in patients 6 aene">nd 7 aene">nd n class="Species">patient 1 on days 13 and 17, the total number of single genomes is quite low. The sequencing results from these samples are quite difficult to interpret as is. It seems to me that greater single genome consensus sequences should provide higher resolution of within-host haplotype frequencies, and I would guess that samples with higher viral RNA contents generally produce more single genome consensus sequences. However, these points are not made explicit and the authors have not designated a cutoff for either a viral load necessary for this method to be successful or for a number of successfully sequenced single genome consensus sequences for this method to accurately represent the within-host diversity of the sample. These values will be quite important for readers who may be interested in applying this new method, and are also important for interpreting the results shown in the paper. I would suggest that following revisions and control experiments, or something that accomplishes similar goals: a. The authors should discuss in the text whether the number of single genome consensus sequences is correlated to the viral load of the original sample. A plot showing this relationship would be helpful. This is particularly important because the authors remark throughout the paper that their method can be used to characterize thousands of individual viral genomes, but they only ever report, at maximum, ~1200 single genomes for a given sample. b. The authors could perform some sort of serial dilution experiment in which a known variant haplotype is spiked into a wild type, cloene">nal Rn class="Gene">NA at some frequency, and then serial dilutions are generated and sequenced. This would provide important information about how to interpret haplotype frequencies derived from low copy number samples. I realize that this is bit more work, but it would provide information that would be quite useful. Alternatively, if the authors can perform some sort of experiment that clarifies the relationship between viral load and Method accuracy, that would be great. c. The authors should assess in some way how the total number of unique single genome consensus sequences impacts haplotype detection. This could be accomplished bioinformatically by randomly subsampling down one of their samples with >1000 single genome consensus sequences to varying degrees, and comparing the output. This sort of information would be really helpful for interpreting the haplotype results for the samples with very low numbers of single genome consensus sequences. While all three of these experiments may not be absolutely necessary, I do think that these questions should be addressed in some way in the revised version of the manuscript. Reviewer #2: Noene">ne ********** Part III – Minor Issues: Editorial and Data Presentation Modifications Please use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. Reviewer #1: Minor comments and points of clarification: 1. In the Methods, the authors report using 1000 cDNA input copies into the PCR reactioene">n. This point was not clear to me. The authors are able to generate >1000 uene">nique single genome coene">nsensus sequences for some samples, despite oene">nly using 1000 cDn class="Gene">NA input copies. How is this possible? Is this just pipetting error? Why did the authors choose to limit their input to 1000 cDNA copies? The authors state multiple times in the manuscript that their method is designed to sequence thousands of individual viral genomes, but it seems that by limiting the copy numbers that they are by definition restricting their sequencing to ~1000 copies. The authors should clarify this in the text. 2. If possible, I would find it very helpful to see a calculation of: given the number of input RNA molecules, what is the probability that 2 different Rn class="Gene">NAs get labelled with the same UMI? Alternatively, how much in excess is the UMI vs the RNA? Some sense for this would be helpful. 3. On lines 211-216, the authors discuss the amount of virus in patient 1 as comparable to other, acutely n class="Disease">infected patients, but do not discuss the duration of shedding. While the viral load is certainly important for within-host viral evolution, the duration of shedding is also critical given that it determines whether within-host viral replication is still occurring at the time of antibody development. The authors should acknowledge this and perhaps could cite some literature describing within-host evolution during prolonged infections. 4. The description of the node collapsing and adjacency method could be clarified a bit. It sounds to me like what is happening is that you connect UMIs that are separated by 1 aene">nd meet this number criteria aene">nd then you collapse them all into 1. Is that right? It is also not immediately clear to me what the difference between the knee point aene">nd inflectioene">n point is. The authors could add a sentence in the Methods to describe this for readers who may not be familiar with this. 5. It is not clear to me exactly when the variant calling correction is implemented. Is this done on the consensus sequences of each UMI bin? Or is this doene">ne oene">n the raw reads themselves? 6. On line 263: “Supernatants were transferred to separate tubes, and supernatant and pellet fractions were processed in parallel, although single-genome sequences derived from these two fractions were subsequently found to be similar and were thus pooled for each sample in the final analysis.” Do the authors mean that the pellets and supernatants were combined for one particular set of samples, or that the reads were combined later? Were pellets and supernatants initially separated to compare within-host variants in genomic RNA vs. cell-associated Rn class="Gene">NA? 7. On line 304, what is the “log rank of that UMI bin within the sample”? I had assumed initially that the log raene">nk was just aene">n ordered list of n class="Chemical">UMIs, ordered by read count, but was unsure. Also, is Figure S2 showing the results for the WA-1 BEI sample or some other sample? Reviewer #2: 1. The genomes appear to contain mostly single nucleotide polymorphisms with but maybe some examples of polymorphisms in two different genes. To get at the issue of recombination during PCR (which always happens) it would be good to be clear whether or not the linkage is perfect in the sequenced genomes or whether there has been any scrambling of the markers between genomes. Scrambling could have occurred in vivo or during PCR, but an absence of scrambling would indicate that it didn't happen during PCR. It still happens during PCR but the ability to build a consensus sequence for each template masks those events. 2. Since cDNA syene">nthesis has poor processivity it would be interesting to kene">now how maene">ny geene">nome sequeene">nces are obtained as a fuene">nctioene">n of copies of Rn class="Gene">NA used in the initial cDNA reaction, i.e. what is the efficiency of template utilization in this sequencing approach? 3. RT has a reported incorporation error rate of 1 in 10,000 nucleotides during cDNA syene">nthesis (I believe listed in the maene">nufacturers informatioene">n). This would meaene">n that every secoene">nd genome would have aene">n RT error incorporated. Iene">n this scenario a pool of identical Rn class="Gene">NA sequences would have the occasional difference due to RT during cDNA synthesis and still have its own UMI tag so artificial diversity. However, it seems that the authors are looking at homogeneous populations that defy this presumption. One way to address this question is to look at the same population twice. If the low level diversity is the result of RT error then it should not be reproducible. If it is real then it will repeat (given sufficient sampling). This is an important question both for the technique and for understanding true low level diversity in these viral populations. ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here on PLOS Biology: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see 23 Feb 2021 Submitted filename: Ko_ResponseLetter_02222021corrected.docx Click here for additional data file. 28 Feb 2021 Dear Dr. Boritz, We are pleased to inform you that your manuscript 'High-Throughput, Single-Copy Sequencing Reveals SARS-CoV-2 Spike Variaene">nts Coincident with Mouene">nting Humoral Immuene">nity during Acute n class="Disease">COVID-19' has been provisionally accepted for publication in PLOS Pathogens. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will oene">nly permit correctioene">ns to spelling, formatting or sigene">nificaene">nt scientific errors from this point oene">nwards. Requests for major chaene">nges, or aene">ny which affect the scientific uene">nderstaene">nding of your work, will cause delays to the publicatioene">n date of your maene">nuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Pathogens. Best regards, Adam S. Lauring Section Editor PLOS Pathogens Adam Lauring Section Editor PLOS Pathogens Kasturi Haldar Editor-in-Chief PLOS Pathogens orcid.org/0000-0001-5065-158X Michael Malim Editor-in-Chief PLOS Pathogens orcid.org/0000-0002-7699-2064 *********************************************************** Nice work! Thaene">nk you for submitting to PLOS Pathogens. Your attention to the reviewers' comments is much appreciated. Reviewer Comments (if any, and for reference): 11 Mar 2021 Dear Dr. Boritz, We are delighted to inform you that your manuscript, "High-Throughput, Single-Copy Sequencing Reveals SARS-CoV-2 Spike Variaene">nts Coincident with Mouene">nting Humoral Immuene">nity during Acute n class="Disease">COVID-19," has been formally accepted for publication in PLOS Pathogens. We have now passed your article onto the PLOS Production Department who will complete the rest of the pre-publication process. All authors will receive a confirmation email upon publication. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Froene">nt Matter articles (Pearls, Reviews, Opinioene">ns, etc...) are geene">nerated oene">n a differeene">nt schedule aene">nd may not be made available as quickly. Soon after your final files are uploaded, the early version of your manuscript, if you opted to have an early version of your article, will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Pathogens. Best regards, Kasturi Haldar Editor-in-Chief PLOS Pathogens orcid.org/0000-0001-5065-158X Michael Malim Editor-in-Chief PLOS Pathogens orcid.org/0000-0002-7699-2064

44 in total

1. Adaptation of tick-borne encephalitis virus to BHK-21 cells results in the formation of multiple heparan sulfate binding sites in the envelope protein and attenuation in vivo.

Authors: C W Mandl; H Kroschewski; S L Allison; R Kofler; H Holzmann; T Meixner; F X Heinz
Journal: J Virol Date: 2001-06 Impact factor: 5.103

2. Search and clustering orders of magnitude faster than BLAST.

Authors: Robert C Edgar
Journal: Bioinformatics Date: 2010-08-12 Impact factor: 6.937

3. Potent neutralizing antibodies against multiple epitopes on SARS-CoV-2 spike.

Authors: Lihong Liu; Pengfei Wang; Manoj S Nair; Jian Yu; Micah Rapp; Qian Wang; Yang Luo; Jasper F-W Chan; Vincent Sahi; Amir Figueroa; Xinzheng V Guo; Gabriele Cerutti; Jude Bimela; Jason Gorman; Tongqing Zhou; Zhiwei Chen; Kwok-Yung Yuen; Peter D Kwong; Joseph G Sodroski; Michael T Yin; Zizhang Sheng; Yaoxing Huang; Lawrence Shapiro; David D Ho
Journal: Nature Date: 2020-07-22 Impact factor: 49.962

4. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.

Authors: Daniel Wrapp; Nianshuang Wang; Kizzmekia S Corbett; Jory A Goldsmith; Ching-Lin Hsieh; Olubukola Abiona; Barney S Graham; Jason S McLellan
Journal: Science Date: 2020-02-19 Impact factor: 47.728

5. Comparative Evaluation of Three Preprocessing Methods for Extraction and Detection of Influenza A Virus Nucleic Acids from Sputum.

Authors: Fei Yu; Ting Qiu; Ying Zeng; Yiyin Wang; Shufa Zheng; Xiao Chen; Yu Chen
Journal: Front Med (Lausanne) Date: 2018-03-02

6. Evolution of viral quasispecies during SARS-CoV-2 infection.

Authors: Aude Jary; Valentin Leducq; Isabelle Malet; Stéphane Marot; Elise Klement-Frutos; Elisa Teyssou; Cathia Soulié; Basma Abdi; Marc Wirden; Valérie Pourcher; Eric Caumes; Vincent Calvez; Sonia Burrel; Anne-Geneviève Marcelin; David Boutolleau
Journal: Clin Microbiol Infect Date: 2020-07-24 Impact factor: 13.310

7. SARS-CoV-2 exhibits intra-host genomic plasticity and low-frequency polymorphic quasispecies.

Authors: Timokratis Karamitros; Gethsimani Papadopoulou; Maria Bousali; Anastasios Mexias; Sotirios Tsiodras; Andreas Mentis
Journal: J Clin Virol Date: 2020-08-11 Impact factor: 3.168

8. Viral dynamics in mild and severe cases of COVID-19.

Authors: Yang Liu; Li-Meng Yan; Lagen Wan; Tian-Xin Xiang; Aiping Le; Jia-Ming Liu; Malik Peiris; Leo L M Poon; Wei Zhang
Journal: Lancet Infect Dis Date: 2020-03-19 Impact factor: 25.071

9. Structure-Based Design with Tag-Based Purification and In-Process Biotinylation Enable Streamlined Development of SARS-CoV-2 Spike Molecular Probes.

Authors: Tongqing Zhou; I-Ting Teng; Adam S Olia; Gabriele Cerutti; Jason Gorman; Alexandra Nazzari; Wei Shi; Yaroslav Tsybovsky; Lingshu Wang; Shuishu Wang; Baoshan Zhang; Yi Zhang; Phinikoula S Katsamba; Yuliya Petrova; Bailey B Banach; Ahmed S Fahad; Lihong Liu; Sheila N Lopez Acevedo; Bharat Madan; Matheus Oliveira de Souza; Xiaoli Pan; Pengfei Wang; Jacy R Wolfe; Michael Yin; David D Ho; Emily Phung; Anthony DiPiazza; Lauren A Chang; Olubukola M Abiona; Kizzmekia S Corbett; Brandon J DeKosky; Barney S Graham; John R Mascola; John Misasi; Tracy Ruckwardt; Nancy J Sullivan; Lawrence Shapiro; Peter D Kwong
Journal: Cell Rep Date: 2020-10-12 Impact factor: 9.423

10. A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants.

Authors: Bethany Dearlove; Eric Lewitus; Hongjun Bai; Yifan Li; Daniel B Reeves; M Gordon Joyce; Paul T Scott; Mihret F Amare; Sandhya Vasan; Nelson L Michael; Kayvon Modjarrad; Morgane Rolland
Journal: Proc Natl Acad Sci U S A Date: 2020-08-31 Impact factor: 12.779

10 in total

1. Structural basis for potent antibody neutralization of SARS-CoV-2 variants including B.1.1.529.

Authors: Tongqing Zhou; Lingshu Wang; John Misasi; John R Mascola; Nancy J Sullivan; Peter D Kwong; Amarendra Pegu; Yi Zhang; Darcy R Harris; Adam S Olia; Chloe Adrienna Talana; Eun Sung Yang; Man Chen; Misook Choe; Wei Shi; I-Ting Teng; Adrian Creanga; Claudia Jenkins; Kwanyee Leung; Tracy Liu; Erik-Stephane D Stancofski; Tyler Stephens; Baoshan Zhang; Yaroslav Tsybovsky; Barney S Graham
Journal: Science Date: 2022-04-22 Impact factor: 63.714

Review 2. Passive Immunity Should and Will Work for COVID-19 for Some Patients.

Authors: Nevio Cimolai
Journal: Clin Hematol Int Date: 2021-04-16

3. Spread of Gamma (P.1) Sub-Lineages Carrying Spike Mutations Close to the Furin Cleavage Site and Deletions in the N-Terminal Domain Drives Ongoing Transmission of SARS-CoV-2 in Amazonas, Brazil.

Authors: Felipe Gomes Naveca; Valdinete Nascimento; Victor Souza; André de Lima Corado; Fernanda Nascimento; George Silva; Matilde Contreras Mejía; Maria Júlia Brandão; Ágatha Costa; Débora Duarte; Karina Pessoa; Michele Jesus; Luciana Gonçalves; Cristiano Fernandes; Tirza Mattos; Ligia Abdalla; João Hugo Santos; Alex Martins; Fabiola Mendonça Chui; Fernando Fonseca Val; Gisely Cardoso de Melo; Mariana Simão Xavier; Vanderson de Souza Sampaio; Maria Paula Mourão; Marcus Vinícius Lacerda; Érika Lopes Rocha Batista; Alessandro Leonardo Álvares Magalhães; Nathânia Dábilla; Lucas Carlos Gomes Pereira; Fernando Vinhal; Fabio Miyajima; Fernando Braga Stehling Dias; Eduardo Ruback Dos Santos; Danilo Coêlho; Matheus Ferraz; Roberto Lins; Gabriel Luz Wallau; Edson Delatorre; Tiago Gräf; Marilda Mendonça Siqueira; Paola Cristina Resende; Gonzalo Bello
Journal: Microbiol Spectr Date: 2022-02-23

4. Designing AbhiSCoVac - A single potential vaccine for all 'corona culprits': Immunoinformatics and immune simulation approaches.

Authors: Abhigyan Choudhury; Parth Sarthi Sen Gupta; Saroj Kumar Panda; Malay Kumar Rana; Suprabhat Mukherjee
Journal: J Mol Liq Date: 2022-01-31 Impact factor: 6.165

5. Prediction of SARS-CoV-2 Variant Lineages Using the S1-Encoding Region Sequence Obtained by PacBio Single-Molecule Real-Time Sequencing.

Authors: Sébastien Lhomme; Justine Latour; Nicolas Jeanne; Pauline Trémeaux; Noémie Ranger; Marion Migueres; Gérald Salin; Cécile Donnadieu; Jacques Izopet
Journal: Viruses Date: 2021-12-18 Impact factor: 5.048

Review 6. Antigenic drift: Understanding COVID-19.

Authors: Jonathan W Yewdell
Journal: Immunity Date: 2021-12-14 Impact factor: 31.745

7. Intrahost SARS-CoV-2 k-mer identification method (iSKIM) for rapid detection of mutations of concern reveals emergence of global mutation patterns.

Authors: Ashley Thommana; Migun Shakya; Jaykumar Gandhi; Christian K Fung; Patrick S G Chain; Irina Maljkovic Berry; Matthew A Conte
Journal: bioRxiv Date: 2022-08-16

8. Whole-genome sequencing of SARS-CoV-2: Comparison of target capture and amplicon single molecule real-time sequencing protocols.

Authors: Florence Nicot; Pauline Trémeaux; Justine Latour; Nicolas Jeanne; Noémie Ranger; Stéphanie Raymond; Chloé Dimeglio; Gérald Salin; Cécile Donnadieu; Jacques Izopet
Journal: J Med Virol Date: 2022-09-03 Impact factor: 20.693

Review 9. Long-Reads-Based Metagenomics in Clinical Diagnosis With a Special Focus on Fungal Infections.

Authors: Minh Thuy Vi Hoang; Laszlo Irinyi; Yiheng Hu; Benjamin Schwessinger; Wieland Meyer
Journal: Front Microbiol Date: 2022-01-06 Impact factor: 5.640

10. Intra-Host SARS-CoV-2 Evolution in the Gut of Mucosally-Infected Chlorocebus aethiops (African Green Monkeys).

Authors: Lori A Rowe; Brandon J Beddingfield; Kelly Goff; Stephanie Z Killeen; Nicole R Chirichella; Alexandra Melton; Chad J Roy; Nicholas J Maness
Journal: Viruses Date: 2022-01-01 Impact factor: 5.048

10 in total