Literature DB >> 36194597

Phylodynamics of a regional SARS-CoV-2 rapid spreading event in Colorado in late 2020.

Kristen J Wade1, Samantha Tisa1, Chloe Barrington1, Jesslyn C Henriksen1, Kristy R Crooks2, Christopher R Gignoux2, Austin T Almand3, J Jordan Steel3, John C Sitko3, Joseph W Rohrer2, Douglas P Wickert3, Erin A Almand3, David D Pollock1, Olivia S Rissland1.   

Abstract

Since the initial reported discovery of SARS-CoV-2 in late 2019, genomic surveillance has been an important tool to understand its transmission and evolution. Here, we sought to describe the underlying regional phylodynamics before and during a rapid spreading event that was documented by surveillance protocols of the United States Air Force Academy (USAFA) in late October-November of 2020. We used replicate long-read sequencing on Colorado SARS-CoV-2 genomes collected July through November 2020 at the University of Colorado Anschutz Medical campus in Aurora and the United States Air Force Academy in Colorado Springs. Replicate sequencing allowed rigorous validation of variation and placement in a phylogenetic relatedness network. We focus on describing the phylodynamics of a lineage that likely originated in the local Colorado Springs community and expanded rapidly over the course of two months in an outbreak within the well-controlled environment of the United States Air Force Academy. Divergence estimates from sampling dates indicate that the SARS-CoV-2 lineage associated with this rapid expansion event originated in late October 2020. These results are in agreement with transmission pathways inferred by the United States Air Force Academy, and provide a window into the evolutionary process and transmission dynamics of a potentially dangerous but ultimately contained variant.

Entities:  

Mesh:

Year:  2022        PMID: 36194597      PMCID: PMC9531818          DOI: 10.1371/journal.pone.0274050

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.752


Introduction

The COVID-19 pandemic caused by SARS-CoV-2 has resulted in worldwide disruption and more than 6.4 million recorded deaths [Worldometers, July 28, 2022]. Sequencing the SARS-CoV-2 genome from infected individuals is an effective means to track the dispersion and prevalence of the virus. Genomic tracking allows researchers to model viral evolution and identify possible variants of concern. For example, sequencing provided strong evidence that the virus was transmitted locally within Washington state as early as January 2020 [1]. Similarly, sequencing allowed public health officials to track the rise and spread of the highly infectious Delta variant, enabling more responsive policies [2]. These sequencing efforts provide even greater power when coupled with viral evolutionary modelling (phylodynamics) in an epidemiological context [3, 4]. This type of combined approach for tracking and predicting viral transmission is known as genomic surveillance and is a critical component of the modern public health response to viral epidemics [2, 5, 6]. Due to the nature of the pandemic, many grass-roots sequencing efforts sprang up de novo around the world, leading to heterogeneity in sequencing quality and inconsistency in geographic sampling. For example, long-read Oxford Nanopore technology (ONT) with overlapping polymerase chain reaction (PCR) amplicons is commonly used to obtain sequences [7]. Long-read ONT has many advantages over short-read sequencing, including that it avoids having to connect short reads that fall away before making it through amplicons and avoids some deterministic errors of other technologies [8], but its high error rate poses other challenges including high false positive and false negative variant calls when comparing to a single reference sequence [8-11]. Because accumulated mutations are key data for inference of phylogenetics, convergence, and selected variants, it is important to be as confident as possible in mutational signatures. There has also been heterogeneity in sequencing between different areas. For instance, analyzing the GISAID dataset [12, 13], there was little sequencing of Colorado genomes for much of 2020, making it challenging to understand the landscape of SARS-CoV-2 variant origins and evolution of local transmission during this phase of the pandemic. Finally, there also are conceptual challenges for this type of genomic surveillance due to incomplete knowledge of the etiology of epidemics, including stochastic environmental effects, sociological response, and phenotypic variance [14]. In the case study presented here, we focus on providing a snapshot of infection dynamics in Colorado in August to November 2020. Sequence data was collected as RNA isolated from infected individuals sampled from two Colorado populations: samples collected for clinical testing at the University of Colorado (CU) Anschutz Medical Campus; and samples gathered predominantly from asymptomatic, randomly sampled individuals at the United States Air Force Academy (USAFA) who tested positive by PCR testing. Together, these samples represent a baseline for the two largest Colorado cities (Denver/Aurora and Colorado Springs) in late 2020 and document the rapid initial spread and subsequent containment of a highly-evolved variant. We used replicate sequencing to validate variants across multiple related genomes rather than relying on only a single distant reference sequence to validate variant calls (Fig 1). Having established a credible set of high confidence SARS-CoV-2 genomes, we then describe the phylodynamics of a rapid viral spread event within a relatively controlled environment, estimate its divergence from related samples and compare to its associated epidemiological data [15].
Fig 1

Flow chart of SARS-CoV-2 sequencing, variant prediction, variant quality analysis and relatedness network generation.

Blue boxes describe the purpose of each stage of analysis, free text designates the specific method used at each stage and orange arrows indicate points where steps were repeated iteratively until sufficient quality metrics were fulfilled.

Flow chart of SARS-CoV-2 sequencing, variant prediction, variant quality analysis and relatedness network generation.

Blue boxes describe the purpose of each stage of analysis, free text designates the specific method used at each stage and orange arrows indicate points where steps were repeated iteratively until sufficient quality metrics were fulfilled.

Materials and methods

Ethics statement

The United States Air Force Academy Institutional Review Board (IRB) determined the surveillance testing (FAC20200035E) was approved as Not Human Subjects Research in accordance with Title 32, Subtitle A, Chapter I, Subchapter M, Part 219: Protection of Human Subjects, Department of Defense Instruction 3216.02: Protection of Human Subjects and Adherence to Ethical Standards in Department of Defense-Conducted and–Supported Research and Air Force Instruction 40–402: Protection of Human Subjects in Biomedical and Behavioral Research.

Public release

This manuscript has been approved for public release: PA number: USAFA-DF-2022-648

Disclaimers

The views expressed in this paper are those of the authors and do not necessarily represent the official position or policy of the U.S. Government, the Department of Defense, or the Department of the Air Force.Use of the University of Colorado Biobank samples for this study was reviewed by the COVID-19 Biobanking Committee on May 20, 2020 and approved May 22, 2020 in letter communicated by Matthew J. Steinbeiss, Special Projects Manager, Office of Regulatory Compliance, University of Colorado Denver, Anschutz Medical Campus, and signed by Thomas Flaig, Vice Chancellor for Research, University of Colorado Denver, Anschutz Medical Campus. All samples were processed in the Rissland laboratory under IBC#1366. We were advised by Taylor Brumbelow (Human Research Protections, COMIRB Investigator Support, University of Colorado Denver, Anschutz Medical Campus, comirb@ucdenver.edu) on May 19, 2020, that our research qualifies as “non-human subjects research” because we did not have access to or use identifiers such as exact dates of service or admission, county, or zip code.

Nanopore long-read sequencing

Extracted RNA was obtained from either the University of Colorado BioBank (CU) or USAFA for samples collected between August and November 2020. Sequencing was performed according to the Nanopore Protocol for PCR tiling of SARS-CoV-2 (revision E, released Feb 6 2020) using the V3 primers (https://github.com/artic-network/artic-ncov2019/tree/master/primer_schemes/nCoV-2019/V3). 11 μL RNA was used for reverse transcription and initial amplicon PCR. Samples were processed in random order, and each was sequenced in at least two replicates. After the PCR and bead clean-up, samples were run on a 1.5% agarose gel, and those with visible bands at 400 bp were quantified using a Qubit fluorometer. Samples were end-prepped, barcoded and pooled together for sequencing. Samples were quantified using a Qubit fluorometer. The samples were then processed for downstream sequencing and analysis according to the Nanopore protocol. Sequencing was performed using R9.4.1 (FLO-MIN106D) Nanopore flow cell. Half of the prepared DNA library (7.5 ul) was diluted to a total volume of 12 uL prior to loading. A minimum of 40,000 reads was collected per barcoded sample. Samples with fewer than 40,000 reads were re-sequenced in later runs. Adapter sequences and primers were removed from reads with the Nanopore sequencing software MinKNOW. Genomes were sequenced in batches. A negative control (water) and a positive control (SARS-CoV-2 RNA from ATCC) were included in each batch of samples. At least two sequencing replicates were performed for each sample to confirm that variants were reproducible; in some cases, especially for low concentration samples, samples were sequenced three or four times to improve call certainty. In later runs, to add further robustness against possible experimental artifacts, the two sequencing replicates were separated, and the order of samples was randomized within a batch.

Read alignment to the Wuhan reference genome

Following sequencing, reads for each barcode [each barcode corresponding to an individual sample] in fastq format were aligned to the reference Wuhan SARS-CoV-2 genome (NC_045512.2) using mimimap2 [16]. Minimap2 parameters were run as follows for each barcode: -a–x splice–uf–k14 –secondary = no NC_045512.2.fasta. The resulting aligned reads were output into sam format and samtools [17, 18] was used to generate a binary alignment map (bam). BLAST was performed to verify that primers had been trimmed from read ends. Summary statistics describing read length, error rate, total number of reads, number of mapped reads, and other quality metrics were generated using samtools stats. Further sequencing quality assessment was performed with bedtools genomecov [19]. For each barcode, a coverage histogram (-ibam), coverage map across consecutive intervals (-ibam–bga) and coverage map at each single nucleotide (-ibam–d) were generated. Finally, an R package (minionCovidCoverage.R) was used to visualize the genome-wide coverage distribution for each barcode individually. A custom python script (findLowQualBases.py) was used to obtain coordinates of low-quality genome sequence and stored in a bed file to mask the low-quality regions from the consensus sequence at a later step. Steps described here were called from a pipeline wrapper script, runCovidSeqs.sh (Code available at: ).

Sequencing quality assessment

Using the aforementioned coverage and read information, each individual genome included in further analysis was required to meet a series of sequencing quality thresholds. Each successfully sequenced genome had: A total read count greater than or equal to 40,000 reads. An error rate less than or equal to 11% Greater than 95% of reads mapped to the reference genome Low quality sequence content less than 5% of the genome length Low-coverage (< = 30X) sequence at the 5’ and 3’ end of the genome that did not exceed two hundred nucleotides in length on each end. Any samples that failed to meet these thresholds were re-sequenced. If genome quality did not meet the above standards upon resequencing, the genome was excluded from analysis.

Prediction of mutations from the reference genome

Variant likelihoods at each position were generated from the bam files of high-quality samples using the mpileup package of bcftools-1.11 [18], with the following parameters: -oU–d 200000. We then used bcftools call to make the variant calls, with respect to the reference genome (NC_045512.2), under the following setting:—ploidy 1 –vm–Oz. Variants assigned a ‘QUAL’ quality score <50 (ranging from 0–225) were excluded, but stored separately should revision be required. Variants were stored in variant call format (vcf) and variant calls were mapped to the reference genome sequence (NC_045512.2) with bcftools consensus. Following previous publications [20, 21], positions with coverage less than or equal to 30X were masked from each genome, or marked as ‘N’ in the consensus genome sequence using bedtools maskfasta. Therefore, no variants were called at these regions. This process generated a putative consensus sequence for each SARS-CoV-2 sample replicate individually, which represented its predicted mutations from the Wuhan reference sequence.

Additional quality control and generation of concordant genome set

Following the variant prediction process, we further evaluated each replicate’s variant predictions for the following quality criteria: If a predicted variant position fell within 200 bases from either the 5’ or the 3’ end of the genome, it was excluded due to consistently poor coverage in those regions. Any variants that occurred in stretches of low-quality score, extending across more than one adjacent or nearby position, were excluded as a likely artifact. Putative mutation events at 28881–28883 were excluded, as this region is known to frequently recur as post in-situ (PCR or sequencing) [22-26] Following these additional quality control steps, we evaluated each genome’s set of replicates for consistency in variant calls. Genomes in which all replicates were predicted with the same set of variants were considered concordant. However, there were instances of inconsistent variant calls between replicates of the same SARS-CoV-2 genome. This subset was described as discordant and set aside for later evaluation. This was done as suggested by Robasky et al., 2014.

Assessment of possible primer match-derived artifacts

To evaluate if primer error had contributed to recurring variant artifacts, primer sequences were mapped to all reads of a given barcode. The position of the primer match, relative to each read, was tracked and a distribution was created of all primer match positions across all reads (S1 Fig). The average matching read position of each primer was evaluated to assess whether any primers were enriched for mapping to ends of reads, or were distributed randomly as expected.

Inference of SARS-CoV-2 ancestral relatedness network

Plausible common ancestors for all 28 concordant genomes in the CU and USAFA sequences were independently inferred based on mutations in our dataset that deviate from the reference genome NC_045512.2. To better understand the larger, continental context of our sampling of genomes, plausible ancestors of the CU and USAFA genomes were qualitatively compared to the NextStrain “ncov” database sample of 3983 genomes accessed on May 11, 2021 [22]. Of these, 3923 were submitted in North America between March 5, 2020 and May 10, 2021, and will be referred to as North American NextStrain (NANS). For each mutation event in our newly inferred genomes, the estimated NANS frequency was considered at four different time points: August 14, 2020; November 14, 2020; February 13, 2021; and May 5, 2021. Each mutation event was also associated with one of the eleven major NextStrain clades based on the ancestral context in which it first appeared. This was done to provide a reference point for understanding where the Colorado genomes fit in context, relative to the viral dynamics across the continent at the time. Although NextStrain calls its groupings “clades”, meaning they are collections of all viruses inferred to be descended from a single common ancestor, their tracking and labelling system confusingly goes against common usage of the word ‘clade’ by removing named clades from within larger clades, leading to a paraphyletic naming convention. Their naming conventions also do not track sequential origins of clades within clades, and they do not include many of the CO sequences. We chose to explicitly track and label inferred ancestors and their relationships to CO descendants. Having obtained a continental, phylogenetic context for where our Colorado samples likely originate, we then inferred ancestral lineages and relatedness within our SARS-CoV-2 dataset. Nodes connecting each CU and USAFA genome sequence to their most plausible direct ancestor were assigned by grouping shared sets of mutations between our genomes. Plausible ancestral genomes were reverse-inferred in a parsimonious approach by working backwards from our genome dataset, and assigning mutations to ancestral nodes based on shared sets of mutations. Due to the rarity of variants compared to the length of the genome and the short time range of the closely related sequences in this study, back mutations and multiple mutation events at a single site are not likely and were not considered. Branching events separating these inferred, plausible ancestors were provisionally assigned based on differences in mutation content. These plausible common ancestors represent plausible branching points at which novel mutations appeared in the genomes included in our dataset. Non-ancestral mutation events were inferred if they occurred in only one genome across our sampled dataset and did not correspond to any known mutation already in the NANS database. These were assigned to the branch leading to the individual sequences in which they occurred. This process was used to infer a plausible ancestral relatedness network among SARS-CoV-2 samples in our dataset, and are used to organize discussion of these sequences. These plausible ancestral nodes were labelled A1-A13.

Use of relatedness network to resolve incongruent relatedness

The approximate relatedness among genomes was used to interpret variant calls in the discordant variant set from above, noting that all such variants were generally rare or absent from the worldwide sequences available at that time. Each genome was fitted to the existing network, based on its variant set with and without the discordant variants. We first considered that a variant that was discordant among replicates of a sample but had no effect on the structure of the network was considered a valid variant call. Most cases were of this nature because most discordant variants were supported by multiple calls in related genomes, and were presumed to be false negatives in some replicates. We disallowed any discordant variant that altered the relatedness network, conservatively presuming that it was a possible false positive variant and not well supported. We discuss removal of variants in further detail in the results. We ended with a complete set of confident variants for each genome that resolved discordancies among replicates so that they did not impact inferred relatedness networks among the SARS-CoV-2 genomes in our dataset and had no impact on our results. Finalized genomes were aligned with Mugsy v1r2.2 [27]. IQTree web server [28] was then used to generate a phylogenetic tree, using the settings: -s covidGenomes22Ref.fasta -st DNA -m JC+F -bb 1000 -alrt 1000 -o NC_045512.2. This tree was then visualized using FigTree v1.4.4.

Estimation of CO Springs lineage divergence time

The Bayesian Evolutionary Analysis Sampling Trees (BEAST) suite, version 1.10.4, was used to generate evolutionary divergence estimates from the USAFA SARS-CoV-2 genomes and their respective sampling dates between August 9, 2020 and November 19, 2020 [29-32]. Sampling dates were represented as the number of days since December 26, 2020, or the date associated with the Wuhan reference genome sequence. The evolutionary model was designed in BEAUti, using the following parameters: Exponential growth model, strict clock, HKY substitution model-estimated base frequencies with a 4-category gamma site heterogeneity model. Default priors were used, with the exception of the coalescent population size parameter, which was set to a lognormal distribution, mu = 1, sigma = 10. Operators were set to auto-optimized parameters. The control file containing these model parameters (covidGenomes22Ref_2_allAFonly_2m_exp_model.XML, SDataset 1) was analyzed with BEAST. The MCMC analysis was run for 20 million generations, with a burn-in of 2 million. The resulting posterior distributions were visualized and assessed in Tracer, version 1.7.2. TreeAnnotator was used to summarize the set of BEAST-predicted trees onto a single, target tree. A highest posterior density (HPD) of 95% was chosen for the credible interval of divergence estimates at each node across all trees generated in the analysis. The resulting tree annotation was visualized using FigTree software.

Comparison to USAFA contact tracing and high-level contact tracing

A large number of the USAFA samples are descended from a series of closely related plausible common ancestors beginning with A12 that are mostly closely related to ancestor A3 (corresponding to NextStrain cluster 20G) but separated by nine mutation events mostly not seen elsewhere in the NextStrain database. This branch of the network appears to have been introduced into USAFA from contact with the local Colorado Springs community [15], and we label it the Colorado Springs variant. The network relationships and identities among sequences were compared to USAFA contact tracing information (under USAFA IRB FAC20200035E) to examine consistency between our inferred genomic ancestral relationships and what was known about the contact path through the USAFA. The network was also used to examine whether the CO Springs variant arose from multiple community transfers or a single community transfer followed by spreading within the USAFA.

Results

Overview of SARS-CoV-2 sequencing and variant calling

We obtained 44 nearly-complete SARS-CoV-2 genomic sequences using RNA collected from anonymized individuals in two Colorado populations. The first was collected in August 2020 by the Colorado Center for Personalized Medicine Biobank at the University of Colorado (CU). The second was collected by the USAFA in September to November 2020. The CU samples were predominantly derived from clinical samples from Denver and Aurora obtained at the University of Colorado hospital, while the USAFA samples were primarily obtained from asymptomatic, randomly sampled cadets. In the case of the USAFA samples, because the testing and quarantine protocols established at the beginning of the school year resulted in a population free from COVID-19 when classes started in August [15], subsequent infections almost certainly originated from contact with the local Colorado Springs community. We followed the ARTIC protocol to produce overlapping short ~450 bp PCR-amplified segments (amplicons), using Minion sequencing and the Oxford Nanopore Technology (ONT) SARS-CoV-2 pipeline. Differences from the Wuhan reference genome sequence (NC_045512.2) were called as described in the methods to predict mutation events in the CU and USAFA genomes. We attempted to sequence 33 CU samples and 68 USAFA samples. Ten CU and 41 USAFA samples did not amplify well enough to sequence, and five USAFA samples did not sequence well despite amplification. Ultimately, each of the 44 SARS-CoV-2 genomes were fully and reliably sequenced at least twice (S1 and S2 Tables).

Quality control process filters out likely sequencing artifacts

Prior to evaluating genomes for discordancy, we initially filtered out predicted variants that appeared to be sequencing artifacts, or occurred in regions of low sequence quality, as described in the methods (S3 Table). First, we excluded variants at positions 28882–28883 based on previous knowledge that these were problematic mutations [25], although they would not have had a substantial impact on the final phylogenetic network. Next, a series of polymorphisms from positions 19299 to 19550 were observed in several genomes but were consistently of low quality and not well replicated. Therefore these were excluded, even when the variant met the quality threshold (S3 Table). Similarly, a pair of adjacent mutations at positions 24389–24390 were called in consensus sequences from ten CU and USAFA samples but were consistently replicated in only two samples. A further eight mutations were called in only one replicate per sample, and at generally lower-quality scores. Based on these incompatibilities, we excluded these mutation events from all genomes (S3 Table).

Discordant variation identified via replicate sequencing

Following quality filtering of variant predictions (S3 Table), the majority of variant calls generated by this analysis were found to be in agreement between replicates (concordant). However, we discovered several instances of incongruent variant calls between replicates of the same genome sample. These were described as discordant, according to Robasky et al., 2014. At least one discordant variant call was observed in 14 of 44 SARS-CoV-2 genomes (S4 Table). In these instances, certain variants were assigned a high enough likelihood score to pass quality filtering (QUAL>50) in at least one replicate, but were not scored above this threshold consistently across replicates, making them possible false positives. These calls were the results of using standard, widely accepted analysis pipelines and quality filters. This finding further underscores the importance of replicate sequencing protocols, even in the context of long-read sequencing. If a replicate sequencing approach had not been followed, these variants would have been summarily excluded as false positives [11]. Therefore, we investigated further to assess the validity of these variants.

Relatedness networks can resolve discordant variant calls and address putative false positives

We identified all plausible ancestral sequences in the Colorado phylogenetic network based on all observed different combinations of shared differences from the Wuhan reference sequence. This resulted in 13 inferred ancestors, which we label A1 to A13 for purposes of discussion (Figs 2 and 3, S2 Fig). Using the inferred ancestral relationships between concordant genomes (relatedness network), we then critically evaluated the impact of incorporating each of the discordant genomes individually, based on how it’s variant set impacted the structure of phylogenetic relationships between genomes. If the inclusion of the putative false positive variants was in agreement with the shared sets of ancestral mutations in related genomes, the variant was retained. If, however, inclusion of a discordant variant created a set of mutations that were incongruent with the shared ancestral variation of related genomes, the variant was excluded as a false positive. Through this approach, we were able to confidently resolve 16 discordant, putative false positives as confident true positives to include in our genome annotations (S4 Table). In a less rigorous process, these mutations would have been rejected from their respective genomes. As a result of this analysis, only one discordant variant, at position 13094, was excluded as a likely false positive because it occurred sporadically on the relatedness network, suggesting it was not a result of a shared ancestral mutation. Together, these results underscore the importance of replicating sample sequencing to improve confidence in results from this amplicon approach [11].
Fig 2

Major phylogroup clustering of the CUAF SARS-CoV-2 genomes.

Tip labels indicate how many CU Anschutz (CU) and USAFA (AF) genomes are associated with ancestor. Color shading indicates which main phylogroup genomes belong to, A1-A3, corresponding to NextStrain clades 20A, 20C and 20G, respectively.

Fig 3

Closely related subset of Air Force samples suggests a novel, rapidly transmitting strain with low average mutation rate, the “Colorado Springs Variant” (A12-A13).

Numbers at tips indicate average number of mutations per genome in each ancestral grouping.

Major phylogroup clustering of the CUAF SARS-CoV-2 genomes.

Tip labels indicate how many CU Anschutz (CU) and USAFA (AF) genomes are associated with ancestor. Color shading indicates which main phylogroup genomes belong to, A1-A3, corresponding to NextStrain clades 20A, 20C and 20G, respectively.

Closely related subset of Air Force samples suggests a novel, rapidly transmitting strain with low average mutation rate, the “Colorado Springs Variant” (A12-A13).

Numbers at tips indicate average number of mutations per genome in each ancestral grouping.

Relatedness networks can also reveal false negatives

Comparing shared variant sets between ancestrally related SARS-CoV-2 genomes also uncovered instances where variants that probably should have been present were excluded due to low quality scores across all replicates. Under the original codified criteria for rejecting variants due to low quality scores, genomes Z, R and I had calls at some sites that were in disagreement with the relatedness network, in the sense that there would necessarily had to have been reversions at these sites to make the calls compatible with the network (S5 Table). However, these variants were observed in the original analysis but filtered due to the quality score cutoff. In another instance, genomes AR, AQ and AJ appeared to be missing calls for a mutation that probably should have been present, based on mutation content of closely related genomes of ancestor A3. However, this mutation, at position 27964, was called with very low-quality genotype scores in many other genomes (S5 Table), and we inferred that the site in question was in a genomic region that regularly exhibited low-coverage and low-quality/inconsistent variant calls across most genomes in our sampling. This is consistent with poor amplicon amplification in some of these genomes, and we avoid making claims based on the presence or absence of this variant. Finally, this approach allowed us to identify two instances of chance congruent mutation events (S6 Table). The mutation at position 14187 in genome J is shared with the descendants of ancestral node A12, although J is a descendant of node A4, which is highly divergent from A12. Additionally, we identified that a mutation at position 24904 occurs in both genomes J and G. However, these genomes are separated by two ancestral nodes, and this mutation is not found in any other related genomes, so we conclude that these two mutations are chance convergent events. The remaining variants are well replicated congruent with the parsimonious phylogenetic network, and represent high-confidence sequences suitable for in-depth evolutionary analysis of SARS-CoV-2 infections in Colorado (S7 Table, S1 Dataset).

Broad phylogenetic structure of the CU and USAFA genomes

We characterized the ancestral variant composition of each genome, relative to our dataset. We jointly analyzed the entire set of Colorado sequences and placed their plausible common ancestors on a phylogenetic network. All newly sequenced Colorado genomes appear to be descended from what we label the A1 ancestor, which is the likely ancestor of NextStrain’s clade 20A. In other words, they all share four mutation events in common that separate them from the Wuhan reference sequence, at sites 3037, 14408, 241, and 23403. This result is not surprising, as this ancestor contains two amino acid-altering mutations in the RdRp and Spike proteins, which confer a competitive advantage over previous variants since its origin in January 2020 [33]. Based on analysis using NextStrain [22], this variant’s early origin and competitive advantage over the original virus caused its descendants to represent 99% of genome sequences throughout the world as of August 2020, when the first of our samples were collected (S3 Fig). 15 of the 21 CU sequences and 75% of the USAFA sequences are descended from A2, which is itself a descendant of A1 and is the likely ancestor of NextStrain’s clade 20C, estimated to have originated in April 2020. A2 differs from A1 by two mutations, at sites 25563 and 1059, and rose to a highest frequency of 43% of sequenced genomes in North America by February, 2021. Thus, for some time 20C appeared well on its way to becoming the dominant variant in North America, and may have been at a competitive advantage compared to other early descendants of A1, but has since lost ground to the well-documented Greek-letter variants (Alpha, Beta, Delta, etc.) since then [34-36] (S4 Fig). Because the Colorado A1 and A2 descendants (other than the Colorado Springs variant) are not highly clustered, the higher frequency of A2 in both CU and USAFA samples may indicate that the A2 variant was even more successful in Colorado than in the rest of North America. In our network, we define all differences that can be parsimoniously mapped to branches of the network to be inferred lineage-defining mutations. There were 41 lineage-defining mutations and 135 novel mutations that were confidently identified from the network of 44 genomes that were sequenced (Table 1, S7 Table). Of the 41 lineage-defining mutations, 24 of these occurred in the ORF1AB gene, five in the S gene, five in the ORF3A gene, one in the ORF8 gene, and six in the N gene. 18 have documented amino acid substitutions already in the NextStrain database.
Table 1

Lineage-defining mutations in a set of 44 Colorado SARS-CoV-2 genomes.

Mutation frequency in US [NextStrain]
Position1ORFREF2ALT2Amino acid change3August 2020 [8–14]November 2020 [11–14]February 2021 [2–13]May 2021 [5–5]Ancestor introduced4NextStrain clade
241ORF1ACT <<1%<<1%<<1%<<1%A120A
829ORF1ACT<<1%<<1%<<1%<<1%A13NA
1059*ORF1ACTT265I25%36%27%12%A220C
1927ORF1ATC<<1%<<1%<<1%<<1%A11NA
2668ORF1ACT<<1%<<1%<<1%<<1%A7NA
3037ORF1ACT<<1%<<1%<<1%<<1%A120A
4021ORF1ACT<<1%<<1%<<1%<<1%A12NA
7006ORF1ACT<<1%<<1%<<1%<<1%A8NA
7086ORF1ACT<<1%<<1%<<1%<<1%A12sporadic5
10319*ORF1ACTL3352F5%26%11%<1%A320G
11824ORF1ACT<<1%<<1%<<1%<<1%A9NA
12295ORF1ACT<<1%<<1%<<1%<<1%A12NA
13216ORF1ATC<<1%<<1%<<1%<<1%A12NA
14187ORF1BGA<<1%<<1%<<1%<<1%A12NA
14408*ORF1BCTP314L90%94%91%97%A120A
15766ORF1BGTV767L3%9%2%<<1%A1120G
18424*ORF1BAGN1653D3%24%11%<1%A920G
18486ORF1BCT<<1%<<1%<<1%<<1%A6NA
18538ORF1BGTV1691L<<1%1%1%<<1%A11sporadic5
19180ORF1BGT<<1%<<1%<<1%<<1%A8sporadic5
19891ORF1BGT<<1%<<1%<<1%<<1%A9NA
20268ORF1BAG<<1%<<1%<<1%<<1%A5NA
21304*ORF1BCTR2613C3%24%10%<1%A1020G
21390ORF1BAG<<1%<<1%<<1%<<1%A12NA
21830SGT<<1%<<1%<<1%<<1%A13NA
22162STC<<1%<<1%<<1%<<1%A6NA
22255SAT<<1%<<1%<<1%<<1%A11NA
22687SCT<<1%<<1%<<1%<<1%A11NA
23403*SAGD614G98%98%98%100%A120A
25563*ORF3AGTQ57H31%49%31%13%A220C
25593ORF3AGCK67N<<1%1%<<1%<<1%A12sporadic5
25907*ORF3AGTG172V4%25%11%<1%A1020G
25930ORF3ATCS180P<<1%1%1%<<1%A11sporadic5
26040ORF3AAT<<1%<<1%<<1%<<1%A9sporadic5
27964*ORF8CTS24L6%27%11%<1%A320G
28472*NCTP67S3%24%11%<1%A1020G
28655NGT<<1%<<1%<<1%<<1%A7NA
28854NCTS194L16%15%4%2%A420A
28869*NCTP199L5%28%12%3%A1020G
28887NCTT205I1%10%17%9%A921C
29439NATQ389H<1%<1%<<1%<<1%A11sporadic5

* Mutation is found at frequencies > 1% in the ncov North America dataset

1positions refer to aligned location in the Wuhan reference genome

2REF and ALT refer to the nucleotide state at that position in the reference genome and the corresponding alternative allele at that position in comparison genomes, respectively.

3Amino acid replacement standard notation shows the reference amino acid, the amino acid position in the corresponding gene, and then the inferred amino acid in the alternative genomes, all as determined from ncov.

4The clade corresponding to an ancestral sequence was determined as the named clade in the NextStrain database at the time of evaluation with the most mutations shared with the “ancestor introduced” prior to that clade on the NextStrain phylogenetic tree.

5Mutation is not associated with a phylogenetic grouping, but occurs at seemingly random branches throughout the entire tree date accessed: 7/12/21

Number of genomes: Showing 4000 of 4043 genomes sampled between Mar 2020 and June 2021

* Mutation is found at frequencies > 1% in the ncov North America dataset 1positions refer to aligned location in the Wuhan reference genome 2REF and ALT refer to the nucleotide state at that position in the reference genome and the corresponding alternative allele at that position in comparison genomes, respectively. 3Amino acid replacement standard notation shows the reference amino acid, the amino acid position in the corresponding gene, and then the inferred amino acid in the alternative genomes, all as determined from ncov. 4The clade corresponding to an ancestral sequence was determined as the named clade in the NextStrain database at the time of evaluation with the most mutations shared with the “ancestor introduced” prior to that clade on the NextStrain phylogenetic tree. 5Mutation is not associated with a phylogenetic grouping, but occurs at seemingly random branches throughout the entire tree date accessed: 7/12/21 Number of genomes: Showing 4000 of 4043 genomes sampled between Mar 2020 and June 2021 These mutations were qualitatively cross referenced against the NextStrain ncov database North American frequencies (henceforth, ncov) to evaluate their mutation frequencies and determine which of the eleven major clades, as defined by NextStrain at the time of evaluation, were represented in our sampling (S4 Fig, Table 1). 18 mutations had no detectable frequency in ncov (<<1% in Table 1), and seven mutations were found sporadically across the ncov tree, with a prevalence of ~1%. Of the 18 Colorado lineage-defining mutations that were more common in ncov, all were on the lineages leading to the series of ancestors A1, A2, or A3, corresponding to NextStrain clades 20A, 20C, and 20G. Based on shared mutations, 22 of the 44 Colorado genomes are descendants of A3 (and thus also A1 and A2), 13 are descendants of A2 (and thus also A1) but not descended from A3, while nine are descended from A1 but not from A2 or A3 (Fig 2). The inferred ancestors other than A1, A2 and A3 are organized such that there are three ancestors (A4-A6) descended from A1 but not A2 or A3, two ancestors (A7-A8) descended from A2 but not A3, and five ancestors (A9-A13) descended from A3. We further verified our relatedness networks by using IQTree to generate a phylogenetic tree of all 44 final, concordant genomes, including the Wuhan-Hu-1 genome, which served as the reference for variant calling (S5 Fig). This confirms our grouping of genomes into the aforementioned relatedness network substructure. In this way, we were able to describe the patterns of relatedness and evolutionary dynamics between 44 Colorado SARS-CoV-2 genomes.

Origins of a novel lineage associated with a rapid transmission event

Although many of the USAFA samples were likely picked up from the local Colorado Springs community based on detailed contact-tracing information and limits on off-base travel implemented by USAFA, a group of samples were also associated with a rapid-spreading event within the USAFA campus starting in late October 2020 [15]. In looking at the distribution of inferred mutations on the Colorado phylogenetic network, we found that most common ancestors [A4, A5, A6, A8, A9] share a shallow network of divergence from the ancestors A1, A2, and A3, with one or two mutations separating each from an earlier ancestor (Fig 2, S2 Fig). Divergence from ancestors A1 and A2 involved an average of 4.55 and 4.23 mutations/genome (s.d. 2.69 and 3.36, range 1–9 and 0–11), respectively (Fig 3, S8 Table). Among the sampled sequences in this part of the tree, only two are identical. These results are in rough agreement with the idea that the mutations and most of the network diversification (other than A1, A2, and A3) were unselected and that mutations accumulated randomly with a rate for beta coronaviruses between 1.3 × 10−4 – 6.1 × 10−4 mutations per site per year [37-40]. A3, which corresponds to the common ancestor of NextStrain’s 20G clade, increased rapidly from August to November 2020 to a peak of 25% in North America (Table 1, Fig 4), and its dominant presence in the later USAFA samples may be due to stochastic events in the local community.
Fig 4

Frequency of a CUAF clade 20G mutation in North America between March 2020 and June 2021.

The mutation pictured here is from ORF1A, genomic position 10319, amino acid position 3352.

Frequency of a CUAF clade 20G mutation in North America between March 2020 and June 2021.

The mutation pictured here is from ORF1A, genomic position 10319, amino acid position 3352. The pattern in ancestors A10, A11, A12 and A13 is strikingly different. For example, divergence from ancestor A3 was much lower, with an average of only 1.77 mutations/genome (s.d. 2.9, range 0–12). This frequency drops even further to < = 1 mutation/genome away from ancestors A11, A12 and A13 in the Colorado Springs variant lineage (Fig 3, S8 Table). Five mutations on the branch led to A10, and seven mutations on each of the branches led to two of its descendants, A11 and A12 (Fig 3). Furthermore, the highly derived A12 gave rise to 14 descendant sequences in the sample, including another shared ancestor differing by two mutations (A13). All of the descendant sequences from A12 and A13 differ from their ancestors by zero to three mutations, and there are respectively three and four sequences corresponding exactly to A13 and A14. The ancestors A10 and A11 have a few descendant sequences from CU individuals, indicating that they and A12 may have arisen from community spread. In contrast, the high prevalence of closely related USAFA samples in A12 and its descendants indicates sequence documentation of a rapid-spreading event, possibly involving one or more individuals. We further validated this inference by estimating the divergence times between the SARS-CoV-2 samples collected by the USAFA (Fig 5) [30-32]. The node where the CO Springs variant branches off from the ancestral lineage was found to have a posterior distribution with a mean of twenty-four days before the most recent sample, November 19, 2020 (Fig 5). This puts the mean, estimated date of divergence at October 26th, 2020, with a credible range (95% HPD) between October 27, 2020 and October 17th, 2020. This range of credible divergence estimates is in accordance with USAFA records that trace the origin of the rapid spreading event to late October 2020 (SDataset 1).
Fig 5

Divergence estimates indicate that the CO Springs lineage originated around October 26th, 2020.

The red arrow indicates the node where the CO Springs lineage branches off from ancestral nodes. 95% HPD and tree distances are in units of “days before November 19, 2020”.

Divergence estimates indicate that the CO Springs lineage originated around October 26th, 2020.

The red arrow indicates the node where the CO Springs lineage branches off from ancestral nodes. 95% HPD and tree distances are in units of “days before November 19, 2020”. Given the possibility that the Colorado Springs variant clade had some transmission advantage, we next considered the potential for these mutations to affect protein function. We evaluated the set of nine mutations that defined the CO Springs Variant ancestral clades A12–A13 (Table 2). These mutations have population frequencies < = 1% in NextStrain (Table 1), and three result in non-synonymous amino acid substitutions. Two of these mutations (ORF1A/T2274I and ORF3A/K67N) arose on the branch leading to A12, the ancestor of the Colorado Springs variant. Based on literature review (see discussion), it appears likely that all three mutations impact the biochemical properties of their associated protein and may have implications for viral fitness.
Table 2

Lineage-defining mutations contributing to the Colorado Springs variant [A12-A13].

AncestorMutation positionORFProteinREF alleleALT alleleAmino acid positionCodon changeAmino acid changeBiochem property change?
A13829ORF1ANsp2CT188AAC->AATN->NN
A124021ORF1AMacro DomainCT1252AAC->AATN->NN
A127086ORF1ANsp3_CCT2274ACT->ATTT-> IY
A1212295ORF1ANsp8CT4010ACT->ACCT->TN
A1213216ORF1ANsp10TC4317GAT->GACD->DN
A1214187ORF1BRdRpGA250AGG->AGAR->RN
A1221390ORF1BMethyltransferaseAG244TTA->TTGL->LN
A1321830SSpikeGT268GTT->TTTV->FY
A1225593ORF3aOrf3aGC67AAG->AACK->NY

Genomic surveillance methods complement high-level contact tracing

From August 2020 through December 2020, USAFA utilized a random surveillance testing program, where a percentage of cadets [4–15%] were tested daily to identify asymptomatic or mildly symptomatic SARS-CoV-2 cases. All SARS-CoV-2 positive patients were interviewed, close-contacts identified, and class schedules reviewed to assess for additional contacts. These individuals were then placed into quarantine with testing and monitoring before release. If the individuals identified during contact tracing were not cadets, these close contacts were contacted following local public health guidance. This method allowed for the identification of infection from a student, sports team, or community exposure. We were able to link multiple infections back to a rapid spreading event in late October 2021, which resulted in the strain clusters A12 and A13. Following the dramatic increase in infections, further lockdown measures were implemented at USAFA limiting new community introductions. At the end of the semester, a similarly rigorous testing and quarantine process occurred prior to release for the winter break, likely eliminating the strain from circulation within the population. The ability to contextualize genomic data with contact tracing information helps see a clearer picture for strain introduction, mutation, and propagation, while making assessment of subsequent viral fitness as SARS-CoV-2 continues to change.

Discussion

Our robust sequencing provides a snapshot of infections in Colorado in late summer and early fall 2020. There were many circulating variants in Colorado at this time, and their dynamics broadly reflect strain variation and divergence across the rest of the US. In addition, our analysis suggests that there were likely multiple introductions of SARS-CoV-2 into the USAFA cadet population, despite their restricted interaction with Colorado Springs. Most did not lead to subsequent outbreaks, and only one sustained a rapid evolutionary expansion, which we name the Colorado Springs variant. Prior sequences from the nearby Denver/Aurora area and pre-outbreak USAFA samples (likely reflecting the Colorado Spring community) indicate slow and potentially neutral evolution of variant twigs, which come from common ancestors of known expanding variants previously identified by NextStrain. The lineages immediately prior and adjacent to the Colorado Springs variant, in contrast, indicate bursts of evolution including amino acid altering mutations that may have affected its transmission properties. This variant may have been highly contagious, but its spread also appears to have been promoted by one or more rapid spreading events. Luckily, it appears to have been contained by the rigorous epidemiological control procedures (e.g. social distancing, mask wear, limited gathering, virtual learning) employed at the USAFA. The type of focused community-level data collected here may be key to understanding how SARS-CoV-2 spreads in local settings, which may often have highly idiosyncratic dynamics compared to the country as a whole. We employed rigorous sequencing quality control and validation steps, including standard PCR and sequencing replicates of all samples, further replicates of any moderately ambiguous results, and comparison to evolving ancestral sequences as well as the standard Wuhan reference. This ultimately resulted in an inferred ancestral sequence network that contained only two convergent mutations and was parsimonious. We verified our final ancestral network via maximum likelihood phylogenetic tree construction. Our results concerning the USAFA diversification are not dependent on the fine details of the network, but it seems clear that more quantitative studies might be misled if they did not take a similarly cautious and rigorous replication approach with hands-on evaluation of the data, as we did here. Because we identified discrepancies that would have been accepted in less thorough [e.g., single-sequence replicate and single reference variant identification] protocols, we suspect, as do others, that an unknown number of variants in the GISAID database contain flawed sequences that may mislead phylogenetic and convergence analyses [23, 26, 41, 42]. The sequencing initiative presented here highlights the power of robust genomic surveillance to describe local viral dynamics, particularly when paired with epidemiological data collected with the patient samples. Sitko et al. collected these samples through a highly effective COVID-19 monitoring system at USAFA. Their approach relied on random sampling of individuals regardless of symptoms. In addition to their robust contact tracing steps, the USAFA was able to pinpoint their surge to a rapid spreading event on their campus in late October. This epidemiological conclusion agrees with the results described here, where we were able to phylogenetically reconstruct the divergence of these samples over the period of time from August to late November. Adopting this approach enabled the identification of the Colorado Springs variant in the context of enough branches to pinpoint a burst of change leading to the variant, and documented the spread of the variant to a large number of people over a short period of time. Through the use of evolutionary modelling, we traced this burst back to a date between October 17–27, 2020, with the highest probability of having occurred on October 26th, 2020. It is known that immunocompromised individuals can serve as accelerated cauldrons of intra-host viral evolution with selected and rapid accumulation of epistatically interacting mutations [25], which might be an explanation for the burst of evolution we see here leading to the CO Springs variant. However, we do not know of such a case in the community, and the USAFA population is mostly young and extremely healthy, and the cadets likely interact with similarly young and healthy individuals in the local community. The possibility that rapid intra-host evolution could occur in such individuals, perhaps during long-term but largely asymptomatic infections, warrants consideration for further study. It is important to track and model the evolution of highly adaptive strains that tend to rapidly rise in frequency in the population once they gain a sufficient foothold, but it is also important to describe patterns of viral evolution that may lead to attenuation. Such strains are likely to be found in sampling from asymptomatic patients because they tend to be less phenotypically severe cases. Attenuated strains have the potential to out-compete more severe strains due to the trade-off between virus transmissibility and severity [43]. Studies like this are well suited to capture a snapshot of this kind of variance, as samples were collected from both symptomatic and asymptomatic individuals, per USAFA’s randomized testing surveillance protocol [15]. Further, mutations in these strains can create reservoirs of mostly neutral mutations, possibly leading to gradual genetic drift over time [44]. In the event of a transmission bottleneck, variants could then rise to sustained, high frequency [44, 45]. Such a scenario could contribute to antigenic shifts and viruses with a capacity to reduce vaccine efficacy [43]. These scenarios, in which neutral variants propagate by chance, seem plausible as the default mode of spreading for SARS-CoV-2 [46], punctuated by the rise of more transmissible variants of concern. While the Colorado Springs variant appears to have been confined to the isolated context in which it was found, similar variants may not be contained, and thus it is important to characterize them whenever possible. The ancestor of the Colorado Springs variant (A12) contained two intriguing non-synonymous amino acid substitutions: ORF1A/T2274I and ORF3A/K67N. ORF1A/T2274I results in a shift from a polar uncharged to a non-polar residue at the third position of a three residue N-linked glycosylation site in the Nsp3 peptide. This type of post-translational modification to Nsp3 is thought to be important for insertion into the endoplasmic reticulum of host cells, though it is not known how a mutation at a glycosylation site would impact its ability to do so [47]. However, disruption of N-linked glycosylation on other viral peptides has been shown to be destabilizing and negatively impacts virus viability [47, 48]. The ORF3A/K67N mutation results in a change from a positively charged residue to an uncharged, polar residue. This particular residue occurs in an LKK peptide motif that is predicted to be a likely B-cell epitope by Azad and Khan, 2021. Because the mutation seems to increase the free energy of folding, it has the potential to alter a putative B-cell epitope, allowing the virus to better evade host immune responses [49]. Interestingly, the only recorded North American SARS-CoV-2 genomes containing this mutation are found in Mexico (S6 Fig). The NextStrain database (derived from the GISAID database) is a highly incomplete collection of existing viral strains, making it difficult to determine whether this mutation was imported to the Colorado area or arose de novo locally. The third amino acid altering mutation, along the lineage leading to A13, is located in the Spike protein at position 268. The change converts a small, hydrophobic valine residue to phenylalanine, which is also hydrophobic but contains a large six-carbon ring side chain. A shift in steric properties is likely to impact local structure, and thereby potentially modify protein function. Without further experimental studies, it is difficult to know how these mutations affect viral dynamics and the extent to which they enabled the rapid spreading event. This case study, while limited in size and scope, is an exemplar to describe the viral phylodynamics of a locally confined rapidly-spreading transmission event, in combination with paired epidemiological data. Due to their rapid expansion, coupled with minimal mutation accumulation, rapid spread scenarios have little phylogenetic structure to describe [50], and the contact structures involved may strongly deviate from the average assumptions used in most epidemiological models [14, 51]. We conclude that case studies similar to that presented here could assist in outbreak control, provide variant-origin replicates to obtain a broader view of the process and refine epidemiological models, and help in early detection and action against novel variants of concern when they occur in the future.

Average matching amplicon position of sequencing primers.

1A shows the average position along the y axis, primer number along the x axis. 1B shows the variance calculated for the distribution of matching positions for each primer. (TIF) Click here for additional data file.

Next Strain estimates of North American continental frequency of the D614G mutation, from March 2020-May 2021.

Gold indicates frequency of ‘G’ substitution over time, green indicates frequency of ‘D’ substitution over time. (TIF) Click here for additional data file.

Eleven major SARS-CoV-2 clades in North America between March 2020 and June 2021, across 3923 genomes.

A) Phylogenetic relationships of strains across the continent. B) Strain relative prevalence between August 2020 and July 2021. Date accessed July 12, 2021. (TIF) Click here for additional data file.

Depiction of the ancestral network of 44 Colorado SARS-CoV-2 genomes.

Each tip indicates the genome letter identifier and the number of mutations away from its most recent ancestral node. Ancestral nodes contain the name of the node (A1-A13) and the position of lineage-defining mutations which are inherited by all downstream lineages. Node and tip coloring described in legend. (TIF) Click here for additional data file.

Phylogenetic tree of 44 novel SARS-CoV-2 genomes from Colorado and Wuhan-Hu-1.

Genomes were aligned with Mugsy and the consensus tree was generated under the Jukes-Cantor model using IQTree. Tip labels indicate the sample names as identifiable in GISAID. Red branch labels indicate the major phylogroups described in this paper. Blue asterisk indicates the Wuhan-Hu-1 genome outgroup, NC_045512.2. (TIF) Click here for additional data file.

Geographic location of the Orf3a K67N mutation in NextStrain.

The only other documented instance of this variant in the NextStrain repository occurs in Northern Mexico. Map image downloaded from NextStrain, which uses OpenStreetMap®. OpenStreetMap® is open data, licensed under the Open Data Commons Open Database License (ODbL) by the OpenStreetMap Foundation (OSMF). (TIF) Click here for additional data file.

GISAID accession IDs for the 44 SARS-CoV-2 genomes sequenced in this work.

(DOCX) Click here for additional data file.

List of successfully sequenced genomes.

Sample ID#s represent deidentified IDs. (XLSX) Click here for additional data file.

Sequencing meta data and de-identified sampleIDs for successfully sequenced replicates.

(XLSX) Click here for additional data file.

Variant calls that were excluded (yellow) during QC processing.

(XLSX) Click here for additional data file.

Discordant genomes containing putative false positive variant calls that were either verified as true positive (green) or excluded as false positives (yellow).

(XLSX) Click here for additional data file.

Genomes with low quality variants that were initially excluded (false positives) due to low quality, but were re-included upon phylogenetic re-assessment (green).

(XLSX) Click here for additional data file.

Genomes containing convergent variants (blue).

(XLSX) Click here for additional data file.

Non-lineage defining genome mutations.

(XLSX) Click here for additional data file.

Average number of observed variants, by phylogroupings.

(XLSX) Click here for additional data file.

Evolutionary model parameters, divergence estimates and MCMC data.

(ZIP) Click here for additional data file. 16 Jun 2022
PONE-D-22-08994
Phylodynamics of a regional SARS-CoV-2 rapid spreading event in Colorado in late 2020
PLOS ONE Dear Dr. Wade, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.
Please perform a molecular dating and the estimation of the evolutionary rate. Please submit your revised manuscript by Jul 25 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Etsuro Ito Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following financial disclosure: "KW is supported by NIH R01 GM083127. CB is supported by the T32 grant T32GM136444 awarded to the Molecular Biology graduate program. This work was supported by NIH grants R35GM128680 (OSR) and the RNA Bioscience Initiative." Please state what role the funders took in the study.  If the funders had no role, please state: ""The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."" If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 3. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The manuscript by Wade and colleagues is a very interesting piece of work. Although I am not English native speaker, to my opinion the manuscript is well written in a clear language. It describes the phylodynamics of a given lineage that to the Authors likely originated in the local Colorado Springs community and expanded rapidly over the course of two months in an outbreak within the well-controlled environment of the United States Air Force Academy. Anyhow, the work-flow of analyses contains a big lacking: the molecular dating. I suggest the Author to perform a molecular dating and the estimation of the evolutionary rate. This kind of result will be very useful in order to have a much more complete point of view on the temporally origin and diffusion of the lineage. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No ********** While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 18 Aug 2022 We thank the reviewer for their thoughtful comments about our manuscript, and as we detail below, we have made changes to the manuscript to address the molecular dating concerns raised by the reviewer. Editor comments: 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf The manuscript style formatting has been revised according to the guidelines. 2. Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." Thank you for bringing this matter to our attention; this statement has now been included. 3. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section. The ethics statement has been moved to the Methods section. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. The captions for Supporting Information are present in the text and at the end of the manuscript (after References), according to the SI guidelines. Reviewer comments: Reviewer #1: The manuscript by Wade and colleagues is a very interesting piece of work. Although I am not English native speaker, to my opinion the manuscript is well written in a clear language. We thank the reviewer for their positive comments. It describes the phylodynamics of a given lineage that to the Authors likely originated in the local Colorado Springs community and expanded rapidly over the course of two months in an outbreak within the well-controlled environment of the United States Air Force Academy. Anyhow, the work-flow of analyses contains a big lacking: the molecular dating. I suggest the Author to perform a molecular dating and the estimation of the evolutionary rate. This kind of result will be very useful in order to have a much more complete point of view on the temporally origin and diffusion of the lineage. We have now performed Bayesian evolutionary modelling of the USAFA SARS-CoV-2 genomes, using their sampling dates from August-November 2020 to inform construction of the phylogenetic tree estimates and inference of divergence rates. This was performed using the Bayesian Evolutionary Analysis Sampling Trees (BEAST) software analysis suite. In doing so, we estimated the divergence of the novel CO Springs variant to a likely origin around October 26, 2020. This agrees with the predictions of the USAFA epidemiological modelling (Sitko et al., 2021), which traced the rapid spreading outbreak back to an event in late October 2020. Our additional modelling and divergence estimate provides further resolution of the related epidemiological data and more robustly supports the conclusions of this manuscript. This new analysis has been included in the revised manuscript. Submitted filename: ResponseToReviewers_11august22.docx Click here for additional data file. 22 Aug 2022 Phylodynamics of a regional SARS-CoV-2 rapid spreading event in Colorado in late 2020 PONE-D-22-08994R1 Dear Dr. Wade, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Etsuro Ito Academic Editor PLOS ONE Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I am fully satisfied by the changes made by the authors. To my opinion the manuscript can be published in this version as it is. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No ********** 26 Sep 2022 PONE-D-22-08994R1 Phylodynamics of a regional SARS-CoV-2 rapid spreading event in Colorado in late 2020 Dear Dr. Wade: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Prof. Etsuro Ito Academic Editor PLOS ONE
  50 in total

1.  Mutation rates among RNA viruses.

Authors:  J W Drake; J J Holland
Journal:  Proc Natl Acad Sci U S A       Date:  1999-11-23       Impact factor: 11.205

2.  Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus.

Authors:  Patrick C Y Woo; Susanna K P Lau; Carol S F Lam; Candy C Y Lau; Alan K L Tsang; John H N Lau; Ru Bai; Jade L L Teng; Chris C C Tsang; Ming Wang; Bo-Jian Zheng; Kwok-Hung Chan; Kwok-Yung Yuen
Journal:  J Virol       Date:  2012-01-25       Impact factor: 5.103

3.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

4.  Evolutionary history of the closely related group 2 coronaviruses: porcine hemagglutinating encephalomyelitis virus, bovine coronavirus, and human coronavirus OC43.

Authors:  Leen Vijgen; Els Keyaerts; Philippe Lemey; Piet Maes; Kristien Van Reeth; Hans Nauwynck; Maurice Pensaert; Marc Van Ranst
Journal:  J Virol       Date:  2006-07       Impact factor: 5.103

5.  Coming to America: Genomic surveillance and how B.1.1.7 arrived in the US.

Authors:  Xueting Qiu; William P Hanage; Bradford P Taylor
Journal:  Cell       Date:  2021-05-13       Impact factor: 41.582

6.  Bayesian phylogenetics with BEAUti and the BEAST 1.7.

Authors:  Alexei J Drummond; Marc A Suchard; Dong Xie; Andrew Rambaut
Journal:  Mol Biol Evol       Date:  2012-02-25       Impact factor: 16.240

7.  Modelling the evolutionary dynamics of viruses within their hosts: a case study using high-throughput sequencing.

Authors:  Frédéric Fabre; Josselin Montarry; Jérôme Coville; Rachid Senoussi; Vincent Simon; Benoît Moury
Journal:  PLoS Pathog       Date:  2012-04-19       Impact factor: 6.823

8.  Rapid and inexpensive whole-genome sequencing of SARS-CoV-2 using 1200 bp tiled amplicons and Oxford Nanopore Rapid Barcoding.

Authors:  Nikki E Freed; Markéta Vlková; Muhammad B Faisal; Olin K Silander
Journal:  Biol Methods Protoc       Date:  2020-07-18

Review 9.  Inhibition of N-linked Glycosylation by Tunicamycin May Contribute to The Treatment of SARS-CoV-2.

Authors:  Ali Adel Dawood; Mahmood Abduljabar Altobje
Journal:  Microb Pathog       Date:  2020-10-20       Impact factor: 3.738

10.  Efficiency of pooled surveillance testing in academic labs to detect and inhibit COVID-19 outbreaks.

Authors:  John C Sitko; James Jordan Steel; Erin A Almand; Christopher A Cullenbine; Joseph W Rohrer; Douglas P Wickert; Steven Cm Hasstedt
Journal:  Bioanalysis       Date:  2021-08-06       Impact factor: 2.681

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.