Literature DB >> 34370041

Jumping a Moving Train: SARS-CoV-2 Evolution in Real Time.

Ahmed M Moustafa^1,2, Paul J Planet^1,3,4.

Abstract

The field of molecular epidemiology responded to the SARS-CoV-2 pandemic with an unrivaled amount of whole viral genome sequencing. By the time this sentence is published we will have well surpassed 1.5 million whole genomes, more than 4 times the number of all microbial whole genomes deposited in GenBank and 35 times the total number of viral genomes. This extraordinary dataset that accrued in near real time has also given us an opportunity to chart the global and local evolution of a virus as it moves through the world population. The data itself presents challenges that have never been dealt with in molecular epidemiology, and tracking a virus that is changing so rapidly means that we are often running to catch up. Here we review what is known about the evolution of the virus, and the critical impact that whole genomes have had on our ability to trace back and track forward the spread of lineages of SARS-CoV-2. We then review what whole genomes have told us about basic biological properties of the virus such as transmissibility, virulence, and immune escape with a special emphasis on pediatric disease. We couch this discussion within the framework of systematic biology and phylogenetics, disciplines that have proven their worth again and again for identifying and deciphering the spread of epidemics, though they were largely developed in areas far removed from infectious disease and medicine.

© The Author(s) 2021. Published by Oxford University Press on behalf of The Journal of the Pediatric Infectious Diseases Society. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities: Chemical

Keywords: COVID-19; VOC; lineages; nomenclature

Mesh：

Year: 2021 PMID： 34370041 PMCID： PMC8385893 DOI： 10.1093/jpids/piab051

Source DB: PubMed Journal: J Pediatric Infect Dis Soc ISSN： 2048-7193 Impact factor: 3.164

The SARS-CoV-2 virus was first reported as a cluster of pneumonia cases in Wuhan, China on December 31, 2019 with the first symptomatic case presenting on December 1 [1], although there have been unconfirmed reports of cases from November [2]. The first case outside China was reported in a traveler from Wuhan to Thailand [3], and by the end of January, cases had been reported in 19 countries outside of China [4]. While the initial cases outside of China were mostly linked to travel, sustained community spread outside of China is thought to have begun globally in February [5, 6]. The virus itself was identified in early January as a novel betacoronavirus (sarbecovirus) that was sequenced directly from respiratory specimens [7, 8], and the first SARS-CoV-2 whole genome was submitted to GenBank on January 5 (released January 12). Subsequent sequencing of the virus using targeted PCR amplification confirmed the association of this virus with the initial cases [8]. The whole genome was critical in allowing the early development of PCR-based testing [9] and genomic amplification (https://artic.network/), as well as beginning to understand the biology of the virus even before the world scientific community had physical access to the virus for study [10]. For instance, the similarity of the receptor-binding domain (RBD) to the SARS-CoV-1 virus suggested that it bound to the ACE2 receptor on human cells [7, 11], which was soon after confirmed [8, 12]. It also suggested a close phylogenetic relationship with circulating bat coronaviruses [7, 8]. While early phylogenetic analysis tied the viral sequence strongly to other known viruses from horseshoe bats (Genus: Rhinolophus) [7, 8], the position in the phylogenetic tree differed based on which genes were used to build the phylogeny [7]. In particular, the spike protein gene (S) was highly similar to and grouped with the S-gene from SARS-CoV-1 while other genes suggested other closest relatives such as the bat virus RaTG13 [7]. Other sequence traits, especially in the variable loop region of the RBD of the spike protein, are similar to coronaviruses isolated from pangolins, leading to the hypothesis that there had been recombination between viruses between different hosts [7, 13] and that pangolins might be an intermediate or proximate host to the pandemic virus [11, 14–18]. Recent structure and binding studies of bat, pangolin, and SARS-CoV-2 spike proteins showed high affinity for pangolin and human ACE2 proteins and low affinity for bat ACE2 [19-21], but it may be that the protein differences expand binding to multiple different hosts [19]. Another study challenged the recombination and pangolin host hypothesis, arguing that the RaTG13 may have acquired its variable loop region from another sarbecovirus, making the RBD possessed by SARS-CoV-2 the ancestral form [22]. At this time, the host source of SARS-CoV-2 is still unclear, however, if the virus has been circulating in bat populations for decades as has been predicted by some analyses [22], comprehensive sampling of bat populations may reveal important observations. Another novel characteristic noted from the whole-genome sequences [11, 23] was a polybasic cleavage site at the junction of the 2 major spike protein domains, a furin cleavage site, in the spike protein that was shown to have a role in SARS-CoV-2 replication, transmission and pathogenesis [24, 25], although some evidence suggests that increased spike proteolysis is associated with reduced viral entry and infectivity [26-28]. However, cleavage of the spike protein seems to be an adaptation that enhances viral entry, cell-cell fusion, and host range in many viruses such as Influenza, MERS, and SARS-CoV-1 [29-33]. Another amino acid feature flanking the furin cleavage site is the prediction of sites for O-linked glycan modification that has recently been shown to impact viral fusion and entry [28] and has been proposed to provide an immunological shield at this site [11, 34, 35]. There has been speculation that the suite of traits that likely enhance the virulence, transmissibility, and host range of this virus somehow demonstrates that the virus was genetically engineered in a laboratory. There are many arguments against this including convincing and careful analyses about the relationship with viruses from bats [8, 13, 22, 36] and, that despite these close relationships, the virus sequence is still substantially different, with variable positions distributed throughout its genome, from any virus known in the literature or used in a laboratory. Overall, it seems extremely unlikely that the virus is derived from a laboratory [11, 22, 35, 37, 38].

MODES OF EVOLUTION AND SELECTION

SARS-CoV-2 is an RNA virus and therefore has a relatively fast mutation rate. Most estimates of global mutation rates range from 5.44E-4 to 1.22E-3, which is approximately 1-2 nucleotide changes per month in any given lineage [22, 39–42]. The genome also undergoes insertions and deletions (indels) at a less well-determined rate [43]. This rate, however, is most likely to be the observed rate overall in viruses that transmit. In actuality, mutational change is likely more frequent, and in any given infection not all viruses will be exactly the same. This variation within each infection is something that has been studied more deeply in other viruses such as HepCV, HepBV, and HIV to name a few, where the diversity of mutations in the infecting virus has been shown to have a profound impact on viral fitness and evolution [44]. Here, the concept of “quasispecies” is important. A quasispecies is a population of organisms that are all very closely related to one another, usually differing by a small number of mutations, that have a collective fitness and can be acted on by selection as a unit [44]. That is, it does not matter how fit the most fit virus is in the population, but it does matter how fit the population is overall. RNA viruses are notorious for behaving like quasispecies because they have high mutation rates and short genomes [44], and both SARS-CoV-1 [45] and MERS [46] have been documented to exist in quasispecies. Therefore it seems likely that quasispecies behavior may be important for SARS-CoV-2. The number of studies documenting quasispecies behavior has been increasing [47-56], but the impact of quasispecies on virulence, immune evasion, and spread is not well understood. Current sequencing methods cannot easily detect quasispecies since they rely on direct amplification of the viral genome from clinical specimens and then consensus sequence building. The genome is, therefore, representative of the dominant forms of the virus in any given infection, and only very deep sequencing can reveal nucleotides that may be there as minor variants. Future studies will need to sequence very deeply and/or use single genome capture [57, 58] or long read strategies [53] to understand quasispecies for SARS-CoV-2. The role of recombination (where distantly related viruses swap portions of their genomes) is also not clear, though it has been proposed as important for the origins of the virus [36, 59–62] and has been shown in some circulating lineages [63, 64]. Indeed, a naming convention has been proposed for them within the Pango system, where the highest level lineage is preceded by an X [65]. Recombination is worrisome because it provides another way for the virus to gain beneficial mutations without having to rely on the random process of mutation. Recombination, of course, relies on 2 unrelated viruses infecting the same cell at the same time. Evidence for co-infections of distinct viruses have been limited so far [63, 66, 67], but it is not clear (as mentioned above) that the most common, current sequencing techniques could detect infections with different viruses. One important aspect of SARS-CoV-2 evolution that we have witnessed in real time is the recurrent generation of the same mutation in different genetic backgrounds [68-71]. This could arise from recombination or by the same random event happening more than once in different genomes. When this type of event happens, and we observe different viruses with the same mutation spreading well in community, it is one of the best kinds of evidence that a particular mutation has a positive impact on viral fitness. True evolutionary convergences or parallelisms (or “homoplasies” in systematic biology terminology) are therefore goldmines for understanding viral biology and identifying variants of concern. However, detecting truly homoplastic changes can be difficult. Some mutations may look like they have occurred in different backgrounds just because they have stayed the same as the other parts of the viral genome changed. Indeed, this has been proposed as the evolutionary pattern that resulted in the diversity in the variable loop region [22]. In systematic biology, this would be called a “symplesiomorphy,” a mutation that viruses share because their ancestors had it as well. Symplesiomorphies provide much weaker evidence for variant fitness, although their conservation over time may signal some benefit. The best way to distinguish between convergence and symplesiomorphy is to understand how viruses are related to each other on a phylogenetic tree, therefore phylogeny is not only important for taxonomy but can be critical for identifying functional mutations.

LINEAGES, VARIANTS, AND NOMENCLATURE

As viral lineages have spread around the globe they have changed enough that they can be distinguished from one another and traced geographically. In this publication, we use the term “variant” to refer to viral genomes that can be distinguished based on one or more specific genomic changes. It is worthwhile noting that variants belonging to the same group may not be closest relatives. We use the term “lineage” to refer to a group of viruses thought to be related to each other by descent from a common ancestor. A “clade” is the group of organisms that includes all of the descendants of a common ancestor, whereas a lineage may only include some descendants. It is also important to note that variants, lineages, and clades are designated here only by their genotype without implying any functional or phenotypic changes.

Nomenclature Systems and Classification Tools

The rapidly evolving genomic forms of the virus required quick elaboration of nomenclature systems, and a strong case was made from the beginning that these systems should be phylogenetically based [72]. Borrowing from previous propositions to tie nomenclature closely to phylogeny (eg, http://phylonames.org/code/), Rambaut and colleagues [72] put forward the use of a string of numbers and letters separated by periods, in which each lineage contains a nested list of its ancestral lineages in its name. Thus, B.1.1.7 suggests that this lineage is the seventh lineage derived from a lineage B.1.1, which itself is the first lineage derived from a lineage called B.1. This Pango nomenclature system, which is implemented in the program (Pangolin) (https://github.com/cov-lineages/pangolin) is intuitive and informative, such that we know that B.1.1.7 is more closely related to B.1.1.6 than it is to B.2.1.7 without having to look it up on a phylogenetic tree. One downside to this kind of nomenclature is that each new lineage must be defined by manual curation of the sequences, requiring a central authority that controls and certifies new lineages. While the Pango system has been widely adopted, some other nomenclature systems have also been used such as Nextstrain and GISAID [73, 74], all of which require manual curation to define which clades on the phylogenetic tree should be named. We recently proposed an automated system of nomenclature based on whole-genome multilocus sequence typing (wgMLST) that defines groups based on clear-cut properties of genomes on a minimum spanning phylogenetic tree. This technique, called GNUVID [75], provides an adjunctive technique that can help add granularity to the Pango system. Because it is costly to re-calculate the entire phylogenetic tree for every new sequence that is obtained several techniques have been developed to assign new sequences to lineages based on how closely they match other sequences in the database. Early techniques used alignment and measurements of similarity to viruses already included in a specific group or lineage [73, 76, 77]. More recent techniques have used machine learning, which can also provide estimates of confidence in the taxonomic assignment [72, 75].

Phylogeny and Phylodynamics

As a natural way to explain variation, heredity, relatedness, and change over time, phylogenies form the backbone of evolutionary biology. From a practical standpoint, phylogenies not only offer the ability to classify genomes into lineages but can also identify transmission and importation events and reconstruct general patterns of spread. Phylogenetic network and tree-based approaches have been used extensively in tracing the emergence of SARSA-CoV-2. Phylogenies can also be used to infer rates of evolution and identify specific sequence changes associated with biological properties of the virus (eg, increased transmissibility or virulence). Phylodynamic approaches [78] add population genetic models and epidemiologic data to the phylogenetic framework and can be used to estimate parameters such as rates of spread, transmissibility, and effective population size. Geospatial data can also be incorporated to estimate the spread of lineages in space. Without active surveillance and phylodynamic modeling of viral lineages, it would have been very difficult, if not impossible, to identify certain viral lineages as more transmissible. These types of analyses strongly suggested that the B.1.1.7 lineage was 50% more transmissible than other circulating clones [79], and helped inform policy decisions around containment of that lineage, unfortunately too late to stop the global spread of that lineage. As practiced by evolutionary biologists, phylogenetics has mostly been built on events that happened in the distant past. A problematic by-product of building phylogenies and taxonomical schemes in real time, as a virus evolves, is that it is almost inescapable that some groupings will not represent all the descendants of a single ancestral virus. In other words, as new groups arise from older groups, the older groups will not necessarily remain “monophyletic.” Taxonomic monophyly is strongly preferred because it is not arbitrary, whereas a paraphyly (a group that has a common ancestor and only some of its descendants) depends upon drawing an arbitrary line in the sand, including only some descendants and excluding others. For instance, the moment that B.1.1 was named and recognized as its own clade, the B.1 clade became paraphyletic. This means that there are likely some members of B.1 that are more closely related to B.1.1 than others. Because of this B.1.1 and B.1 are not really comparable units, a taxonomic issue that needs to be kept in mind when comparing named lineages.

TRACING THE ORIGINS OF LINEAGES AND TRACKING THE SPREAD OF THE VIRUS

Global Patterns

The pandemic might be divided into a few critical mutational events that have defined the spread of the virus around the world. After its initial emergence in Wuhan, the first event was a mutation at the D614G position in the spike protein that led to a lineage of viruses with higher transmission rates [27]. The biological basis of increased transmission seems to be an increased affinity and cell entry [80-83] and potentially increased surface spike protein density such that the virus has more possible ligands for targeting the ACE2 receptor expressed on host cells [84]. Given the available data, this mutation likely arose in Asia (it was first sequenced in China) and then spread to Europe and around the world. While the very earliest global cases of COVID-19 did not have the D614G variant, viruses with this mutation very soon came to dominate the early phase of the pandemic in many countries. The earliest outbreaks in Italy, Spain and the United Kingdom all were dominated by this mutation [27]. In addition, the lineage that dominated the early New York spike in March and April of 2020 also had D614G [85]. The next major phase started with the introduction of the B.1.1.7, which was recognized by an unusual suite of 17 changes (Table 1). This lineage was remarkable for 2 major reasons. First, it seemed to be spreading much faster than other viruses, a conclusion that could only be reached because of a concerted effort to sequence viruses in the United Kingdom [79, 86] and phylodynamic integration with community testing data that showed PCR S-gene target failures, or “S-drop,” even when other parts of the viral genome were detected. S-drop is due to a deletion mutation in the S-gene (del69-70) [86]. Regions of the United Kingdom where there were faster increases in cases were identified to have higher rates of B.1.1.7, and S-drop cases seemed to increase faster than non-S-drop cases in the same locations in England [86]. In addition, measurements of infection of contacts by index cases were higher for B.1.1.7 compared to other circulating strains [87].

Table 1.

Characteristics of the SARS-CoV-2 variants of interest and concern.

PANGO Lineage	B.1.525	B.1.526	B.1.526.1	B.1.617	B.1.617.1	B.1.617.3	P.2	B.1.1.7	B.1.351	B.1.427	B.1.429	B.1.617.2	P.1
Classification	VOI	VOI	VOI	VOI	VOI	VOI	VOI	VOC	VOC	VOC	VOC	VOC	VOC
WHO label	Eta	Iota	-	-	Kappa	-	Zeta	Alpha	Beta	Epsilon	Epsilon	Delta	Gamma
Nextstrain name	20A/S:484K	20C/S:484K	20C	20A	20A/S:154K	20A	20J	20I/501Y.V1	20H/501.V2	20C/S:452R	20C/S:452R	20A/S:478K	20J/501Y.V3
First detected	UK/Nigeria	NY	NY	India	India	India	Brazil	UK	South Africa	CA	CA	India	Japan / Brazil
No. of spike mutations	8	3–7	6–8	3	7–8	7	3–4	10–13	10	4	4	9–10	11
RBD mutations	E484K	(S477N) (E484K)	L452R	L452R E484Q	L452R E484Q	L452R E484Q	E484K	N501Y	K417N E484K N501Y	L452R	L452R	L452R T478K	K417T E484K N501Y
Other spike mutations	A67V, 69del, 70del, 144del, D614G, Q677H, F888L	(L5F), T95I, D253G, D614G, (A701V)	D80G, 144del, F157S, D614G, (T791I), (T859N), D950H	D614G	(T95I*), G142D, E154K, D614G, P681R, Q1071H	T19R, G142D, D614G, P681R, D950N	(F565L*), D614G, V1176F	69del, 70del, 144del, (E484K), (S494P), A570D, D614G, P681H, T716I, S982A, D1118H (K1191N*)	D80A, D215G, 241del, 242del, 243del, D614G, A701V	S13I, W152C, D614G	S13I, W152C, D614G	T19R, (G142D*), 156del, 157del, R158G, D614G, P681R, D950N	L18F, T20N, P26S, D138Y, R190S, D614G, H655Y, T1027I
Transmission	-	-	-	-	-	-	-	50% increased	50% increased	20% increased	20% increased	Increased	-
Neutralization by convalescent and vaccine sera	Potential reduced	Reduced	Potential reduced	Slightly reduced^a	Potential reduced^a	Potential reduced^a	Reduced^a	Minimal impact	Reduced	Reduced	Reduced	Potential reduced^a	Reduced
Neutralization by some EUA monoclonal antibody treatments	Potential reduced	Reduced^b	Potential reduced	Potential reduced	Potential reduced	Potential reduced	Potential reduced	No Impact	Significant decrease^b	Modest decrease^b	Modest decrease^b	Potential reduced	Significant decrease^b

Abbreviations: WHO, World Health Organization; EUA, Emergency Use Authorization; CA, California; NY, New York; RBD, receptor-binding domain; UK, United Kingdom; VOC, variant of concern; VOI, variant of interest.

This table is reproduced from data available on CDC website [88].

(*) means found in some sequences but not all.

aThe reported reductions in B.1.617, B.1.617.1, B.1.617.2, B.1.617.3, and P.2 are for neutralization by vaccine sera. No data for convalescent sera.

bThis reduction in susceptibility to the bamlanivimab and etesevimab monoclonal antibody combination treatment, however, other monoclonal antibody treatments are available.

Characteristics of the SARS-CoV-2 variants of interest and concern. Abbreviations: WHO, World Health Organization; EUA, Emergency Use Authorization; CA, California; NY, New York; RBD, receptor-binding domain; UK, United Kingdom; VOC, variant of concern; VOI, variant of interest. This table is reproduced from data available on CDC website [88]. (*) means found in some sequences but not all. aThe reported reductions in B.1.617, B.1.617.1, B.1.617.2, B.1.617.3, and P.2 are for neutralization by vaccine sera. No data for convalescent sera. bThis reduction in susceptibility to the bamlanivimab and etesevimab monoclonal antibody combination treatment, however, other monoclonal antibody treatments are available. The second unusual feature of B.1.1.7 was a high number of putatively adaptive mutations, leading to the hypothesis that the origin of B.1.1.7 may have been in a prolonged infection in a single host [89-93]. The del69-70 mutation in the spike protein has been implicated in increased spread [94, 95]. Likewise, a N501Y mutation in the spike protein has also been shown by modeling and in vitro to more avidly bind the ACE2 receptor [70, 96–98]. The mutation, P681H that is immediately adjacent to the furin cleavage site has also been suggested to increase transmission rates [25, 26]. The origin of the B.1.1.7 lineage is unclear, but the earliest B.1.1.7 genome was reported from the United Kingdom on September 2020 and was the dominant strain in the United Kingdom by December 2020 [86]. It very quickly went on to become dominant in several other European countries and in Israel [99]. More recently starting in March 2021, B.1.1.7 has become the predominant lineage in the United States, with an extremely rapid increase across the country, coinciding with a major vaccination effort. The N501Y mutation is likely convergent across several lineages, and 2 other important lineages with this mutation, B.1.351 and P.1, also emerged around the same time as B.1.1.7 in South Africa and Brazil respectively. These 2 lineages also had a worrisome mutation, E484K, that has been shown to enhance the escape of neutralizing antibodies in vitro [100-102] and may be linked to lower efficacy for vaccines [103-106]. Interestingly, the B.1.351 lineage appears to have greater reductions in neutralizing antibodies than the P.1 lineage [107]. This same mutation has also surfaced in several B.1.1.7 isolates from around the world [108] though only a small number of studies have evaluated the impact on neutralizing antibody escape for this variant. Currently, the B.1.617 lineage and its sublineages (B.1.617.1, B.1.617.2, and B.1.617.3) are circulating in a massive spike of cases in India, and this lineage is already spreading globally. The critical mutations in this lineage appear to be L452R and E484Q, although B.1.617.2 lacks the latter and instead has T478K. E484Q, like E484K, is thought to reduce antibody efficacy and neutralizing antibodies. L452R variants in the B.1.427/429 lineage have recently been reported to be more transmissible and infective as well as less susceptible to neutralizing antibodies [109]. Less is known about the T478K but it has been noted to be increasing dramatically since January 2021 in Mexico and North America [110]. Interestingly, B.1.617.2 appeared to be expanded rapidly in the United Kingdom in April and May [111].

Local Patterns

One important aspect of whole-genome sequencing and classification has been the ability to detect new variants as they arise or are imported to a new place. Early in the pandemic, whole genomes allowed researchers to estimate the numbers and sources of introductions of the virus [112-114]. Since these initial observations, a striking realization has been the huge number of predicted importations and exportations between countries and across continents, even with significant restrictions on travel. A close look at any part of the SARS-CoV-2 phylogeny shows multiple exportations and importations. Reliable quantification of the flux across borders is highly dependent on the density and breadth of sampling. A detailed analysis was possible only in the UK, where there has been a strong commitments to sampling and sequencing genomes [113]. In this study, greater than 1000 viral introductions were detected prior to the lockdown in March 2020 and then dropped significantly, but were not completely controlled with the lockdown [113]. The number of introductions in the early pandemic into countries with much less genomic sampling, like the United States, will likely never be known, but new sampling efforts may increase our knowledge of importations and allow for rational approaches to curbing spread. The diversity of viruses in any given location is determined by the rate at which new viruses arise by mutation (or possibly recombination), the number of introductions, and the extinction of lineages if they fail to transmit. Each of these factors depends on the ability of the virus to transmit and persist in the community, and therefore viral diversity is reflective of overall viral fitness. Measurements of circulating diversity might be useful tools for monitoring spread and gauging the impact of interventions. Measuring diversity is also highly dependent on sampling, and in areas with high levels of genomic sequencing groups have used measurements such as the Shannon Index [113]. Higher-order measurements that weigh the major circulating lineages (eg, Simpson Index) may be less prone to bias from under-sampling. We have recently argued that measurements of effective circulating diversity (Hill numbers [115-117]) may be more reliable, and less biased, especially in under-sampled locations [75]. One other way that whole-genome sequencing has been helpful at the very local level is in contact tracing and confirming outbreaks. In this regard, whole-genome sequences can either rule in or rule out transmission events. In some reported instances, transmitted viruses are either completely identical at the nucleotide level or 1-3 SNP differences apart [118, 119]. To conclusively demonstrate transmission, it is necessary to compare the genomes in a putative cluster to other circulating genomes in the community and also establish epidemiological links. Studies that have looked at clusters in this way have generally been able to separate true transmission events from instances that may have an epidemiological link but different genomes.

BIOLOGICAL DIFFERENCES BETWEEN GENOMIC VARIANTS

Differences in Transmission

While the most clear biological difference between genomic variants is in transmission, even this property is unclear and riddled with problems of bias. Of the variants that have been proposed to have an increased rate of transmission, the D614G mutation in the spike protein has the most evidence behind it from epidemiological studies, phylodynamic modeling, structural analyses, and in vivo and in vitro comparative analyses [27, 120–122]. Other variants have been less well studied, with the N501Y mutation garnering, perhaps, the second-best support [79, 96, 123–125]. Both the B.1.1.7 and the B.1.351 variants, which have the N501Y mutation in their RBDs, have recently been shown to have enhanced affinity for the ACE2 protein [98]. It is also important to distinguish here between a holistic view of the viral lineage (genomic background) in general and the specific mutational variants at specific positions. For instance, it could be that N501Y in one genomic context significantly increases transmission whereas in another context it has little impact. In addition, an important quality of SARS-CoV-2 transmission appears to be its variable (over-dispersed) infectivity rate and its tendency to occur in super spreader events [126]. It is unclear whether the propensity to sometimes have a much higher R0 is primarily based on the virus, the host, behavior, or environment; it is likely a complex combination of these factors.

Disease Severity

Hypotheses that some variants cause worse disease have been proposed at various times during the pandemic [127], but attempts to understand outcomes have been limited by available sequencing data, biased sampling, and changing epidemiology. Most notably, after initial reports suggested that B.1.1.7 lineage was no more virulent than other strains [100, 128–130], subsequent reports suggested that it may, indeed, be associated with worse outcomes [131-135] although increased severity has not been shown in some recent studies [79, 125]. One study linked higher viral load to increased morbidity [136], and B.1.1.7 has been consistently noted to have higher titers in clinical samples [125, 137]. Recently, there have been recent reports that B.1.351 and P.2 may also be associated with poorer outcomes [138, 139]. All of these studies suffer from the enormous problems of biased sampling and will need well-designed and controlled clinical studies combined with whole-genome sequencing to make more definitive statements. In general, pediatric cases of COVID-19 are less severe [140, 141], and as such there have been only a few reported genome studies of sequences from patients under 21 [142-145]. One study found an association between genotype and severity [142]. Another presentation of SARS-CoV-2–related disease in the pediatric population has been the multisystem inflammatory syndrome in children (MIS-C), which is sometimes referred to as PIMS (pediatric inflammatory multisystem syndrome) [146-148]. This syndrome usually presents 4-6 weeks after infection with body-wide inflammatory signs and symptoms and, often, a cardiovascular component requiring hemodynamic support and treatment with steroids and intravenous immunoglobulin. Because MIS-C is a late presentation, the virus is often not present or found at very low levels in respiratory secretions. Therefore, whole-genome analysis has been difficult in this condition, and only a few reported sequences are available [149, 150]. Although the pathogenesis of MIS-C is not well understood, it has been suggested that components of the virus may act as superantigens to provoke a polyclonal T-cell expansion [151, 152] making it conceivable that viral variants may be more or less likely to cause disease. The available sequences suggest that MIS-C associated viruses might be drawn from several distinct lineages that are representative of circulating virus without any genetic similarities that tie them together [150]. However, it is notable that the original experience with the virus in China did not detect any MIS-C, raising the possibility that this is presentation is linked to later forms of the virus. Much more viral sequencing will be required to test the hypothesis that some viral variants may be more likely to cause disease.

Vaccine Escape

In vitro studies of neutralizing antibodies from patients naturally infected with SARS-CoV-2 or immunized with different vaccines have clearly shown reductions in neutralizing antibody efficacy associated with specific variants. Most notable of these are mutations at the E484 position [130, 153, 154]. Mutations in naturally occurring variants at this position (E484K, E484Q) have been found in the B.1.351 (South African), P.1 and P.2 (Brazilian), and B.1.617 (Indian) lineages, with a small number of reported B.1.1.7 isolates bearing a mutation at this site [108]. Several studies have looked at neutralizing antibodies in each of these variants, and several have shown decreased neutralizing response compared to wild type [106, 155–157], however, the B.1.351 appears to have the largest decrease [102, 103]. Data like this suggest that specific variant positions may have distinct impacts in distinct lineages. Despite significant reductions in neutralizing antibodies, many of the vaccines have been found to be effective, most encouragingly for severe disease, in places dominated by worrisome “vaccine escape” mutants [158-169]. The biggest drops in efficacy have been associated with the B.1.351 lineage but only in mild to moderate disease [167]. This protective effect is likely because of very good initial vaccine efficacy as well as retention of multiple other epitope targets, and potentially other immunological factors such as the T-cell response. Going forward we will need to develop quick assays for assessing correlates of protection that can be applied to emerging variants [170].

WHAT DO WE NEED TO DO GOING FORWARD?

Most evolutionary biology to date has been done in retrospect, and therefore the techniques are focused on deriving the maximum amount of data from things that have already happened. Likewise, epidemiological techniques are often centered around known diversity, and a pathogen that may not be changing its ability to spread or cause disease. In this instance, we need to reboot our toolbox and scale up our understanding of pathogen diversity. It will be absolutely critical moving forward to have, in place, systematic strategies for sampling, quickly analyzing, new data. It is instructive that patterns can change very rapidly. For instance, in Philadelphia between February and March our surveillance data saw a more than 400% increase in B.1.1.7. Our sequencing and analytical strategies need to be able to react in real time before the speeding train passes us by.

3 in total

1. Clinical Performance Characteristics of the Swift Normalase Amplicon Panel for Sensitive Recovery of Severe Acute Respiratory Syndrome Coronavirus 2 Genomes.

Authors: Lasata Shrestha; Michelle J Lin; Hong Xie; Margaret G Mills; Shah A Mohamed Bakhash; Vinod P Gaur; Robert J Livingston; Jared Castor; Emily A Bruce; Jason W Botten; Meei-Li Huang; Keith R Jerome; Alexander L Greninger; Pavitra Roychoudhury
Journal: J Mol Diagn Date: 2022-07-18 Impact factor: 5.341

Review 2. Metagenomic Next-Generation Sequencing (mNGS): SARS-CoV-2 as an Example of the Technology's Potential Pediatric Infectious Disease Applications.

Authors: Andrew S Handel; William J Muller; Paul J Planet
Journal: J Pediatric Infect Dis Soc Date: 2021-12-24 Impact factor: 3.164

3. Is the Infection of the SARS-CoV-2 Delta Variant Associated With the Outcomes of COVID-19 Patients?

Authors: Mohamad Saifudin Hakim; Hendra Wibawa; Vivi Setiawaty; Ika Trisnawati; Endah Supriyati; Riat El Khair; Kristy Iskandar; Nungki Anggorowati; Edwin Widyanto Daniwijaya; Dwi Aris Agung Nugrahaningsih; Yunika Puspadewi; Dyah Ayu Puspitarani; Irene Tania; Khanza Adzkia Vujira; Muhammad Buston Ardlyamustaqim; Gita Christy Gabriela; Laudria Stella Eryvinka; Bunga Citta Nirmala; Esensi Tarian Geometri; Abirafdi Amajida Darutama; Anisa Adityarini Kuswandani; Sri Handayani Irianingsih; Siti Khoiriyah; Ina Lestari; Nur Rahmi Ananda; Eggi Arguni; Titik Nuryastuti; Tri Wibawa
Journal: Front Med (Lausanne) Date: 2021-12-09

3 in total