Denis Jacob Machado1, Richard Allen White1,2, Janice Kofsky1, Daniel A Janies1. 1. University of North Carolina at Charlotte, College of Computing and Informatics, Department of Bioinformatics and Genomics, Charlotte, North Carolina. 2. University of North Carolina at Charlotte, North Carolina Research Campus (NCRC), Kannapolis, North Carolina.
Abstract
The coronavirus disease 2019 (COVID-19) pandemic was one of the significant causes of death worldwide in 2020. The disease is caused by severe acute coronavirus syndrome (SARS) coronavirus 2 (SARS-CoV-2), an RNA virus of the subfamily Orthocoronavirinae related to 2 other clinically relevant coronaviruses, SARS-CoV and MERS-CoV. Like other coronaviruses and several other viruses, SARS-CoV-2 originated in bats. However, unlike other coronaviruses, SARS-CoV-2 resulted in a devastating pandemic. The SARS-CoV-2 pandemic rages on due to viral evolution that leads to more transmissible and immune evasive variants. Technology such as genomic sequencing has driven the shift from syndromic to molecular epidemiology and promises better understanding of variants. The COVID-19 pandemic has exposed critical impediments that must be addressed to develop the science of pandemics. Much of the progress is being applied in the developed world. However, barriers to the use of molecular epidemiology in low- and middle-income countries (LMICs) remain, including lack of logistics for equipment and reagents and lack of training in analysis. We review the molecular epidemiology literature to understand its origins from the SARS epidemic (2002-2003) through influenza events and the current COVID-19 pandemic. We advocate for improved genomic surveillance of SARS-CoV and understanding the pathogen diversity in potential zoonotic hosts. This work will require training in phylogenetic and high-performance computing to improve analyses of the origin and spread of pathogens. The overarching goals are to understand and abate zoonosis risk through interdisciplinary collaboration and lowering logistical barriers.
The coronavirus disease 2019 (COVID-19) pandemic was one of the significant causes of death worldwide in 2020. The disease is caused by severe acute coronavirus syndrome (SARS) coronavirus 2 (SARS-CoV-2), an RNA virus of the subfamily Orthocoronavirinae related to 2 other clinically relevant coronaviruses, SARS-CoV and MERS-CoV. Like other coronaviruses and several other viruses, SARS-CoV-2 originated in bats. However, unlike other coronaviruses, SARS-CoV-2 resulted in a devastating pandemic. The SARS-CoV-2 pandemic rages on due to viral evolution that leads to more transmissible and immune evasive variants. Technology such as genomic sequencing has driven the shift from syndromic to molecular epidemiology and promises better understanding of variants. The COVID-19 pandemic has exposed critical impediments that must be addressed to develop the science of pandemics. Much of the progress is being applied in the developed world. However, barriers to the use of molecular epidemiology in low- and middle-income countries (LMICs) remain, including lack of logistics for equipment and reagents and lack of training in analysis. We review the molecular epidemiology literature to understand its origins from the SARS epidemic (2002-2003) through influenza events and the current COVID-19 pandemic. We advocate for improved genomic surveillance of SARS-CoV and understanding the pathogen diversity in potential zoonotic hosts. This work will require training in phylogenetic and high-performance computing to improve analyses of the origin and spread of pathogens. The overarching goals are to understand and abate zoonosis risk through interdisciplinary collaboration and lowering logistical barriers.
Genomic epidemiology stems from molecular epidemiology, which uses evidence ranging from gel electrophoresis to multilocus sequence typing to study the origins and spread of pathogenic microorganisms. Janies et al
reviewed the history of molecular epidemiology and compared it with syndromic epidemiology. Here, we focus on recent advances toward genomic epidemiology (Fig. 1), which includes genomic sequencing combined with rapid data sharing as enabled by the Internet. In 2002–2003, the severe acute respiratory syndrome coronavirus (SARS-CoV) was the first infectious disease for which scientists shared software and pathogen genetic data over the Internet to rapidly respond to the disease. Thereafter, genomic epidemiology was solidified by responses to H5N1, H1N1-2009, and other strains of influenza such as H7N9
and expanded to respond to foodborne and sexually transmitted diseases.
Fig. 1.
Timeline of major events in sequencing technology (green) and genomic epidemiology (purple) alongside the first recorded occurrence of SARS-CoV, H1N1-2009, MERS-CoV, and SARS-CoV-2 in humans. Associated references can be found in Supplementary Table 1.
Timeline of major events in sequencing technology (green) and genomic epidemiology (purple) alongside the first recorded occurrence of SARS-CoV, H1N1-2009, MERS-CoV, and SARS-CoV-2 in humans. Associated references can be found in Supplementary Table 1.The first SARS-CoV genome was shared after publication
on National Center for Biotechnology Information’s (NCBI) GenBank website, which was customary. Meanwhile, dashboards, graphs, and maps emerged to track cases over time and space.
Janies et al
combined genomic and geographic data for SARS-CoV and H5N1 influenza, respectively, being the first to project phylogenies onto a virtual globe. Janies et al
used Keyhole Markup Language (KML) to develop Supramap, which facilitates geographic mapping of phylogenies. Supramap allowed hypothesis testing ranging from the host and geographic origins of pathogens
to tracing mutations that conferred drug resistance or host switching.
Limitations of computing large data sets, coupled with a preference for sharing data after publication, resulted in a greater turnaround between data acquisition and results than occurs today. However, these conditions did not impede a hypothesis-driven field with value to decision makers, as demonstrated in a 2007 congressional hearing.In the 2000s, some genomes were sequenced for respiratory pathogens such as H1N1-2009. However, even SARS-CoV genomes were not always sequenced completely, and sequences were released gradually.
This changed due to factors such as new DNA sequencing technologies.
How did advances in sequencing technology reshape genomic epidemiology?
Current genomic epidemiology of infectious diseases originated in response to the SARS-CoV epidemic.
Sequencing the SARS-CoV genome was instrumental in recognizing it as a novel coronavirus associated with HCoV-OC43 and HCoV-229E.
Researchers combined genomic and epidemiological data to trace the genotypic variation of the viral transmission paths between 2002 and 2003.
However, today’s genomic surveillance evolved with the advance of high-throughput sequencing (HTS) (Fig. 1).Reuter et al
summarized HTS history until 2015 and Pérez-Losada
reviewed recent HTS advances. We focus on the sequence cost variation per raw megabase between 2001 and 2020
(Fig. 2a) to illustrate the increasing feasibility of sequencing coronavirus genomes (Fig. 2b). Considering raw nucleotide sequencing cost, US$100 was not sufficient to sequence one coronavirus genome in 2020, but $100 it would cover >400,000 genomes in 2020.
Fig. 2.
The increasing feasibility of sequencing complete coronavirus genomes. (a) Sequencing cost per raw megabase of DNA sequence from September 2001 until August 2020 (data source: genome.gov/sequencingcosts, access date: September 2021). (b) Number of complete coronavirus genomes that can be sequenced with USD 100, assuming a genome size of 32 Kbp. These cost estimates do not consider sampling, storage, consumables, equipment, and staff costs. These plots use a logarithmic scale.
The increasing feasibility of sequencing complete coronavirus genomes. (a) Sequencing cost per raw megabase of DNA sequence from September 2001 until August 2020 (data source: genome.gov/sequencingcosts, access date: September 2021). (b) Number of complete coronavirus genomes that can be sequenced with USD 100, assuming a genome size of 32 Kbp. These cost estimates do not consider sampling, storage, consumables, equipment, and staff costs. These plots use a logarithmic scale.
What are coronaviruses?
Coronaviruses correspond to the four genera of the subfamily Orthocoronavirinae. Gammacoronavirus (GammaCoVs) and Deltacoronavirus (DeltaCoVs) mainly infect birds and rarely infect mammals.
Alphacoronavirus (AlphaCoVs) and Betacoronavirus (BetaCoVs) originated from Chiroptera (bats) and are often found in other mammals, including humans.The coronavirus virion encapsulates one of the longest RNA virus genomes (27–32 kb),
which has complex gene expression
and variable gene content among genera (Fig. 3a).
Fig. 3.
Fundamental evolution of coronaviruses based on Machado et al.
(a) Virion and genome structure. The genomic regions indicated in the figure do not represent all the genes in the coronavirus genome, but the genes that are shared among the different genera of Orthocoronavirinae and that were analyzed by Machado et al.
Note. E, envelope small membrane protein; M, membrane protein; N, nucleoprotein; S, spike glycoprotein. (b) Summarized cladogram from Machado et al.
The original cladogram contained 2,006 terminals corresponding to unique coronavirus genomes. Terminals indicating the eight species of human coronaviruses (HCoVs) are in bold. (c) Hosts involved in the emergence of all human coronaviruses, including SARS-CoV-2. The HCoVs of special concern to human health (SARS-CoV, MERS-CoV, and SARS-CoV-2) are shown in red. The flow chart indicates that HCoV-NL63, SARS-CoV, and SARS-CoV-2 originated from bat-hosted coronaviruses. Bats were also key to the emergence of MERS-CoV in camels and humans. HCoV-229E, HCoV-HKU1, and HCoV-OC43 originated from viruses hosted in artiodactyls, rodents, and bovids, respectively. All silhouettes were downloaded from PhyloPic (http://phylopic.org). The coronavirus vision structure was modified from https://commons.wikimedia.org/wiki/File:Coronavirus_virion_structure.svg. See Supplementary File 1 for detailed copyright and license information.
Fundamental evolution of coronaviruses based on Machado et al.
(a) Virion and genome structure. The genomic regions indicated in the figure do not represent all the genes in the coronavirus genome, but the genes that are shared among the different genera of Orthocoronavirinae and that were analyzed by Machado et al.
Note. E, envelope small membrane protein; M, membrane protein; N, nucleoprotein; S, spike glycoprotein. (b) Summarized cladogram from Machado et al.
The original cladogram contained 2,006 terminals corresponding to unique coronavirus genomes. Terminals indicating the eight species of human coronaviruses (HCoVs) are in bold. (c) Hosts involved in the emergence of all human coronaviruses, including SARS-CoV-2. The HCoVs of special concern to human health (SARS-CoV, MERS-CoV, and SARS-CoV-2) are shown in red. The flow chart indicates that HCoV-NL63, SARS-CoV, and SARS-CoV-2 originated from bat-hosted coronaviruses. Bats were also key to the emergence of MERS-CoV in camels and humans. HCoV-229E, HCoV-HKU1, and HCoV-OC43 originated from viruses hosted in artiodactyls, rodents, and bovids, respectively. All silhouettes were downloaded from PhyloPic (http://phylopic.org). The coronavirus vision structure was modified from https://commons.wikimedia.org/wiki/File:Coronavirus_virion_structure.svg. See Supplementary File 1 for detailed copyright and license information.Coronavirus infections in domestic animals are economically significant.
However, the episodic emergence of human coronaviruses (HCoVs) is a pressing concern because they cause infections in all age groups, often leading to respiratory or enteric diseases.
Neurological illness or hepatitis is less frequent.
The US Centers for Disease Control (CDC) website
lists 7 HCoVs: 2 AlphaCoVs (HCoV-229E and HCoV-NL63) and 5 BetaCoVs (HCoV-OC43, HCoV-HKU1, SARS-CoV, MERS-CoV, and SARS-CoV-2). We added the human enteric coronavirus 4408 (HECV-4408) to the list because it was isolated from a child with acute gastroenteritis.
How did SARS-CoV-2 accelerate the growth of genomic epidemiology?
Coronaviruses were not deemed highly pathogenic to humans until the 2002 SARS-CoV outbreak.
The dangers of HCoVs were made more evident by the 2012 outbreak of Middle East respiratory syndrome (MERS) coronavirus (MERS-CoV).
Nevertheless, coronaviruses did not receive the current level of attention until the pandemic coronavirus disease 2019 (COVID-19), caused by SARS-CoV-2, was first reported in humans in Wuhan, China, in December 2019.
However, Pekar et al
inferred that the virus was present in Hubei approximately a month before. On March 11, 2020, the World Health Organization (WHO) declared a pandemic due to the spread of SARS-CoV-2.
By October 14, 2021, COVID-19 had caused 4,863,818 deaths worldwide.Understanding the emergence and evolution of SARS-CoV-2 is vital to preventing future pandemics.
The question can be divided into 3 components. First, was the virus purposefully manipulated? Several peer-reviewed publications have concluded that SARS-CoV-2 emerged naturally via zoonosis (see eg, Anderson et al,
Liu et al
, and Holmes et al
). Moreover, previous serology data indicate natural human infections by bat-hosted, SARS-like viruses.Second, was SARS-CoV-2 an accidental release? If a naturally occurring virus was transported to a laboratory and humans were infected shortly thereafter, the virus may not have accumulated sufficient mutations to record its passage through controlled environments.
However, no evidence indicates that SARS-CoV-2 was known to scientists before December 2019.Third, what is the natural source of SARS-CoV-2? The most comprehensive phylogenomic analysis of coronavirus
(Fig. 3b) addressed the fundamental evolution of HCoVs (Fig. 3c) and showed that SARS-CoV-2 results from bat-hosted viruses infecting humans.
SARS-CoV-2 finds its closest related bat-hosted coronaviruses in the subgenus Sabercovirus, a subgroup of SARS-related coronaviruses (SARSr-CoV) first identified in horseshoe bats (Rhinophulus spp).
Bat-hosted viruses similar to SARS-CoV-2 were collected in the Yunnan province, >1,500 km away from Wuhan, but the hosts have a wide geographic range.Despite a confusing array of reports confirming
and denying
the origin of SARS-CoV-2 from pangolin (Manis javanica) hosts, pangolins are not involved in the lineage of SARS-CoV-2 that infected humans.
This finding is similar to the emergence of SARS-CoV,
which also infected humans from bat-hosted viruses without any need for intermediate hosts, including Himalayan palm civets (Parguma larvata) and raccoon dogs (Nyctereutes procyonoides).
Are we sequencing SARS-CoV-2 genomes fast enough?
SARS-CoV-2 was identified on January 7, 2020. Three days later, its genome and metadata were shared via the Global Initiative on Sharing Avian Influenza Data (GISAID)
EpiCoV database,
before the first peer-reviewed article was published in February 2020.To put the SARS-CoV-2 genome sequencing speed into context, consider that SARS-CoV was first reported in November 2002, but its genome was publicly released in April 2003.
The speed at which such data are released was changed by several forces, illustrated by Janies et al.
In brief, the reasons include the increased feasibility of genome sequencing, the willingness to share data before publication, and the rise of the popular GISAID database, which credits submitting laboratories.Figure 4 shows the accumulation of 4,224,785 complete SARS-CoV-2 genomes in EpiCoV between January 10, 2020, and October 13, 2021. The curve is far from reaching a plateau, indicating that we are not producing coronavirus genomes at total capacity. Efforts to sequence SARS-CoV-2 following international guidelines
are welcome because these data inform epidemiological forecasts (eg, increased transmission efficiency of SARS-CoV-2 variants has led to projections of the rise of higher numbers of cases
).
Fig. 4.
Progressive accumulation of 4,224,785 complete SARS-COV-2 genome sequences (>26 Kbp) submitted to the GISAID EpiCoV database (https://www.epicov.org/) between January 10, 2020, and October 13, 2021. These cost estimates do not consider sampling, storage, consumables, equipment, and staff costs (see eg, Schwarze et al
). Nevertheless, the price of raw nucleotide sequencing is a significant component of the cost of genome projects.
Progressive accumulation of 4,224,785 complete SARS-COV-2 genome sequences (>26 Kbp) submitted to the GISAID EpiCoV database (https://www.epicov.org/) between January 10, 2020, and October 13, 2021. These cost estimates do not consider sampling, storage, consumables, equipment, and staff costs (see eg, Schwarze et al
). Nevertheless, the price of raw nucleotide sequencing is a significant component of the cost of genome projects.Genomic sequencing generates a snapshot of a viral lineage in a place and time. When sequences are collected longitudinally, applications in genomic epidemiology and pandemic responses emerge, which we illustrate with 4 examples. First, profiling mutation fingerprints from the viral pangenome to individual infection quasi-species enables molecular contact tracing.
Second, genomic sequencing informs the peptide mass fingerprinting (PMF) used to predict novel structures and find inhibitors for viral peptides,
although results must be tested in randomized controlled trials
to identify effective antivirals.
Third, the data are used to model epidemic or pandemic size and severity.
Fourth, viral sequences are fundamental for developing mRNA vaccines.
For a review on current pitfalls and opportunities in applying HTS to SARS-CoV-2 genomes, see Chiara et al.As SARS-CoV-2 becomes endemic,
sequencing demand will remain high. SARS-CoV-2 infections are decreasing as more people develop immunity through natural infection or vaccination.
However, variants may evade infection and vaccine-induced antibodies,
especially with infections occurring months after vaccination (ie, breakthrough infections).
Given breakthrough infections, increased transmission of some variants, and the lack of full vaccination among eligible people, we can predict that SARS-CoV-2 will continue to evolve. Whether SARS-CoV-2 is evolving toward more severe or more benign COVID-19 phenotypes is a pressing research question for genomic epidemiology.Effective countermeasures depend on understanding SARS-CoV-2 lineages, such as sampling variants for which phenotype is not fully understood
and addressing sampling bias.
For example, if we restrict sequencing viral isolates from hospitalized patients, the relationships between any variables associated with hospitalization will be distorted when compared to the general population. Thus, we would miss mutations associated with asymptomatic and symptomatic cases that did not require hospitalization, which could lead to inducing or misinterpreting the evidence for phenotype-genotype associations.Brito et al
analyzed the spatiotemporal heterogeneity in each country’s SARS-CoV-2 genomic surveillance efforts based on metadata submitted to GISAID until May 30, 2021. These researchers estimated that when the prevalence of a rare lineage is 2%, 300 cases would need to be sequenced to detect at least 1 genome of that lineage with 95% probability. Therefore, sequencing capacity should be at least 0.5% of cases per week when incidence is >100 positive cases per 100,000 people.Brito et al
observed that countries like Denmark, which have a quick turnaround for sequencing, processing, and sharing SARS-CoV-2 genomic data (<18 days) and a high sequencing rate (>32%), observe greater lineage diversity. Many variants may be missed when sampling rates are low. However, disparities in wealth, investment in research and training, coordination, and supply chain logistics affect the ability of countries to perform genomic surveillance, especially LMICs. Therefore, efforts must be made to provide funds, training, and logistic support for researchers based in LMICs to improve their genomic surveillance capacity and public-health decision making.
How do we classify the variants of SARS-CoV-2?
Any genome sequence that is genetically distinct from the reference can be called a variant. In practice, the SARS-CoV-2 variants represent clades that share a set of key mutations while still permitting a small amount of other sequence variation.
Moreover, convergent evolution among geographically distant variants has been observed (Table 1).
Although variants and strains are different, some researchers use these terms interchangeably (eg, Awadasseid et al,
Hossein et al,
and Ul-Rahman et al
). The term “strain” is typically associated with lineages that became sufficiently divergent to exhibit a changed phenotype.
Table 1.
Notable Variants of SARS-CoV-2 and Their Main Attributes
Note. SIG, US government SARS-CoV-2 Interagency Group; VBM, variant being monitored; VOC, variant of concern; VOI, variant of interest; VUM, variants under monitoring; EUA, emergency use authorization.
This table was modified and updated from the WHO website,
the CDC website,
Rambaut et al,
and Soh et al.
SIG and WHO classifications are detailed in Table 2.
Notable Variants of SARS-CoV-2 and Their Main AttributesNote. SIG, US government SARS-CoV-2 Interagency Group; VBM, variant being monitored; VOC, variant of concern; VOI, variant of interest; VUM, variants under monitoring; EUA, emergency use authorization.This table was modified and updated from the WHO website,
the CDC website,
Rambaut et al,
and Soh et al.
SIG and WHO classifications are detailed in Table 2.
Table 2.
Comparing the Different Categories in the WHO Variant Classification System
With the System Used by the US government SARS-CoV-2 Interagency Group (SIG)
,[a]
SIG
WHO
Category
Potential Attributes
Category
Working Definition
VBM
Variants for which data indicate a potential or clear impact on approved or authorized medical countermeasures or that have been associated with more severe disease or increased transmission but are no longer detected or are circulating at very low levels in the United States, and as such, do not pose a significant and imminent risk to public health in the United States
VUM
A SARS-CoV-2 variant with genetic changes that are suspected to affect virus characteristics with some indication that it may pose a future risk, but evidence of phenotypic or epidemiological impact is currently unclear, requiring enhanced monitoring and repeat assessment pending new evidence
VOI
Specific genetic markers that are predicted to affect transmission, diagnostics, therapeutics, or immune escape
VOI
With genetic changes that are predicted or known to affect virus characteristics such as transmissibility, disease severity, immune escape, diagnostic or therapeutic escape ANDidentified to cause significant community transmission or multiple COVID-19 clusters, in multiple countries with increasing relative prevalence alongside increasing number of cases over time, or other apparent epidemiological impacts to suggest an emerging risk to global public health
Evidence that it is the cause of an increased proportion of cases or unique outbreak clusters
Limited prevalence or expansion in the United States or in other countries
VOC
Evidence of impact on diagnostics, treatments, or vaccines
VOC
Increase in transmissibility or detrimental change in COVID-19 epidemiology OR increase in virulence or change in clinical disease presentation OR decrease in effectiveness of public health and social measures or available diagnostics, vaccines, therapeutics.
Evidence of increased transmissibility
Evidence of increased disease severity
Evidence of increased transmissibility
Evidence of increased disease severity
VOHC
Impact on medical countermeasures (MCM)
(No equivalent WHO category)
Demonstrated failure of diagnostic test targets.
Evidence to suggest a significant reduction in vaccine effectiveness, a disproportionately high number of infections in vaccinated persons, or very low vaccine-induced protection against severe disease
Significantly reduced susceptibility to multiple EUA or approved therapeutics
More severe clinical disease and increased hospitalizations
Note. VBM, variant being monitored; VOC, variant of concern; VOI, variant of interest; VUM, variants under monitoring; VOHC, variant of high consequence; EUA, emergency use authorization.
Currently, no variants are being classified as VOI or VOHC by the CDC and SIG.
In late 2020 and throughout 2021, as vaccine availability increased, information on variants began to dominate the COVID-19 response.
The emergence of variants that might pose an increased risk to global public health prompted the WHO to characterize specific variants of interest (VOIs) and variants of concern (VOCs) to prioritize global monitoring and research.
The US government SARS-CoV-2 interagency group (SIG) developed a separate variant classification scheme,
which we compare to the WHO system in Table 2.Comparing the Different Categories in the WHO Variant Classification System
With the System Used by the US government SARS-CoV-2 Interagency Group (SIG)
,[a]Note. VBM, variant being monitored; VOC, variant of concern; VOI, variant of interest; VUM, variants under monitoring; VOHC, variant of high consequence; EUA, emergency use authorization.Currently, no variants are being classified as VOI or VOHC by the CDC and SIG.In March 2021, the WHO assigned letters of the Greek alphabet to categorize VOIs and VOCs,
for simplicity and to avoid association with particular localities. These labels do not replace existing classifications by GISAID (https://gisaid.org/),
Nextstrain (https://nexstrain.org/),
and Pango lineages (https://cov-lineages.org/).
SARS-CoV-2 variants were reviewed by Harvey et al.
Why are vaccines still not enough against COVID-19?
The speed of development and testing of COVID-19 vaccines development is one of history’s most outstanding public health achievements. Vast vaccination of eligible individuals is the best and safest way to control the pandemic.
Although some SARS-CoV-2 variants show a degree of escape from protective antibodies induced by natural infection (and, to a lesser degree, after immunization), T-cell responses are retained.
Furthermore, first-generation SARS-CoV-2 mRNA-based vaccines induce public antibodies (ie, antibodies with similar genetic elements and modes of recognition against a different antigen observed in multiple individuals) with robust neutralizing and potentially durable protective activity against variants such as alpha (α), beta (β), and gamma (γ).SARS-CoV-2 variants will continue to emerge,
requiring close international monitoring to determine the need for vaccination boosters and or redesign.
As variants emerge in areas of low vaccination, a global COVID-19 vaccination rollout is imperative. Since the vaccine rollout, new questions have arisen regarding vaccine efficacy against the transmission of different variants,
the duration of protection,
and the efficacy of prime-boost schedules.
A demand has also arisen for studies to determine the immunological correlates of protection against COVID-19 as cases decline and prevention of severe disease gains more importance in vaccine efficacy.
Meanwhile, nonpharmaceutical interventions to reduce the spread of SARS-CoV-2 and other pathogens are still warranted.
How can we bridge the knowledge gap between disease origin and transmission?
Genomic epidemiology can be a tool to study emerging infectious diseases (EIDs) in humans, but its effectiveness is maximized when it accounts for animal and environmental components. In the case of zoonosis, there is a knowledge gap between the animal and human components of EID research, and One Health can bridge this gap.Although most human health researchers have only started focusing on coronaviruses since the emergence of SARS-CoV-2, veterinarians, virologists, and zoologists have been researching animal coronaviruses long before the COVID-19 epidemic.
One Health proposes placing these realms of research (on humans and animals) in the same environmental context. The next steps in pandemic prevention science are to understand factors that create opportunities for zoonosis,
such as entering infectious habitats such as bat caves and the use of wildlife as food and medicine.Deep sequencing the microbiomes and viromes of taxonomically, geographically, and temporally deep biorepository archives of putative host animals will serve as the basis of new approaches to zoonosis, risk assessment, and threat mitigation.
Therefore, another step toward furthering the One Health approach is leveraging biorepositories in biomedical research. Although the Global Museum initiative already offers a route of international integration among museum biorepositories in a decentralized and geographically dispersed network,
the link to EID research is still not fully realized.The recent creation of the Museums and Emerging Pathogens in the Americas network (MEPA) is vital for linking biorepositories and EID research.
The overarching goal of the MEPA is to leverage museum biorepositories in a global, decentralized pathogen surveillance system by expanding biodiversity infrastructure and opening communication channels that foster collaboration among biorepositories and biomedical communities.The need for this host-based approach to genomic epidemiology is made evident by the transmissible nature of SARS-CoV-2,
which has the potential to infect a range of hosts, including tigers,
minks,
domestic cats,
ferrets,
raccoon dogs,
cynomolgus and rhesus macaques,
rabbits,
Egyptian fruit bats,
Syrian hamsters,
and white-tailed deer.
How can we track SARS-CoV-2 variants faster?
Vaccines are still effective in preventing severe outcomes against all SARS-CoV-2 variants,
which are ravaging unvaccinated people.
However, the likelihood of new mutations increases as cases rise, possibly leading to enhanced transmission, immune escape, or increased pathogenicity. This process has resulted in more transmissible variants.Researchers face 2 main challenges in keeping pace with SARS-CoV-2 variants: using resources at optimal capacity and lowering barriers to technology and training in genomic epidemiology across the world. On the one hand, countries with a high positivity rate, like India, are not sequencing isolates at full capacity.
The United States is an even more extreme example because it has ranked low in SARS-CoV-2 sequencing despite its capacity and expertise.
On the other hand, countries like South Africa have sequencing laboratories struggling with reagent shortages and the scarcity of trained scientists.Global efforts to strengthen pathogen sequencing capacity are still required to respond to technical, logistical, and financial challenges in resource-limited settings despite increased sequencing feasibility. Moreover, good SARS-CoV-2 sequencing performance for some LMICs (eg, Democratic Republic of the Congo, Brazil, Senegal, and Thailand) further encourages international and domestic collaboration among public health authorities, healthcare facilities, academia, and industries.Additional challenges include consistent handling of isolates as well as metadata and sequence data curation and deposition in a way that facilitates combining data sets from different laboratories. These challenges require coordinated efforts
and data standards
to guarantee rapid access to large volumes of raw and processed molecular data at unprecedented scales.We also need to address bioinformatics bottlenecks to respond faster to the threat of emergent diseases and to manage the fast-paced production of genomic information. Most tools are co-opted from evolutionary biology’s arsenal to study the lineages of higher taxa with exemplar approaches.
Although these tools were not designed to manage big data from rapidly evolving pathogens,
some have already started to respond to these demands. For example, the ultrafast sample placement on existing trees (UShER) enables the rapid placement of novel genomes into a reference tree using the parsimony optimality criterion.
Thus, as phylogenetic principles underpin how we view genetic changes over time, One Health will also include the exchange of knowledge among evolutionary biologists and epidemiologists.Phylogenetic trees are hard to compute and interpret. The need to consult professional phylogeneticists is made plain by the number of prominent papers that did not adhere to the standards of phylogenetics and failed to identify the fundamental hosts of coronaviruses.
Moreover, a good phylogenetic analysis requires many elements: careful choice of the collected taxa, sequence, and or phenotypic data; method and quality control of sequence data and alignment; evaluation of substitution and indel models; treatment of partitions; tree-search protocol; measures of fit or confidence; and strategies for character coding and optimization.
Moreover, results may vary with parameterization.
These are only a few of the difficult decisions that go way beyond the level of sophistication of any software manuals and automated systems.
Are trees mapped to globes always needed?
In many cases, such as the initial spread of H5N1 influenza, trees and Supramaps were very useful to understand the geographic spread of the pathogen, its multiple geographically and mutationally distinct patterns of zoonosis,
and drug resistance.
However, due to occlusion, Supramaps were not suitable for the visualization of cosmopolitan diseases, such as strains of Salmonella (eg, Hoffman et al
), seasonal influenza (eg, H3N2), pandemic influenza (H1N1-2009),
and SARS-CoV-2. In response, researchers have worked on alternative visualization tools, including pointmaps and route maps
and eventually moved beyond the need for mapping trees to globes with Strainhub.Unlike Supramap, Strainhub is less computationally demanding. It can be executed from a web browser; it does not depend on closed source software (Google Earth), and geographical data are optional (Fig. 5). Moreover, Strainhub can be used to test hypotheses on the relative importance of hosts or places in disease spread. Future efforts for Strainhub will focus on usability, interoperability, visual clarity, and quantification of the relative importance of hosts or places in the spread of disease to better understand zoonosis.
Fig. 5.
Comparison between Supramap and Strainhub visualizations. (a) Supramap phylogenetic visualization of bat-hosted and pangolin-hosted coronaviruses that share recent ancestry (2005–2019) with human-hosted SARS-CoV-2. The underlying data are genomic sequences, temporal and geographic metadata. (b) Strainhub visualization of the same data plus host metadata in a network using arbitrary space. Arrow colors correspond to different types of transmission (red = bat to human, green = bat to bat, yellow = bat to pangolin). The size of the circle represents the source hub ratio (SHR). SHR is the number of transitions originating from a node as a fraction of the total number of transitions related to that node. A node scoring SHR close to 1 indicates a source (eg, Hubei, Yunnan, and Zhejiang), SHR close to 0.5 a hub and SHR close to 0 a sink for the pathogen. The thickness of the line represents a higher frequency of viral transmission (eg, Hubei to Zhejiang).
Comparison between Supramap and Strainhub visualizations. (a) Supramap phylogenetic visualization of bat-hosted and pangolin-hosted coronaviruses that share recent ancestry (2005–2019) with human-hosted SARS-CoV-2. The underlying data are genomic sequences, temporal and geographic metadata. (b) Strainhub visualization of the same data plus host metadata in a network using arbitrary space. Arrow colors correspond to different types of transmission (red = bat to human, green = bat to bat, yellow = bat to pangolin). The size of the circle represents the source hub ratio (SHR). SHR is the number of transitions originating from a node as a fraction of the total number of transitions related to that node. A node scoring SHR close to 1 indicates a source (eg, Hubei, Yunnan, and Zhejiang), SHR close to 0.5 a hub and SHR close to 0 a sink for the pathogen. The thickness of the line represents a higher frequency of viral transmission (eg, Hubei to Zhejiang).
How do we prepare for the next pandemic?
The COVID-19 pandemic has illustrated how unprepared our interconnected global society is for zoonotic disease. For the next pandemic, 2 frontiers of investigation are interesting for genomic epidemiology as a tool to survey microbes of pandemic potential to predict, prevent, or respond faster to the emergence of new disease.First, we must survey the natural diversity of coronaviruses and other microbes of pandemic potential present within animals.
Second, we must develop the science of pandemic prevention by moving from tracking pandemics that are occurring to predicting outbreaks. For example, combining artificial intelligence with genomic epidemiology can lead constructing a “viral forecast“ to inform decisions about viruses with pandemic potential.
Moreover, we have proposed a novel mathematical modeling framework based on agent-based modeling to predict pathogen patch dynamics underlying zoonosis.
Final remarks
The COVID-19 pandemic, while ongoing, has caused 4,863,818 deaths worldwide as of October 14, 2021,
and it has surpassed the US death toll from the 1918–1919 H1N1 pandemic, which was ∼675,000. As SARS-CoV-2 becomes endemic, we must remember that it is not as lethal as other pathogens such as H5N1 influenza or Nipah virus. In its last 100 years of existence, smallpox killed 300 million people, and Variola major (the major variant of smallpox) killed 30% of these patients.A novel pathogen at 30% mortality infecting 50% of the US population (166.7 million) would have resulted in 50 million deaths. MERS-CoV, henipaviruses, and hantavirus all have high mortality (>30%) and virulence with no approved vaccines or antivirals available. The 2018 Nipah outbreak had a 91% case-fatality rate, claiming 21 lives.
We must heed the warning that pathogens with more severe disease phenotypes than SARS-CoV-2 could resultin a far more devastating pandemic.
Authors: Thomas G Ksiazek; Dean Erdman; Cynthia S Goldsmith; Sherif R Zaki; Teresa Peret; Shannon Emery; Suxiang Tong; Carlo Urbani; James A Comer; Wilina Lim; Pierre E Rollin; Scott F Dowell; Ai-Ee Ling; Charles D Humphrey; Wun-Ju Shieh; Jeannette Guarner; Christopher D Paddock; Paul Rota; Barry Fields; Joseph DeRisi; Jyh-Yuan Yang; Nancy Cox; James M Hughes; James W LeDuc; William J Bellini; Larry J Anderson Journal: N Engl J Med Date: 2003-04-10 Impact factor: 91.245
Authors: Daniel A Janies; Laura W Pomeroy; Jacob M Aaronson; Samuel Handelman; Jori Hardman; Kevin Kawalec; Thomas Bitterman; Ward C Wheeler Journal: Cladistics Date: 2012-05-21 Impact factor: 5.254
Authors: Yi Jun Ruan; Chia Lin Wei; Ai Ling Ee; Vinsensius B Vega; Herve Thoreau; Se Thoe Yun Su; Jer-Ming Chia; Patrick Ng; Kuo Ping Chiu; Landri Lim; Tao Zhang; Chan Kwai Peng; Ean Oon Lynette Lin; Ng Mah Lee; Sin Leo Yee; Lisa F P Ng; Ren Ee Chee; Lawrence W Stanton; Philip M Long; Edison T Liu Journal: Lancet Date: 2003-05-24 Impact factor: 79.321