| Literature DB >> 35794893 |
Laura Hughes, Karthik Gangavarapu, Alaa Abdel Latif, Julia Mullen, Manar Alkuzweny, Emory Hufbauer, Ginger Tsueng, Emily Haag, Mark Zeller, Christine Aceves, Karina Zaiets, Marco Cano, Jerry Zhou, Zhongchao Qian, Rachel Sattler, Nathaniel Matteson, Joshua Levy, Raphael Lee, Lucas Freitas, Sebastian Maurer-Stroh, Marc Suchard, Chunlei Wu, Andrew Su, Kristian Andersen.
Abstract
The emergence of SARS-CoV-2 variants of concern has prompted the need for near real-time genomic surveillance to inform public health interventions. In response to this need, the global scientific community, through unprecedented effort, has sequenced and shared over 11 million genomes through GISAID, as of May 2022. This extraordinarily high sampling rate provides a unique opportunity to track the evolution of the virus in near real-time. Here, we present outbreak.info, a platform that currently tracks over 40 million combinations of PANGO lineages and individual mutations, across over 7,000 locations, to provide insights for researchers, public health officials, and the general public. We describe the interpretable and opinionated visualizations in the variant and location focussed reports available in our web application, the pipelines that enable the scalable ingestion of heterogeneous sources of SARS-CoV-2 variant data, and the server infrastructure that enables widespread data dissemination via a high performance API that can be accessed using an R package. We present a case study that illustrates how outbreak.info can be used for genomic surveillance and as a hypothesis generation tool to understand the ongoing pandemic at varying geographic and temporal scales. With an emphasis on scalability, interactivity, interpretability, and reusability, outbreak.info provides a template to enable genomic surveillance at a global and localized scale.Entities:
Year: 2022 PMID: 35794893 PMCID: PMC9258294 DOI: 10.21203/rs.3.rs-1723829/v1
Source DB: PubMed Journal: Res Sq
Figure 1.outbreak.info enables the exploration of genomic data across three dimensions. a, Growth rate of a lineage is a function of epidemiology and intrinsic biological properties of a lineage. Further, epidemiology varies over time and by geography while intrinsic biological properties are determined by the mutations present in a given lineage. b, Genomic data is ingested from GISAID, processed using the custom-built data pipeline, Bjorn, and stored on a server which can be accessed via an Application Programming Interface (API). The API is consumed by two clients: A JavaScript based web client and an R package that provides programmatic access by authenticating against GISAID credentials. c, The web interface contains three tools that allow exploration of genomic data across three different dimensions: lineage/mutation, time, and geography.
Questions addressed by the Lineage and/or Mutation Tracker
| Question | Relevant visual elements |
|---|---|
| What is the prevalence of a set of mutations within different lineages? | Mutations such as S:N501Y, S:DEL69/70, and S:E484K have been shown to have functional impact on the phenotype exhibited by a lineage such as increased pathogenicity or immune evasion[ |
| What is the trend shown by the prevalence of a lineage and/or a set of mutations over time? | Tracking the growth rate of a lineage or a set of mutations over time is very important to inform public health interventions. We estimate the prevalence of a given query as a proportion of the total number of sequences collected on a given day at a given location. To convey the uncertainty in estimating the prevalence, we calculate binomial proportion confidence intervals using Jeffrey’s interval ( |
| What are the “characteristic mutations” of a lineage? | The mutations that are characteristic of a lineage can be used to generate hypotheses about the phenotype exhibited by a lineage based on prior studies on the functional impact of mutations. This is especially important to assess any potential impact a lineage might have on therapeutics such as monoclonal antibody drugs. We define the “characteristic mutations” of a lineage as those mutations found in at least 75% of the genomes classified as the lineage ( |
| What is the total number of sequences that belong to a lineage and/or a set of mutations? | In order to assess how quickly a variant spread and the extent of the geographic spread, we show summary of relevant statistics such as the total number of sequences that match the query, the cumulative prevalence of these mutations, the first and last date a sequence matching the query was detected worldwide for a customizable set of locations ( |
| What is the geographic prevalence of a lineage and/or a set of mutations? | Many lineages including VOCs Beta and Gamma show variation in growth rates across different locations. Hence, it is essential to be able to access the geographic distribution of a given lineage. To facilitate this, we show the cumulative prevalence of lineages since they were first detected across the sub-admin levels of a given location for a lineage/mutation query ( |
| What is the latest research available on this lineage and/or set of mutations? | With the growth of new variants over the pandemic, we have seen many studies that focus on important aspects of a lineage such as the ability to evade immune response and the impact on vaccine efficacy. In order to aid in the discoverability of preprints, publications, datasets and other resources, we show the entries that match a given lineage or mutation query from our up-to-date Research Library [ |
Questions addressed by the Location Tracker
| Question | Relevant visual elements |
|---|---|
| What are the most prevalent lineages over the last 60 days? | In order to quickly provide a snapshot of the lineages currently circulating in a given location, we show a stream graph of the prevalence of lineages over the last 60 days ( |
| What is the distribution of mutations across these lineages? | The Location Tracker shows a snapshot of currently circulating lineages which will help identify a newly emerging lineage that exhibits a high relative growth rate. Often in such cases, the mutations found in the lineage might provide preliminary evidence on phenotypes exhibited by the virus such as increased transmissibility or immune evasion. To facilitate this process, we show the prevalence of mutations that are present in the spike gene of at least 75% of the sequences of currently circulating lineages ( |
| How does the prevalence of different lineages or mutations within this location change over time? | In addition to showing a snapshot of the lineages circulating over the last 60 days, we developed a component to show the temporal variation in the prevalence of a customizable set of lineages/mutations for a given location. This offers additional flexibility to dynamically select lineages or mutations of interest and compare their prevalence over time with a customizable time window ( |
| How does the lineage prevalence over time correspond to the number of daily reported cases in this region? | The impact of lineage dynamics on the reported cases over time is of primary concern to public health. To accomplish this, we cross-linked the reported cases for each location using a standardized location identifier, and this is shown in a line graph below the prevalence of a lineage ( |
Figure 4.Prevalence of Variants of Concern: Alpha, Beta, Gamma, Delta, and Omicron lineages over time in the (a) Worldwide, (b) South Africa, (c) Brazil, and (d) United States. Lineages with a prevalence over 3% over the last 60 days in (e) Denmark, (f) United Kingdom, (g) United States, and (h) South Africa.
Figure 5.Software infrastructure of outbreak.info. The infrastructure can be broadly divided into (1) Data ingestion pipelines, (2) Server-side hosting the database and API server, and (3) Client-side applications that use the API from the server.
Figure 3.Location report.
a, Relative prevalence of all lineages over time in South Africa. Total number of sequenced samples collected per day are shown in the bar chart below. b, Relative cumulative prevalence of all lineages over the last 60 days in South Africa. c, Mutation prevalence across the most prevalent lineages in South Africa over the last 60 days. d, Comparison of the prevalence of VOCs grouped by WHO classification: Alpha, Beta, Delta, and Omicron over time in South Africa. e, Daily reported cases in South Africa are shown in the line chart below.
Figure 6.Flowchart describing the steps in Bjorn.