| Literature DB >> 34702903 |
Anna Bernasconi1, Lorenzo Mari2, Renato Casagrandi2, Stefano Ceri2.
Abstract
Since its emergence in late 2019, the diffusion of SARS-CoV-2 is associated with the evolution of its viral genome. The co-occurrence of specific amino acid changes, collectively named 'virus variant', requires scrutiny (as variants may hugely impact the agent's transmission, pathogenesis, or antigenicity); variant evolution is studied using phylogenetics. Yet, never has this problem been tackled by digging into data with ad hoc analysis techniques. Here we show that the emergence of variants can in fact be traced through data-driven methods, further capitalizing on the value of large collections of SARS-CoV-2 sequences. For all countries with sufficient data, we compute weekly counts of amino acid changes, unveil time-varying clusters of changes with similar-rapidly growing-dynamics, and then follow their evolution. Our method succeeds in timely associating clusters to variants of interest/concern, provided their change composition is well characterized. This allows us to detect variants' emergence, rise, peak, and eventual decline under competitive pressure of another variant. Our early warning system, exclusively relying on deposited sequences, shows the power of big data in this context, and concurs to calling for the wide spreading of public SARS-CoV-2 genome sequencing for improved surveillance and control of the COVID-19 pandemic.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34702903 PMCID: PMC8548498 DOI: 10.1038/s41598-021-00496-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Data-driven identification of variants in Japan. (a–c), left panels: Four hits in Japan (two of which occurring in the same week), corresponding to clusters JP_008 (Jun-wk4-20), JP_025 (Aug-wk3-20), JP_093, and JP_098 (both observed in Mar-wk1-21). Central panels: Cluster-dictionary similarity () for the four early hits showing, at time of detection, to strongly match ( in all cases) the four lineages B.1.1.284 (JP2), B.1.1.214 (JP1), R.1 (JP3), and B.1.1.7 (Alpha); values are not shown. Right panels: Community detection applied to the within-cluster change co-occurrence matrix evaluated over the period of time between the emergence of two subsequent variants. The resulting interaction network is plotted using a force-directed layout[65], with edge lengths inversely proportional to link weights. Colors indicate nodes belonging to communities that strongly match known lineages, other communities are in gray-scale shades. (d–g): Temporal dynamics of the four identified variants. In each scatter plot, the left y-axis value represents the average prevalence of the changes in the cluster-dictionary intersection, circle size represents the cluster-dictionary similarity ( not shown), while circle color is proportional to the log-ratio between the number of changes in the cluster and in the dictionary. Warnings are marked with a labeled arrow. In the background, the number of reported COVID-19 infections (thousand cases, right axis) in Japan[66]. Plots were created using MATLAB R2021a (http://www.mathworks.com). All graphics were further processed using Adobe Illustrator 2021 (http://www.adobe.com).
Figure 2Emergence of variants in their country of origin. Details of the variant dynamics plots shown within insets as in panels (d–g) of Fig. 1. The vertical dashed lines indicate the dates of hits using our method. Estimates of reported cases are taken from Johns Hopkins/Our World in Data[66], except for the US, for which data comes from the Centers for Disease Control and Prevention[67]. The map was created using https://mapchart.net/ on 4 July 2021 and plots were created using MATLAB R2021a (http://www.mathworks.com). All graphics were further processed using Adobe Illustrator 2021 (http://www.adobe.com).
Figure 3Temporal dynamics of notable variants in European countries (a) and in the US (b). Color shading for European countries indicates the timing of emergence of the Alpha variant (the lighter, the later). Different colors code instead the couple/triple of variants (Epsilon, Iota, Alpha, and US1/US2 as aliases of B.1.2 and B.1.596—only US2 is shown here) that were detected and tracked in the US using our method. Inset details as in panels (d–g) of Fig. 1. The blank maps of Europe and US were retrieved from https://commons.wikimedia.org/wiki/File:Europe_political_chart_complete_blank.svg and https://commons.wikimedia.org/wiki/File:USA_blank.svg on 4 July 2021 and plots were created using MATLAB R2021a (http://www.mathworks.com). All graphics were further processed using Adobe Illustrator 2021 (https://www.adobe.com).
Capture of the emergence of the major WHO-named SARS-CoV-2 variants.
| WHO | Country of origin | Hit date | Early hit | 1st communication | Comparison (weeks) |
|---|---|---|---|---|---|
| Alpha | United Kingdom | Dec-wk1-20 | Dec-wk3-20 | 2 Weeks earlier | |
| Beta | South Africa | Nov-wk1-20 | Dec-wk3-20 | 6 Weeks earlier | |
| Gamma | Brazil | Jan-wk3-21 | Jan-wk1-21 | 2 Weeks later | |
| Delta | India | Apr-wk3-21 | Apr-wk3-21 | Same time | |
| Epsilon | California (USA) | Nov-wk4-20 | Jan-wk3-21 | 7 Weeks earlier | |
| Zeta | Brazil | Dec-wk2-20 | Dec-wk4-20 | 2 Weeks earlier | |
| Iota | New York (USA) | Feb-wk2-21 | Feb-wk2-21 | Same time | |
| Kappa | India | Feb-wk4-21 | Mar-wk4-21 | 4 Weeks earlier |
WHO variant name; its country of origin; hit date, corresponding to the warning date for early hits, extended by a short delay of two weeks for Beta and one week for Iota (late hits, see “Methods” section); first communication date retrieved from institutional or research outlets (see “Methods” section); temporal comparison between hit date and 1st communication (in weeks).