| Literature DB >> 32412415 |
Carla Mavian1,2, Simone Marini1,3, Mattia Prosperi3, Marco Salemi1,2.
Abstract
BACKGROUND: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non-peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution.Entities:
Keywords: covid-19; evolution; genetics; genome; infectious disease; pandemic; phylogenetics; sars-cov-2; sequence; tracing; tracking; transmission; virus
Mesh:
Year: 2020 PMID: 32412415 PMCID: PMC7265655 DOI: 10.2196/19170
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Figure 1Snapshots of genomes and confirmed cases on March 30, 2020 (panel a) and April 24, 2020 (panel b). On a logarithmic scale, the x-axis reports the confirmed cases, while the y-axis reports the number of genomes +1. Each dot represents a country; the dot color indicates the number of genomes, and the dot size is proportional to the country population.
Figure 2Cladograms of SARS-CoV-2 subclades. Cladograms were extracted from maximum likelihood phylogenies rooted by enforcing a molecular clock. The colored branches represent the country of origin of the sampled sequences (tip branches) and the ancestral lineages (internal branches). The numbers at the nodes indicate ultrafast bootstrap support (only >90% values are shown). (a) Cladogram of a monophyletic clade within the SARS-CoV-2 maximum likelihood tree inferred from sequences available on March 3, 2020 (Figure S1 in Multimedia Appendix 3). The subclade including sequences from Italy and Germany, named Subclade A, is highlighted. (b) Cladogram of sub-clade A of the SARS-CoV-2 maximum likelihood tree including additional sequences that became available on March 10, 2020 (Figure S2 in Multimedia Appendix 3). Each bidirectional arrow and corresponding number connects two tip branches that were switched to generate an alternative tree topology to be tested (Table 1).
Testing of alternative topologies.
| Alternative topologya | Switched branches | LogLb | ∆Lc | |
| 1 | Italy with Wales | –45443.2 | 0.0000 | .24 |
| 2 | Germany with Brazil | –45451.5 | 8.3554 | .16 |
| 3 | Portugal with Brazil | –45443.2 | 0.0002 | .75 |
| 4 | Germany with Portugal | –45451.5 | 8.3197 | .16 |
aAlternative topologies were obtained by switching branches in the maximum likelihood tree inferred from SARS-CoV-2 full genome sequences. 1) Italy (EPI_ISL_412973) switched with Wales (EPI_ISL_413555); 2) Germany (EPI_ISL_406862) with Brazil (EPI_ISL_412964); 3) Portugal (EPI_ISL_413648) with Brazil (EPI_ISL_412964); 4) Germany (EPI_ISL_406862) with Portugal (EPI_ISL_413648).
bLogL: log likelihood estimated for each alternative topology.
c∆L: difference between LogL and the log likelihood of the original tree.
dCalculated with the Shimodaira-Hasegawa test [32].