| Literature DB >> 32306500 |
Denis C Bauer1,2, Aidan P Tay1, Laurence O W Wilson1, Daniel Reti1, Cameron Hosking1, Alexander J McAuley3, Elizabeth Pharo3, Shawn Todd3, Vicky Stevens4, Matthew J Neave4, Mary Tachedjian3, Trevor W Drew4, Seshadri S Vasan3,5.
Abstract
Pre-clinical responses to fast-moving infectious disease outbreaks heavily depend on choosing the best isolates for animal models that inform diagnostics, vaccines and treatments. Current approaches are driven by practical considerations (e.g. first available virus isolate) rather than a detailed analysis of the characteristics of the virus strain chosen, which can lead to animal models that are not representative of the circulating or emerging clusters. Here, we suggest a combination of epidemiological, experimental and bioinformatic considerations when choosing virus strains for animal model generation. We discuss the currently chosen SARS-CoV-2 strains for international coronavirus disease (COVID-19) models in the context of their phylogeny as well as in a novel alignment-free bioinformatic approach. Unlike phylogenetic trees, which focus on individual shared mutations, this new approach assesses genome-wide co-developing functionalities and hence offers a more fluid view of the 'cloud of variances' that RNA viruses are prone to accumulate. This joint approach concludes that while the current animal models cover the existing viral strains adequately, there is substantial evolutionary activity that is likely not considered by the current models. Based on insights from the non-discrete alignment-free approach and experimental observations, we suggest isolates for future animal models.Entities:
Keywords: COVID-19; PHEIC; alignment-free phylogeny; bioinformatics; genomics; viral evolution
Mesh:
Year: 2020 PMID: 32306500 PMCID: PMC7264654 DOI: 10.1111/tbed.13588
Source DB: PubMed Journal: Transbound Emerg Dis ISSN: 1865-1674 Impact factor: 4.521
FIGURE 1Illustration of coronavirus spread while it accumulates mutations. The dark blue arrows represent the main volume of transmissions, while the nucleic acid symbol illustrates mutations acquired by the different viral strains as they enter humans from a primary/reservoir host (represented by the bat symbol) through an intermediate host (which is yet to be identified for SARS‐CoV‐2). The first human SARS‐CoV‐2 isolate sequenced (with orange and pink mutation) may not have been the original strain that first infected humans (grey). It is possible that a strain sequenced later (green) may be genetically closer to the original strain. In this scenario, the original strain has not been captured through sequencing at all. It also shows that there may be two currently circulating strains (orange‐pink‐purple and orange‐pink‐brown), which in turn might be different from the most virulent one (orange‐pink‐blue). In the absence of clinical data correlated with SARS‐CoV‐2 genome isolates, bioinformatic analysis (represented by the computer symbol) can identify clusters and consensus sequences to investigate the genetic diversity of the emerging SARS‐CoV‐2 strains
FIGURE 2Phylogenetic tree highlighting isolates of interest with branch points of the six clusters labelled to indicate mature (orange) and emerging (yellow) disease clusters (full list of identical sequences for these branch points are in Table S1 and complete image in Figure S5)
Mutations characterizing phylogenetic clusters
| Cluster | Mutations in common | Diversity within cluster |
|---|---|---|
| C1 (Wuhan‐hu‐1) | Reference strain | 107 unique mutations |
| C2 (Vic01, France/IDF0372, Sydney/3) | G26144T | 31 unique mutations |
| C3 (Australia/NSW01, USA/WA1) | C8782T, T28144C | 68 unique mutations |
| C4 | C241T, C3037T, A23403G, C14408T, GGG28881AAC, | 9 unique mutations |
| C5 | C8782T, T28144C, C24034T, T26729C, G28077C | 7 unique mutations |
| C6 | G11083T, G1397A, T28688C, G29742T | 10 unique mutations |
Positions are relative to trimmed alignments (see Methods for more details). The following sequences were excluded from the analysis: Beijing/IVDC‐BJ‐005, Shenzhen/SZTH‐001, Shenzhen/SZTH‐004 and Wuhan/HBCDC‐HB‐04 because their high number of mutations is likely due to sequencing errors. Cluster diversity of C1 includes the diversity of cluster C4 and C6, and cluster diversity of C3 includes the cluster diversity of C5.
Hamming Distance Mutation analysis (trimmed) relative to Wuhan‐HU‐1
| Strain | Shorthand | Condensed/Uncondensed (in core) | Core identical sequences | Interest |
|---|---|---|---|---|
| BetaCoV/USA/WA1/2020|EPI_ISL_404895 | USA/WA1 | 5/24 (3/3) | 4 | Animal model |
| BetaCoV/Germany/BavPat1/2020|EPI_ISL_406862 | BavPat1 | 7/124 (3/3) | 2 | Animal model |
|
Human 2019‐nCoV Human 2019‐nCoV 026V‐03883 | 7/71 (3/3) | 2 | Animal model | |
|
BetaCoV/Australia/VIC01/2020|EPI_ISL_406844 | Vic01 | 4/13 (4/13) | Animal model | |
|
BetaCoV/Canada/ON‐VIDO‐01/2020|EPI_ISL_413015 | Canada/ON‐VIDO‐01 | 8/27 (5/13) | Animal model, deletion | |
|
BetaCoV_France_IDF0372_2020_C2 | France/IDF0372 | 4/31 (2/2) | 4 | Animal model |
| BetaCoV/Sydney/2/2020|EPI_ISL_408976 | Syd02 | 7/164 (3/43) | Deletion | |
| BetaCoV/Sydney/3/2020|EPI_ISL_408977 | Syd03 | 5/122 (1/1) | ||
| BetaCoV/USA/CA6/2020|EPI_ISL_410044 | USA/CA6 | 4/45 (24/2) | Deletion | |
| BetaCoV/Japan/AI/I‐004/2020|EPI_ISL_407084 | Japan/AI/I‐004 | 6/57 (26/3) | Deletion | |
| BetaCoV/Australia/NSW01/2020|EPI_ISL_407893 | NSW01 | 6/123 (2/2) | 5 | |
| BetaCoV/Chongqing/IVDC‐CQ‐001/2020|EPI_ISL_408481 | Chongqing/IVDC‐CQ‐001 | 3/22 (1/1) | 4 | Potential recombination with Sydney/3 |
| BetaCoV/Italy/INMI1‐cs/2020|EPI_ISL_410546 | Italy/INMI1‐cs | 5/39 (3/3) | 2 | Potential recombination result between Chongqing/1VDC‐CQ‐001 and Sydney/3 |
Table lists the isolate of note for this paper and collects the information from Tables S1 and S2 for easy access. The third column counts the number of differences to Wuhan‐HU‐1 for the full and (core sequences), in a condensed (deletions count as 1)/and full way.
FIGURE 3PCA plots showing the genomic signatures of different coronavirus sequences. Each point represents the genomic signature for an isolate. Inset Comparison of genomic signatures across different strains of coronavirus. Numbers correspond to the number of isolates at each location. Overall, the genomic signatures for isolates of different coronavirus strains were relatively far apart. Main image Zoomed in PCA plot of the cluster of SARS‐CoV‐2 isolates, showing the overall genomic signatures of the different strains
FIGURE 4Identification of potential viral strains for animal models. Phylogenetic methods (a) show that current animal models (highlighted in green) cover the major clusters (C1‐3) but may not capture the emerging clusters. A K‐mer based analysis (b) is able to suggest alternative strains that cover all emerging clusters (C4‐6). The inset shows the wider region with the main image extent marked by a rectangle