| Literature DB >> 35910777 |
Daniel Andrés Dos Santos1,2, María Celina Reynaga2, Juan Cruz González2, Gabriela Fontanarrosa2, María de Lourdes Gultemirian2,3, Agustina Novillo2, Virginia Abdala2,4.
Abstract
The structural proteins of coronaviruses portray critical information to address issues of classification, assembly constraints, and evolutionary pathways involving host shifts. We compiled 173 complete protein sequences from isolates belonging to the four genera of the subfamily Coronavirinae. We calculate a single matrix of viral distance as a linear combination of protein distances. The minimum spanning tree (MST) connecting the individuals captures the structure of their similarities. The MST re-capitulates the known phylogeny of Coronovirinae. Hosts were mapped onto the MST and we found a non-trivial concordance between host phylogeny and viral proteomic distance. We also study the chimerism in our dataset through computational simulations. We found evidence that structural units coming from loosely related hosts hardly give rise to feasible chimeras in nature. This work offers a fresh way to analyze features of SARS-CoV-2 and related viruses. ©2022 Dos Santos et al.Entities:
Keywords: Chimerism; Coronavirus; Evolutionary constraints; Host-virus interaction; Viral assembly; Viral proteomes; Virology; Zoonotic reservoirs
Year: 2022 PMID: 35910777 PMCID: PMC9332319 DOI: 10.7717/peerj.13700
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 3.061
Viruses attributes.
List of attributes for the 173 virus sampled from NCBI Virus database. The amino acid sequences of all the four structural proteins (E, M, N and S) are available for these individuals. The ID number is used to identify the isolate in the network of Figure 1. Information includes accession numbers, genome size (nucleotide number), host taxa, and title reported in GenBank for each sequence (very long titles were shortened for clarity).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1 |
| 29757 | NA | β-coronavirus | SARS coronavirus GD01 |
| 2 |
| 29760 | NA | β-coronavirus | SARS coronavirus GZ02 |
| 3 |
| 29518 |
| β-coronavirus | SARS coronavirus civet010 |
| 4 |
| 29525 |
| β-coronavirus | SARS coronavirus B039 |
| 5 |
| 29728 | NA | β-coronavirus | Bat SARS coronavirus HKU3-1 |
| 6 |
| 29736 |
| β-coronavirus | Bat SARS coronavirus Rp3 |
| 7 |
| 29711 | NA | β-coronavirus | bat SARS coronavirus HKU3-3 |
| 8 |
| 29709 |
| β-coronavirus | Bat SARS coronavirus Rf1 |
| 9 |
| 29749 |
| β-coronavirus | Bat SARS coronavirus Rm1 |
| 10 |
| 29704 | NA | β-coronavirus | Bat coronavirus (BtCoV/273/2005) |
| 11 |
| 29741 | NA | β-coronavirus | Bat coronavirus (BtCoV/279/2005) |
| 12 |
| 27550 |
| α-coronavirus | PRCV ISU-1 |
| 13 |
| 30286 |
| β-coronavirus | Bat coronavirus HKU4-1 |
| 14 |
| 30286 |
| β-coronavirus | Bat coronavirus HKU4-2 |
| 15 |
| 30286 |
| β-coronavirus | Bat coronavirus HKU4-3 |
| 16 |
| 30316 |
| β-coronavirus | Bat coronavirus HKU4-4 |
| 17 |
| 30482 |
| β-coronavirus | Bat coronavirus HKU5-1 |
| 18 |
| 30488 |
| β-coronavirus | Bat coronavirus HKU5-2 |
| 19 |
| 30488 |
| β-coronavirus | Bat coronavirus HKU5-3 |
| 20 |
| 30487 |
| β-coronavirus | Bat coronavirus HKU5-5 |
| 21 |
| 29114 |
| β-coronavirus | Bat coronavirus HKU9-1 |
| 22 |
| 29107 |
| β-coronavirus | Bat coronavirus HKU9-2 |
| 23 |
| 29136 |
| β-coronavirus | Bat coronavirus HKU9-3 |
| 24 |
| 29155 |
| β-coronavirus | Bat coronavirus HKU9-4 |
| 25 |
| 31017 |
| β-coronavirus | Bovine coronavirus E-AH65 |
| 26 |
| 30970 |
| β-coronavirus | Bovine coronavirus E-AH65-TC |
| 27 |
| 31016 |
| β-coronavirus | Bovine coronavirus R-AH65 |
| 28 |
| 30995 |
| β-coronavirus | Bovine coronavirus R-AH65-TC |
| 29 |
| 30995 |
| β-coronavirus | Bovine coronavirus E-AH187 |
| 30 |
| 30964 |
| β-coronavirus | Bovine coronavirus R-AH187 |
| 31 |
| 30995 |
| β-coronavirus | Sable antelope coronavirus US/OH1/2003 |
| 32 |
| 30979 |
| β-coronavirus | Giraffe coronavirus US/OH3-TC/2006 |
| 33 |
| 31002 |
| β-coronavirus | Giraffe coronavirus US/OH3/2003 |
| 34 |
| 30979 |
| β-coronavirus | Calf-giraffe coronavirus US/OH3/2006 |
| 35 |
| 26476 |
| δ -coronavirus | Bulbul coronavirus HKU11-796 |
| 36 |
| 31029 |
| β-coronavirus | Human enteric coronavirus 4408 |
| 37 |
| 30995 |
| β-coronavirus | Waterbuck coronavirus US/OH-WD358-TC/1994 |
| 38 |
| 30940 |
| β-coronavirus | Waterbuck coronavirus US/OH-WD358-GnC/1994 |
| 39 |
| 30962 |
| β-coronavirus | Waterbuck coronavirus US/OH-WD358/1994 |
| 40 |
| 31020 |
| β-coronavirus | White-tailed deer coronavirus US/OH-WD470/1994 |
| 41 |
| 30995 |
| β-coronavirus | Sambar deer coronavirus US/OH-WD388-TC/1994 |
| 42 |
| 30997 |
| β-coronavirus | Sambar deer coronavirus US/OH-WD388/1994 |
| 43 |
| 31285 |
| β-coronavirus | Murine coronavirus RA59/R13 |
| 44 |
| 31427 |
| β-coronavirus | Murine coronavirus RJHM/A |
| 45 |
| 31429 |
| β-coronavirus | Murine coronavirus RA59/SJHM |
| 46 |
| 31456 |
| β-coronavirus | Murine coronavirus repA59/RJHM |
| 47 |
| 31283 |
| β-coronavirus | Murine coronavirus SA59/RJHM |
| 48 |
| 31386 |
| β-coronavirus | Murine coronavirus MHV-1 |
| 49 |
| 31448 |
| β-coronavirus | Murine coronavirus MHV-3 |
| 50 |
| 31293 |
| β-coronavirus | Murine coronavirus inf-MHV-A59 |
| 51 |
| 31473 |
| β-coronavirus | Murine coronavirus MHV-JHM.IA |
| 52 |
| 31250 |
| β-coronavirus | Murine coronavirus repJHM/RA59 |
| 53 |
| 29682 |
| β-coronavirus | SARS coronavirus P2 |
| 54 |
| 31275 |
| β-coronavirus | Murine hepatitis virus strain A59 B11 variant |
| 55 |
| 29232 |
| α-coronavirus | Feline coronavirus RM |
| 56 |
| 29306 |
| α-coronavirus | Feline coronavirus UU11 |
| 57 |
| 29277 |
| α-coronavirus | Feline coronavirus UU7 |
| 58 |
| 29269 |
| α-coronavirus | Feline coronavirus UU4 |
| 59 |
| 29285 |
| α-coronavirus | Feline coronavirus UU8 |
| 60 |
| 29253 |
| α-coronavirus | Feline coronavirus UU5 |
| 61 |
| 29275 |
| α-coronavirus | Feline coronavirus UU15 |
| 62 |
| 28479 |
| α-coronavirus | Feline coronavirus UU16 |
| 63 |
| 29295 |
| α-coronavirus | Feline coronavirus UU10 |
| 64 |
| 29256 |
| α-coronavirus | Feline coronavirus UU2 |
| 65 |
| 29130 |
| α-coronavirus | Feline coronavirus UU3 |
| 66 |
| 29266 |
| α-coronavirus | Feline coronavirus UU9 |
| 67 |
| 31024 |
| β-coronavirus | Bovine coronavirus E-DB2-TC |
| 68 |
| 30995 |
| β-coronavirus | Bovine coronavirus E-AH187-TC |
| 69 |
| 30969 |
| β-coronavirus | Bovine respiratory coronavirus AH187 |
| 70 |
| 30953 |
| β-coronavirus | Bovine respiratory coronavirus bovine/US/OH-440-TC/1996 |
| 71 |
| 30953 |
| β-coronavirus | Human enteric coronavirus strain 4408 |
| 72 |
| 29695 | NA | β-coronavirus | Bat SARS coronavirus HKU3-9 |
| 73 |
| 29264 |
| α-coronavirus | Feline coronavirus UU22 isolate TCVSP-ROTTIER-00022 |
| 74 |
| 29264 |
| α-coronavirus | Feline coronavirus UU23 isolate TCVSP-ROTTIER-00023 |
| 75 |
| 29136 |
| β-coronavirus | Bat coronavirus HKU9-5-1 |
| 76 |
| 29112 |
| β-coronavirus | Bat coronavirus HKU9-5-2 |
| 77 |
| 29136 |
| β-coronavirus | Bat coronavirus HKU9-10-1 |
| 78 |
| 29122 |
| β-coronavirus | Bat coronavirus HKU9-10-2 |
| 79 |
| 28915 |
| α-coronavirus | Mink coronavirus strain WD1133 |
| 80 |
| 29233 |
| α-coronavirus | Feline coronavirus UU40 |
| 81 |
| 29255 |
| α-coronavirus | Feline coronavirus UU19 |
| 82 |
| 29252 |
| α-coronavirus | Feline coronavirus UU20 |
| 83 |
| 29233 |
| α-coronavirus | Feline coronavirus UU30 |
| 84 |
| 27673 |
| γ-coronavirus | Duck coronavirus isolate DK/CH/HN/ZZ2004 |
| 85 |
| 31286 |
| β-coronavirus | Rat coronavirus isolate 681 |
| 86 |
| 29243 |
| α-coronavirus | Feline coronavirus UU47 |
| 87 |
| 29222 |
| α-coronavirus | Feline coronavirus UU54 |
| 88 |
| 27374 |
| α-coronavirus | Alpaca respiratory coronavirus isolate CA08-1/2008 |
| 89 |
| 28483 |
| α-coronavirus | Hipposideros bat coronavirus HKU10 isolate TLC1343A |
| 90 |
| 31028 |
| β-coronavirus | Canine respiratory coronavirus strain K37 |
| 91 |
| 30119 |
| β-coronavirus | Human β-coronavirus 2c EMC/2012 |
| 92 |
| 29484 |
| β-coronavirus | Bat coronavirus Rp/Shaanxi2011 |
| 93 |
| 29452 |
| β-coronavirus | Bat coronavirus Cp/Yunnan2011 |
| 94 |
| 30112 |
| β-coronavirus | Human β-coronavirus 2c England-Qatar/2012 |
| 95 |
| 30030 |
| β-coronavirus | Human β-coronavirus 2c Jordan-N3/2012 |
| 96 |
| 29787 |
| β-coronavirus | Bat SARS-like coronavirus RsSHC014 |
| 97 |
| 29792 |
| β-coronavirus | Bat SARS-like coronavirus Rs3367 |
| 98 |
| 29676 |
| β-coronavirus | SARS-related bat coronavirus |
| 99 |
| 30309 |
| β-coronavirus | Bat SARS-like coronavirus WIV1 |
| 100 |
| 29805 |
| β-coronavirus | Rhinolophus affinis coronavirus isolate LYRa11 |
| 101 |
| 31052 |
| β-coronavirus | Dromedary camel coronavirus HKU23 strain HKU23-265F |
| 102 |
| 29037 |
| β-coronavirus | BtRf-BetaCoV/JL2012 |
| 103 |
| 29443 |
| β-coronavirus | BtRf-BetaCoV/HeB2013 |
| 104 |
| 29461 |
| β-coronavirus | BtRf-BetaCoV/SX2013 |
| 105 |
| 29658 |
| β-coronavirus | BtRs-BetaCoV/HuB2013 |
| 106 |
| 29161 |
| β-coronavirus | BtRs-BetaCoV/GX2013 |
| 107 |
| 29142 |
| β-coronavirus | BtRs-BetaCoV/YN2013 |
| 108 |
| 30423 |
| β-coronavirus | BtVs-BetaCoV/SC2013 |
| 109 |
| 25406 |
| δ -coronavirus | δ -coronavirus PDCoV/USA/Illinois121/2014 from USA |
| 110 |
| 25422 |
| δ -coronavirus | Porcine δ -coronavirus 8734/USA-IA/2014 |
| 111 |
| 25408 |
| δ -coronavirus | δ -coronavirus PDCoV/USA/Illinois133/2014 from USA |
| 112 |
| 25404 |
| δ -coronavirus | δ -coronavirus PDCoV/USA/Illinois134/2014 from USA |
| 113 |
| 25404 |
| δ -coronavirus | δ -coronavirus PDCoV/USA/Illinois136/2014 from USA |
| 114 |
| 25404 |
| δ -coronavirus | δ -coronavirus PDCoV/USA/Ohio137/2014 from USA |
| 115 |
| 25433 |
| δ -coronavirus | Swine δ -coronavirus OhioCVM1/2014 |
| 116 |
| 25422 |
| δ -coronavirus | Porcine δ -coronavirus KNU14-04 |
| 117 |
| 29723 |
| β-coronavirus | Bat SARS-like coronavirus YNLF_31C |
| 118 |
| 30290 |
| β-coronavirus | SARS-like coronavirus WIV16 |
| 119 |
| 29722 |
| β-coronavirus | UNVERIFIED: SARS-related coronavirus isolate F46 |
| 120 |
| 29274 |
| β-coronavirus | Severe acute respiratory syndrome-related coronavirus strain BtKY72 |
| 121 |
| 29725 |
| β-coronavirus | Bat SARS-like coronavirus isolate As6526 |
| 122 |
| 29741 |
| β-coronavirus | Bat SARS-like coronavirus isolate Rs4081 |
| 123 |
| 30311 |
| β-coronavirus | Bat SARS-like coronavirus isolate Rs4874 |
| 124 |
| 30307 |
| β-coronavirus | Bat SARS-like coronavirus isolate Rs7327 |
| 125 |
| 29802 |
| β-coronavirus | Bat SARS-like coronavirus isolate bat-SL-CoVZC45 |
| 126 |
| 29732 |
| β-coronavirus | Bat SARS-like coronavirus isolate bat-SL-CoVZXC21 |
| 127 |
| 29648 |
| β-coronavirus | Coronavirus BtRl-BetaCoV/SC2018 |
| 128 |
| 29698 |
| β-coronavirus | Coronavirus BtRs-BetaCoV/YN2018A |
| 129 |
| 30256 |
| β-coronavirus | Coronavirus BtRs-BetaCoV/YN2018B |
| 130 |
| 29689 |
| β-coronavirus | Coronavirus BtRs-BetaCoV/YN2018C |
| 131 |
| 30213 |
| β-coronavirus | Coronavirus BtRs-BetaCoV/YN2018D |
| 132 |
| 29903 |
| β-coronavirus | Severe acute respiratory syndrome coronavirus 2 (Wuhan) |
| 133 |
| 29855 |
| β-coronavirus | Bat coronavirus RaTG13 |
| 134 |
| 29521 |
| β-coronavirus | Pangolin coronavirus isolate MP789 |
| 135 |
| 29926 |
| β-coronavirus | Human coronavirus HKU1 |
| 136 |
| 30286 |
| β-coronavirus | Bat coronavirus HKU4-1 |
| 137 |
| 30482 |
| β-coronavirus | Bat coronavirus HKU5-1 |
| 138 |
| 29114 |
| β-coronavirus | Bat coronavirus HKU9-1 |
| 139 |
| 31686 |
| γ-coronavirus | Beluga Whale coronavirus SW1 |
| 140 |
| 27657 |
| γ-coronavirus | Turkey coronavirus |
| 141 |
| 26487 |
| δ -coronavirus | Bulbul coronavirus HKU11-934 |
| 142 |
| 26396 |
| δ -coronavirus | Thrush coronavirus HKU12-600 |
| 143 |
| 26552 |
| δ -coronavirus | Munia coronavirus HKU13-3514 |
| 144 |
| 31250 |
| β-coronavirus | Rat coronavirus Parker |
| 145 |
| 29276 |
| Unknown | Bat coronavirus BM48-31/BGR/2008 |
| 146 |
| 26083 |
| δ -coronavirus | Sparrow coronavirus HKU17 |
| 147 |
| 26689 |
| δ -coronavirus | Magpie-robin coronavirus HKU18 |
| 148 |
| 26077 |
| δ -coronavirus | Night-heron coronavirus HKU19 |
| 149 |
| 26227 |
| δ -coronavirus | Wigeon coronavirus HKU20 |
| 150 |
| 26223 |
| δ -coronavirus | Common-moorhen coronavirus HKU21 |
| 151 |
| 31100 |
| β-coronavirus | Rabbit coronavirus HKU14 |
| 152 |
| 28494 |
| α-coronavirus | Rousettus bat coronavirus HKU10 |
| 153 |
| 30119 |
| β-coronavirus | Middle East respiratory syndrome coronavirus |
| 154 |
| 28941 |
| α-coronavirus | Mink coronavirus strain WD1127 |
| 155 |
| 31491 |
| β-coronavirus | Bat Hp- β-coronavirus/Zhejiang2013 |
| 156 |
| 31249 |
| β-coronavirus | β-coronavirus HKU24 strain HKU24-R05005I |
| 157 |
| 27395 |
| α-coronavirus | Camel α-coronavirus isolate camel/Riyadh/Ry141/2015 |
| 158 |
| 28111 |
| α-coronavirus | Swine enteric coronavirus strain Italy/213306/2009 |
| 159 |
| 27935 |
| α-coronavirus | BtMr-AlphaCoV/SAX2011 |
| 160 |
| 27608 |
| α-coronavirus | BtRf-AlphaCoV/HuB2013 |
| 161 |
| 26975 |
| α-coronavirus | BtRf-AlphaCoV/YN2012 |
| 162 |
| 27783 |
| α-coronavirus | BtNv-AlphaCoV/SC2013 |
| 163 |
| 28434 |
| α-coronavirus | Ferret coronavirus isolate FRCoV-NL-2010 |
| 164 |
| 30161 |
| β-coronavirus | Rousettus bat coronavirus isolate GCCDC1 356 |
| 165 |
| 28363 |
| α-coronavirus | NL63-related bat coronavirus strain BtKYNL63-9a |
| 166 |
| 28763 |
| α-coronavirus | Lucheng Rn rat coronavirus |
| 167 |
| 29642 |
| Unknown | Bat coronavirus isolate PREDICT/PDF-2180 |
| 168 |
| 27682 |
| α-coronavirus | Coronavirus AcCoV-JC34 |
| 169 |
| 25995 |
| α-coronavirus | Wencheng Sm shrew coronavirus |
| 170 |
| 30111 |
| β-coronavirus | β-coronavirus England 1 |
| 171 |
| 28586 |
| α-coronavirus | Transmissible gastroenteritis virus genomic RNA |
| 172 |
| 30148 |
| β-coronavirus | β-coronavirus Erinaceus/VMC/DEU/2012 |
| 173 |
| 25425 |
| δ -coronavirus | Porcine coronavirus HKU15 strain HKU15-155 |
Size of coronavirus structural proteins (length of sequence).
The Tukey Five-Number Summaries are the maximum and minimum values, the lower and upper quartiles, and the median of the data set.
| E protein | M protein | N protein | S protein | |
|---|---|---|---|---|
| Tukey Five-Number Summaries | [65, 76, 82, 83, 109] | [185, 221, 222, 230, 268] | [342, 379, 421, 448, 470] | [1126, 1241, 1324, 1363, 1472] |
| Mean (SD) | 80.7 (4.8) | 229.9 (15.8) | 413.5 (35.5) | 1308.7 (91.2) |
Structural protein distances.
Correlation between matrices of distances. All values are statistically significant after performing Mantel’s test (P < 0.01).
| E | M | N | S | |
|---|---|---|---|---|
| E | – | 0.83 | 0.82 | 0.74 |
| M | – | – | 0.87 | 0.80 |
| N | – | – | – | 0.79 |
| S | – | – | – | – |
Figure 1Proximity network spanning over 173 samples from the group Coronavirinae of viruses.
The four distinct CoV genera can be easily segregated after removing the unique between-genera links, and are highlighted through a gray halo. Nodes have been colored by clade membership of host in which virus was isolated. SARS-CoV-2 and adjacent nodes have been tagged with the respective host icon. Human silhouette was also added to all those viruses infecting humans. Note the overall co-structure between viral proteome distance and phylogenetic distance of respective hosts, leading to a broad agreement between connected clusters of CoV genera and host clades. Additional information about nodes of the network are available in Table 1. Silhouette images were freely obtained from http://phylopic.org/.
Figure 2Graphical representation of hosts associated with the endpoints of links in the proximity network of Fig. 1.
To the left, phylogenetic tree of involved hosts. To the right, links/edges of proximity network represented as parabolic arcs bridging the hosts associated with endpoints of such links. The height of arcs correspond to the distance between nodes/virus connected by the respective link, so that flat arcs represent links between similar viruses whereas bumpy arcs join dissimilar ones. All taxa from the main clades (highlighted through transparent rectangles) retrieve always a between-clade patristic distance larger than unity (>1.0). Silhouette images were freely obtained from http://phylopic.org/.
Figure 3Expected phylogenetic distance between hosts under random scenarios of host allocation on the same proximity network represented in Fig. 1.
Quantiles of viral distance are plotted against quantiles of phylogenetic distance between hosts. Dotted polyline, the observed distribution of values. Solid polyline, values obtained after randomization. The 95% confidence interval is drawn around this last line. Departure of observed values from randomness indicates that hosts of viruses directly connected in the proximity network tends to be closely related.
Figure 4Computational experiments of chimera compositions.
Didactic introduction to concepts (A–B) in addition to results from such experiments (C) applied on our real data. (A) Three hypothetical tetrads of structural proteins coming from three different viruses. The distance between them are indicated (normalized values to the maximum between brackets). Here, the distance denotes the amount of differences in the attributes of letters used to label the protein (upper/lowercase; normal/italics). (B) Showing all the possible combinations of proteins from the above hypothetical viral sources. Heterotopic disaffinity between pairs of distinct proteins is inferred from the distance between proteins of the same kind of the viral precursors. For any assembly, the degree of chimerality is the average heterotopic disaffinity.