| Literature DB >> 34172731 |
Maya Wardeh1,2, Marcus S C Blagrove3, Kieran J Sharkey4, Matthew Baylis5,6.
Abstract
Our knowledge of viral host ranges remains limited. Completing this picture by identifying unknown hosts of known viruses is an important research aim that can help identify and mitigate zoonotic and animal-disease risks, such as spill-over from animal reservoirs into human populations. To address this knowledge-gap we apply a divide-and-conquer approach which separates viral, mammalian and network features into three unique perspectives, each predicting associations independently to enhance predictive power. Our approach predicts over 20,000 unknown associations between known viruses and susceptible mammalian species, suggesting that current knowledge underestimates the number of associations in wild and semi-domesticated mammals by a factor of 4.3, and the average potential mammalian host-range of viruses by a factor of 3.2. In particular, our results highlight a significant knowledge gap in the wild reservoirs of important zoonotic and domesticated mammals' viruses: specifically, lyssaviruses, bornaviruses and rotaviruses.Entities:
Mesh:
Year: 2021 PMID: 34172731 PMCID: PMC8233343 DOI: 10.1038/s41467-021-24085-w
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Example showcasing final and intermediate predictions of West Nile Virus (WNV), and Rousettus leschenaultii.
Panel A Top 60 predicted mammalian species susceptible to WNV. Mammals were ordered by mean probability of predictions derived from mammalian (all models), viral (WNV models) and network perspectives, and top 60 were selected. Circles represent the following information in order: 1) whether the association is known (documented in our sources) or not (potential or undocumented). Hosts are omitted for known associations. 2) Mean probability of the three perspectives (per association). 3) Median mammalian perspective probabilities of predicted associations. These probabilities are obtained from 3000 models (50 replicate models for each mammal), trained with viral features – SMOTE class balancing. 4) Median viral perspective probabilities of predicted associations (50 WNV replicate models trained with mammalian features – SMOTE class balancing). 5) Median network perspective probabilities of predicted associations (100 replicate models, balanced under-sampling). 6) Taxonomic order of predicted susceptible species. Orders are shortened as follows: Artiodactyla (Art), Carnivora (Crn), Chiroptera (Chp), primates (Prm), Rodentia (Rod), and Others (Oth). Panel B Top 50 predicted viruses of R. leschenaultii. Viruses were ordered by mean probability of predictions derived from mammalian (R. leschenaultii models), viral (all models) and network perspectives. Circles as per Panel A. Baltimore represents Baltimore classification. Panel C Median probability of predicted WNV-mammal associations in each of the three perspectives per mammalian order. Points represent susceptible species predicted by voting (at least two of the three perspectives – n = 137). Median ensemble probability is computed in each perspective (50 replicate models for each virus/mammal, 100 replicate network models). Predictions derived from each perspective at 0.5 probability cut-off. Supplementary Data 1 presents full WNV results. Panel D Median probability of virus-R. leschenaultii associations in the three perspectives per Baltimore group. Points represent susceptible species predicted by voting (at least two of the three perspectives – n = 64), predictions are derived as per panel C. Supplementary Data 2 lists full results for R. leschenaultii. Supplementary Fig. 7 illustrate the results when research effort into viruses and mammals is included in mammalian and viral perspectives, respectively.
Viral traits & features used to build our mammalian models.
| Category | Viral Feature | Data type | Reason for inclusion |
|---|---|---|---|
| Host-driven | Mean phylogenetic distance between hosts | Continuous | Capturing phylogenetic and ecological distances between each virus’ known hosts and each mammal in our study. |
| Mean ecological distance between hosts | |||
| Maximum phylogenetic breadth[ | Greater phylogenetic breadth indicates more generalist potential of the virus. | ||
| Virus genome & capsid | RNA | Binary | RNA viruses mutate/adapt faster[ |
| Retro-transcribing | Retroviruses are generally very conserved[ | ||
| Negative sense/positive sense | Sense affects replication cycle and range of host enzymes needed. | ||
| Circular/linear | Circular/linear genome affects enlisting host enzymes for replication and translation[ | ||
| Monopartite/segmented | Segmented viruses can undergo recombination if two strains of the same virus infect a cell[ | ||
| Enveloped | Envelopes are derived from the host cell membrane, so can affect specific-host immune activation. Enveloped viruses deactivate rapidly in the external environment (often requiring direct transfer). The envelope will change upon infection of a new host[ | ||
| GC-content | Continuous | High GC content usually leads to higher thermo-stability of the genome[ | |
| Genome size | Genome size is indicative of many aspects of the virus such as complexity, DNA/RNA, and replication type. Replication site is linked to RNA/DNA genome – if a virus has a DNA stage it must replicate in the nucleus and overcome additional cell barriers. | ||
| Virus replication, release, and cell entry | Cytoplasm | Binary | |
| Release | Categorical | Affects rate of virus production, cell life-span and means of presentation to the immune system[ | |
| Cell entry | Availability of receptors influences potential host range. | ||
| Transmission routes | 8 main transmission routes | Binary for each route | Route(s) of transmission affected by structure/stability of virus and nature of interaction between potential hosts. |
We trained a suite of models for each mammalian species with two or more known viruses (n = 699). Each model comprised the below described features (response variable = 1 if the virus is known to associate with the focal mammalian species, 0 otherwise – methods section provides further details). Full description of these features, their sources and justification are listed in Supplementary Note 2.
mammalian traits & features used to build our viral models.
| Category | Mammalian feature | Reason for inclusion |
|---|---|---|
| Phylogeny | Mean phylogenetic distance to known hosts. | Linked to sharing of viruses between mammals[ |
| Evolutionary distinctiveness | Can correlate negatively with pathogen species richness[ | |
| Taxonomy & domestication | Order & family | Can affect host-pathogen[ |
| Domestication | Might influence sharing of viruses between host groups. Domesticated mammals and human might share more viruses with each other than related wild species. | |
| Ecological traits | Morphological traits (body mass) | A key feature in terms of metabolism and adaption to environment. |
| Life-history traits (Maximum age, age at sexual maturity, activity cycle, and migration) | Potentially relevant in terms of within-host dynamics of viruses. | |
| Reproductive traits (gestation period length, litters per year, litter size and weaning age) | ||
| Habitat utilisation | Similar habitat utilisation might correlate with contact with similar viruses. | |
| Diet (proportional use of 10 categories) | Similar dietary habit might associate with similar viral assemblage. | |
| Mean ecological distance | Indicates if a potential host species is | |
| Geo-spatial features | Geographical range (area size) | Might lead to exposure to larger number or more diverse viruses. |
| Climate (mean temperature & precipitation) | Climate has been shown to influence a number of human and domestic mammal pathogens[ | |
Natural land cover diversity/Agriculture and farming diversity Mammalian biodiversity Urbanisation/human population | These factors have been found to influence certain categories of host-pathogen associations[ |
We trained a suite of models for each virus species with two or more known mammalian hosts (n = 556). Each model comprised the below described features (response variable = 1 if the mammal is known to associate with the focal virus species, 0 otherwise – methods section provides further details). Full description of these features, their sources and justification are listed in Supplementary Note 3.
Fig. 2Results (viruses).
Panel A Variable importance (relative contribution) of viral traits to mammalian perspective models. Variable importance is calculated for each constituent ensemble (n = 699) of our mammalian perspective (median of a suite of 50 replicate models, trained with viral features, with SMOTE sampling), and then aggregated (mean) per each reported group (columns). Panel B – Number of known and new mammalian species associated with each virus. Rabies lyssavirus was excluded from panel B to allow for better visualisation. Top 40 (by number of new hosts) are labelled. Species in bold have over 150 predicted hosts (Supplementary Data 3 lists details of these viruses including CI). Panel C Predicted number of viruses per species of wild and semi-domesticated mammals (group by mammalian order). Following orders (clockwise) are presented: Artiodactyla, Carnivora, Chiroptera, Perissodactyla, Primates, and Rodentia. Source of the silhouette graphics is PhyloPic.org. (Supplementary Data 4 lists aggregated results per mammalian order). Circles represent each mammalian species (with predicted viruses > 0), coloured by number of known viruses previously not associated with this species. Boxplots indicate median (centre), the 25th and 75th percentiles (bounds of box) and inter quantile range (whiskers) and are aggregated at the order level. Large red circles with error bars (90% CI) illustrate the median number of known viruses per species in each order. Number of species presented (n) is as follows: All = 1293 (Artiodactyla = 104, Carnivora = 177, Chiroptera = 548, Perissodactyla = 11, Primates = 171, and Rodentia = 282); Group I = 666 (94, 109, 156, 10, 160, 137); Group II = 371 (32, 120, 111, 1, 54, 53); Group III = 410 (87,62,123,9,51,78); Group IV = 739 (98, 102, 221, 9, 148, 161); Group V = 1129 (87, 173, 528, 8, 107, 226); Group VI = 358 (55, 64, 30, 6, 139, 64); and Group VII = 110 (3,2,53,1,43,8). Supplementary Fig. 8 presents results derived with research effort into mammalian hosts and viruses included in the constituent models trained in the viral and mammalian perspectives, respectively.
Predicted range of susceptible mammalian species of viruses per Baltimore group, family (top 15 families, ranked by fold increase) and transmission route.
| Baltimore classification | Family | |||
|---|---|---|---|---|
| Predicted range (~fold increase) | Predicted range (~fold increase) | |||
| Group I (dsDNA) | 8.63 [3, 30.43] (~2.59 [~1.16, ~6.94]) | Bornaviridae | V | 71 [15.5, 293.25] (~9.51 [~2.08, ~42.22]) |
| Group II (ssDNA) | 5.47 [2.19, 24.88] (~2.04 [~1.07, ~6.56]) | Orthomyxoviridae | V | 60.25 [15, 196.5] (~7.76 [~1.62, ~27.19]) |
| Group III (dsRNA) | 27.15 [7.96, 93.11] (~4.04 [~1.41, ~11.94]) | Rhabdoviridae | V | 52.8 [23.68, 149.09] (~7.33 [~1.81, ~24.03]) |
| Group IV ((+)ssRNA) | 17.64 [5.34, 65.29] (~3.49 [~1.26, ~10.9]) | Hepeviridae | IV | 70.67 [25.33, 220] (~6.67 [~2.49, ~15.54]) |
| Group V ((−)ssRNA) | 24.91 [8.36, 100.53] (~4.44 [~1.39, ~18.08]) | Filoviridae | V | 31.75 [7, 155.62] (~5.77 [~1.3, ~25.37]) |
| Group VI (ssRNA-RT) | 26.68 [10.26, 94.58] (~4.99 [~1.54, ~15.36]) | Togaviridae | IV | 48.5 [12.45, 161.65] (~5.71 [~1.52, ~16.95]) |
| Group V (dsDNA-RT) | 19.29 [7.29, 109.43] (~2.53 [~1.35, ~14.55]) | Flaviviridae | IV | 40.59 [11.26, 131.77] (~5.09 [~1.37, ~16.14]) |
| Retroviridae | VI | 26.68 [10.26, 94.58] (~4.99 [~1.54, ~15.36]) | ||
| Transmission route | Coronaviridae | IV | 22.86 [6.23, 94.89] (~4.81 [~1.44, ~17.85]) | |
| Direct | 14.67 [5.07, 55] (~3.29 [~1.25, ~10.49]) | Poxviridae | I | 23.22 [7.76, 88.76] (~4.77 [~1.39, ~15.74]) |
| Direct sexual | 18.26 [6.05, 60.05] (~3.18 [~1.27, ~9.17]) | Reoviridae | III | 32.5 [9.39, 111.21] (~4.56 [~1.49, ~13.71]) |
| Direct vertical | 20.26 [6.81, 68.79] (~3.44 [~1.31, ~10.48]) | Paramyxoviridae | V | 26.28 [9.39, 98.79] (~4.46 [~1.61, ~14.4]) |
| Indirect | 20.02 [7.35, 71.38] (~3.41 [~1.27, ~11.47]) | Phenuiviridae | V | 26.94 [6.35, 124.18] (~4.25 [~1.23, ~20.34]) |
| Ingestion | 10.7 [4.11, 39.46] (~2.52 [~1.15, ~7.69]) | Peribunyaviridae | V | 20.09 [5.85, 90.45] (~4.15 [~1.39, ~19.03]) |
| Inhalation | 14.53 [4.6, 59.87] (~3.29 [~1.24, ~11.73]) | Hantaviridae | V | 15.61 [4.83, 77.59] (~3.61 [~1.23, ~17.13]) |
| Environmental | 20.32 [6.25, 82.58] (~3.81 [~1.29, ~14.12] | Picornaviridae | IV | 13.62 [4.45, 52.93] (~3.55 [~1.32, ~10.7]) |
| Vector | 30.1 [8.38, 117.44] (~4.73 [~1.42, ~18.26]) | Pneumoviridae | V | 28.89 [8.67, 107.56] (~3.47 [~1.18, ~12.78]) |
Results with research effort included into our mammalian and viral perspective models are reported in Supplementary Table 6.
Fig. 3Results (Mammals).
Panel A Variable importance (relative contribution) of mammalian traits to viral perspective models. Variable importance is calculated for each constituent model (n = 556) of our viral perspective (trained with mammalian features), and then aggregated (median) per each reported group (columns). Panel B Number of known and new viruses associated with each mammal. Labelled mammals are as follows: top 4 (by number of new viruses) for each of Artiodactyla, Carnivora, Chiroptera, Primates, Rodentia, and other orders. Species in bold have 100 or more predicted viruses (Supplementary Data 5). Panel C Top 18 genera (by number of predicted wild or semi-domesticated mammalian host species) in selected orders (Other indicated results for all orders not included in the first five circles). Each order figure comprises the following circles (from outside to inside): 1) Number of hosts predicted to have an association with viruses within the viral genus. 2) Number of hosts detected to have association. 3) Number of hosts predicted to harbour viral zoonoses (i.e. known or predicted to share at least one virus species with humans). 4) Number of hosts predicted to share viruses with domesticated mammals of economic significance (domesticated mammals in orders: Artiodactyla, Carnivora, Lagomorpha and Perissodactyla). 5) Baltimore classification of the selected genera (Supplementary Data 6). Supplementary Fig. 9 presents results derived with research effort into mammalian hosts and viruses included in the constituent models trained in the viral and mammalian perspectives, respectively.
Predicted number of viruses per top 15 orders by fold increase in number of viruses predicted in wild or semi-domesticated mammalian hosts (per species).
| Order/sub-order | Included species | Fold increase/species | Virus range/species | New viruses/species |
|---|---|---|---|---|
| 6 | ~12 [~1.75, ~30.45] | 31.5 [4, 125] | 29 [1.5, 122.5] | |
| 99 | ~9.12 [~1.69, ~23.53] | 41.45 [10.92, 126.55] | 36 [5.61, 121.07] | |
| 172 | ~7.12 [~1.48, ~20.88] | 34.46 [10.24, 114.62] | 27.77 [3.86, 107.89] | |
| 13 | ~7.12 [~0.88, ~28.01] | 21.83 [3.33, 112.83] | 18.92 [1.33, 109.83] | |
| 14 | ~5.74 [~1.67, ~17.78] | 25.18 [8, 97.09] | 20.64 [3.73, 92.55] | |
| 1 | ~5.67 [~1.67, ~27] | 17 [5, 107] | 14 [2, 104] | |
| 13 | ~5.27 [~1.9, ~16.43] | 18 [5.25, 74.83] | 15.08 [2.42, 71.92] | |
| 287 | ~4.79 [~1.18, ~18.72] | 15.22 [3.54, 74.84] | 12.65 [1.26, 72.22] | |
| 180 | ~3.91 [~1.23, ~14.84] | 18.3 [5.99, 78.38] | 14.11 [1.94, 74.15] | |
| 2 | ~3.00 [~1.00, ~10.5] | 3 [1, 20] | 2 [0, 19] | |
| 548 | ~2.79 [~1.08, ~9.5] | 9.51 [3.11, 41.66] | 7.11 [0.84, 39.24] | |
| 3 | ~2.15 [~0.73, ~10.34] | 11 [4.33, 55.33] | 5.67 [1, 50] | |
| 5 | ~2.13 [~0.93, ~8.57] | 4.6 [1.6, 27] | 3 [0, 25.2] | |
| 44 | ~2.04 [~0.86, ~13.78] | 7.14 [2.45, 55] | 4.73 [0.36, 52.45] | |
| 5 | ~1.84 [~0.6, ~17.98] | 7.8 [3.8, 68.4] | 3.4 [0, 63.8] |
Results are ordered by descending fold increase. Values are derived per species and averaged per order. Results with research effort included into our mammalian and viral perspective models are reported in Supplementary Table 7.
Fig. 4The network perspective - potential motifs (subgraphs) in our virus-host bipartite network.
A The concept of potential motif. The association TBEV-P. leo is a forced insertion into the network prior to calculating motifs for the association. B Motifs space: networks represent 2 steps and 3 steps ego networks (union) of host (here P. leo) and virus (TBEV). 1, 2 and 3 step ego networks comprise the counting space for TBEV-P. leo potential motifs. Dark grey nodes represent viruses, light grey nodes represent hosts. Size of nodes is adjusted to represent overall number of hosts or viruses with known associations to the node. Red edges represent nodes reachable from the mammal (P. leo) in 1 or 2 steps (links). Blue edges represent nodes reachable from the virus (TBEV) with 1 or 2 steps (links). Humans and rabies virus were excluded from these networks. C 3, 4 and 5-node potential motifs in our virus-host bipartite network. Circles represent viruses and squares represent mammals. Red circles represent the focal virus (v), and blue squares represent the focal mammal (m) of the association v-m for which the motifs are being counted (dashed yellow line). This association has two states: either already known (documented in EID2), or unknown. Grey lines illustrate existing associations in our network. D Motifs counts. Heatmap illustrating distribution of motif-features (counts of potential motifs per each focal association) in our bipartite network, grouped by mammalian order and Baltimore classification. The counts are logged to allow for better visualisation. E Variable importance (relative contribution) of motif-features (variables) to our network perspective models (SVM-RW). Motifs (subgraphs) are coloured by the number of nodes (K = 3, 4, 5). Boxplots indicate median (centre), the 25th and 75th percentiles (bounds of box) and inter quantile range (whiskers). Points represent variable importance in individual runs (n = 100). Research effort into both viruses and mammals is included as independent variables in our network models (coloured in yellow).