| Literature DB >> 34069049 |
Phuoc Truong Nguyen1,2, Santiago Garcia-Vallvé3, Pere Puigbò1,4,5.
Abstract
Early characterization of emerging viruses is essential to control their spread, such as the Zika Virus outbreak in 2014. Among other non-viral factors, host information is essential for the surveillance and control of virus spread. Flaviviruses (genus Flavivirus), akin to other viruses, are modulated by high mutation rates and selective forces to adapt their codon usage to that of their hosts. However, a major challenge is the identification of potential hosts for novel viruses. Usually, potential hosts of emerging zoonotic viruses are identified after several confirmed cases. This is inefficient for deterring future outbreaks. In this paper, we introduce an algorithm to identify the host range of a virus from its raw genome sequences. The proposed strategy relies on comparing codon usage frequencies across viruses and hosts, by means of a normalized Codon Adaptation Index (CAI). We have tested our algorithm on 94 flaviviruses and 16 potential hosts. This novel method is able to distinguish between arthropod and vertebrate hosts for several flaviviruses with high values of accuracy (virus group 91.9% and host type 86.1%) and specificity (virus group 94.9% and host type 79.6%), in comparison to empirical observations. Overall, this algorithm may be useful as a complementary tool to current phylogenetic methods in monitoring current and future viral outbreaks by understanding host-virus relationships.Entities:
Keywords: algorithm; codon adaptation index; flavivirus; host identification
Year: 2021 PMID: 34069049 PMCID: PMC8157105 DOI: 10.3390/life11050442
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Figure 1Scheme of the algorithm used to calculate the normalized codon adaptation index (nCAI). (a) Pipeline to identify putative hosts based on nCAI values. The complete coding sequences of hosts and viruses are used to compute nCAI values, which are put into a table. These values are then subjected to correspondence analysis to identify optimal hosts and, thus, the likelihood of a virus infecting an organism. (b) Algorithm to calculate nCAI. The CAI values for possible hosts and viruses of interest are computed from the complete coding sequences (CDSs) and the codon usage tables, which are calculated from the same sequences. The CAI values of the host (CAIh) are calculated from virus CDSs and host codon usages, and the CAI values of the viruses (CAIs) are computed using virus CDSs and the codon usage values of the viruses themselves. The resulting CAI values are then normalized by dividing each CAIh by its respective CAIs.
Figure 2Correspondence analysis of the normalized codon adaptation index (nCAI) values of flaviviruses (genus Flavivirus; n = 94). The plot shows that nCAI can differentiate multiple subgroups of flaviviruses based on their degree of codon usage optimization relative to their host organisms. Mosquito-borne flaviviruses are generally optimized for vertebrate hosts, while tick-borne flaviviruses are optimized for ticks, and insect-only flaviviruses are optimized for mosquitoes. Dual-host insect-only flaviviruses show optimization for both mosquitoes and vertebrates, and unknown vector flaviviruses are also optimized for vertebrates. Dimension 1 explains 89.4% of the variation, and Dimension 2 explains 8.5% of the variation. MBFV: mosquito-borne flaviviruses, TBFV: tick-borne flaviviruses, IOFV: insect-only flaviviruses and UVFV: unknown vector flaviviruses.
Values of %GC3 and nCAI (Mean ± Standard deviation) by flavivirus and host groups.
| Tick | Aedes | Anopheles | Culex | Mammals | Other | |||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| ||
| - | ±1.8% | - | - | ±2.5% | ±4.5% | |||
|
|
|
|
|
|
|
|
| |
| ±2.3% | ±0.048 | ±0.053 | ±0.051 | ±0.049 | ±0.062 | ±0.049 | ||
|
|
|
|
|
|
|
|
| |
| ±3.8% | ±0.046 | ±0.044 | ±0.042 | ±0.047 | 0.051 | ±0.037 | ||
|
|
|
|
|
|
|
|
| |
| ±3.7% | ±0.033 | ±0.033 | ±0.029 | ±0.032 | ±0.051 | ±0.035 | ||
|
|
|
|
|
|
|
|
| |
| ±1.8% | ±0.021 | ±0.024 | ±0.020 | ±0.020 | ±0.043 | ±0.028 | ||
|
|
|
|
|
|
|
|
| |
| ±3.3% | ±0.012 | ±0.028 | ±0.010 | ±0.010 | ±0.053 | ±0.038 | ||
|
|
|
|
|
|
|
|
| |
| ±3.8% | ±0.046 | ±0.044 | ±0.042 | ±0.047 | ±0.051 | ±0.037 | ||
|
|
|
|
|
|
|
|
| |
| ±1.8% | ±0.021 | ±0.024 | ±0.020 | ±0.020 | ±0.043 | ±0.028 | ||
|
|
|
|
|
|
|
|
| |
| ±4.3% | ±0.038 | ±0.035 | ±0.033 | ±0.035 | ±0.053 | ±0.037 | ||
Host types: Tick (Ixodes scapularis); Aedes (Aedes albopictus and Aedes aegypti); Anopheles (Anopheles gambiae); Culex (Culex quinquefasciatus); Mammals (Homo sapiens, Bos taurus, Sus scrofa, Mus musculus, Myotis davidii and Myotis brandtii); and Other Vertebrates (Alligator mississippiensis, Xenopus laevis, Anas platyrhynchos, Gallus gallus and Columba livia). Complete list of flaviviruses is available in Supplementary Table S1. nCAI = CAIh/CAIs; nCAI: normalized Codon Adaptation Index; CAIh: Codon Adaptation Index calculated using host codon usage as a reference; CAIs: Codon Adaptation Index calculated with virus codon usage as a reference. %GC3: Percentage of guanine and cytosine at the third codon position. MBFV: mosquito-borne flaviviruses, TBFV: tick-borne flaviviruses, IOFV: insect-only flaviviruses and UVFV: unknown vector flaviviruses.
Values of specificity and accuracy between nCAI predictions and empirical observations from Supplementary Table S1.
| Virus | dhIOFV | IOFV | MBFV | TBFV | UVFV | Host | Mosquito | Tick | Vertebrate | |
|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy 1 | 91.9% | 94.7% | 96.8% | 81.9% | 95.7% | 90.4% | 86.1% | 78.8% | 86.7% | 92.9% |
| Specificity 2 | 94.9% | 94.4% | 100.0% | 95.6% | 94.6% | 90.9% | 79.6% | 75.3% | 84.0% | 79.5% |
1 (TP + TN)/(TP + FN + FP + TN); 2 TN/(TN + FP).