| Literature DB >> 23717598 |
Chenglong Yu1, Troy Hernandez, Hui Zheng, Shek-Chung Yau, Hsin-Hsiung Huang, Rong Lucy He, Jie Yang, Stephen S-T Yau.
Abstract
The International Committee on Taxonomy of Viruses authorizes and organizes the taxonomic classification of viruses. Thus far, the detailed classifications for all viruses are neither complete nor free from dispute. For example, the current missing label rates in GenBank are 12.1% for family label and 30.0% for genus label. Using the proposed Natural Vector representation, all 2,044 single-segment referenced viral genomes in GenBank can be embedded in [Formula: see text]. Unlike other approaches, this allows us to determine phylogenetic relations for all viruses at any level (e.g., Baltimore class, family, subfamily, genus, and species) in real time. Additionally, the proposed graphical representation for virus phylogeny provides a visualization of the distribution of viruses in [Formula: see text]. Unlike the commonly used tree visualization methods which suffer from uniqueness and existence problems, our representation always exists and is unique. This approach is successfully used to predict and correct viral classification information, as well as to identify viral origins; e.g. a recent public health threat, the West Nile virus, is closer to the Japanese encephalitis antigenic complex based on our visualization. Based on cross-validation results, the accuracy rates of our predictions are as high as 98.2% for Baltimore class labels, 96.6% for family labels, 99.7% for subfamily labels and 97.2% for genus labels.Entities:
Mesh:
Year: 2013 PMID: 23717598 PMCID: PMC3661469 DOI: 10.1371/journal.pone.0064328
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The dataset and statistical results of our study.
| Baltimore class | I | II | III | IV | V | VI | VII | Satellite | <NA> | |
| Name | dsDNA | ssDNA | dsRNA | ssRNA(+) | ssRNA(−) | ssRNA(RT) | dsDNA(RT) | |||
| Linear number | 599 | 56 | 45 | 563 | 66 | 58 | 0 | 33 | 19 | |
| Circular number | 177 | 272 | 0 | 0 | 1 | 0 | 44 | 103 | 8 | |
| Total Number | 776 | 328 | 45 | 563 | 67 | 58 | 44 | 136 | 27 | |
| Checking Baltimore classification by NV | Inconsistencies | 4 | 14 | 5 | 21 | 2 | 7 | 1 | NA | NA |
| Inconsistency Rate | 0.01 | 0.04 | 0.11 | 0.04 | 0.03 | 0.12 | 0.02 | NA | NA | |
| Checking Family classification by NV | Inconsistencies | 58 | 0 | 0 | 11 | 0 | 0 | 0 | NA | NA |
| Inconsistency Rate | 0.08 | 0 | 0 | 0.02 | 0 | 0 | 0 | NA | NA | |
(1) The corrected Baltimore classification information of the 2,044 single-segmented referenced viruses. (2) The Baltimore classification prediction information for the 2,044 viruses. (3) The family classification prediction information given Baltimore class information.
The distance matrix of 10 elements.
| A | B | C | D | E | F | G | H | I | J | |
| A | 0 | |||||||||
| B | 9 | 0 | ||||||||
| C | 13 | 4 | 0 | |||||||
| D | 23 | 21 | 23 | 0 | ||||||
| E | 27 | 34 | 38 | 30 | 0 | |||||
| F | 26 | 36 | 39 | 39 | 12 | 0 | ||||
| G | 18 | 26 | 30 | 25 | 12 | 16 | 0 | |||
| H | 19 | 8 | 9 | 18 | 34 | 25 | 25 | 0 | ||
| I | 20 | 14 | 11 | 30 | 43 | 44 | 35 | 12 | 0 | |
| J | 28 | 21 | 20 | 18 | 20 | 47 | 37 | 17 | 20 | 0 |
Figure 1The graph construction process of a distance matrix shown in Table 2.
(A) From each element draw a directed line(s) to its closest element(s). (B) Combine the connected elements in (A) using directed lines, resulting in two connected graphs, graph 1 and graph 2. (C) The final graphical representation is obtained by connecting element A in graph1 and element G in graph2, based on the distance matrix in Table 2.
The distance matrix of 2 graphs obtained from Figure 1.
| Graph 1 | Graph 2 | |
| Graph 1 | 0 | |
| Graph 2 | 18 | 0 |
Figure 2The natural graphical representation for the 44 single-segment referenced viruses of Baltimore VII.
Figure 3The natural graphical representation for the 45 single-segment referenced viruses of Baltimore III.
Figure 4The natural graphical representation for the 67 single-segment referenced viruses of Baltimore V.
Figure 5The natural graphical representation for 53 viruses of the Flaviviridae family.