| Literature DB >> 22022592 |
Chengguo Wei1, Guoqing Wang, Xin Chen, Honglan Huang, Bin Liu, Ying Xu, Fan Li.
Abstract
Identification and typing of human enterovirus (HEVs) are important to pathogen detection and therapy. Previous phylogeny-based typing methods are mainly based on multiple sequence alignments of specific genes in the HEVs, but the results are not stable with respect to different choices of genes. Here we report a novel method for identification and typing of HEVs based on information derived from their whole genomes. Specifically, we calculate the k-mer based barcode image for each genome, HEV or other human viruses, for a fixed k, 1<k<7, where a genome barcode is defined in terms of the k-mer frequency distribution across the whole genome for all combinations of k-mers. A phylogenetic tree is constructed using a barcode-based distance and a neighbor-joining method among a set of 443 representative non-HEV human viruses and 395 HEV sequences. The tree shows a clear separation of the HEV viruses from all the non-HEV viruses with 100% accuracy and a separation of the HEVs into four distinct clads with 93.4% consistency with a multiple sequence alignment-based phylogeny. Our detailed analyses of the HEVs having different typing results by the two methods indicate that our results are in better agreement with known information about the HEVs.Entities:
Mesh:
Year: 2011 PMID: 22022592 PMCID: PMC3194813 DOI: 10.1371/journal.pone.0026296
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Barcodes of five representative human viruses: (a) HIV, (b) Enterovirus, (c) Rabies virus, (d) SARS Coronavirus, and (e) Hepatitis B virus.
For each barcode, the x-axis is the list of all unique combinations of 4-mers arranged in the alphabetical order, the y-axis is same kind of virus joint genome axis contracted by 2,000 fold, and the gray level shows the frequency of each k-mer within a 2,000 bp window in the corresponding location.
Figure 2Identification and typing of HEVs.
The x-axis for each plot is the distance between the feature vector of each virus' barcode and the average feature vector of all the viruses we used (in A we used the average feature vector of four kinds of virus: HIV, HEV, SARS and rabies virus; in B we used the average feature vector of all subtypes of HEV), and the y-axis is the distance between the feature vector of each virus' barcode and a normalization vector with value = 1/136 for each of its dimensions, where 136 is the total of number of unique k-mers (paired with its reverse complement [32]). (A): the red dots represent HEVs (395 genomes), the blue ones for HIV (279 genomes), the magenta ones for SARS coronavirus (101 genomes), and the cyan ones for rabies virus genomes (63 genomes). (B): the blue dots represent poliovirus (78 genomes), the green ones for echovirus (52 genomes), the red ones for new virus strain enterovirus 68-71 (72 genomes), and the magenta ones for coxsackievirus A and B group genomes (85 genomes).
Figure 3Phylogenetic trees for the HEVs based on a specific HEV gene (A) versus the HEV barcodes (B).
The edge lengths in the trees reflect the genetic distance calculated according to the Kimura-parameter model. The VP1-based tree's reliability was estimated using 1,000 bootstrap replications. The serotype names beside the trees denote what the serotypes of the species having HEVs.
Comparison of one gene-based and whole genome barcode-based phylogenetic trees (the numbers inside parentheses are the number of virus types for the corresponding serotype).
| Num. | Typing results by two methods | |||
| Serotypes | Barcode based | One gene based | Comparison | |
| 1 | EV71(82), CV-A2 to A7(1), CV-A10(1), CV-A12(1), CV-A14(1), CV-A16(3) | HEV-1 | HEV-A | Exact match |
| 2 | CV-A9(1), CV-B1(1), CV-B2(3), CV-B3(15), CV-B4 to 6(3)E3(2), E4(3), E5(2), E6(3), E7(2), E9(4), E11(4), E12 to 16(1), E18 to 20(1), E24(1), E25(2), E26(1), E27(1), E29(1), E30(6), E30 to 33(1)EV69(1), EV74(1), EV75(1), EV77(1), EV79-87(1), EV97-98(2), EV100(2), EV101(1), EV107(2), EV109(1) | HEV-2 | HEV-B | Exact match |
| 3 | PV-1(72), PV-2(32), PV-3(13) | HEV-3 | HEV-C | Exact match |
| 4 | CV-A1(1),CV-A11(2),CV-A13(3), CV-A15(2), CV-A18(3), CV-A19(2), CV-A20(4), CV-A21(3), CV-A22(3), CV-A24(3) | HEV-4 | HEV-C |
|
| 5 | EV-68(2), 70(2), EV94(2) | HEV-4 | HEV-D | Exact match |
| Match rate: 93.41% | ||||
*The match rate is calculated by the match sequence number/total sequence number.
Information of five classes of viruses' complete genome sequences.
| Virus | Genome Length | Genome Number |
| HIV | ∼9,006 | 279 |
| HEVs | ∼7,200 | 395 |
| Rabies virus | ∼11,900 | 63 |
| SARS Coronavirus | ∼29,700 | 101 |
| Hepatitis B virus | ∼3,200 | 993 |