| Literature DB >> 33286029 |
Yuanlin Ma1,2, Zuguo Yu1,3, Runbin Tang1, Xianhua Xie4, Guosheng Han1, Vo V Anh1,5.
Abstract
HIV-1 viruses, which are predominant in the family of HIV viruses, have strong pathogenicity and infectivity. They can evolve into many different variants in a very short time. In this study, we propose a new and effective alignment-free method for the phylogenetic analysis of HIV-1 viruses using complete genome sequences. Our method combines the position distribution information and the counts of the k-mers together. We also propose a metric to determine the optimal k value. We name our method the Position-Weighted k-mers (PWkmer) method. Validation and comparison with the Robinson-Foulds distance method and the modified bootstrap method on a benchmark dataset show that our method is reliable for the phylogenetic analysis of HIV-1 viruses. PWkmer can resolve within-group variations for different known subtypes of Group M of HIV-1 viruses. This method is simple and computationally fast for whole genome phylogenetic analysis.Entities:
Keywords: Alignment-free; HIV-1 virus; Robinson–Foulds distance; phylogenetic analysis; position-weighted k-mers
Year: 2020 PMID: 33286029 PMCID: PMC7516702 DOI: 10.3390/e22020255
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Labels of complete genome builds used for 44 HIV-1 genomes of the dataset.
| No. | Accession | Subtype | Length (bp) | Area |
|---|---|---|---|---|
| 1 | U51190 | A1 | 8999 | Uganda |
| 2 | AF004885 | A1 | 9160 | Kenya |
| 3 | AF069670 | A1 | 8813 | Somalia |
| 4 | AF484509 | A1 | 8807 | Uganda |
| 5 | AF286237 | A2 | 9060 | Cyprus |
| 6 | AF286238 | A2 | 8972 | DRC |
| 7 | AY173951 | B | 8996 | Thailand |
| 8 | AY331295 | B | 8834 | USA |
| 9 | AY423387 | B | 9359 | Netherlands |
| 10 | K03455 | B | 9719 | France |
| 11 | AF146728 | B | 8887 | Australia |
| 12 | AF067155 | C | 9002 | India |
| 13 | AY772699 | C | 9011 | South Africa |
| 14 | U46016 | C | 9031 | Ethopia |
| 15 | U52953 | C | 8959 | Brazil |
| 16 | AY371157 | D | 8379 | Cameroon |
| 17 | K03454 | D | 9176 | DRC |
| 18 | U88824 | D | 8952 | Uganda |
| 19 | AF005494 | F1 | 8968 | Brazil |
| 20 | AF075703 | F1 | 8925 | Finland |
| 21 | AF077336 | F1 | 8903 | Belgium (DRC) |
| 22 | AJ249238 | F1 | 8614 | France |
| 23 | AF377956 | F2 | 8782 | Cameroon |
| 24 | AJ249236 | F2 | 8555 | Cameroon |
| 25 | AJ249237 | F2 | 8589 | Cameroon |
| 26 | AY371158 | F2 | 8349 | Cameroon |
| 27 | AF061641 | G | 9047 | Finland(Kenya) |
| 28 | AF061642 | G | 9074 | Sweden (DRC) |
| 29 | AF084936 | G | 9707 | Belgium (DRC) |
| 30 | AF005496 | H | 8953 | Cent.Afr. Rep |
| 31 | AF190127 | H | 9056 | Belgium |
| 32 | AF190128 | H | 9707 | Belgium |
| 33 | AF082394 | J | 8943 | Sweden |
| 34 | AF082395 | J | 8953 | Sweden |
| 35 | AJ249235 | K | 8600 | DRC |
| 36 | AJ249239 | K | 8604 | Cameroon |
| 37 | AJ006022 | N | 9182 | Cameroon |
| 38 | AJ271370 | N | 9045 | Cameroon |
| 39 | AY532635 | N | 8938 | Cameroon |
| 40 | AJ302647 | O | 9829 | Senegal |
| 41 | AY169812 | O | 9110 | Cameroon |
| 42 | L20571 | O | 9793 | Cameroon |
| 43 | L20587 | O | 9754 | Cameroon |
| 44 | AF447763 | CPZ | 9326 | Tanzania |
DRC: Democratic republic of Congo
Figure 1The trend chart of k value vs. scoring scheme score(k). The red circles represent the score of the HIV dataset for different k values, and the blue dots represent the score of the HEV dataset for different k value.
Figure 2Subtyping of HIV based on position weighted k-mers feature for whole genome sequences. The Neighbor-Joining (NJ) tree of 44 HIV whole genomes is constructed by position weighted k-mers feature distance matrix .
Figure 3Subtyping of HIV based on alignment for whole genome sequences. The NJ tree of 44 HIV whole genomes is constructed by ClustalX.
Figure 4Robinson–Foulds distance between phylogenetic trees reconstructed by the PWkmer method, the CVTree method [20], the DLTree [12] method, and the tree reconstructed by ClustalX method for 44 HIV genome sequence in Table 1 (we selected their optimal result tree by CVTree and DLTree).
Figure 5The modified bootstrap consensus tree for Figure 2 based on 100 replicates.
Robinson–Foulds distances between phylogenetic trees reconstructed by our method at in Manhattan distance and the tree reconstructed by ClustalX on the HIV dataset.
| Species | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| HIV | 74 | 54 | 38 | 26 | 20 | 14 |
| 12 | 14 |