Literature DB >> 26988414

Virus classification in 60-dimensional protein space.

Yongkun Li1, Kun Tian1, Changchuan Yin2, Rong Lucy He3, Stephen S-T Yau4.   

Abstract

Due to vast sequence divergence among different viral groups, sequence alignment is not directly applicable to genome-wide comparative analysis of viruses. More and more attention has been paid to alignment-free methods for whole genome comparison and phylogenetic tree reconstruction. Among alignment-free methods, the recently proposed "Natural Vector (NV) representation" has successfully been used to study the phylogeny of multi-segmented viruses based on a 12-dimensional genome space derived from the nucleotide sequence structure. But the preference of proteomes over genomes for the determination of viral phylogeny was not deeply investigated. As the translated products of genes, proteins directly form the shape of viral structure and are vital for all metabolic pathways. In this study, using the NV representation of a protein sequence along with the Hausdorff distance suitable to compare point sets, we construct a 60-dimensional protein space to analyze the evolutionary relationships of 4021 viruses by whole-proteomes in the current NCBI Reference Sequence Database (RefSeq). We also take advantage of the previously developed natural graphical representation to recover viral phylogeny. Our results demonstrate that the proposed method is efficient and accurate for classifying viruses. The accuracy rates of our predictions such as for Baltimore II viruses are as high as 95.9% for family labels, 95.7% for subfamily labels and 96.5% for genus labels. Finally, we discover that proteomes lead to better viral classification when reliable protein sequences are abundant. In other cases, the accuracy rates using proteomes are still comparable to that of genomes. Published by Elsevier Inc.

Keywords:  Hausdorff distance; Natural graphical representation; Natural vector; Virus classification

Mesh:

Substances:

Year:  2016        PMID: 26988414     DOI: 10.1016/j.ympev.2016.03.009

Source DB:  PubMed          Journal:  Mol Phylogenet Evol        ISSN: 1055-7903            Impact factor:   4.286


  9 in total

1.  Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Authors:  Marika Kaden; Katrin Sophie Bohnsack; Mirko Weber; Mateusz Kudła; Kaja Gutowska; Jacek Blazewicz; Thomas Villmann
Journal:  Neural Comput Appl       Date:  2021-04-27       Impact factor: 5.606

2.  Integration of In Silico and In Vitro Analysis of Gliotoxin Production Reveals a Narrow Range of Producing Fungal Species.

Authors:  Sergio Redrado; Patricia Esteban; María Pilar Domingo; Concepción Lopez; Antonio Rezusta; Ariel Ramirez-Labrada; Maykel Arias; Julián Pardo; Eva M Galvez
Journal:  J Fungi (Basel)       Date:  2022-03-31

3.  Virus Database and Online Inquiry System Based on Natural Vectors.

Authors:  Rui Dong; Hui Zheng; Kun Tian; Shek-Chung Yau; Weiguang Mao; Wenping Yu; Changchuan Yin; Chenglong Yu; Rong Lucy He; Jie Yang; Stephen St Yau
Journal:  Evol Bioinform Online       Date:  2017-12-17       Impact factor: 1.625

4.  A novel fast vector method for genetic sequence comparison.

Authors:  Yongkun Li; Lily He; Rong Lucy He; Stephen S-T Yau
Journal:  Sci Rep       Date:  2017-09-22       Impact factor: 4.379

5.  An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.

Authors:  Stephen Solis-Reyes; Mariano Avino; Art Poon; Lila Kari
Journal:  PLoS One       Date:  2018-11-14       Impact factor: 3.240

6.  A new method to analyze protein sequence similarity using Dynamic Time Warping.

Authors:  Wenbing Hou; Qiuhui Pan; Qianying Peng; Mingfeng He
Journal:  Genomics       Date:  2016-12-11       Impact factor: 5.736

7.  iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins.

Authors:  Dan Zhang; Hua-Dong Chen; Hasan Zulfiqar; Shi-Shi Yuan; Qin-Lai Huang; Zhao-Yue Zhang; Ke-Jun Deng
Journal:  Comput Math Methods Med       Date:  2021-01-07       Impact factor: 2.238

8.  Identification of HIV Rapid Mutations Using Differences in Nucleotide Distribution over Time.

Authors:  Nan Sun; Jie Yang; Stephen S-T Yau
Journal:  Genes (Basel)       Date:  2022-01-19       Impact factor: 4.096

9.  Virome assembly and annotation in brain tissue based on next-generation sequencing.

Authors:  Zihao Yuan; Xiaohua Ye; Lisha Zhu; Ningyan Zhang; Zhiqiang An; W Jim Zheng
Journal:  Cancer Med       Date:  2020-08-01       Impact factor: 4.452

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.