Literature DB >> 34976561

Classification of COVID-19 and Other Pathogenic Sequences: A Dinucleotide Frequency and Machine Learning Approach.

Gciniwe S Dlamini1, Stephanie J Muller1, Rebone L Meraba1, Richard A Young1, James Mashiyane1, Tapiwa Chiwewe1, Darlington S Mapiye1.   

Abstract

The world is grappling with the COVID-19 pandemic caused by the 2019 novel SARS-CoV-2. To better understand this novel virus and its relationship with other pathogens, new methods for analyzing the genome are required. In this study, intrinsic dinucleotide genomic signatures were analyzed for whole genome sequence data of eight pathogenic species, including SARS-CoV-2. The genome sequences were transformed into dinucleotide relative frequencies and classified using the extreme gradient boosting (XGBoost) model. The classification models were trained to a) distinguish between the sequences of all eight species and b) distinguish between sequences of SARS-CoV-2 that originate from different geographic regions. Our method attained 100% in all performance metrics and for all tasks in the eight-species classification problem. Moreover, the models achieved 67% balanced accuracy for the task of classifying the SARS-CoV-2 sequences into the six continental regions and achieved 86% balanced accuracy for the task of classifying SARS-CoV-2 samples as either originating from Asia or not. Analysis of the dinucleotide genomic profiles of the eight species revealed a similarity between the SARS-CoV-2 and MERS-CoV viral sequences. Further analysis of SARS-CoV-2 viral sequences from the six continents revealed that samples from Oceania had the highest frequency of TT dinucleotides as well as the lowest CG frequency compared to the other continents. The dinucleotide signatures of AC, AG,CA, CT, GA, GT, TC, and TG were well conserved across most genomes, while the frequencies of other dinucleotide signatures varied considerably. Altogether, the results from this study demonstrate the utility of dinucleotide relative frequencies for discriminating and identifying similar species. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/.

Entities:  

Keywords:  Alignment-free sequence analysis; COVID-19; XGBoost; dinucleotide frequencies; feature representations; genomic signatures; human pathogens; machine learning

Year:  2020        PMID: 34976561      PMCID: PMC8675546          DOI: 10.1109/ACCESS.2020.3031387

Source DB:  PubMed          Journal:  IEEE Access        ISSN: 2169-3536            Impact factor:   3.367


  55 in total

1.  Conversion of nucleotides sequences into genomic signals.

Authors:  P D Cristea
Journal:  J Cell Mol Med       Date:  2002 Apr-Jun       Impact factor: 5.310

Review 2.  Hepatitis B virus infection: epidemiology and vaccination.

Authors:  Colin W Shepard; Edgar P Simard; Lyn Finelli; Anthony E Fiore; Beth P Bell
Journal:  Epidemiol Rev       Date:  2006-06-05       Impact factor: 6.222

Review 3.  Comparative analysis of environmental sequences: potential and challenges.

Authors:  Konrad U Foerstner; Christian von Mering; Peer Bork
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2006-03-29       Impact factor: 6.237

4.  Emergence of Zaire Ebola virus disease in Guinea.

Authors:  Sylvain Baize; Delphine Pannetier; Lisa Oestereich; Toni Rieger; Lamine Koivogui; N'Faly Magassouba; Barrè Soropogui; Mamadou Saliou Sow; Sakoba Keïta; Hilde De Clerck; Amanda Tiffany; Gemma Dominguez; Mathieu Loua; Alexis Traoré; Moussa Kolié; Emmanuel Roland Malano; Emmanuel Heleze; Anne Bocquin; Stephane Mély; Hervé Raoul; Valérie Caro; Dániel Cadar; Martin Gabriel; Meike Pahlmann; Dennis Tappe; Jonas Schmidt-Chanasit; Benido Impouma; Abdoul Karim Diallo; Pierre Formenty; Michel Van Herp; Stephan Günther
Journal:  N Engl J Med       Date:  2014-04-16       Impact factor: 91.245

5.  Some rules in the ordering of nucleotides in the DNA.

Authors:  R Nussinov
Journal:  Nucleic Acids Res       Date:  1980-10-10       Impact factor: 16.971

6.  Cytosine methylation and CpG, TpG (CpA) and TpA frequencies.

Authors:  Kamel Jabbari; Giorgio Bernardi
Journal:  Gene       Date:  2004-05-26       Impact factor: 3.688

7.  A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

Authors:  Xiang Gao; Huaiying Lin; Kashi Revanna; Qunfeng Dong
Journal:  BMC Bioinformatics       Date:  2017-05-10       Impact factor: 3.169

8.  CpG usage in RNA viruses: data and hypotheses.

Authors:  Xiaofei Cheng; Nasar Virk; Wei Chen; Shuqin Ji; Shuxian Ji; Yuqiang Sun; Xiaoyun Wu
Journal:  PLoS One       Date:  2013-09-23       Impact factor: 3.240

Review 9.  Alignment-free inference of hierarchical and reticulate phylogenomic relationships.

Authors:  Guillaume Bernard; Cheong Xin Chan; Yao-Ban Chan; Xin-Yi Chua; Yingnan Cong; James M Hogan; Stefan R Maetschke; Mark A Ragan
Journal:  Brief Bioinform       Date:  2019-03-22       Impact factor: 11.622

10.  Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.

Authors:  Gurjit S Randhawa; Maximillian P M Soltysiak; Hadi El Roz; Camila P E de Souza; Kathleen A Hill; Lila Kari
Journal:  PLoS One       Date:  2020-04-24       Impact factor: 3.240

View more
  3 in total

1.  First-line drug resistance profiling of Mycobacterium tuberculosis: a machine learning approach.

Authors:  Stephanie J Müller; Rebone L Meraba; Gciniwe S Dlamini; Darlington S Mapiye
Journal:  AMIA Annu Symp Proc       Date:  2022-02-21

2.  A Survey on Machine Learning and Internet of Medical Things-Based Approaches for Handling COVID-19: Meta-Analysis.

Authors:  Shahab S Band; Sina Ardabili; Atefeh Yarahmadi; Bahareh Pahlevanzadeh; Adiqa Kausar Kiani; Amin Beheshti; Hamid Alinejad-Rokny; Iman Dehzangi; Arthur Chang; Amir Mosavi; Massoud Moslehpour
Journal:  Front Public Health       Date:  2022-06-23

3.  Correlation-Based Analysis of COVID-19 Virus Genome Versus Other Fatal Virus Genomes.

Authors:  Sidharth Purohit; Suresh Chandra Satapathy; S Sibi Chakkaravarthy; Yu-Dong Zhang
Journal:  Arab J Sci Eng       Date:  2021-06-24       Impact factor: 2.807

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.