Literature DB >> 33349236

Analysis of heterogeneous genomic samples using image normalization and machine learning.

Sunitha Basodi1, Pelin Icer Baykal2, Alex Zelikovsky2,3, Pavel Skums2, Yi Pan2.   

Abstract

BACKGROUND: Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures.
RESULTS: We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy.
CONCLUSIONS: Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.

Entities:  

Keywords:  Clustering; Image normalization; Next-generation sequencing data; Outbreaks investigations; Staging HCV infections

Mesh:

Year:  2020        PMID: 33349236      PMCID: PMC7751093          DOI: 10.1186/s12864-020-6661-6

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  34 in total

Review 1.  RNA virus populations as quasispecies.

Authors:  J J Holland; J C De La Torre; D A Steinhauer
Journal:  Curr Top Microbiol Immunol       Date:  1992       Impact factor: 4.291

2.  Reconstruction of viral population structure from next-generation sequencing data using multicommodity flows.

Authors:  Pavel Skums; Nicholas Mancuso; Alexander Artyomenko; Bassam Tork; Ion Mandoiu; Yury Khudyakov; Alex Zelikovsky
Journal:  BMC Bioinformatics       Date:  2013-06-28       Impact factor: 3.169

3.  Health care-associated hepatitis C virus infections attributed to narcotic diversion.

Authors:  Walter C Hellinger; Laura P Bacalis; Robyn S Kay; Nicola D Thompson; Guo-Liang Xia; Yulin Lin; Yury E Khudyakov; Joseph F Perz
Journal:  Ann Intern Med       Date:  2012-04-03       Impact factor: 25.391

4.  Hepatitis C virus infections from unsafe injection practices at an endoscopy clinic in Las Vegas, Nevada, 2007-2008.

Authors:  Gayle E Fischer; Melissa K Schaefer; Brian J Labus; Lawrence Sands; Patricia Rowley; Ihsan A Azzam; Patricia Armour; Yury E Khudyakov; Yulin Lin; Guoliang Xia; Priti R Patel; Joseph F Perz; Scott D Holmberg
Journal:  Clin Infect Dis       Date:  2010-08-01       Impact factor: 9.079

5.  Acute hepatitis B outbreaks in 2 skilled nursing facilities and possible sources of transmission: North Carolina, 2009-2010.

Authors:  Arlene C Seña; Anne Moorman; Levi Njord; Roxanne E Williams; James Colborn; Yury Khudyakov; Jan Drobenuic; Guo-Liang Xia; Hattie Wood; Zack Moore
Journal:  Infect Control Hosp Epidemiol       Date:  2013-05-16       Impact factor: 3.254

6.  Outbreak of hepatitis A in the USA associated with frozen pomegranate arils imported from Turkey: an epidemiological case study.

Authors:  Melissa G Collier; Yury E Khudyakov; David Selvage; Meg Adams-Cameron; Erin Epson; Alicia Cronquist; Rachel H Jervis; Katherine Lamba; Akiko C Kimura; Rick Sowadsky; Rashida Hassan; Sarah Y Park; Eric Garza; Aleisha J Elliott; David S Rotstein; Jennifer Beal; Thomas Kuntz; Susan E Lance; Rebecca Dreisch; Matthew E Wise; Noele P Nelson; Anil Suryaprasad; Jan Drobeniuc; Scott D Holmberg; Fujie Xu
Journal:  Lancet Infect Dis       Date:  2014-09-03       Impact factor: 25.071

7.  Drug resistance of a viral population and its individual intrahost variants during the first 48 hours of therapy.

Authors:  D S Campo; P Skums; Z Dimitrova; G Vaughan; J C Forbi; C G Teo; Y Khudyakov; D T-Y Lau
Journal:  Clin Pharmacol Ther       Date:  2014-01-31       Impact factor: 6.875

8.  Social and Genetic Networks of HIV-1 Transmission in New York City.

Authors:  Joel O Wertheim; Sergei L Kosakovsky Pond; Lisa A Forgione; Sanjay R Mehta; Ben Murrell; Sharmila Shah; Davey M Smith; Konrad Scheffler; Lucia V Torian
Journal:  PLoS Pathog       Date:  2017-01-09       Impact factor: 6.823

9.  Transmission of hepatitis C virus associated with surgical procedures - New Jersey 2010 and Wisconsin 2011.

Authors:  Andria Apostolou; Michael L Bartholomew; Rebecca Greeley; Sheila M Guilfoyle; Marcia Gordon; Carol Genese; Jeffrey P Davis; Barbara Montana; Gwen Borlaug
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2015-02-27       Impact factor: 17.586

Review 10.  Challenges in RNA virus bioinformatics.

Authors:  Manja Marz; Niko Beerenwinkel; Christian Drosten; Markus Fricke; Dmitrij Frishman; Ivo L Hofacker; Dieter Hoffmann; Martin Middendorf; Thomas Rattei; Peter F Stadler; Armin Töpfer
Journal:  Bioinformatics       Date:  2014-03-03       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.