Literature DB >> 24798423

Imbalanced class learning in epigenetics.

M Muksitul Haque1, Michael K Skinner, Lawrence B Holder.   

Abstract

In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological data, sensor data, medical diagnostics, or any other domain where labeling any instances of the minority class can be time-consuming or costly or the data may not be easily available. The current study investigates a number of imbalanced class algorithms for solving the imbalanced class distribution present in epigenetic datasets. Epigenetic (DNA methylation) datasets inherently come with few differentially DNA methylated regions (DMR) and with a higher number of non-DMR sites. For this class imbalance problem, a number of algorithms are compared, including the TAN+AdaBoost algorithm. Experiments performed on four epigenetic datasets and several known datasets show that an imbalanced dataset can have similar accuracy as a regular learner on a balanced dataset.

Keywords:  DNA; biology; computational molecular biology; genomics; machine earning

Mesh:

Year:  2014        PMID: 24798423      PMCID: PMC4082351          DOI: 10.1089/cmb.2014.0008

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  22 in total

Review 1.  DNA binding sites: representation and discovery.

Authors:  G D Stormo
Journal:  Bioinformatics       Date:  2000-01       Impact factor: 6.937

Review 2.  Computational epigenetics.

Authors:  Christoph Bock; Thomas Lengauer
Journal:  Bioinformatics       Date:  2007-11-17       Impact factor: 6.937

3.  Exploratory undersampling for class-imbalance learning.

Authors:  Xu-Ying Liu; Jianxin Wu; Zhi-Hua Zhou
Journal:  IEEE Trans Syst Man Cybern B Cybern       Date:  2008-12-16

4.  SRY induced TCF21 genome-wide targets and cascade of bHLH factors during Sertoli cell differentiation and male sex determination in rats.

Authors:  Ramji K Bhandari; Ellyn N Schinke; Md M Haque; Ingrid Sadler-Riggleman; Michael K Skinner
Journal:  Biol Reprod       Date:  2012-12-06       Impact factor: 4.285

5.  Hydrocarbons (jet fuel JP-8) induce epigenetic transgenerational inheritance of obesity, reproductive disease and sperm epimutations.

Authors:  Rebecca Tracey; Mohan Manikkam; Carlos Guerrero-Bosagna; Michael K Skinner
Journal:  Reprod Toxicol       Date:  2013-01-25       Impact factor: 3.143

6.  A particle swarm based hybrid system for imbalanced medical data sampling.

Authors:  Pengyi Yang; Liang Xu; Bing B Zhou; Zili Zhang; Albert Y Zomaya
Journal:  BMC Genomics       Date:  2009-12-03       Impact factor: 3.969

7.  Dioxin (TCDD) induces epigenetic transgenerational inheritance of adult onset disease and sperm epimutations.

Authors:  Mohan Manikkam; Rebecca Tracey; Carlos Guerrero-Bosagna; Michael K Skinner
Journal:  PLoS One       Date:  2012-09-26       Impact factor: 3.240

8.  Plastics derived endocrine disruptors (BPA, DEHP and DBP) induce epigenetic transgenerational inheritance of obesity, reproductive disease and sperm epimutations.

Authors:  Mohan Manikkam; Rebecca Tracey; Carlos Guerrero-Bosagna; Michael K Skinner
Journal:  PLoS One       Date:  2013-01-24       Impact factor: 3.240

9.  Environmentally induced epigenetic transgenerational inheritance of altered Sertoli cell transcriptome and epigenome: molecular etiology of male infertility.

Authors:  Carlos Guerrero-Bosagna; Marina Savenkova; Md Muksitul Haque; Eric Nilsson; Michael K Skinner
Journal:  PLoS One       Date:  2013-03-28       Impact factor: 3.240

Review 10.  A survey of DNA motif finding algorithms.

Authors:  Modan K Das; Ho-Kwok Dai
Journal:  BMC Bioinformatics       Date:  2007-11-01       Impact factor: 3.169

View more
  4 in total

1.  Integrated genetic and epigenetic prediction of coronary heart disease in the Framingham Heart Study.

Authors:  Meeshanthini V Dogan; Isabella M Grumbach; Jacob J Michaelson; Robert A Philibert
Journal:  PLoS One       Date:  2018-01-02       Impact factor: 3.240

2.  Predicting gastrointestinal drug effects using contextualized metabolic models.

Authors:  Marouen Ben Guebila; Ines Thiele
Journal:  PLoS Comput Biol       Date:  2019-06-26       Impact factor: 4.475

3.  Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values.

Authors:  Abdul Karim; Zheng Su; Phillip K West; Matthew Keon; Jannah Shamsani; Samuel Brennan; Ted Wong; Ognjen Milicevic; Guus Teunisse; Hima Nikafshan Rad; Abdul Sattar
Journal:  Genes (Basel)       Date:  2021-10-30       Impact factor: 4.096

4.  Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

Authors:  M Muksitul Haque; Lawrence B Holder; Michael K Skinner
Journal:  PLoS One       Date:  2015-11-16       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.