Literature DB >> 29218887

A heuristic method for simulating open-data of arbitrary complexity that can be used to compare and evaluate machine learning methods.

Jason H Moore1, Maksim Shestov, Peter Schmitt, Randal S Olson.   

Abstract

A central challenge of developing and evaluating artificial intelligence and machine learning methods for regression and classification is access to data that illuminates the strengths and weaknesses of different methods. Open data plays an important role in this process by making it easy for computational researchers to easily access real data for this purpose. Genomics has in some examples taken a leading role in the open data effort starting with DNA microarrays. While real data from experimental and observational studies is necessary for developing computational methods it is not sufficient. This is because it is not possible to know what the ground truth is in real data. This must be accompanied by simulated data where that balance between signal and noise is known and can be directly evaluated. Unfortunately, there is a lack of methods and software for simulating data with the kind of complexity found in real biological and biomedical systems. We present here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating complex biological and biomedical data. Further, we introduce new methods for developing simulation models that generate data that specifically allows discrimination between different machine learning methods.

Entities:  

Mesh:

Year:  2018        PMID: 29218887      PMCID: PMC5728661     

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  8 in total

1.  simuPOP: a forward-time population genetics simulation environment.

Authors:  Bo Peng; Marek Kimmel
Journal:  Bioinformatics       Date:  2005-07-14       Impact factor: 6.937

2.  A flexible forward simulator for populations subject to selection and demography.

Authors:  Ryan D Hernandez
Journal:  Bioinformatics       Date:  2008-10-07       Impact factor: 6.937

3.  Forward-time simulations of non-random mating populations using simuPOP.

Authors:  Bo Peng; Christopher I Amos
Journal:  Bioinformatics       Date:  2008-04-15       Impact factor: 6.937

Review 4.  Genome simulation approaches for synthesizing in silico datasets for human genomics.

Authors:  Marylyn D Ritchie; William S Bush
Journal:  Adv Genet       Date:  2010       Impact factor: 1.944

5.  Epi2Loc: an R package to investigate two-locus epistatic models.

Authors:  Raymond K Walters; Charles Laurin; Gitta H Lubke
Journal:  Twin Res Hum Genet       Date:  2014-07-01       Impact factor: 1.587

6.  Data simulation software for whole-genome association and other studies in human genetics.

Authors:  Scott M Dudek; Alison A Motsinger; Digna R Velez; Scott M Williams; Marylyn D Ritchie
Journal:  Pac Symp Biocomput       Date:  2006

7.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures.

Authors:  Ryan J Urbanowicz; Jeff Kiralis; Nicholas A Sinnott-Armstrong; Tamra Heberling; Jonathan M Fisher; Jason H Moore
Journal:  BioData Min       Date:  2012-10-01       Impact factor: 2.522

8.  Heuristic identification of biological architectures for simulating complex hierarchical genetic interactions.

Authors:  Jason H Moore; Ryan Amos; Jeff Kiralis; Peter C Andrews
Journal:  Genet Epidemiol       Date:  2014-11-13       Impact factor: 2.135

  8 in total
  3 in total

1.  Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies.

Authors:  Marc Joiret; Jestinah M Mahachie John; Elena S Gusareva; Kristel Van Steen
Journal:  BioData Min       Date:  2019-06-10       Impact factor: 2.522

2.  Conservation machine learning: a case study of random forests.

Authors:  Moshe Sipper; Jason H Moore
Journal:  Sci Rep       Date:  2021-02-11       Impact factor: 4.379

3.  A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions.

Authors:  Alena Orlenko; Jason H Moore
Journal:  BioData Min       Date:  2021-01-29       Impact factor: 2.522

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.