Literature DB >> 24358057

A Multi-Core Parallelization Strategy for Statistical Significance Testing in Learning Classifier Systems.

James Rudd1, Jason H Moore2, Ryan J Urbanowicz3.   

Abstract

Permutation-based statistics for evaluating the significance of class prediction, predictive attributes, and patterns of association have only appeared within the learning classifier system (LCS) literature since 2012. While still not widely utilized by the LCS research community, formal evaluations of test statistic confidence are imperative to large and complex real world applications such as genetic epidemiology where it is standard practice to quantify the likelihood that a seemingly meaningful statistic could have been obtained purely by chance. LCS algorithms are relatively computationally expensive on their own. The compounding requirements for generating permutation-based statistics may be a limiting factor for some researchers interested in applying LCS algorithms to real world problems. Technology has made LCS parallelization strategies more accessible and thus more popular in recent years. In the present study we examine the benefits of externally parallelizing a series of independent LCS runs such that permutation testing with cross validation becomes more feasible to complete on a single multi-core workstation. We test our python implementation of this strategy in the context of a simulated complex genetic epidemiological data mining problem. Our evaluations indicate that as long as the number of concurrent processes does not exceed the number of CPU cores, the speedup achieved is approximately linear.

Entities:  

Keywords:  Algorithms; Design; LCS; Performance; multi-core processors; parallelization; scalability; significance testing

Year:  2013        PMID: 24358057      PMCID: PMC3864178          DOI: 10.1007/s12065-013-0092-0

Source DB:  PubMed          Journal:  Evol Intell        ISSN: 1864-5909


  6 in total

1.  Accuracy-based learning classifier systems: models, analysis and applications to classification tasks.

Authors:  Ester Bernadó-Mansilla; Josep M Garrell-Guiu
Journal:  Evol Comput       Date:  2003       Impact factor: 3.277

2.  An Analysis Pipeline with Statistical and Visualization-Guided Knowledge Discovery for Michigan-Style Learning Classifier Systems.

Authors:  Ryan J Urbanowicz; Ambrose Granizo-Mackenzie; Jason H Moore
Journal:  IEEE Comput Intell Mag       Date:  2012-11       Impact factor: 11.356

3.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures.

Authors:  Ryan J Urbanowicz; Jeff Kiralis; Nicholas A Sinnott-Armstrong; Tamra Heberling; Jonathan M Fisher; Jason H Moore
Journal:  BioData Min       Date:  2012-10-01       Impact factor: 2.522

4.  Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.

Authors:  Ryan J Urbanowicz; Jeff Kiralis; Jonathan M Fisher; Jason H Moore
Journal:  BioData Min       Date:  2012-09-26       Impact factor: 2.522

Review 5.  Bioinformatics challenges for genome-wide association studies.

Authors:  Jason H Moore; Folkert W Asselbergs; Scott M Williams
Journal:  Bioinformatics       Date:  2010-01-06       Impact factor: 6.937

6.  Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach.

Authors:  Ryan John Urbanowicz; Angeline S Andrew; Margaret Rita Karagas; Jason H Moore
Journal:  J Am Med Inform Assoc       Date:  2013-02-26       Impact factor: 4.497

  6 in total
  1 in total

1.  ExSTraCS 2.0: Description and Evaluation of a Scalable Learning Classifier System.

Authors:  Ryan J Urbanowicz; Jason H Moore
Journal:  Evol Intell       Date:  2015-04-03
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.