Literature DB >> 18046765

Data mining, neural nets, trees--problems 2 and 3 of Genetic Analysis Workshop 15.

Andreas Ziegler1, Anita L DeStefano, Inke R König, Claire Bardel, Dumitru Brinza, Shelley Bull, Zhaohui Cai, Beate Glaser, Wei Jiang, Kristine E Lee, Chuang Xing Li, Jing Li, Xin Li, Paul Majoram, Yan Meng, Kristin K Nicodemus, Alexander Platt, Daniel F Schwarz, Weilang Shi, Yin Yao Shugart, Hans H Stassen, Yan V Sun, Sungho Won, Wenyi Wang, Grace Wahba, Usumah A Zagaar, Zhenming Zhao.   

Abstract

Genome-wide association studies using thousands to hundreds of thousands of single nucleotide polymorphism (SNP) markers and region-wide association studies using a dense panel of SNPs are already in use to identify disease susceptibility genes and to predict disease risk in individuals. Because these tasks become increasingly important, three different data sets were provided for the Genetic Analysis Workshop 15, thus allowing examination of various novel and existing data mining methods for both classification and identification of disease susceptibility genes, gene by gene or gene by environment interaction. The approach most often applied in this presentation group was random forests because of its simplicity, elegance, and robustness. It was used for prediction and for screening for interesting SNPs in a first step. The logistic tree with unbiased selection approach appeared to be an interesting alternative to efficiently select interesting SNPs. Machine learning, specifically ensemble methods, might be useful as pre-screening tools for large-scale association studies because they can be less prone to overfitting, can be less computer processor time intensive, can easily include pair-wise and higher-order interactions compared with standard statistical approaches and can also have a high capability for classification. However, improved implementations that are able to deal with hundreds of thousands of SNPs at a time are required.

Mesh:

Year:  2007        PMID: 18046765     DOI: 10.1002/gepi.20280

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  13 in total

1.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.

Authors:  Daniel F Schwarz; Inke R König; Andreas Ziegler
Journal:  Bioinformatics       Date:  2010-05-26       Impact factor: 6.937

2.  Patient-centered yes/no prognosis using learning machines.

Authors:  I R König; J D Malley; S Pajevic; C Weimar; H-C Diener; A Ziegler
Journal:  Int J Data Min Bioinform       Date:  2008       Impact factor: 0.667

Review 3.  Molecular signatures of cardiovascular disease risk: potential for test development and clinical application.

Authors:  Heribert Schunkert; Inke R König; Jeanette Erdmann
Journal:  Mol Diagn Ther       Date:  2008       Impact factor: 4.074

Review 4.  Statistical learning approaches in the genetic epidemiology of complex diseases.

Authors:  Anne-Laure Boulesteix; Marvin N Wright; Sabine Hoffmann; Inke R König
Journal:  Hum Genet       Date:  2019-05-02       Impact factor: 4.132

5.  Genome-wide association studies: quality control and population-based measures.

Authors:  Andreas Ziegler
Journal:  Genet Epidemiol       Date:  2009       Impact factor: 2.135

6.  Association between protein signals and type 2 diabetes incidence.

Authors:  Troels Mygind Jensen; Daniel R Witte; Damiana Pieragostino; James N McGuire; Ellis D Schjerning; Chiara Nardi; Andrea Urbani; Mika Kivimäki; Eric J Brunner; Adam G Tabàk; Dorte Vistisen
Journal:  Acta Diabetol       Date:  2012-02-05       Impact factor: 4.280

7.  Machine learning and data mining in complex genomic data--a review on the lessons learned in Genetic Analysis Workshop 19.

Authors:  Inke R König; Jonathan Auerbach; Damian Gola; Elizabeth Held; Emily R Holzinger; Marc-André Legault; Rui Sun; Nathan Tintle; Hsin-Chou Yang
Journal:  BMC Genet       Date:  2016-02-03       Impact factor: 2.797

8.  Identification of genes and haplotypes that predict rheumatoid arthritis using random forests.

Authors:  Rui Tang; Jason P Sinnwell; Jia Li; David N Rider; Mariza de Andrade; Joanna M Biernacka
Journal:  BMC Proc       Date:  2009-12-15

9.  Parallel classification and feature selection in microarray data using SPRINT.

Authors:  Lawrence Mitchell; Terence M Sloan; Muriel Mewissen; Peter Ghazal; Thorsten Forster; Michal Piotrowski; Arthur Trew
Journal:  Concurr Comput       Date:  2014-03-25       Impact factor: 1.536

10.  Do little interactions get lost in dark random forests?

Authors:  Marvin N Wright; Andreas Ziegler; Inke R König
Journal:  BMC Bioinformatics       Date:  2016-03-31       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.