Literature DB >> 24989843

Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications.

Jochen Kruppa1, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R König, Andreas Ziegler.   

Abstract

Machine learning methods are applied to three different large datasets, all dealing with probability estimation problems for dichotomous or multicategory data. Specifically, we investigate k-nearest neighbors, bagged nearest neighbors, random forests for probability estimation trees, and support vector machines with the kernels of Bessel, linear, Laplacian, and radial basis type. Comparisons are made with logistic regression. The dataset from the German Stroke Study Collaboration with dichotomous and three-category outcome variables allows, in particular, for temporal and external validation. The other two datasets are freely available from the UCI learning repository and provide dichotomous outcome variables. One of them, the Cleveland Clinic Foundation Heart Disease dataset, uses data from one clinic for training and from three clinics for external validation, while the other, the thyroid disease dataset, allows for temporal validation by separating data into training and test data by date of recruitment into study. For dichotomous outcome variables, we use receiver operating characteristics, areas under the curve values with bootstrapped 95% confidence intervals, and Hosmer-Lemeshow-type figures as comparison criteria. For dichotomous and multicategory outcomes, we calculated bootstrap Brier scores with 95% confidence intervals and also compared them through bootstrapping. In a supplement, we provide R code for performing the analyses and for random forest analyses in Random Jungle, version 2.1.0. The learning machines show promising performance over all constructed models. They are simple to apply and serve as an alternative approach to logistic or multinomial logistic regression analysis.
© 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Entities:  

Keywords:  Brier score; German Stroke Study Collaboration; Probability estimation; Random Jungle; Random forest; Support vector machine

Mesh:

Year:  2014        PMID: 24989843     DOI: 10.1002/bimj.201300077

Source DB:  PubMed          Journal:  Biom J        ISSN: 0323-3847            Impact factor:   2.207


  9 in total

1.  Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data.

Authors:  Máté E Maros; David Capper; David T W Jones; Volker Hovestadt; Andreas von Deimling; Stefan M Pfister; Axel Benner; Manuela Zucknick; Martin Sill
Journal:  Nat Protoc       Date:  2020-01-13       Impact factor: 13.491

2.  A Signature Enrichment Design with Bayesian Adaptive Randomization.

Authors:  Fang Xia; Stephen L George; Jing Ning; Liang Li; Xuelin Huang
Journal:  J Appl Stat       Date:  2020-04-27       Impact factor: 1.404

3.  Machine learning and data mining in complex genomic data--a review on the lessons learned in Genetic Analysis Workshop 19.

Authors:  Inke R König; Jonathan Auerbach; Damian Gola; Elizabeth Held; Emily R Holzinger; Marc-André Legault; Rui Sun; Nathan Tintle; Hsin-Chou Yang
Journal:  BMC Genet       Date:  2016-02-03       Impact factor: 2.797

4.  Efficiency of different measures for defining the applicability domain of classification models.

Authors:  Waldemar Klingspohn; Miriam Mathea; Antonius Ter Laak; Nikolaus Heinrich; Knut Baumann
Journal:  J Cheminform       Date:  2017-08-03       Impact factor: 5.514

5.  Simulation of complex data structures for planning of studies with focus on biomarker comparison.

Authors:  Andreas Schulz; Daniela Zöller; Stefan Nickels; Manfred E Beutel; Maria Blettner; Philipp S Wild; Harald Binder
Journal:  BMC Med Res Methodol       Date:  2017-06-13       Impact factor: 4.615

6.  Comparison of Machine Learning Techniques for Prediction of Hospitalization in Heart Failure Patients.

Authors:  Giulia Lorenzoni; Stefano Santo Sabato; Corrado Lanera; Daniele Bottigliengo; Clara Minto; Honoria Ocagli; Paola De Paolis; Dario Gregori; Sabino Iliceto; Franco Pisanò
Journal:  J Clin Med       Date:  2019-08-24       Impact factor: 4.241

7.  Predicting Physician Consultations for Low Back Pain Using Claims Data and Population-Based Cohort Data-An Interpretable Machine Learning Approach.

Authors:  Adrian Richter; Julia Truthmann; Jean-François Chenot; Carsten Oliver Schmidt
Journal:  Int J Environ Res Public Health       Date:  2021-11-16       Impact factor: 3.390

8.  Predicting short and long-term mortality after acute ischemic stroke using EHR.

Authors:  Vida Abedi; Venkatesh Avula; Seyed-Mostafa Razavi; Shreya Bavishi; Durgesh Chaudhary; Shima Shahjouei; Ming Wang; Christoph J Griessenauer; Jiang Li; Ramin Zand
Journal:  J Neurol Sci       Date:  2021-06-29       Impact factor: 4.553

9.  A systematic review of machine learning models for predicting outcomes of stroke with structured data.

Authors:  Wenjuan Wang; Martin Kiik; Niels Peek; Vasa Curcin; Iain J Marshall; Anthony G Rudd; Yanzhong Wang; Abdel Douiri; Charles D Wolfe; Benjamin Bray
Journal:  PLoS One       Date:  2020-06-12       Impact factor: 3.240

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.