Literature DB >> 24478134

Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

Jochen Kruppa1, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R König, James D Malley, Andreas Ziegler.   

Abstract

Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077).
© 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Entities:  

Keywords:  Bagged nearest neighbor; Nonparametric regression; Probability estimation; Random forest; Support vector machine

Mesh:

Year:  2014        PMID: 24478134     DOI: 10.1002/bimj.201300068

Source DB:  PubMed          Journal:  Biom J        ISSN: 0323-3847            Impact factor:   2.207


  22 in total

Review 1.  Big data in medical science--a biostatistical view.

Authors:  Harald Binder; Maria Blettner
Journal:  Dtsch Arztebl Int       Date:  2015-02-27       Impact factor: 5.594

Review 2.  Statistical learning approaches in the genetic epidemiology of complex diseases.

Authors:  Anne-Laure Boulesteix; Marvin N Wright; Sabine Hoffmann; Inke R König
Journal:  Hum Genet       Date:  2019-05-02       Impact factor: 4.132

3.  Real alerts and artifact classification in archived multi-signal vital sign monitoring data: implications for mining big data.

Authors:  Marilyn Hravnak; Lujie Chen; Artur Dubrawski; Eliezer Bose; Gilles Clermont; Michael R Pinsky
Journal:  J Clin Monit Comput       Date:  2015-10-05       Impact factor: 2.502

4.  Random forest classifiers aid in the detection of incidental osteoblastic osseous metastases in DEXA studies.

Authors:  Samir D Mehta; Ronnie Sebro
Journal:  Int J Comput Assist Radiol Surg       Date:  2019-03-09       Impact factor: 2.924

5.  Developing and validating a multivariable prediction model for in-hospital mortality of pneumonia with advanced chronic kidney disease patients: a retrospective analysis using a nationwide database in Japan.

Authors:  Daisuke Takada; Susumu Kunisawa; Takeshi Matsubara; Kiyohide Fushimi; Motoko Yanagita; Yuichi Imanaka
Journal:  Clin Exp Nephrol       Date:  2020-04-15       Impact factor: 2.801

6.  Using Supervised Machine Learning to Classify Real Alerts and Artifact in Online Multisignal Vital Sign Monitoring Data.

Authors:  Lujie Chen; Artur Dubrawski; Donghan Wang; Madalina Fiterau; Mathieu Guillame-Bert; Eliezer Bose; Ata M Kaynar; David J Wallace; Jane Guttendorf; Gilles Clermont; Michael R Pinsky; Marilyn Hravnak
Journal:  Crit Care Med       Date:  2016-07       Impact factor: 7.598

7.  Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data.

Authors:  Máté E Maros; David Capper; David T W Jones; Volker Hovestadt; Andreas von Deimling; Stefan M Pfister; Axel Benner; Manuela Zucknick; Martin Sill
Journal:  Nat Protoc       Date:  2020-01-13       Impact factor: 13.491

8.  Early Assessment Window for Predicting Breast Cancer Neoadjuvant Therapy using Biomarkers, Ultrasound, and Diffuse Optical Tomography.

Authors:  Quing Zhu; Foluso O Ademuyiwa; Catherine Young; Catherine Appleton; Matthew F Covington; Cynthia Ma; Souzan Sanati; Ian S Hagemann; Atahar Mostafa; K M Shihab Uddin; Isabella Grigsby; Ashley E Frith; Leonel F Hernandez-Aya; Steven S Poplack
Journal:  Breast Cancer Res Treat       Date:  2021-05-10       Impact factor: 4.872

9.  Multiplexed measurement of candidate blood protein biomarkers of heart failure.

Authors:  Claire Tonry; Ken McDonald; Mark Ledwidge; Belinda Hernandez; Nadezhda Glezeva; Cathy Rooney; Brian Morrissey; Stephen R Pennington; John A Baugh; Chris J Watson
Journal:  ESC Heart Fail       Date:  2021-03-28

10.  Using Machine Learning to Unravel the Value of Radiographic Features for the Classification of Bone Tumors.

Authors:  Derun Pan; Renyi Liu; Bowen Zheng; Jianxiang Yuan; Hui Zeng; Zilong He; Zhendong Luo; Genggeng Qin; Weiguo Chen
Journal:  Biomed Res Int       Date:  2021-03-11       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.