| Literature DB >> 34862525 |
Victor E Staartjes1, Julius M Kernbach2.
Abstract
We illustrate the steps required to train and validate a simple, machine learning-based clinical prediction model for any binary outcome, such as, for example, the occurrence of a complication, in the statistical programming language R. To illustrate the methods applied, we supply a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict the occurrence of 12-month survival. We walk the reader through each step, including import, checking, and splitting of datasets. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm, and how to perform feature selection using recursive feature elimination. When it comes to training models, we apply the theory discussed in Parts I-III. We show how to implement bootstrapping and to evaluate and select models based on out-of-sample error. Specifically for classification, we discuss how to counteract class imbalance by using upsampling techniques. We discuss how the reporting of a minimum of accuracy, area under the curve (AUC), sensitivity, and specificity for discrimination, as well as slope and intercept for calibration-if possible alongside a calibration plot-is paramount. Finally, we explain how to arrive at a measure of variable importance using a universal, AUC-based method. We provide the full, structured code, as well as the complete glioblastoma survival database for the readers to download and execute in parallel to this section.Entities:
Keywords: Artificial intelligence; Clinical prediction model; Machine intelligence; Machine learning; Prediction; Prognosis
Mesh:
Year: 2022 PMID: 34862525 DOI: 10.1007/978-3-030-85292-4_5
Source DB: PubMed Journal: Acta Neurochir Suppl ISSN: 0065-1419