Literature DB >> 24384866

A primer on predictive models.

Akbar K Waljee¹, Peter D R Higgins², Amit G Singal³.

Abstract

Prediction research is becoming increasing popular; however, the differences between traditional explanatory research and prediction research are often poorly understood, resulting in a wide variation in the methodologic quality of prediction research. This primer describes the basic methods for conducting prediction research in gastroenterology and highlights differences between traditional explanatory research and predictive research.

Entities: Chemical Disease Gene Species

Year: 2014 PMID： 24384866 PMCID： PMC3912317 DOI： 10.1038/ctg.2013.19

Source DB: PubMed Journal: Clin Transl Gastroenterol ISSN： 2155-384X Impact factor: 4.488

INTRODUCTION

Prediction research, which aims to predict future events or outcomes based on patterns within a set of variables, has become increasingly popular in medical research.[1] Accurate predictive models can inform patients and physicians about the future course of an illness or the risk of developing an illness and thereby help guide decisions on screening and/or treatment. For example, predictive models have been developed in gastroenterology to predict the risk of disease flares for inflammatory bowel disease and risk of hepatocellular carcinoma among patients with cirrhosis.[2, 3] There are several important differences between traditional explanatory research and prediction research. Explanatory research typically applies statistical methods to test causal hypotheses using a priori theoretical constructs (e.g., hepatocellular carcinoma surveillance underutilization is related to provider-level factors[4]). In contrast, predictive research applies statistical methods and/or data mining techniques, without preconceived theoretical constructs, to predict future outcomes (e.g., predicting the risk of hospital readmission[5]).[6] Although predictive models may be used to provide insight into causality of pathophysiology of the outcome, causality is neither a primary aim nor a requirement for variable inclusion.[6] Noncausal predictive factors may be surrogates for other drivers of disease, with tumor markers as predictors of cancer progression or recurrence being the most common example. Unfortunately, a poor understanding of the differences in methodology between explanatory and predictive research has led to a wide variation in the methodologic quality of prediction research.[7] The aim of this primer is to describe basic methods for conducting prediction research, which can be divided into three main steps: developing a predictive model, independently validating its performance, and prospectively studying its clinical impact.

TYPES OF PREDICTIVE MODELS

Although prediction research in medicine has traditionally used a Bayesian framework approach, with statistical techniques such as regression models, data mining techniques such as machine learning algorithms are a form of artificial intelligence that are being used with increasing frequency.[8] Machine learning has been previously used to predict behavior or outcomes in business, such as identifying consumer preferences for products based on prior purchasing history. A number of different techniques to develop predictive algorithms exist, using a variety of prediction analytic tools/software and have been described in extensive detail elsewhere.[8, 9] Some examples include neural networks, support vector machines and decision trees. Decision trees, for example, use techniques such as classification and regression trees, boosting and random forest to predict various outcomes. The analysis can be conducted using free software environments such as “R”[10] as well as vendor applications. Machine learning algorithms, such as random-forest approaches,[11, 12] have several advantages over traditional explanatory statistical modeling, such as lack of a predefined hypothesis, making it less likely to overlook unexpected predictor variables or potential interactions. Approaching a predictive problem without a specific causal hypothesis can be quite effective when many potential predictors are available (increasingly common with electronic health records) and when there are interactions between predictors, which are common in biological and social causative processes. Predictive models using machine learning algorithms may therefore facilitate recognition of clinically important risk and variables in patients with several marginal risk factors that may otherwise not be identified. In fact, many examples of discovery of unexpected predictor variables exist in the machine learning literature.[2, 3]

DEVELOPING A PREDICTIVE MODEL

The first step in developing a predictive model, when using traditional regression analysis, is selecting relevant candidate predictor variables for possible inclusion in the model; however, there is no consensus for the best strategy to do so.[13] A backward-elimination approach starts with all candidate variables, and hypothesis tests are sequentially applied to determine which variables should be removed from the final model, whereas a full-model approach includes all candidate variables to avoid potential overfitting and selection bias. Previously reported significant predictor variables should typically be included in the final model regardless of their statistical significance but the number of variables included is usually limited by the sample size of the data set.[14] Inappropriate selection of variables is an important and common cause of poor model performance in this situation. As described above, variable selection is less of an issue using machine learning techniques given that they are often not solely based on predefined hypotheses. There are several other important issues related to data management when developing a predictive model, such as dealing with missing data and variable transformation; however, these topics are beyond the scope of this primer and addressed elsewhere.[15, 16, 17]

VALIDATING A PREDICTIVE MODEL

For a prediction model to be valuable, it must not only have predictive ability in the derivation cohort but must also perform well in a validation cohort.[7, 18] A model's performance may differ substantially between derivation and validation cohorts for several reasons including overfitting of the model, missing important predictor variables, interobserver variability of predictors leading to measurement errors, and differences in the patient cohort case mix.[18] Therefore, model performance in the derivation cohort may be overly optimistic and is not a guarantee that the model will perform equally well in new patients. For example, external validation of the HALT-C predictive model for hepatocellular carcinoma was recently demonstrated to have a significantly worse performance in an external validation cohort.[3] Unfortunately, the majority of published prediction research focuses solely on model derivation, and validation studies are scarce.[1, 18] Validation can be performed using internal or external validation. A common approach to internal validation is to split the data set into two portions—a “training set” and “validation set”. If splitting the data set is not possible given the limited available data, measures such as cross validation or bootstrapping can be used for internal validation.[19] Machine learning algorithms, more specifically the random-forest approach, uses an alternative approach called—“in-bag” and “out-of-bag” sampling.[11] In a random-forest approach, the initial cohort is divided into two groups—“in-bag” and “out-of-bag” samples. The in-bag sample is created using random sampling with replacement from the initial cohort, creating a sample equivalent in size to the initial cohort. The out-of-bag sample is composed of the unsampled data from the initial cohort, and typically includes about one-third of the initial cohort. The “out-of-bag” cohort can serve as an internal validation cohort for the model derived using the “in-bag” sample. However, internal validation nearly always yields optimistic results given that the derivation and validation data sets are very similar (as they are from the same cohort). Although external validation is more difficult as it requires data collected from similar patients in a different setting or a different center, it is always preferred to internal validation.[1, 18] When a validation study shows disappointing results, researchers are often tempted to reject the initial model and to develop a new predictive model using the validation cohort data. For example, there are over 60 published predictive models for breast cancer. This approach neglects the information captured from prior studies and predictive models. There are several methods to update prior predictive models with data from the patients of the validation cohort, but these are unfortunately rarely utilized.[1]

ASSESSING THE PERFORMANCE OF A PREDICTIVE MODEL

When assessing model performance, it is important to remember that explanatory models are judged based on strength of associations, whereas predictive models are judged solely based on their ability to make accurate predictions. The performance of a predictive model is assessed using several complementary tests, which assess overall performance, calibration, discrimination, and reclassification (Table 1).[20] Performance characteristics should be determined and reported for both the derivation and validation data sets.

Table 1

Performance characteristics for a predictive model (measures of predicitve error)

Aspect	Measure	Outcome measure	Description
Overall performance	R²	Continuous	Average squared difference between predicted and observed outcome
	Adjusted R²	Continuous	Same as R², but penalizes for the number of predictors
	Brier score	Categorical	Average square distances from the predicted and the observed outcomes
Discrimination	ROC curve (c-statistic)	Continuous or categorical	Overall measure of how effectively the model differentiates between events and non-events
	C-index	Cox-model
Calibration	Hosmer–Lemeshow test	Categorical	Agreement between predicted and observed risks
Reclassification	Reclassification table	Categoricala	Number of individuals that move from one category to another by improving the prediction model
	NRI		A quantitative assessment of the improvement in classification by improving the prediction model
	IDI		Similar to NRI but using all possible cutoffs to categorize events and non-events

IDI, Integrated discrimination index; NRI, net reclassification index.

Can be performed for continuous data as well if a risk cutoff is assigned.

The overall model performance can be measured using R2, which characterizes the degree of variation in risk explained by the model.[21] The adjusted R2 has been proposed as a better measure, as it accounts for the number of predictors and helps to prevent overfitting. Brier scores are a similar measure of performance, which are used when the outcome of interest is categorical instead of continuous.[22] Calibration is the difference between observed and predicted event rates for groups of patients and is assessed using the Hosmer–Lemeshow test.[23] Discrimination is the ability of a model to distinguish between patients who do and do not experience the outcome of interest, and it is most commonly assessed using receiver operating characteristic (ROC) curves.[24] However, ROC analysis alone is relatively insensitive for assessing differences between good predictive models;[25] therefore, several relatively novel performance measures have been proposed. The net reclassification improvement and integrated discrimination improvement are measures used to assess changes in predicted outcome classification between two models.[20, 26] Although it is common for prediction research studies to report results from ROC analysis, the other measures of model performance, calibration, and reclassification are seldom reported.[7, 20]

STUDYING THE CLINICAL IMPACT OF A PREDICTIVE MODEL

The performance of a predictive model may suffer when applied in clinical practice compared with testing in derivation or validation data sets owing to differences in the patient population and case mix.[27] The distribution of predictive factors and outcomes are often different when broadly applied to general populations, rather than the carefully selected populations in which the model was derived and validated. Furthermore, high model performance does not necessarily guarantee provider acceptance and uptake in clinical practice.[1] For example, providers may not use a predictive model because they feel that the application of the model is not sufficiently user-friendly or that the model itself does not have sufficient face validity. Predictive models are developed with the goal of providing estimates of outcome probabilities to complement provider clinical intuition. They should ideally recommend decisions instead of simply providing risk estimates for an outcome. Predictive models that estimate risk without recommending particular decisions are less likely to change provider behavior and outcomes than those that translate risk into a decision recommendation.[27] With the growing implementation of electronic health records, predictive models can serve as the basis for electronic decision support tools with real-time risk assessments. Implementation of the predictive algorithm could be used to identify high-risk individual cases and transmit annotated data back to the provider, facilitating changes to their clinical assessment. If properly validated in several different populations, predictive algorithms could also form the basis for publicly available online risk calculators. Electronic predictive models are particularly attractive, as they can optimize user-friendliness and may be introduced quickly and cheaply, after implementation of an electronic health record system. Impact studies serve to study the effect of predictive models on provider behavior and patient outcomes.[28] This is often done using a design that compares outcomes between providers provided with output from the predictive model to a control group without the predictive model. Although this is best done using a site-randomized controlled trial approach, this may also be assessed using a pre-post study design. A potential intermediate step using decision modeling techniques or Markov modeling can be used to estimate the potential consequences and benefits of using a predictive model. If this analysis does not reveal improved patient outcomes, this would obviate the need for formal impact studies.

EXAMPLE OF PREDICTIVE MODELING

An example of the analytic tools used in predictive modeling can be found in a recent publication examining the performance characteristics of predictive models for development of hepatocellular carcinoma among patients with cirrhosis.[3] In this study, the performance of a traditional regression model is compared with that of machine learning algorithms. This study highlights a couple of important concepts. First, external validation is crucial. Internal validation overestimated the performance of the models, and each has substantially worse performance when externally validated. Second, it is important to use a wide range of complementary methods to assess predictive model performance, not just ROC curve analysis. The machine learning algorithm and traditional regression analysis models had similar c-statistics using ROC curve analysis, but the machine learning algorithm, using random forest, outperformed the traditional regression model when using net reclassification improvement, integrated discrimination improvement, and misclassification tables.

CONCLUSIONS

Although predictive models cannot replace clinical judgment, they can provide objective estimates about the future course of an illness and serve as important adjuncts in clinical practice. For example, predictive models have been used to risk stratify patients with regard to readmission risk, allowing for early interventions to reduce readmissions. Although low-risk patients could be considered for early discharge, high-risk patients might be triaged to specialized hospital services, intensive outpatient case management, and earlier clinic visits post discharge. Such applications may be particularly important to maximize cost-effectiveness under the Accountable Care Organization model.[29] However, predictive models must be properly developed and also validated in a separate cohort using modern assessment of their performance. Finally, the clinical impact of these predictive models must be prospectively assessed once implemented in clinical practice.

TAKE HOME POINTS

Prediction research may serve as an important adjunct to clinical practice. Prediction research involves developing a predictive model, independently validating its performance, and prospectively studying its clinical impact.

26 in total

1. Missing data in clinical studies: issues and methods.

Authors: Joseph G Ibrahim; Haitao Chu; Ming-Hui Chen
Journal: J Clin Oncol Date: 2012-05-29 Impact factor: 44.544

2. Translating clinical research into clinical practice: impact of using prediction rules to make decisions.

Authors: Brendan M Reilly; Arthur T Evans
Journal: Ann Intern Med Date: 2006-02-07 Impact factor: 25.391

Review 3. Validation, updating and impact of clinical prediction rules: a review.

Authors: D B Toll; K J M Janssen; Y Vergouwe; K G M Moons
Journal: J Clin Epidemiol Date: 2008-11 Impact factor: 6.437

Review 4. A comparison of goodness-of-fit tests for the logistic regression model.

Authors: D W Hosmer; T Hosmer; S Le Cessie; S Lemeshow
Journal: Stat Med Date: 1997-05-15 Impact factor: 2.373

5. Use and misuse of the receiver operating characteristic curve in risk prediction.

Authors: Nancy R Cook
Journal: Circulation Date: 2007-02-20 Impact factor: 29.690

6. Assessing the performance of prediction models: a framework for traditional and novel measures.

Authors: Ewout W Steyerberg; Andrew J Vickers; Nancy R Cook; Thomas Gerds; Mithat Gonen; Nancy Obuchowski; Michael J Pencina; Michael W Kattan
Journal: Epidemiology Date: 2010-01 Impact factor: 4.822

7. Failure rates in the hepatocellular carcinoma surveillance process.

Authors: Amit G Singal; Adam C Yopp; Samir Gupta; Celette Sugg Skinner; Ethan A Halm; Eucharia Okolo; Mahendra Nehra; William M Lee; Jorge A Marrero; Jasmin A Tiro
Journal: Cancer Prev Res (Phila) Date: 2012-07-30

8. Algorithms outperform metabolite tests in predicting response of patients with inflammatory bowel disease to thiopurines.

Authors: Akbar K Waljee; Joel C Joyce; Sijian Wang; Aditi Saxena; Margaret Hart; Ji Zhu; Peter D R Higgins
Journal: Clin Gastroenterol Hepatol Date: 2009-10-14 Impact factor: 11.382

9. An automated model using electronic medical record data identifies patients with cirrhosis at high risk for readmission.

Authors: Amit G Singal; Robert S Rahimi; Christopher Clark; Ying Ma; Jennifer A Cuthbert; Don C Rockey; Ruben Amarasingham
Journal: Clin Gastroenterol Hepatol Date: 2013-04-13 Impact factor: 11.382

10. Machine learning algorithms outperform conventional regression models in predicting development of hepatocellular carcinoma.

Authors: Amit G Singal; Ashin Mukherjee; B Joseph Elmunzer; Peter D R Higgins; Anna S Lok; Ji Zhu; Jorge A Marrero; Akbar K Waljee
Journal: Am J Gastroenterol Date: 2013-10-29 Impact factor: 10.864

31 in total

Review 1. Systems biology: perspectives on multiscale modeling in research on endocrine-related cancers.

Authors: Robert Clarke; John J Tyson; Ming Tan; William T Baumann; Lu Jin; Jianhua Xuan; Yue Wang
Journal: Endocr Relat Cancer Date: 2019-06 Impact factor: 5.678

2. Predicting Inpatient Length of Stay After Brain Tumor Surgery: Developing Machine Learning Ensembles to Improve Predictive Performance.

Authors: Whitney E Muhlestein; Dallin S Akagi; Jason M Davies; Lola B Chambless
Journal: Neurosurgery Date: 2019-09-01 Impact factor: 4.654

Review 3. The use of prognostic factors in inflammatory bowel diseases.

Authors: Thomas Billiet; Marc Ferrante; Gert Van Assche
Journal: Curr Gastroenterol Rep Date: 2014-11

4. Development and Internal Validation of a Model for Early Detection of Hepatocellular Carcinoma in Patients With Cirrhosis.

Authors: Jaimin Patel; Adam Yopp; Akbar K Waljee; Amit G Singal
Journal: J Clin Gastroenterol Date: 2016-02 Impact factor: 3.062

5. Minimum Data Set Changes in Health, End-Stage Disease and Symptoms and Signs Scale: A Revised Measure to Predict Mortality in Nursing Home Residents.

Authors: Jessica A Ogarek; Ellen M McCreedy; Kali S Thomas; Joan M Teno; Pedro L Gozalo
Journal: J Am Geriatr Soc Date: 2018-03-02 Impact factor: 5.562

6. Research Pearls: The Significance of Statistics and Perils of Pooling. Part 2: Predictive Modeling.

Authors: Erik Hohmann; Merrick J Wetzler; Ralph B D'Agostino
Journal: Arthroscopy Date: 2017-04-28 Impact factor: 4.772

7. Comparison Between Statistical Model and Machine Learning Methods for Predicting the Risk of Renal Function Decline Using Routine Clinical Data in Health Screening.

Authors: Xia Cao; Yanhui Lin; Binfang Yang; Ying Li; Jiansong Zhou
Journal: Risk Manag Healthc Policy Date: 2022-04-26

Review 8. Semi-automatic Methods for Airway and Adjacent Vessel Measurement in Bronchiectasis Patterns in Lung HRCT Images of Cystic Fibrosis Patients.

Authors: Zeinab Naseri; Soghra Sherafat; Hamid Abrishami Moghaddam; Mohammadreza Modaresi; Neda Pak; Fatemeh Zamani
Journal: J Digit Imaging Date: 2018-10 Impact factor: 4.056

9. Electronic Health Data Predict Outcomes After Aneurysmal Subarachnoid Hemorrhage.

Authors: Sahar F Zafar; Eva N Postma; Siddharth Biswal; Lucas Fleuren; Emily J Boyle; Sophia Bechek; Kathryn O'Connor; Apeksha Shenoy; Durga Jonnalagadda; Jennifer Kim; Mouhsin S Shafi; Aman B Patel; Eric S Rosenthal; M Brandon Westover
Journal: Neurocrit Care Date: 2018-04 Impact factor: 3.210

10. Machine Learning Algorithms for Objective Remission and Clinical Outcomes with Thiopurines.

Authors: Akbar K Waljee; Kay Sauder; Anand Patel; Sandeep Segar; Boang Liu; Yiwei Zhang; Ji Zhu; Ryan W Stidham; Ulysses Balis; Peter D R Higgins
Journal: J Crohns Colitis Date: 2017-07-01 Impact factor: 9.071