Literature DB >> 31373357

Predictive analytics in health care: how can we know it works?

Ben Van Calster^1,2, Laure Wynants¹, Dirk Timmerman^1,3, Ewout W Steyerberg², Gary S Collins^4,5.

Abstract

There is increasing awareness that the methodology and findings of research should be transparent. This includes studies using artificial intelligence to develop predictive algorithms that make individualized diagnostic or prognostic risk predictions. We argue that it is paramount to make the algorithm behind any prediction publicly available. This allows independent external validation, assessment of performance heterogeneity across settings and over time, and algorithm refinement or updating. Online calculators and apps may aid uptake if accompanied with sufficient information. For algorithms based on "black box" machine learning methods, software for algorithm implementation is a must. Hiding algorithms for commercial exploitation is unethical, because there is no possibility to assess whether algorithms work as advertised or to monitor when and how algorithms are updated. Journals and funders should demand maximal transparency for publications on predictive algorithms, and clinical guidelines should only recommend publicly available algorithms.

Entities: Chemical Disease Gene Species

Keywords: artificial intelligence; external validation; machine learning; model performance; predictive analytics

Mesh：

Year: 2019 PMID： 31373357 PMCID： PMC6857503 DOI： 10.1093/jamia/ocz130

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

The current interest in predictive analytics for improving health care is reflected by a surge in long-term investment in developing new technologies using artificial intelligence and machine learning to forecast future events (possibly in real time) to improve the health of individuals. Predictive algorithms or clinical prediction models, as they have historically been called, help identify individuals at increased likelihood of disease for diagnosis and prognosis (see Supplementary Material Table S1 for a glossary of terms used in this manuscript). In an era of personalized medicine, predictive algorithms are used to make clinical management decisions based on individual patient characteristics (rather than on population averages) and to counsel patients. The rate at which new algorithms are published shows no sign of abating, particularly with the increasing availability of Big Data, medical imaging, routinely collected electronic health records, and national registry data. The scientific community is making efforts to improve data sharing, increase study registration beyond clinical trials, and make reporting transparent and comprehensive with full disclosure of study results., We discuss the importance of transparency in the context of medical predictive analytics.

ALGORITHM PERFORMANCE IS NOT GUARANTEED: FULLY INDEPENDENT EXTERNAL VALIDATION IS KEY

Before recommending a predictive algorithm for clinical practice, it is important to know whether and for whom it works well. First, predictions should discriminate between individuals with and without the disease (ie, higher predictions in those with the disease compared to those without the disease). Risk predictions should be also accurate (often referred to as calibrated). Algorithm development may suffer from overfitting, which usually results in poorer discrimination and calibration when evaluated on new data. Although the clinical literature tends to focus on discrimination, calibration is clearly crucial. Inaccurate risk predictions can lead to inappropriate decisions or expectations, even when discrimination is good. Calibration has therefore been labeled the Achilles heel of prediction. In addition, there is often substantial heterogeneity between populations, as well as changes in populations over time., For example, there may be differences between patients in academic hospitals compared with patients at regional hospitals, ethnicities, or past versus contemporary patients due to advances in patient care. Recent work indicated that the half-life of clinical data relevance can be remarkably short., Hence, algorithms are likely to perform differently across centers, settings, and time. On top of overfitting and heterogeneity between populations, operational heterogeneity can affect algorithm performance. Different hospitals may, for example, use different EHR software, imaging machines, or marker kits.,, As a result, the clinical utility of predictive algorithms for decision-making may vary greatly. It is well established that “internal validation” of performance using, for example, a train–test split of available data is insufficient. Rather, algorithms should undergo “external validation” on a different data set., Notably, algorithms developed using traditional study designs may not validate well when applied on electronic health record data., It is important to stress 3 issues. First, external validation should be extensive: it should take place at various sites in contemporary cohorts of patients from the targeted population. Second, performance should be monitored over time. Third, external validation by independent investigators is imperative. It is a good evolution to include an external validation as part of the algorithm development study, but one can imagine that algorithms with poor performance on a different data set may be less likely to get published in the first place. If performance in a specific setting is poor, an algorithm can be updated—specifically, its calibration., To counter temporal changes in populations, continual updating strategies may help. For example, QRISK2 models (www.qrisk.org) are updated regularly as new data are continually being collected.

POTENTIAL HURDLES FOR MAKING PREDICTIVE ALGORITHMS PUBLICLY AVAILABLE

To allow others to independently evaluate the predictive accuracy, it is important to describe in full detail how the algorithm was developed. Algorithms should be available in a format that can readily be implemented by others. Not adhering to these principles severely limits the usefulness of the findings—surely a research waste. An analogous situation would be an article describing the findings from a randomized clinical trial without actually reporting the intervention effect or how to implement the intervention.

Transparent and full reporting

The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, a reporting guideline for studies on predictive algorithms, recommends that the equation behind an algorithm is presented in the publication describing its development. More explicitly, the mathematical formula of an algorithm should be available in full. This includes details such as which predictors are included, how they are coded (including ranges of any continuous predictors, units of measurement), and the values of the regression coefficients. Publications presenting new algorithms often fail to include key information such as specification of the baseline risk (namely, the intercept in logistic regression models for binary outcomes; the baseline hazard at 1 or more clinically relevant time points for time-to-event regression models). Without this information, making predictions is not possible. Below, we expand on modern artificial intelligence methods that do not produce straightforward mathematical equations.

Online calculators and mobile apps

It has become customary to implement algorithms as online calculators or mobile apps. Then, we depend on the researchers’ openness to provide clear and honest information about algorithm development and results of validation studies, with references to relevant publications. For example, FRAX predicts the 10-year probability of hip fracture and major osteoporotic fracture (www.sheffield.ac.uk/FRAX/). FRAX is a collection of algorithms (eg, 68 country-specific equations), which are both freely available via a website interface or commercially available via a desktop application. However, none of these algorithms has been published in full. The release notes indicate that the algorithms are continually revised, but do not offer detailed information. This lack of full disclosure prohibits independent evaluation. In theory, we can try “reverse engineering” by reconstructing the equation based on risk estimates for a sample of patients (see Supplementary Material). However, such reverse engineering is not a realistic solution. The solution is to avoid hidden algorithms. Online or mobile calculators allow the inclusion of algorithms into daily clinical routine, which is a positive evolution. However, it is impractical for large-scale independent validation studies, because information for every single patient has to be entered manually.

Machine learning algorithms

Machine learning methods, such as random forests or deep learning, are becoming increasingly popular to develop predictive algorithms., The architecture of these algorithms is often too complex to fully disentangle and report the relation between a set of predictors and the outcome (“black box”). This is the commonly addressed problem when discussing transparency of predictive analytics based on machine learning. We argue that algorithm availability is at least as important. A similar problem can affect regression-based algorithms that use complex spline functions to model continuous predictors. Software implementations are therefore imperative for validation purposes, in particular, because these algorithms have a higher risk of overfitting and instable performance., Machine learning algorithms can be stored in computer files that may be transferred to other computers to allow validation studies. Recently, initiatives in this direction are being set up.,

Proprietary algorithms

Developers may choose not to disclose an algorithm, and to offer the algorithm on a fee-for-service basis. For example, a biomarker-based algorithm to diagnose ovarian cancer has a cost of $897 per patient (http://vermillion.com/2436-2/). Assume we want to validate this algorithm in a center that has 20% malignancies in the target population. If we want to recruit at least 100 patients in each outcome group, following current recommendations for validation studies, the study needs at least 500 patients. This implies a minimum cost of $448 500 in order to obtain useful information about whether this algorithm works in this particular center. It is important to emphasize this is just the cost required to judge whether the algorithm has any validity in this setting; there is no guarantee that it will be clinically useful. Many predictive algorithms have been developed using financial support from public institutions. Then we believe that the results belong to the community and should be fully and publicly available. If this is the case, asking a small installation fee for an attractive and user-friendly calculator is defendable to cover software development and generate resources for maintenance and improvements. Such implementations facilitate uptake and inclusion into daily workflow. Private companies may invest in the development of an algorithm that uses predictors for which the company offers measurement tools (eg, kits, biomarkers). In these instances, the return on investment should focus on the measurement tools, not on selling the algorithm. We argue that it is ethically unacceptable to have a business model that focuses on selling an algorithm. However, such business models may facilitate Food and Drug Administration (FDA) approval or Conformité Européenne (CE) marking of predictive algorithms (eg, https://www.hcanews.com/news/predictive-patient-surveillance-system-receives-fda-clearance). It is important to realize that regulatory approval does not imply clinical validity or usefulness of a predictive algorithm in a specific clinical setting.

THE IMPORTANCE OF ALGORITHM METADATA IN ORDER TO MAKE ALGORITHMS WORK

Although making algorithms fully and publicly available is imperative, the context of the algorithm is equally important. This extends the abovementioned issue of full and transparent reporting according to the TRIPOD guidelines. Reporting should provide full details of algorithm development practices. This includes—but is not limited to—the source of study data (e.g., retrospective EHR, randomized controlled trial data, or prospectively collected cohort data), the number and type of participating centers, the patient recruitment period, inclusion and exclusion criteria, clear definitions of predictors and the outcome, details on how variables were measured, detailed information on missing values and how these were handled, and a full account of the modeling strategy (eg, predictor selection, handling of continuous variables, hyperparameter tuning). Unfortunately, studies reveal time and again that such metadata are poorly reported., Even when authors develop an algorithm using sensible procedures (eg ,with low risk of overfitting), poor reporting will lead to poor understanding of the context, which may contribute to decreased performance on external validation. Initiatives such as the Observational Health Data Sciences and Informatics (OHDSI; http://ohdsi.org) focus on such contextual differences and aim to standardize procedures (eg, in terms of terminology, data formats, and definitions of variables) in order to lead to better and more applicable predictive algorithms., In addition, when an algorithm is made available electronically, we recommend it include an indication of the extent to which the algorithm has been validated.

CONCLUSION

Predictive algorithms should be fully and publicly available to facilitate independent external validation across various settings (Table 1). For complex algorithms, alternative and innovative solutions are needed; a calculator is a minimal requirement, but downloadable software to batch process multiple records is more efficient. We believe that selling predictions from an undisclosed algorithm is unethical. This article does not touch on legal consequences of using predictive algorithms, where issues such as algorithm availability or black-box predictions cannot be easily ignored. When journals consider manuscripts introducing a predictive algorithm, its availability should be a minimum requirement before acceptance. Clinical guideline documents should focus on publicly available algorithms that have been independently validated.

Table 1.

Summary of arguments in favor of making predictive algorithms fully available, hurdles for doing so, and reasons why developers choose to hide and sell algorithms

Why should predictive algorithms be fully and publicly available?	Facilitate external validation and assessment of heterogeneity in performance Facilitate uptake of algorithm by researchers and clinicians, avoid research waste Facilitate updating for specific settings For publicly funded research, this makes research results available to the community
Recommendations to maximize algorithm availability	Report the full equation of a predictive algorithm, where possible (eg, regression-based models); this includes reporting of the intercept, or baseline hazard information for time-to-event regression models When making an algorithm available online or via a mobile app, provide relevant and complete background information For complex algorithms (eg, black-box machine learning), provide software to facilitate implementation and large-scale validation studies
Potential reasons why developers might choose to hide and sell algorithms	Generate income for further research More control over how people use an algorithm Facilitate FDA approval or CE certification, because a commercial entity can be identified To install a profitable business model

Summary of arguments in favor of making predictive algorithms fully available, hurdles for doing so, and reasons why developers choose to hide and sell algorithms Facilitate external validation and assessment of heterogeneity in performance Facilitate uptake of algorithm by researchers and clinicians, avoid research waste Facilitate updating for specific settings For publicly funded research, this makes research results available to the community Report the full equation of a predictive algorithm, where possible (eg, regression-based models); this includes reporting of the intercept, or baseline hazard information for time-to-event regression models When making an algorithm available online or via a mobile app, provide relevant and complete background information For complex algorithms (eg, black-box machine learning), provide software to facilitate implementation and large-scale validation studies Generate income for further research More control over how people use an algorithm Facilitate FDA approval or CE certification, because a commercial entity can be identified To install a profitable business model

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

FUNDING

This work was funded by Research Foundation – Flanders (grant G0B4716N), Internal Funds KU Leuven (grant C24/15/037). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

CONTRIBUTIONS

Conception: BVC, LW, DT, EWS, GSC. Writing—original draft preparation: BVC. Writing—review and editing: BVC, LW, DT, EWS, GSC. All authors approved the submitted version and agreed to be accountable.

Conflict of interest statement

LW is a postdoctoral fellow of the Research Foundation – Flanders. GSC was supported by the NIHR Biomedical Research Centre, Oxford. Click here for additional data file.

31 in total

1. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.

Authors: Evangelia Christodoulou; Jie Ma; Gary S Collins; Ewout W Steyerberg; Jan Y Verbakel; Ben Van Calster
Journal: J Clin Epidemiol Date: 2019-02-11 Impact factor: 6.437

2. Risk Prediction With Electronic Health Records: The Importance of Model Validation and Clinical Context.

Authors: Benjamin A Goldstein; Ann Marie Navar; Michael J Pencina
Journal: JAMA Cardiol Date: 2016-12-01 Impact factor: 14.676

3. Big Data and Machine Learning in Health Care.

Authors: Andrew L Beam; Isaac S Kohane
Journal: JAMA Date: 2018-04-03 Impact factor: 56.272

4. Decaying relevance of clinical data towards future decisions in data-driven inpatient clinical order sets.

Authors: Jonathan H Chen; Muthuraman Alagappan; Mary K Goldstein; Steven M Asch; Russ B Altman
Journal: Int J Med Inform Date: 2017-03-18 Impact factor: 4.046

Review 5. Legal liability and the uncertain nature of risk prediction: the case of breast cancer risk prediction models.

Authors: L Black; B M Knoppers; D Avard; J Simard
Journal: Public Health Genomics Date: 2012-09-12 Impact factor: 2.000

Review 6. Prognostic models in obstetrics: available, but far from applicable.

Authors: C Emily Kleinrouweler; Fiona M Cheong-See; Gary S Collins; Anneke Kwee; Shakila Thangaratinam; Khalid S Khan; Ben Willem J Mol; Eva Pajkrt; Karel G M Moons; Ewoud Schuit
Journal: Am J Obstet Gynecol Date: 2015-06-10 Impact factor: 8.661

7. Reducing waste from incomplete or unusable reports of biomedical research.

Authors: Paul Glasziou; Douglas G Altman; Patrick Bossuyt; Isabelle Boutron; Mike Clarke; Steven Julious; Susan Michie; David Moher; Elizabeth Wager
Journal: Lancet Date: 2014-01-08 Impact factor: 79.321

8. Next-generation phenotyping of electronic health records.

Authors: George Hripcsak; David J Albers
Journal: J Am Med Inform Assoc Date: 2012-09-06 Impact factor: 4.497

9. Strategies to diagnose ovarian cancer: new evidence from phase 3 of the multicentre international IOTA study.

Authors: A Testa; J Kaijser; L Wynants; D Fischerova; C Van Holsbeke; D Franchi; L Savelli; E Epstein; A Czekierdowski; S Guerriero; R Fruscio; F P G Leone; I Vergote; T Bourne; L Valentin; B Van Calster; D Timmerman
Journal: Br J Cancer Date: 2014-06-17 Impact factor: 7.640

10. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges.

Authors: Richard D Riley; Joie Ensor; Kym I E Snell; Thomas P A Debray; Doug G Altman; Karel G M Moons; Gary S Collins
Journal: BMJ Date: 2016-06-22

23 in total

1. Implementing High-Quality Primary Care Through a Health Equity Lens.

Authors: Azza Eissa; Robyn Rowe; Andrew Pinto; George N Okoli; Kendall M Campbell; Judy C Washington; José E Rodríguez
Journal: Ann Fam Med Date: 2022-02-14 Impact factor: 5.166

2. Evaluation of Electronic Health Record-Based Suicide Risk Prediction Models on Contemporary Data.

Authors: Rod L Walker; Susan M Shortreed; Rebecca A Ziebell; Eric Johnson; Jennifer M Boggs; Frances L Lynch; Yihe G Daida; Brian K Ahmedani; Rebecca Rossom; Karen J Coleman; Gregory E Simon
Journal: Appl Clin Inform Date: 2021-08-18 Impact factor: 2.762

3. Just How Confident Can We Be in Predicting Sports Injuries? A Systematic Review of the Methodological Conduct and Performance of Existing Musculoskeletal Injury Prediction Models in Sport.

Authors: Garrett S Bullock; Joseph Mylott; Tom Hughes; Kristen F Nicholson; Richard D Riley; Gary S Collins
Journal: Sports Med Date: 2022-06-11 Impact factor: 11.928

4.

Authors: Laura Gosselin; Maxime Thibault; Denis Lebel; Jean-François Bussières
Journal: Can J Hosp Pharm Date: 2021-04-01

5. Rethinking PICO in the Machine Learning Era: ML-PICO.

Authors: Xinran Liu; James Anstey; Ron Li; Chethan Sarabu; Reiri Sono; Atul J Butte
Journal: Appl Clin Inform Date: 2021-05-19 Impact factor: 2.342

6. When predictions are used to allocate scarce health care resources: three considerations for models in the era of Covid-19.

Authors: David M Kent; Jessica K Paulus; Richard R Sharp; Negin Hajizadeh
Journal: Diagn Progn Res Date: 2020-05-20

Review 7. Review of Clinical Research Informatics.

Authors: Anthony Solomonides
Journal: Yearb Med Inform Date: 2020-08-21

8. A systematic review of machine learning models for predicting outcomes of stroke with structured data.

Authors: Wenjuan Wang; Martin Kiik; Niels Peek; Vasa Curcin; Iain J Marshall; Anthony G Rudd; Yanzhong Wang; Abdel Douiri; Charles D Wolfe; Benjamin Bray
Journal: PLoS One Date: 2020-06-12 Impact factor: 3.240

9. Feasibility and evaluation of a large-scale external validation approach for patient-level prediction in an international data network: validation of models predicting stroke in female patients newly diagnosed with atrial fibrillation.

Authors: Jenna M Reps; Ross D Williams; Seng Chan You; Thomas Falconer; Evan Minty; Alison Callahan; Patrick B Ryan; Rae Woong Park; Hong-Seok Lim; Peter Rijnbeek
Journal: BMC Med Res Methodol Date: 2020-05-06 Impact factor: 4.615

10. Presenting machine learning model information to clinical end users with model facts labels.

Authors: Mark P Sendak; Michael Gao; Nathan Brajer; Suresh Balu
Journal: NPJ Digit Med Date: 2020-03-23