| Literature DB >> 33109649 |
Jason Denzil Morgenstern1, Emmalin Buajitti2,3, Meghan O'Neill2, Thomas Piggott1, Vivek Goel2,3, Daniel Fridman4, Kathy Kornas2, Laura C Rosella5,3,6.
Abstract
OBJECTIVE: To determine how machine learning has been applied to prediction applications in population health contexts. Specifically, to describe which outcomes have been studied, the data sources most widely used and whether reporting of machine learning predictive models aligns with established reporting guidelines.Entities:
Keywords: epidemiology; public health; statistics & research methods
Mesh:
Year: 2020 PMID: 33109649 PMCID: PMC7592293 DOI: 10.1136/bmjopen-2020-037860
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Preferred Reporting Items for Systematic Reviews and Meta-Analyses flowchart of article screening process.
Summary statistics of included articles
| Characteristic* | Number of articles† | Percent of articles‡ |
| Region | ||
| The USA | 71 | 30.74% |
| Asia excluding China | 41 | 17.75% |
| China | 40 | 17.32% |
| Europe | 36 | 15.58% |
| Americas excluding the USA | 13 | 5.63% |
| Africa | 5 | 2.16% |
| Oceania | 2 | 0.87% |
| Multi-region | 15 | 6.49% |
| Not reported | 8 | 3.46% |
| Year published | ||
| Before 1990 | 1 | 0.4% |
| 1990–1999 | 3 | 1.3% |
| 2000–2004 | 13 | 5.6% |
| 2005–2009 | 18 | 7.8% |
| 2010–2014 | 70 | 30.3% |
| 2015–2018 | 126 | 54.5% |
| Outcome level§ | ||
| Individual risk prediction | 139 | 60.17% |
| Population risk prediction | 92 | 39.83% |
| Number of observations | Median=5414† | IQR=16 54 |
| Not reported | 72 | 31.2% |
| Number of features | Median=17† | IQR=31‡ |
| Not reported | 59 | 25.5% |
| Used any unstructured text | ||
| Yes | 24 | 10.4% |
| No | 207 | 89.6% |
| Machine learning model was compared with other statistical methods | 111 | 48.1% |
| Reported data preprocessing¶ | ||
| Yes | 160 | 69.3% |
| No | 71 | 30.7% |
| Reported method of feature selection | ||
| Yes | 164 | 71.0% |
| No | 67 | 29.0% |
| Reported hyperparameter search | ||
| Yes | 114 | 49.4% |
| No | 117 | 50.6% |
| Method of validation | ||
| Holdout | 112 | 48.5% |
| Cross-validation or bootstrap | 84 | 36.4% |
| External | 15 | 6.5% |
| Not reported | 32 | 13.9% |
| Reported descriptive statistics** | ||
| Yes | 140 | 60.6% |
| No | 91 | 39.4% |
| Discussed the practical costs of prediction errors†† | ||
| Yes | 36 | 15.6% |
| No | 195 | 84.4% |
| Stated rationale for using machine learning | ||
| Yes | 179 | 77.5% |
| No | 52 | 22.5% |
| Discussed model usability | ||
| Yes | 91 | 39.4% |
| No | 140 | 60.6% |
| Stated model limitations | ||
| Yes | 161 | 69.7% |
| No | 70 | 30.3% |
| Discussed model implementation | ||
| Yes | 184 | 79.7% |
| No | 47 | 20.3% |
| Dataset availability by study‡‡ | ||
| Closed | 149 | 64.5% |
| Public | 42 | 18.2% |
| Closed and public | 38 | 16.5% |
| Unknown | 1 | 0.4% |
*Refer to online supplemental table A for a description of each characteristic and rationales for including some elements.
†In rows where the characteristic being measured is an integer count (eg, number of features), this column refers to the median value.
‡In rows where the characteristic being measured is an integer count (eg, number of features), this column refers to the IQR (quartile 3 – quartile 1).
§Individual risk prediction refers to studies that developed models to predict the health outcomes of individuals, while population risk prediction refers to studies that developed models to predict aggregated population-level health outcomes.
¶Whether any aspects of data cleaning or preprocessing were reported. Examples include how missing data were handled, whether log transformations were done and if derived variables were generated.
**Included a broad array of descriptive statistics such as sample population demographics, feature distributions and outcome distributions.
††Whether the article discussed the relative risks of false negative and false positive results based on their predictive model in contexts where it might be used.
‡‡Closed refers to datasets that were not immediately available in the public domain or were not identifiable as such.
Figure 2Number of articles by outcome.
Data sources
| Sources of data used* | Number | Percent |
| Environmental | 42 | 18.2% |
| Geographical information database | 12 | 5.2% |
| Meteorological/air quality datasets | 32 | 13.9% |
| Satellite imagery | 21 | 9.1% |
| Health records database | 126 | 54.5% |
| Clinical record database† | 46 | 19.9% |
| Disease registry | 2 | 0.9% |
| Population health survey | 15 | 6.5% |
| Reportable disease database | 42 | 18.2% |
| Other health records database | 30 | 13.0% |
| Government database | 32 | 13.9% |
| Census | 11 | 4.8% |
| Vital statistics | 13 | 5.6% |
| Other government database | 14 | 6.1% |
| HealthMap | 3 | 1.3% |
| Private insurance data | 9 | 3.9% |
| Private insurance claims | 9 | 3.9% |
| Private insurance questionnaire | 3 | 1.3% |
| Internet based | 21 | 9.1% |
| Search engine | 12 | 5.2% |
| Social media | 12 | 5.2% |
| Investigator generated‡ | 86 | 37.2% |
| Public repositories§ | 19 | 8.2% |
| Health organisation reports¶ | 5 | 2.2% |
| Not reported | 6 | 2.6% |
*Categories are not mutually exclusive.
†Any dataset produced primarily for the purpose of delivering clinical care, such as electronic medical records and administrative healthcare databases produced by hospitals.
‡Any datasets resulting from researcher-driven studies, such as randomised controlled trials, cohort studies and case–control studies.
§Any freely available datasets such as Medical Information Mart for Intensive Care or the University of California, Irvine Machine Learning Repository.
¶Health-related reports, typically, including disease burden estimates, produced by non-governmental or governmental organisations, such as the WHO.
Prediction performance metrics
| Prediction performance metrics used | Number | Percent |
| Any overall performance metric | 77 | 33.33% |
| RMSE | 35 | 15.15% |
| MSE | 26 | 11.26% |
| MAE | 24 | 10.39% |
| MAPE | 23 | 9.96% |
| R2* | 19 | 8.23% |
| Correlation | 8 | 3.46% |
| AIC or BIC | 8 | 3.46% |
| Other performance metric† | 21 | 9.09% |
| Any discrimination metric | 172 | 74.46% |
| Area under the curve‡ | 98 | 42.42% |
| Accuracy§ | 76 | 32.90% |
| Recall¶ | 68 | 29.44% |
| Precision** | 39 | 16.88% |
| F statistics | 10 | 4.33% |
| Likelihood ratio†† | 4 | 1.73% |
| Youden Index | 3 | 1.30% |
| Manual or visual comparison | 3 | 1.30% |
| Other discrimination metric‡‡ | 4 | 1.73% |
| Any calibration metric | 21 | 9.09% |
| Manual or visual comparison§§ | 9 | 3.90% |
| Hosmer-Lemeshow | 8 | 3.46% |
| Observed/xpected | 5 | 2.16% |
| Other calibration metric¶¶ | 3 | 1.30% |
| Any reclassification metric | 6 | 2.60% |
| Net Reclassification Index | 5 | 2.16% |
| Integrated discrimination improvement | 3 | 1.30% |
*Includes R2 and pseudo-R2 metrics.
†Includes penalty error, total sum of squares, proportional reduction in error, overall prediction error, specific prediction error, Nash-Sutcliffe, root mean squared percentage Error (2), mean relative absolute error, analysis of variance F-stat, 2LogLikelihood, relative efficiency, deviance, Ljung-Box test, mean absolute deviation, SE, mean percentage error, Brier score and log score.
‡Includes c-statistic, s-index and area under the receiver operator curve.
§Includes accuracy, misclassification and error rate.
¶Includes sensitivity, specificity, true/false positive and true/false negative.
**Includes positive predictive value, negative predictive value and precision.
††Includes positive/negative likelihood ratios.
‡‡Includes G-means (2), k-statistic and Matthews correlation coefficient.
§§Includes calibration plots.
¶¶Includes mean bias (from Bland-Altman plot), calibration factoring and calibration statistic.
AIC, Akaike Information Criterion; BIC, Bayesian Information Criterion; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE, mean squared error; RMSE, root mean squared error.