| Literature DB >> 35475536 |
Tom M Seinen1, Egill A Fridgeirsson1, Solomon Ioannou1, Daniel Jeannetot1, Luis H John1, Jan A Kors1, Aniek F Markus1, Victor Pera1, Alexandros Rekkas1, Ross D Williams1, Cynthia Yang1, Erik M van Mulligen1, Peter R Rijnbeek1.
Abstract
OBJECTIVE: This systematic review aims to assess how information from unstructured text is used to develop and validate clinical prognostic prediction models. We summarize the prediction problems and methodological landscape and determine whether using text data in addition to more commonly used structured data improves the prediction performance.Entities:
Keywords: clinical prediction model; electronic health records; machine learning; natural language processing; prognostic prediction
Mesh:
Year: 2022 PMID: 35475536 PMCID: PMC9196702 DOI: 10.1093/jamia/ocac058
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 7.942
Figure 1.Visualization of the prognostic prediction problem. The objective is to predict which patients from a target population will experience an outcome event within a prediction horizon, using predictors only measured in an observation window before the time of prediction. Predictors can be extracted from both the structured data and text data.
Inclusion criteria
| Criterium | Description | Exclude examples | |
|---|---|---|---|
| 1 | The study described the development and evaluation of a prognostic clinical prediction model | ||
| A | The model predicts a future clinical event or outcome for a patient | Exclude diagnostic, identification, phenotyping, or extraction models | |
| B | The subject must be a patient or a limited group of patients | Exclude if the subject is anything else, such as a drug, bed, or gene | |
| C | A parameterized prediction model must be developed and evaluated | Exclude if a study only reports the odd-ratios of covariates, only runs statistical tests, or does not evaluate the developed model | |
| D | Any clinical domain is relevant, such as intensive, radiology, general practitioner, or psychiatric care | ||
| 2 | The model predictors were based on information extracted from unstructured text in an EHR database | ||
| A | The information is extracted from human-readable text data in an EHR database | Exclude if all text data comes from other sources, such as social media, literature, recordings, transcripts, or genetic data | |
| B | The extracted information is used as covariates in the model | Exclude when the study only uses the information to define the outcome or target patient cohorts | |
| C | The model must at least use information from unstructured text, but a combination with structured data is allowed | ||
| 3 | Information was automatically extracted from the unstructured text in a data-driven manner | ||
| A | Data-driven means that the extraction is exploratory and it should not have been known beforehand what information from the text data was important for model development | Exclude if the extraction was driven by mere intuition, personal experience, or existing knowledge. For example, the extraction of a limited number or specific set of clinical concepts, such as the smoking status or a small set of vital signs | |
| B | The extraction was done automatically | Exclude if the information is manually extracted from the text data |
List of data items for data extraction, by topic
| Item topic | Data item | Input type |
|---|---|---|
| 1. General information | Publication year | Year |
| Journal | Free text | |
| 2. Study setting | Dataset | Free text |
| Country of data | Country | |
| Clinical setting | Free text | |
| Study dates | Range of years | |
| 3. Population | Type of study | Cohort, case-control |
| Target populationAB | Free text | |
| Prediction outcomeA | Free text | |
| Prediction horizonA | Hours, days, years, relative time (free text), and timepoint | |
| Prediction outcome type | Binary, multi-class, and continuous | |
| 4. Unstructured text predictors | Type of unstructured text | Free text |
| Language of text | Language | |
| Observation windowB | Hours, days, years, relative time (free text), and timepoint | |
| Preprocessing methods | Free text | |
| Text representation methods |
| |
| Used ontologies/vocabularies | Free text | |
| Used software/program/package | Free text | |
| Number of predictorsA | Number | |
| 5. Structured data predictors | Types of structured data | Free text |
| Observation windowB | Hours, days, years, relative time (free text), and timepoint | |
| Number of predictorsA | Number | |
| 6. Model | Machine learning methodA |
|
| Feature set | Structured, text, and combined | |
| 7. Internal validation | Number of observationsA | Number |
| Number of observations with the outcome (outcome cases)A | Number | |
| AUC, AUPRC, F1-scoreA | Values | |
| Accuracy, sensitivity (or recall), specificity, and positive predictive value (or precision) reported? A | Yes or No | |
| MSE/MAE reported? A | Yes or No | |
| ROC/PR curves presented? A | Yes or No | |
| Calibration plot or metrics presented? AB | Yes or No | |
| 8. External validation | Type of external validationA | Same or another department, center, or country |
| Same items as internal validation | ||
| 9. Explainability | Global feature importance presented? | Yes or No |
| Single patient (local) feature importance presented? | Yes or No | |
| 10. Final model availability | Is the final model directly available to apply to different data? A | Yes or No |
| Is the study code available to reproduce the methods? | Yes or No |
Notes: Data item sources indicated by A: CHARMS and B: TRIPOD; an asterisk (*) indicates data items added to the review protocol.
Abbreviations: BoW: Bag-of-Words; CE: Concept Extraction; WE: word embedding; DE: document embedding; TM: topic model; SS: summarizing score; LogR: logistic regression; LinR: linear regression; Cox: cox proportional hazards regression; NB: Naïve Bayes; RFTB: Random forests or other tree-based methods; GB: gradient boosting; SVM: support vector machines ; NN: neural networks; RNN: recurrent neural networks; CNN: convolutional neural networks; DNN: deep neural networks; AUC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve; MSE: mean squared error; MAE: mean absolute error; CHARMS: critical appraisal and data extraction for systematic reviews of prediction modeling studies; TRIPOD: transparent reporting of a multivariable prediction model for individual prognosis or diagnosis.
Figure 2.Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram with the search and screening results of the systematic review.
Number of included studies by publication year
| Year | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 (until March) |
|---|---|---|---|---|---|---|---|---|---|---|
| Number of included studies | 4 | 0 | 5 | 4 | 9 | 5 | 19 | 30 | 41 | 9 |
Figure 3.Sankey diagram of the different categories of target populations and clinical outcomes, and clinical outcomes and prediction horizons, ordered by size. The number in parentheses indicates the number of prediction problems with these categories and the width of the connection between 2 categories represents the number of prediction problems with this combination of categories.
Figure 4.(A) Boxplots of the number of observations (left) and outcome cases (right) of 145 prediction problems. (B) Boxplot of the ratios between the number of observations and outcome cases. In both (A) and (B), the mean is indicated by the diamond and the points represent the underlying data.
Figure 5.(A) The use of different text representations (TR) and machine learning (ML) methods in text-based or combined-data prediction models over time. No eligible studies in 2013. (D)NN are all feedforward and deep neural network-based methods. (B) The combinations of text representations (left) and machine-learning methods (right) in text-based or combined-data prediction models. The number in parentheses indicates the number of prediction problems with these categories and the width of the connection between 2 categories represents the number of prediction problems with this combination of categories. Both (A) and (B) share the same legend: the colors of the nodes indicate the types of text representations and machine learning methods.
Figure 6.(A) Area under the receiver operating characteristic curve (AUC) difference distribution boxplots of the combined and structured-data models (ΔAUC Combined−Structured), the text and structured-data models (ΔAUC Text−Structured), and combined and text-based models (ΔAUC Combined−text). (B) Text and structured-data model AUC difference (ΔAUC Text−Structured) boxplots for 4 different clinical settings. In both (A) and (B), the means are indicated by a diamond, the points represent the underlying data, sample sizes are shown on top, and the dotted line indicates the AUC difference of zero. ns: not significant; *P < .05, ****P < .001.