| Literature DB >> 35804834 |
Melissa Estevez1, Corey M Benedum1, Chengsheng Jiang1, Aaron B Cohen1,2, Sharang Phadke1, Somnath Sarkar1, Selen Bozkurt1.
Abstract
A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.Entities:
Keywords: artificial intelligence; deep learning; machine learning; oncology; personalized medicine
Year: 2022 PMID: 35804834 PMCID: PMC9264846 DOI: 10.3390/cancers14133063
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1Evaluation Framework.
Example performance metrics.
| Variable Type | Example ML-Extracted Variable | Example Performance Metric |
|---|---|---|
| Categorical | Diagnosis (yes/no) | Sensitivity |
| Date | Diagnosis date | Sensitivity with a ± n-day window |
| Continuous | Lab value | Sensitivity, PPV, and accuracy for classifying the result as within vs. outside the normal range |
1: The proportion of patients’ human-abstracted as having the diagnosis that is also correctly identified as having the diagnosis by the model and where the ML-extracted diagnosis date is within ±n days of the abstracted diagnosis date or both abstracted and ML-extracted dates are unknown.
Stratified analysis steps and example variables.
| Goal | Example Strata Variables |
|---|---|
| (I.) Understand performance in sub-cohorts of interest | Year of diagnosis (e.g., before vs. after year x) |
| (II.) Fairness | Race and ethnicity group |
| (III.) Risk for statistical bias in analysis | Treatment setting (Academic vs. Community) |
Figure 2Confusion matrix for model errors.
Examples for comparison of errors and their interpretation.
| Comparison | Usefulness/Interpretation |
|---|---|
| True positives vs. False negatives | This comparison informs whether patients incorrectly excluded from the study cohort differ from those correctly included with respect to patient characteristics or outcomes. Model misclassification may be random and excluded patients will most likely have a minimal impact on analysis results. Model misclassification may be systematic and excluded patients may impact analysis results. |
| True positives vs. False positives | This comparison informs whether patients incorrectly included in the study cohort differ from those correctly included with respect to patient characteristics or outcomes. Model misclassification may be random and incorrectly included patients may have a minimal impact on analysis results. Model misclassification may be systematic and incorrectly including patients may impact analysis results. |
Evaluation framework template for the illustrative example.
|
Selecting a cohort of patients who have (or do not have) metastatic disease Using metastatic status as a covariate or stratifying variable in an analysis | ||
|
| ||
|
|
|
|
| Test Set | The size of the test set is selected to achieve a target margin of error for the primary evaluation metric (e.g, sensitivity or PPV) within the minority class (metastatic disease). | Patients selected from the target population which is not included in model development |
| Overall Performance | As the primary use of this variable is to select a cohort of metastatic patients, sensitivity, PPV, specificity, and NPV are measured. | Sensitivity 2 = 0.94 |
| Stratified Performance | Sensitivity and PPV for both Metastatic and Non-metastatic classes are calculated across strata of variables of interest. Stratifying variables are selected with the following goals in mind: Performance in sub-cohorts of interest (e.g., year of diagnosis) Fairness (e.g., race and ethnicity) Risk for statistical bias in analysis (e.g., cancer stage at diagnosis) | Example finding for race and ethnicity: Sensitivity for the “metastatic” class is 5% better for “Black or African American” race group vs. “White”. PPV for the “metastatic” class is 5% lower for “Black or African American” race group vs. “White” |
| Quantitative Error Analysis | To understand the impact of model errors on the selected study cohort, baseline characteristics and rwOS are evaluated for the following groups True positives vs. false negatives True positives vs. false positives | Example findings from rwOS analysis *: rwOS ** for False Positives (21 months) was similar to True Positives (17 months). Compared to true negatives, false positives are less likely to have a history of smoking (86% vs. 91%). |
| Replication of Use Cases | Evaluate rwOS from metastatic diagnosis date for patients selected as metastatic by the ML-extracted variable vs. abstracted counterpart (outcomes in the general population) | rwOS for ML extracted cohort: 9.8 months (95% CI 8.92–10.75) |
1: Model is constructed using snippets of text around key terms related to “metastasis,” and processed by a long short-term memory (LSTM) network to produce a compact vector representation of each sentence. These representations were then processed by additional network layers to produce a final metastatic status prediction [31]. 2: Sensitivity refers to the proportion of patients abstracted as having a value of a variable (e.g., metastasis = true) that are also ML-extracted as having the same value. 3: PPV refers to the proportion of patients ML-extracted as having a value of a variable (e.g., metastasis = true) that is also abstracted as having the same value. 4: Specificity refers to the proportion of patients abstracted as not having a value of a variable (e.g., metastasis = false) that are also ML-extracted as not having the same value. 5: NPV refers to the proportion of patients ML-extracted as not having a value of a variable (e.g., metastasis = false) that are also abstracted as not having the same value. *: rwOS analysis was performed using Kaplan–Meier method [32]. **: The index date selected for rwOS calculation can be changed based on the study goals. However, the index date that is selected should be available for all patients, regardless of the concordance of their abstracted and predicted value. In this illustrative example, we provided the rwOS strictly as an example and do not specify the index date as index date selection will be case-dependent.