| Literature DB >> 32228556 |
Sunyang Fu1, Lester Y Leung2, Anne-Olivia Raulli2, David F Kallmes3, Kristin A Kinsman3, Kristoff B Nelson2, Michael S Clark3, Patrick H Luetmer3, Paul R Kingsbury1, David M Kent4, Hongfang Liu5.
Abstract
BACKGROUND: The rapid adoption of electronic health records (EHRs) holds great promise for advancing medicine through practice-based knowledge discovery. However, the validity of EHR-based clinical research is questionable due to poor research reproducibility caused by the heterogeneity and complexity of healthcare institutions and EHR systems, the cross-disciplinary nature of the research team, and the lack of standard processes and best practices for conducting EHR-based clinical research.Entities:
Keywords: Clinical research informatics; Data quality; Electronic health records; Learning health system; Multi-site studies; Reproducibility
Mesh:
Year: 2020 PMID: 32228556 PMCID: PMC7106829 DOI: 10.1186/s12911-020-1072-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Data Abstraction Framework for EHR-based Clinical Research
Variation Assessment Table for Data Abstraction
| Institutional variation | Variation in practice patterns, outcomes, and patient sociodemographic characteristics | Inconsistent phenotype definition; unbalanced concept distribution | • Compare clinical guideline, protocol, and definition • Calculate the number of eligible patients divided by screening population • Calculate the ratio of the proportion of the persons with the disease over the proportion with the exposure |
| EHR system variation | Variation in data type and format caused by different EHR system infrastructure | Inconsistent data type; different data collection processes | • Compare data type, document structure, and metadata • Conduct a semi-structured interview to obtain information about the context of use |
| Documentation variation | Variation in reporting schemes during the processes of generating clinical narratives | Noisy data | • Compare the cosine similarity between two documents represented by vectors • Conduct a sub-language analysis to assess syntactic variation |
| Process variation | Variation in data collection and corpus annotation process | Poor data reliability, validity, and reproducibility | • Calculate the degree of agreement among abstractors • Conduct a semi-structured interview to obtain information about the context of use |
Fig. 2Example of neuroimaging report annotation (left) and neuroimage interpretation (right) for SBI (yellow) and WMD (blue)
The prevalence of SBI and WMD for Mayo and TMC patients at age of 50, 60, 70 and 80
| > = 50 | 12.5 | 7.4 | 11.3 | 7.7 | 28.7 | 55.0 | 69.2 | 51.7 |
| > = 60 | 16.0 | 9.4 | 14.0 | 9.7 | 35.1 | 65.9 | 75.3 | 60.2 |
| > = 70 | 23.5 | 11.4 | 20.2 | 12.2 | 47.1 | 80.7 | 84.6 | 65.3 |
| > = 80 | 26.3 | 18.4 | 26.5 | 20.8 | 52.6 | 94.7 | 85.3 | 66.7 |
Analysis of Cohort Characteristics Between Mayo and TMC
| Age (mean) | 65 (+ − 10.6) | 66 (+ − 9.7) | 0.1197 |
| Gender (female) | 243 (48.6) | 274 (54.8) | 0.0576 |
| SBI | 57 (11.4) | 38 (7.6) | 0.0516 |
| Acuity | |||
| Acuity/subacute | 6 (1.2) | 6 (1.2) | 1.0000 |
| Chronic | 44 (8.8) | 29 (5.8) | 0.0882 |
| Non-specified | 7 (1.4) | 3 (0.6) | 0.3407 |
| Location | |||
| Lacunar/subcortical | 27 (5.4) | 10 (2.0) | |
| Cortical/juxtacortical | 9 (1.8) | 13 (2.6) | 0.5188 |
| Both | 0 (0) | 3 (0.6) | 0.2492 |
| Non-specified | 21(4.2) | 12 (2.4) | 0.1558 |
| WMD | 291 (58.2) | 264 (52.8) | 0.9800 |
| WMD grading | |||
| Mild | 191 (38.2) | 154 (30.8) | |
| Mild/moderate | 21 (4.2) | 0 (0.0) | |
| Moderate | 42 (8.4) | 45 (9.0) | 0.8226 |
| Moderate/severe | 2 (0.4) | 0 (0) | 0.4995 |
| Severe | 8 (1.6) | 11 (2.2) | 0.6443 |
| No mention of quantification | 27 (5.4) | 54 (10.8) | |
Definition of abbreviations: CI confidence interval, OR odds ratio
Example of Language Variation between Two Data Sources
• No restricted diffusion. • No focal masses, focal atrophy, or foci of restricted water diffusion. • No evidence for acute ischemia on the diffusion weighted images. | • There is no acute territorial infarct. • No acute territorial infarct. • There is no decreased diffusion to indicate an acute infarct. |
• Mild leukoaraiosis • Minimal leukoaraiosis • Moderate leukoaraiosis | • There are scattered foci of hypodensity in the subcortical and periventricular white matter, a non-specific finding but likely reflecting the sequela of chronic microangiopathy • Areas of white matter hypodensity are a non-specific finding but may represent the sequela of chronic microangiopathy • There are multiple foci of t2 flair hyperintensity in the periventricular, deep and subcortical white matter, a non-specific finding but likely reflecting the sequela of chronic microangiopathy |
Fig. 3Overview of ESPRESSO Data Abstraction Process. Total Annotation Issues during Two Iterations