| Literature DB >> 25717415 |
Daniel Fort1, Chunhua Weng1, Suzanne Bakken2, Adam B Wilcox3.
Abstract
Collected to support clinical decisions and processes, clinical data may be subject to validity issues when used for research. The objective of this study is to examine methods and issues in summarizing and evaluating the accuracy of clinical data as compared to primary research data. We hypothesized that research survey data on a patient cohort could serve as a reference standard for uncovering potential biases in clinical data. We compared the summary statistics between clinical and research datasets. Seven clinical variables, i.e., height, weight, gender, ethnicity, systolic and diastolic blood pressure, and diabetes status, were included in the study. Our results show that the clinical data and research data had similar summary statistical profiles, but there are detectable differences in definitions and measurements for individual variables such as height, diastolic blood pressure, and diabetes status. We discuss the implications of these results and confirm the important considerations for using research data to verify clinical data accuracy.Entities:
Year: 2014 PMID: 25717415 PMCID: PMC4333689
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Concepts and Definitions for sample summary
| Concept | Definition |
|---|---|
| Number of individuals in sample | |
| Average age of individuals in sample | |
| Proportion of sample labeled female | |
| Proportion of sample labeled Hispanic | |
| Average weight of individuals in sample | |
| Median height of individuals in sample | |
| Average BMI of individuals in sample, computed from individual weight and height | |
| Proportion of sample labeled positive for smoking | |
| Proportion labeled positive for smoking out of individuals with labeled smoking status | |
| Average systolic blood pressure of individuals in sample | |
| Average diastolic blood pressure of individuals in sample | |
| Proportion of sample with positive diabetes status. | |
| Proportion with positive diabetes status out of individuals with recorded ICD-9 codes and test values |
Summary Results from alternate sampling methods
| Survey | Census-weighted Survey | Clinical Raw | Clinical Resampled | Census-weightedClinical | ||
|---|---|---|---|---|---|---|
| 4069 | 78418 | 56694 | ||||
| 50.1 | 44.6 | 47.6 | 47.0 | 44.1 | ||
| 0.708 | 0.528 | 0.619 | 0.714 | 0.528 | ||
| 0.955 | 0.951 | 0496 | 0.604 | 0.501 | ||
| 75.4 | 77.0 | 75.7 | 74.8 | 78.2 | ||
| 161.2 | 163.7 | 160.3 | 159.1 | 162.7 | ||
| 28.2 | 27.7 | 28.1 | 28.3 | 28.1 | ||
| 0.058 | 0.064 | 0.089 | 0.078 | 0.101 | ||
| 0.060 | 0.066 | 0.122 | 0.103 | 0.138 | ||
| 127.7 | 125.5 | 127.2 | 126.5 | 126.8 | ||
| 81.0 | 80.7 | 73.1 | 72.7 | 73.4 | ||
| 0.159 | 0.122 | 0 038 | 0.040 | 0.032 | ||
| 0.162 | 0.124 | 0.284 | 0.286 | 0.313 |
Clinical, Survey, and Matched Set data comparison. Bonferroni-corrected p-value = 1e-4
Systolic blood pressure summary values and patient cohort size for various data point selection methodologies
| Systolic BP | Survey | Closest Prior | Closest Subsequent | Random Point | Mean |
|---|---|---|---|---|---|
| 1290 | 1107 | 962 | 1185 | 1185 | |
| 127.8 | 127.9 | 130.3 | 129.3 | 128.5 |
Sensitivity, Specificity, F-measure, and Positive Predictive Value of components of a diabetes diagnosis
| Value | ALL | ANY | ≥ ICD-9 | ≥ ICD-9 | HIGH HBA1C | HIGH GLUCOSE, EVER | HIGH GLUCOSE, RECENT |
|---|---|---|---|---|---|---|---|
| 0.33 | 0.81 | 0.90 | 0.84 | 0.48 | 0.72 | 0.52 | |
| 0.98 | 0.35 | 0.88 | 0.93 | 0.96 | 0.53 | 0.74 | |
| 0.49 | 0.49 | 0.89 | 0.88 | 0.64 | 0.61 | 0.61 | |
| 0.82 | 0.27 | 0.68 | 0.78 | 0 79 | 0.31 | 0.37 |