| Literature DB >> 31329239 |
Spiros Denaxas1,2,3,4,5, Arturo Gonzalez-Izquierdo1,2,4, Kenan Direk1,2,4, Natalie K Fitzpatrick1,2, Ghazaleh Fatemifar1,2, Amitava Banerjee1,2,5, Richard J B Dobson1,2,6,4,5, Laurence J Howe7, Valerie Kuan2,7, R Tom Lumbers1,2,5, Laura Pasea1,2, Riyaz S Patel7,5, Anoop D Shah1,2,5, Aroon D Hingorani2,7, Cathie Sudlow8,9, Harry Hemingway1,2,4,5.
Abstract
OBJECTIVE: Electronic health records (EHRs) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems, and collected for purposes other than medical research. We describe an approach for developing, validating, and sharing reproducible phenotypes from national structured EHR in the United Kingdom with applications for translational research.Entities:
Keywords: electronic health records; medical informatics; personalized medicine; phenotyping
Year: 2019 PMID: 31329239 PMCID: PMC6857510 DOI: 10.1093/jamia/ocz105
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.The CALIBER platform (https://www.caliberresearch.org) links national structured electronic health records (EHRs) across primary care, secondary care, and mortality for research. EHR-derived phenotypes are created using an iterative methodology and 6 independent approaches of evidence are generated to assess algorithm accuracy. More than 50 phenotypes are published in an open-access resource, the CALIBER Portal (https://www.caliberresearch.org/portal), and are used in >60 publications.
Figure 2.CALIBER Portal entry for the heart failure phenotype (available at https://www.caliberresearch.org/portal/phenotypes/heartfailure). Each entry in the Portal contains implementation details on the logic and the terms from controlled clinical terminologies associated with the phenotyping algorithm. Additionally, the 6 approaches of validation evidence are presented and the research output that has used the phenotype is provided.
Overview of published, peer-reviewed EHR phenotypes derived from the CALIBER platform and the approaches of validation evidence - More information available on the CALIBER Portal https://www.caliberresearch.org/portal/phenotypes
| Phenotype | EHR data source | Validation evidence | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Primary care | Secondary care | Death | Cross-source | Case-note review | Prognosis | Etiology | Genetic | Cross-country | |
|
| |||||||||
| AAA |
|
|
|
|
|
| |||
| AMI |
|
|
|
|
|
|
|
| |
| AD |
|
|
|
|
| ||||
| AF |
|
|
|
|
|
|
| ||
| Uveitis |
|
|
| ||||||
| Bleeding |
|
|
|
|
|
|
|
| |
| Bullous disorder |
|
|
|
| |||||
| CHD |
|
|
|
|
| ||||
| Depression |
|
|
|
| |||||
| Diabetes |
|
|
|
| |||||
| Giant cell arteritis |
|
|
|
| |||||
| HF |
|
|
|
|
|
| |||
| HIV |
|
|
|
|
| ||||
| Hypertension |
|
|
|
|
| ||||
| HCM |
|
|
|
| |||||
| Influenza |
|
| |||||||
| MS |
|
|
|
| |||||
| PAD |
|
|
|
|
|
| |||
| Polymyalgia |
|
|
|
| |||||
| PBC |
|
|
|
| |||||
| Psoriasis |
|
|
|
| |||||
| Dementia NOS |
|
|
|
|
| ||||
| RA |
|
|
|
| |||||
| SA |
|
|
|
|
| ||||
| Intracerebral hemorrhage |
|
|
|
|
|
| |||
| Ischemic stroke |
|
|
|
|
|
| |||
| SAH |
|
|
|
|
|
| |||
| Stroke NOS |
|
|
|
|
|
| |||
| SCD |
|
|
|
|
|
| |||
| Systemic sclerosis |
|
|
|
| |||||
| TIA |
|
|
|
|
|
| |||
| UCD |
|
|
|
|
| ||||
| UA |
|
|
|
|
| ||||
| Vascular dementia |
|
|
|
|
| ||||
| Obesity |
|
|
|
| |||||
|
| |||||||||
| Blood pressure |
|
| |||||||
| Eosinophils |
|
| |||||||
| Heart rate |
|
| |||||||
| Lymphocytes |
|
| |||||||
| Neutrophils |
|
| |||||||
| White blood cells |
|
|
| ||||||
| LDL cholesterol |
|
| |||||||
| HDL cholesterol |
|
| |||||||
| Triglycerides |
|
| |||||||
| BMI |
|
|
| ||||||
|
| |||||||||
| Alcohol |
|
| |||||||
| Ethnicity |
|
|
| ||||||
| Pregnancy |
|
|
| ||||||
| Sex |
|
| |||||||
| Smoking |
|
| |||||||
| Deprivation |
|
| |||||||
AAA: abdominal aortic aneurysm; AD: Alzheimer’s disease; AF: atrial fibrillation; AMI: acute myocardial infarction; BMI: body mass index; CHD: coronary heart disease; EHR: electronic health record; HCM: hypertrophic cardiomyopathy; HDL: high-density lipoprotein; HF: heart failure; HIV: human immunodeficiency virus; LDL: low-density lipoprotein; MS: multiple sclerosis; NOS: not otherwise specified; PAD: peripheral arterial disease; PBC: primary biliary cirrhosis; RA: rheumatoid arthritis; SA: stable angina; SAH: subarachnoid hemorrhage; SCD: sudden cardiac death; TIA: transient ischemic attack; UA: unstable angina; UCD: unheralded coronary death.
Figure 3.Assessing the recording and concordance of 3 electronic health record (EHR)–derived phenotypes (heart failure, nonfatal acute myocardial infarction [AMI], and bleeding) across 3 EHR data sources: primary care (Clinical Practice Research Datalink [CPRD]), hospital care (Hospital Episode Statistics [HES]), and mortality (Office for National Statistics [ONS]) or disease registry data (Myocardial Ischaemia National Audit Project [MINAP]). Only a very small proportion (9% for heart failure, 31% for AMI, and <1% for bleeding) of cases are identified concurrently by all 3 data sources. ICD-10: International Classification of Diseases–Tenth Revision.
Figure 4.Risk factors for initial presentation of heart failure (HF) phenotype: hazard ratio (HR) and 95% confidence interval of smoking status, type 2 diabetes mellitus (T2DM), systolic blood pressure (BP) and heart rate based on previously published CALIBER studies,, compared with estimates obtained from investigator-led studies derived using manually curated research data. All individual analyses have been adjusted for age and sex and other covariates. Scale: 279 × 215 mm (72 × 72 dots per inch).
Systematic validation of the CALIBER EHR-derived phenotypes for HF, AMI, and bleeding across 6 approaches of evidence: cross-EHR concordance, case-note review, etiology, prognosis, genetic associations, and external populations
| Validation domain | Description | What has been done | ||
|---|---|---|---|---|
| HF | AMI | Bleeding | ||
|
| To what extent is the phenotype concordant across EHR sources? | The proportion of HF cases recorded in primary care and hospital care EHR was 27% | The proportion of nonfatal AMI defined across primary care, hospital care, and disease registry was 32% | The proportion of bleeding events recorded in primary care and hospital care was 12%, with 47% of bleeding events recorded only in primary care and 12% only in hospital care |
|
| What is the PPV and the NPV when comparing the algorithm with clinician review of case notes or “gold standard” source of information? | Compared with AMI defined in the disease registry, the PPV of AMI recorded in primary care was 92.2% (95% CI, 91.6%-92.8%) and in hospital admissions was 91.5% (95% CI, 90.8%-92.1%) | Compared through independent review by 2 clinicians, the PPV of bleeding events identified through the phenotyping algorithm was 0.88 | |
|
| Are the prospective associations with risk actors consistent with previous evidence? | Type 2 diabetes, | Type 2 diabetes, | At 5 y, 29.1% (95% CI, 28.2%-29.9%) of atrial fibrillation patients, 21.9% (95% CI, 21.2%-22.5%) of myocardial infarction patients, 25.3% (95% CI, 24.2%-26.3%) of unstable angina patients and 23.4% (95% CI, 23.0%-23.8%) of stable angina had bleeding of any kind |
|
| Are the risks of subsequent events plausible? | Corrected for age and sex, HF was strongly associated with mortality, with HRs for all‐cause mortality ranging from 7.01 (95% CI, 6.83-7.20) to 7.23 (95% CI, 7.03-7.43), and up to 15.38 (95% CI, 15.02-15.83) for patients in primary care with acute HF hospitalization, primary care only, and patients hospitalized but no primary care record |
Patients with myocardial infarction identified in the disease registry had lower crude 30-d mortality (10.8%; 95% CI, 10.2%-11.4%) than did those identified in hospital care (13.9%; 95% CI, 13.3%-14.4%) or in primary care (14.9%; 95% CI, 14.4%-15.5%) ( Of the 24 479 patients with AMI, 5775 (23.6%) developed HF during a median follow-up of 3.7 years (incidence rate per 1000 person-years, 63.8; 95% CI, 62.2-65.5) | The HR for all-cause mortality was 1.98 (95% CI, 1.86-2.11) for primary care bleeding with markers of severity, and 1.99 (95% CI, 1.92-2.05) for hospitalized bleeding without markers of severity, compared with patients with no bleeding |
|
| Are the observed genetic associations plausible and concordance with previous evidence? | Consistent direction and magnitude of associations were replicated in 67 (97%) of previously reported genetic variants | ||
|
| Has the algorithm been tested (in any of the previous validation domains) in different countries? | We observed high 3-y crude cumulative risks of all-cause death (from 19.6% [England] to 30.2% [United States]); the composite of AMI, stroke, or death (from 26.0% [France] to 36.2% [United States]); and hospitalized bleeding (from 3.1% [France] to 5.3% [United States]) | ||
AMI: acute myocardial infarction; CI: confidence interval; EHR: electronic health record; HF: heart failure; HR: hazard ratio; NPV: negative predictive value; PPV: positive predictive value.