| Literature DB >> 29370377 |
Harry Hemingway1,2, Folkert W Asselbergs1,2,3, John Danesh4, Richard Dobson1,2,5, Nikolaos Maniadakis6, Aldo Maggioni6, Ghislaine J M van Thiel3, Maureen Cronin7, Gunnar Brobert8, Panos Vardas6, Stefan D Anker9,10, Diederick E Grobbee11, Spiros Denaxas1,2.
Abstract
Aims: Cohorts of millions of people's health records, whole genome sequencing, imaging, sensor, societal and publicly available data present a rapidly expanding digital trace of health. We aimed to critically review, for the first time, the challenges and potential of big data across early and late stages of translational cardiovascular disease research. Methods and results: We sought exemplars based on literature reviews and expertise across the BigData@Heart Consortium. We identified formidable challenges including: data quality, knowing what data exist, the legal and ethical framework for their use, data sharing, building and maintaining public trust, developing standards for defining disease, developing tools for scalable, replicable science and equipping the clinical and scientific work force with new inter-disciplinary skills. Opportunities claimed for big health record data include: richer profiles of health and disease from birth to death and from the molecular to the societal scale; accelerated understanding of disease causation and progression, discovery of new mechanisms and treatment-relevant disease sub-phenotypes, understanding health and diseases in whole populations and whole health systems and returning actionable feedback loops to improve (and potentially disrupt) existing models of research and care, with greater efficiency. In early translational research we identified exemplars including: discovery of fundamental biological processes e.g. linking exome sequences to lifelong electronic health records (EHR) (e.g. human knockout experiments); drug development: genomic approaches to drug target validation; precision medicine: e.g. DNA integrated into hospital EHR for pre-emptive pharmacogenomics. In late translational research we identified exemplars including: learning health systems with outcome trials integrated into clinical care; citizen driven health with 24/7 multi-parameter patient monitoring to improve outcomes and population-based linkages of multiple EHR sources for higher resolution clinical epidemiology and public health.Entities:
Mesh:
Year: 2018 PMID: 29370377 PMCID: PMC6019015 DOI: 10.1093/eurheartj/ehx487
Source DB: PubMed Journal: Eur Heart J ISSN: 0195-668X Impact factor: 29.983
Early translation exemplars of big health record data research: discovery of disease mechanism, drug development, and precision medicine
| Health challenges | Example | Author/year |
|
| Phenotype at baseline | Longitudinal phenotypes, omics and imaging | Analysis approaches |
|---|---|---|---|---|---|---|---|
| Discovery | |||||||
| Human knockouts and health | Population based resource for experimental medicine in ‘human knockouts’ | Narasimham | 3.222k | 3 Consented cohort Recall by genotype, EHR-1°, | Parentally related Pakistani adults recruited from antenatal clinic | Exome sequencing: III rare variant genotype; predicts loss of gene function Result 1358 phenotypes | Genetics Experimental medicine Informatics |
| Discovery approaches agnostic to disease and biology | GWAS and phenome wide association studies (PheWAS) | Denny | 13.835k | EHR | 1358 phenotypes in Hospital treated patients | Genotyped 3144 SNPs | Informatics GWAS |
| Discovering new disease sub-types | Heart failure with preserved ejection fraction divided into two groups with differing outcomes | Shah | 0.397k | 5 67 parameters physical characteristics, blood labor, ECG, echo-cardiography | Heart failure (preserved ejection fraction, HFpEF) | HF hospitalization | Machine learning: unbiased hierarchical cluster analysis on continuous values |
| Developing models of disease networks | Networks of more than 1000 longitudinal disease trajectories: gout important for cardiovascular disease progression | Jensen | 6.2 m | 1 EHR-2° (admissions, outpatients, casualty) | All diagnosed diseases | All diagnosed diseases (14.9 years) | Trajectory/network analysis |
| Drug discovery and repurposing | |||||||
| Drug target validation | Inactivating mutation in gene (NPC1L1) mimicking drug (ezetimibe) and effect on LDL cholesterol and coronary disease | Stitziel | 7.364k | Various, incl. Biovu & GoDarts | CHD cases | Exon sequencing genetics of NPC1L1 | Genetics for drug target validation |
| Repurposing existing drugs | Mapping GWAS catalogues to druggable genome and 3 tiers of compounds 8 drug target gene associations concordant, 19 discordant | Finan | >100k in 84 GWAS relevant to CVDs | 3 All GWAS, All compounds | 84 GWAS in 39 CVDs 388 associations in 670 genes, of which 135 genes drugabble | All compounds with bioactivity against targets 18 844 in CHEMBL Druggable genome | Bioinformatics |
| Trial endpoint optimization | Heart failure and peripheral arterial disease are comments diseases, seldom prominent in trial endpoints | Rapsomaniki | 2000k | 4 EHR1°, QR, A, M | Healthy, free from diagnosed CVDs at baseline | 12 incident CVDs over follow up | Cohort epidemiology |
| Precision medicine | |||||||
| Pre-emptive pharmacogenomics in care | Genotypes used to select anti-platelet drug, or dosing in warfarin | Van Driest | 10k | EHR structured and text | Hospital patients at high risk of subsequent receipt of antithrombotics | Clopidogrel CYP2C19; Simva SLC01B1; Warfarin VK0RC1; CYP2C9; thiopurine TPMT; tacrolimus CYP3A5 | Demonstration project |
| Tailoring drug treatment decisions to a patients risk of benefit and harm | Prolonged dual anti-platelet therapy: Development and validation of risk prediction models for benefits (CVD death, MI and stroke) and harms (bleeding) | Pasea | 18.307k | 4 EHR-1° QR, A, M | Stable CAD 12 months post-AMI | CVD, MI, stroke Bleeding | Multiple prognostic risk models and net benefit |
*EHR, electronic health records; QR, quality registry; A, Administrative data; M, mortality; GWAS, genome wide association study.
Late translation exemplars of big health record data research: learning health systems, citizen driven health, and public health
| Health challenges | Example | Author/year | N patients (000’s) | Phenotype at baseline | Longitudinal phenotypes, omics and imaging | Design/Analysis/Disciplines | |
|---|---|---|---|---|---|---|---|
| Learning health systems | |||||||
| Integrating trials in clinical care | Thrombus aspiration at the time of primary coronary intervention (TASTE trial) has no impact on short or long term outcomes | Fröbert | 7.244k | 3 QR, A, M | STEMI, Angio findings | Follow up for ACM, stent restem, uf MI (1 yr) | RCT: Point of care, registry embedded, pragmatic |
| Comparing effectiveness of whole health systems | Large differences in care and outcomes between UK and Sweden (all hospitals) | Chung | 500k | 2 QR, M | NSTEMI STEMI | ACM (30 d) | Survival analysis |
| Vigilance for safety | Mining text could have detected the Vioxx – acute MI signal earlier than conventional pharmacoeopidemology approaches | Lependu | 1.8k | 1 hospital records structured and text | All diagnosed diseases rofecexib | All drug safety signals including (acute MI) 11 m clinical notes | Text mining |
| Targeting cost effective care | Cost effectiveness decision models provide willingness to pay estimates in different risk groups, and different treatment benefits for stable coronary disease | Asaria | 100k | 4 EHR-primary care, QR, A, M | Stable CAD, stable angina (xxx AMI) | All hospitals, procedures, drug use, resource use | Health economics |
| Citizen driven health | |||||||
| Real time real world 24/7 monitoring: the ‘sensed self’ | Pacemaker monitoring might lower event rates | Hindricks | 0.716k | 4 Implantable monitor | Heart failure Multi-parameter monitoring | Primary outcome was: ACM, hospitalization for heart failure, worsening of NYHA class. | RCT of a detailed monitoring intervention |
| Delivering individualized interventions through mobile phones | Texts might increase smoking cessation | Free | 5.8k | Smoker | Cessation Text messaging | RCT of behavioural intervention | |
| Understanding the public through social media | Twitter language might predict community heart disease rates | Eichstaedt | 148 m country mapped | CDC atherosclerosis Twitter | Mapping of words used in tweets to psychological constructs | N/A | Ecological correlations at county level |
| Public health | |||||||
| Epidemiology of all CVDs and clinically relevant sub-types of disease | Incidence and survival of NSTEMI and STEM ICD9-CM | Yeh | 3000k | 2 HMO, M | 46 086 hospitalizations STEMI/NSTEMI | All cause mortality30 day | Cohort |
| Rare disease epidemiology | Rare disease: valid EHR phenotypes & new associations with coronary disease [HCM] | Pujades-Rodriguez | 1.16k | 4 EHR-primary care A, QR, M | Hypertrophic cardiomyopathy | Coronary, stroke, HF, arrhythmia, bleeding, DVT/PE at 4 years follow up | Cohort |
| Evaluating population impact of interventions | Introduction of smoke free legislation in different countries at different times: impact on admissions to hospital with heart attack England smoke-free 1 July 2007 | Sims | millions | 1 HES | MI admission 1 July 2002–30 September 2008 | N/A | Natural experiment Time series analysis |
EHR, electronic health records; QR, quality registry; A, Administrative data; M, mortality; GWAS, genome wide association study; HCM, Hypertrophic Cardiomyopathy; MI, myocardial infarction; STEMI, ST-segment elevation MI; NSTEMI, Non ST-segment elevation MI; CAD, Coronary Artery Disease; ACM, All-Cause Mortality; RCT, Randomized Clinical Trial; NYHA, New York Heart Association; ICD9-CM, International Classification of Diseases 9th revision – Clinical Modifications; CVD, Cardiovascular Disease; HMO, Health Management Organisation; CDC, Centres for Disease Control and Prevention; HES, Hospital Episode Statistics; DVT, Deep Vein Thrombosis; PE, Pulmonary Embolism.