| Literature DB >> 25937834 |
Abstract
The convergence of two rapidly developing technologies - high-throughput genotyping and electronic health records (EHRs) - gives scientists an unprecedented opportunity to utilize routine healthcare data to accelerate genomic discovery. Institutions and healthcare systems have been building EHR-linked DNA biobanks to enable such a vision. However, the precise extraction of detailed disease and drug-response phenotype information hidden in EHRs is not an easy task. EHR-based studies have successfully replicated known associations, made new discoveries for diseases and drug response traits, rapidly contributed cases and controls to large meta-analyses, and demonstrated the potential of EHRs for broad-based phenome-wide association studies. In this review, we summarize the advantages and challenges of repurposing EHR data for genetic research. We also highlight recent notable studies and novel approaches to provide an overview of advanced EHR-based phenotyping.Entities:
Year: 2015 PMID: 25937834 PMCID: PMC4416392 DOI: 10.1186/s13073-015-0166-y
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Efforts and incentives to leverage clinical data for genomics research
|
|
|
|
|
|
|---|---|---|---|---|
| eMERGE | United States | 2007 |
| To develop methods and best practices for the utilization of EHRs for genetic research |
| i2b2 | United States | 2004 |
| To provide researchers with useful tools to leverage EHRs for clinical and genetic research |
| PGPop | United States | 2010 |
| To understand how a person’s genes affect his or her response to medicines |
| deCODE genetics | Iceland | 1996 |
| To leverage population-based and EHR-linked biosamples to investigate inherited causes of common diseases |
| UK Biobank | United Kingdom | 2007 |
| To improve the prevention, diagnosis and treatment of a wide range of serious and life-threatening illnesses through a collection of around 500,000 volunteers' biosamples and clinical information |
| MVP | United States | 2011 |
| To enroll one million volunteers and use their clinical and genetic data to improve health care for veterans |
| KP RPGEH | United States | 2009 |
| To examine the genetic and environmental factors that influence common diseases |
| CKB | China | 2004 |
| To explore the complex interplay between genes and environmental factors on the risks of common chronic diseases |
CKB, China Kadoorie Biobank; eMERGE, The Electronic Medical Records and Genomics Network; i2b2, Informatics for Integrating Biology and the Bedside; KP, Kaiser Permanente; MVP, Million Veteran Program; PGPop, Pharmacogenomic Discovery and Replication in Very Large Patient Populations; RPGEH, Research Program on Genes, Environment, and Health.
Figure 1Algorithm for the identification of subjects with type 2 diabetes. Normal glucose values are random glucose >200 mg/dl, fasting glucose >125 mg/dl. Normal HbA1c ≥6.5%. Dx, diagnosis; HbA1c, hemoglobin A1c; ICD-9, International Classification of Diseases, Ninth Revision; Rx, treatment; T1DM, type 1 diabetes mellitus; T2DM, type 2 diabetes mellitus. Figure reprinted with permission from Kho et al. [57].
Figure 2EHR data structure and accurate phenotyping. (a) Electronic health record (EHR) data can be structured or unstructured. Structured data are easy to retrieve whereas unstructured data require additional tools to be used for phenotyping, such as natural language processing (NLP). (b) Accurate phenotyping often requires extracting information from billing codes, prescriptions, laboratory tests and clinical notes. This information can be either structured or unstructured. ICD-9, International Classification of Diseases, Ninth Revision.
Figure 3The numbers of GWAS papers and EHR-based genetic studies per year. The horizontal axis represents time. The vertical axis is the log of the number of publications. Data source: National Human Genome Research Institute GWAS Catalog and PubMed.