| Literature DB >> 29062915 |
Shona M Kerr1, Archie Campbell2, Jonathan Marten1, Veronique Vitart1, Andrew M McIntosh3,4, David J Porteous2,5, Caroline Hayward1.
Abstract
This article provides the first detailed demonstration of the research value of the Electronic Health Record (EHR) linked to research data in Generation Scotland Scottish Family Health Study (GS:SFHS) participants, together with how to access this data. The structured, coded variables in the routine biochemistry, prescribing and morbidity records, in particular, represent highly valuable phenotypic data for a genomics research resource. Access to a wealth of other specialized datasets, including cancer, mental health and maternity inpatient information, is also possible through the same straightforward and transparent application process. The EHR linked dataset is a key component of GS:SFHS, a biobank conceived in 1999 for the purpose of studying the genetics of health areas of current and projected public health importance. Over 24,000 adults were recruited from 2006 to 2011, with broad and enduring written informed consent for biomedical research. Consent was obtained from 23,603 participants for GS:SFHS study data to be linked to their Scottish National Health Service (NHS) records, using their Community Health Index number. This identifying number is used for NHS Scotland procedures (registrations, attendances, samples, prescribing and investigations) and allows healthcare records for individuals to be linked across time and location. Here, we describe the NHS EHR dataset on the sub-cohort of 20,032 GS:SFHS participants with consent and mechanism for record linkage plus extensive genetic data. Together with existing study phenotypes, including family history and environmental exposures, such as smoking, the EHR is a rich resource of real world data that can be used in research to characterise the health trajectory of participants, available at low cost and a high degree of timeliness, matched to DNA, urine and serum samples and genome-wide genetic information.Entities:
Keywords: Biobank; Data; Electronic Health Record; Generation Scotland; Genotype
Year: 2017 PMID: 29062915 PMCID: PMC5645708 DOI: 10.12688/wellcomeopenres.12600.1
Source DB: PubMed Journal: Wellcome Open Res ISSN: 2398-502X
Figure 1. Schematic illustrating the mechanism for research data analyses.
The datasets available in Generation Scotland and Electronic Health Records (EHRs) are indicated, with numbers of participants and records. The Manhattan plot displays the results of a genome-wide association analysis (GWAS) using genotyped SNPs and EHR-derived serum urate measurements, as an example of how the datasets can be used together in genetic research by approved researchers. The single highest serum urate reading was taken for each participant, with covariates and methods for accounting for relatedness as previously described [5]. The −log 10 (P-value) is plotted on the y-axis, and chromosomal location is plotted on the x-axis. The genome-wide significance threshold accounting for multiple testing (p-value < 5 x 10 -8) is indicated by a red line, while suggestive significance (p-value < 10 -5) is indicated by a blue line.
Figure 2. Generation Scotland Scottish Family Health Study participants with categories of Electronic Health Record (EHR) data available through Community Health Index (CHI) linkage.
The light blue bars indicate the numbers of participants with linked data but no genetic data available, exact numbers given in parentheses below. The dark blue bars indicate the numbers of participants with genome-wide genotype data (Illumina OmniExpressExome array) available in each EHR dataset. CHI (n = 22,408); Any SMR (Scottish Morbidity Record) (n = 21,651 ); SMR00, Outpatient (n = 20,777); SMR01, General acute/Inpatient (n = 18,686); SMR02, Maternity Inpatient (n = 7,804); SMR11, Neonatal Inpatient (n = 3,284); SMR06, Scottish Cancer Registry (n = 2,562); SMR04, Mental Health Inpatient (n = 498); Died (n = 768); SIMD (Scottish Index of Multiple Deprivation) (n = 21,021); PIS (Prescribing Information System) (n = 21,029); Biochem (biochemistry laboratory data) (n = 19,233).
The 30 most frequently collected serum biochemistry measures, by number of unique participants.
The description of each measure, local code, Read Code, total number of records, number of unique participants and unique participants with genotype data are shown. Totals are for data collected from 2006 to 2016 on participants aged 18 or over.
| Description | Local
| Read
| Total
| Unique
| Unique IDs
|
|---|---|---|---|---|---|
| Creatinine | CR | 44J3. | 206498 | 17393 | 16069 |
| Sodium | NA | 44I5. | 205738 | 17387 | 16056 |
| Urea | UREA | 44J9. | 193348 | 17355 | 16038 |
| Potassium | KA | 44I4. | 201746 | 17315 | 15995 |
| eGFR | eGFR | 451E. | 195396 | 17101 | 15885 |
| Glucose | GL | 44g.. | 89325 | 16604 | 15424 |
| Alkaline phosphatase | AP | 44F.. | 224274 | 15564 | 14148 |
| Bilirubin | BI | 44EC. | 148662 | 15220 | 14088 |
| Albumin | AL | 44M4. | 225920 | 15163 | 14107 |
| Total Cholesterol | CHO | 44P.. | 72951 | 15106 | 14187 |
| HDL Cholesterol | HDL | 44P5. | 68357 | 14748 | 13855 |
| ALT/SGPT | ALT | 44G3. | 136145 | 13545 | 12541 |
| TSH | TSH | 44TW. | 60750 | 13166 | 12203 |
| C-Reactive Protein | CRP | 44CS. | 81382 | 11807 | 10938 |
| Corrected calcium | CC | 44IC. | 47913 | 11642 | 10767 |
| Calcium | CA | 44I8. | 67949 | 11631 | 10757 |
| Triglycerides | TRIG | 44Q.. | 34374 | 8373 | 7769 |
| Gamma-Glutamyltransferase | GT | 44G9. | 38172 | 7351 | 6715 |
| Free Thyroxine | FT4 | 442V. | 26892 | 7272 | 6693 |
| Protein (Total) | TP | 44M3. | 39303 | 6925 | 6341 |
| Chloride | PCL | 44I6. | 80031 | 6749 | 6191 |
| Phosphate | PH | 44I9. | 33796 | 6360 | 5825 |
| Bicarbonate | BIC | 44I7. | 23877 | 4197 | 3843 |
| Creatine kinase | CK | 44HG. | 10614 | 3907 | 3623 |
| Magnesium | MG | 44LD. | 18331 | 3794 | 3470 |
| Amylase | AMY | 44CN. | 8856 | 3498 | 3222 |
| Urate | URIC | 44K5. | 6688 | 3002 | 2780 |
| LDL Cholesterol (calc) | CLDL | 44PI. | 15269 | 3223 | 2968 |
| FSH | FSH | 443h. | 4272 | 2766 | 2580 |
| Transferrin | TRF | 44CB. | 6255 | 2734 | 2507 |