| Literature DB >> 33619467 |
Spiros Denaxas1,2,3,4, Anoop D Shah1,2, Bilal A Mateen3,5, Valerie Kuan2,4,6, Jennifer K Quint2,7, Natalie Fitzpatrick1,2, Ana Torralbo1,2, Ghazaleh Fatemifar1,2, Harry Hemingway1,2,4.
Abstract
OBJECTIVES: The UK Biobank (UKB) is making primary care electronic health records (EHRs) for 500 000 participants available for COVID-19-related research. Data are extracted from four sources, recorded using five clinical terminologies and stored in different schemas. The aims of our research were to: (a) develop a semi-supervised approach for bootstrapping EHR phenotyping algorithms in UKB EHR, and (b) to evaluate our approach by implementing and evaluating phenotypes for 31 common biomarkers.Entities:
Keywords: UK Biobank; electronic health records; medical informatics; phenotyping
Year: 2020 PMID: 33619467 PMCID: PMC7717266 DOI: 10.1093/jamiaopen/ooaa047
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Primary care electronic health record data made available on UK Biobank participants
| Country | Data source | Controlled clinical terminologies: clinical observations | Controlled clinical terminologies: prescriptions | Patients ( | Clinical events ( | Prescription events ( | Data fields |
|---|---|---|---|---|---|---|---|
| England | Vision | Read v2 |
Read v2 DM+D | 17 860 | 11 973 249 | 6 350 259 | 2 |
| Scotland | EMIS, Vision | Read v2 | BNF | 26 269 | 11 365 300 | 4 301 151 | 3 |
| England | TPP | CTV3 | BNF | 158 894 | 87 493 722 | 39 515 266 | 1 |
| Wales | EMIS, Vision | Read v2 | Read v2 | 20 463 | 12 837 100 | 7 533 324 | 2 |
Note: The number of patients reported was extracted from the registrations table and includes patients with more or one unique registration periods.
BNF: British National Formulary; CTV3: Clinical Terms Version 3; DM+D: Dictionary of Medicines and Devices; EMIS: Egton Medical Information Systems; TPP: The Phoenix Partners.
Details on the 31 biomarkers used in this study spanning blood biochemistry, blood count and physical measures
| Phenotype | UK Biobank field id | Phenotype type | Units | UnitOntology |
|---|---|---|---|---|
| ALP | 30610 | Blood biochemistry | U/L | UO_0000179 |
| ALT | 30620 | Blood biochemistry | U/L | UO_0000179 |
| Albumin | 30600 | Blood biochemistry | g/L | UO_0000175 |
| CRP | 30710 | Blood biochemistry | mg/L | UO_0000273 |
| Calcium | 30680 | Blood biochemistry | mmol/L | UO_0010003 |
| Cholesterol | 30690 | Blood biochemistry | mmol/L | UO_0010003 |
| Creatinine | 30700 | Blood biochemistry | umol/L | UO_0010003 |
| Glucose | 30740 | Blood biochemistry | mmol/L | UO_0010003 |
| HDL | 30760 | Blood biochemistry | mmol/L | UO_0010003 |
| HbA1c | 30750 | Blood biochemistry | mmol/mol | UO_0010048 |
| Total bilirubin | 30840 | Blood biochemistry | umol/L | UO_0010003 |
| Triglycerides | 30870 | Blood biochemistry | mmol/L | UO_0010003 |
| Urea | 30670 | Blood biochemistry | mmol/L | UO_0010003 |
| Basophills | 30160 | Blood count | 10^9/L | UO_0000317 |
| Eosinophills | 30150 | Blood count | 10^9/L | UO_0000317 |
| Hematocrit perc | 30030 | Blood count | % | UO_0000187 |
| Hemoglobin conc | 30020 | Blood count | g/dL | UO_0000208 |
| Lymphocytes | 30120 | Blood count | 10^9/L | UO_0000317 |
| MCHb conc | 30060 | Blood count | g/dL | UO_0000208 |
| MCV | 30040 | Blood count | fL | UO_0000104 |
| Monocytes | 30130 | Blood count | 10^9/L | UO_0000317 |
| Neutrophils | 30140 | Blood count | 10^9/L | UO_0000317 |
| Platelets | 30080 | Blood count | 10^9/L | UO_0000317 |
| RBC | 30010 | Blood count | 10^12/L | UO_0000317 |
| WBC | 30000 | Blood count | 10^9/L | UO_0000317 |
| DBP | 4079 | Physical measures | mmHg | UO_0000272 |
| FEV1 | 3063 | Physical measures | L | UO_0000099 |
| FVC | 3062 | Physical measures | L | UO_0000099 |
| Height | 50 | Physical measures | cm | UO_0000015 |
| SBP | 4080 | Physical measures | mmHg | UO_0000272 |
| Weight | 21002 | Physical measures | Kg | UO_0000009 |
Note: For units, we provide the UnitOntology entry identifier. The UK Biobank field id column provides the field identifier for the respective biomarker measure, if available, derived from the research data collected at baseline.
ALP: alanine aminotransferase level; ALP: alkaline phosphatase level; CRP: C-reactive protein; DBP: diastolic blood pressure; FEV1: forced expiratory volume in 1 second; FVC: full vital capacity; HDL: high-density lipoprotein; MChb conc: mean corpuscular hemoglobin concentration; MCV: mean corpuscular volume; RBC: red blood cell; SBP: systolic blood pressure; WBC: white blood cell.
Figure 1.Description of main steps involved in the semi-supervised approach for rapidly creating electronic health record phenotyping algorithms for biomarkers in the UK Biobank. The main steps involved in the semi-supervised phenotyping process are: (1) seeding the algorithm definitions using existing phenotype algorithms from the CALIBER resource, (2) excluding generic, rare or semantically distant terms, (3) map Read version 2 terms to Clinical Terms Version 3 terms using the maps provided by the National Health Service (NHS) terminology service (TRUD), (4) expert review and manual inclusion/exclusion of terms, and (5) translation to SQL code and data extraction.
Figure 2.Flow diagram showing the number of Read v2 and CTV3 terms identified by the algorithm and subsequent inclusions and exclusions performed through expert review. CTV3: Clinical Terms Version 3.
Figure 3.Histogram plots showing the distribution of values extracted from primary care EHR for the clinical biomarkers defined in this study. The dashed red line represents the mean value of the biomarker when measured at baseline (across any of the three waves) in study participants (value extracted from the UK Biobank Showcase). Minimum and maximum graph values have been aligned to those reported on the baseline measurements. ALP: alanine aminotransferase level; ALP: alkaline phosphatase level; CRP: C-reactive protein; DBP: diastolic blood pressure; FEV1: forced expiratory volume in 1 second; FVC: full vital capacity; HDL: high-density lipoprotein; MChb conc: mean corpuscular hemoglobin concentration; MCV: mean corpuscular volume; RBC: red blood cell; SBP: systolic blood pressure; WBC: white blood cell.
Figure 4.Boxplot showing the distribution of values extracted from primary care EHR for the clinical biomarkers defined in this study. 1 = England Vision, 2 = Scotland EMIS and Vision, 3 = England TPP and 4 = Wales. Minimum and maximum graph values have been aligned to those reported on the baseline measurements. ALP: alanine aminotransferase level; ALP: alkaline phosphatase level; CRP: C-reactive protein; DBP: diastolic blood pressure; FEV1: forced expiratory volume in 1 second; FVC: full vital capacity; HDL: high-density lipoprotein; MChb conc: mean corpuscular hemoglobin concentration; MCV: mean corpuscular volume; RBC: red blood cell; SBP: systolic blood pressure; WBC: white blood cell.
Descriptive statistics (median and IQR) on the clinical biomarkers defined in this study covering blood counts, key biochemistry markers and physical measurements
| Phenotype | Category | UKB id | Units | Eng. Vision events | Eng. Vision median (IQR) | Scotland events | Scotland median (IQR) | Eng. TPP events | Eng. TPP median (IQR) | Wales events | Wales median (IQR) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ALP | Blood biochemistry | 30610 | U/L | 147431 | 73.00 (27.00) | 65 017 | 74.00 (28.00) | 907 659 | 74.00 (31.00) | 169 068 | 73.00 (27.00) |
| ALT | Blood biochemistry | 30620 | U/L | 114591 | 23.00 (14.00) | 54 724 | 20.00 (12.00) | 891 694 | 23.00 (13.00) | 92 361 | 22.00 (13.00) |
| Albumin | Blood biochemistry | 30600 | g/L | 144475 | 42.00 (4.00) | 33 063 | 43.00 (5.00) | 954 748 | 42.00 (5.00) | 164 884 | 42.00 (5.00) |
| CRP | Blood biochemistry | 30710 | mg/L | 23549 | 2.00 (3.00) | 2510 | 3.00 (4.00) | 89 406 | 4.00 (3.00) | 18 104 | 4.00 (3.30) |
| Calcium | Blood biochemistry | 30680 | mmol/L | 54796 | 2.34 (0.14) | 7367 | 2.33 (0.14) | 282 652 | 2.35 (0.14) | 42 470 | 2.35 (0.14) |
| Cholesterol | Blood biochemistry | 30690 | mmol/L | 140222 | 5.20 (1.60) | 75 828 | 5.00 (1.70) | 978 057 | 5.10 (1.70) | 138 801 | 5.10 (1.60) |
| Creatinine | Blood biochemistry | 30700 | umol/L | 164360 | 79.00 (23.00) | 102 491 | 75.00 (22.00) | 1 147 028 | 81.00 (23.00) | 204 486 | 76.00 (21.00) |
| Glucose | Blood biochemistry | 30740 | mmol/L | 41394 | 5.10 (0.90) | 17 603 | 5.10 (1.10) | 282 383 | 5.10 (1.00) | 28 830 | 5.20 (1.10) |
| HDL | Blood biochemistry | 30760 | mmol/L | 121635 | 1.40 (0.50) | 46 453 | 1.40 (0.57) | 764 837 | 1.40 (0.52) | 92 854 | 1.30 (0.50) |
| HbA1c | Blood biochemistry | 30750 | mmol/mol | 33799 | 40.00 (6.00) | 5915 | 44.00 (8.00) | 175 995 | 40.00 (7.00) | 21 659 | 41.00 (9.00) |
| Total bilirubin | Blood biochemistry | 30840 | umol/L | 137775 | 9.00 (5.00) | 64 802 | 9.00 (5.00) | 903 641 | 10.00 (6.00) | 140 003 | 9.00 (5.00) |
| Triglycerides | Blood biochemistry | 30870 | mmol/L | 116308 | 1.30 (0.90) | 40 272 | 1.40 (0.93) | 717 562 | 1.37 (0.90) | 128 464 | 1.36 (0.90) |
| Urea | Blood biochemistry | 30670 | mmol/L | 148897 | 5.50 (1.80) | 75 630 | 5.60 (1.90) | 1 022 366 | 5.50 (1.90) | 58 332 | 5.30 (1.80) |
| Basophills | Blood count | 30160 | 10^9/L | 129939 | 0.02 (0.10) | 53 759 | 0.02 (0.05) | 832 814 | 0.02 (0.06) | 138 889 | 0.03 (0.10) |
| Eosinophills | Blood count | 30150 | 10^9/L | 130840 | 0.20 (0.10) | 52 681 | 0.15 (0.12) | 828 186 | 0.16 (0.12) | 142 025 | 0.20 (0.12) |
| Hematocrit perc | Blood count | 30030 | % | 4055 | 41.10 (5.10) | 11 | 45.00 (5.45) | 11 971 | 41.70 (5.30) | 11 192 | 41.20 (5.00) |
| Hemoglobin conc | Blood count | 30020 | g/dL | 78815 | 13.70 (1.80) | 21 512 | 13.60 (1.90) | 903 286 | 13.70 (1.70) | 92 144 | 13.70 (1.80) |
| Lymphocytes | Blood count | 30120 | 10^9/L | 133853 | 1.80 (0.80) | 50 032 | 1.79 (0.82) | 842 332 | 1.89 (0.80) | 142 637 | 1.80 (0.80) |
| MCHb conc | Blood count | 30060 | g/dL | 64138 | 33.50 (1.40) | 28 062 | 33.50 (1.70) | 607 386 | 33.40 (1.40) | 33 023 | 33.60 (1.30) |
| MCV | Blood count | 30040 | fL | 136768 | 91.00 (6.00) | 54 055 | 90.00 (6.30) | 872 756 | 90.90 (6.10) | 147 161 | 91.00 (6.00) |
| Monocytes | Blood count | 30130 | 10^9/L | 133696 | 0.50 (0.20) | 52 727 | 0.50 (0.27) | 839 930 | 0.50 (0.21) | 141 545 | 0.50 (0.20) |
| Neutrophils | Blood count | 30140 | 10^9/L | 133588 | 3.40 (1.65) | 53 374 | 3.57 (1.86) | 845 892 | 3.50 (1.79) | 143 045 | 3.50 (1.71) |
| Platelets | Blood count | 30080 | 10^9/L | 137935 | 244.00 (84.00) | 55 317 | 249.00 (85.00) | 879 866 | 250.00 (84.00) | 145 591 | 251.00 (83.00) |
| RBC | Blood count | 30010 | 10^12/L | 135140 | 4.50 (0.58) | 54 030 | 4.47 (0.63) | 869 832 | 4.55 (0.59) | 146 016 | 4.52 (0.58) |
| WBC | Blood count | 30000 | 10^9/L | 140068 | 6.10 (2.25) | 55 830 | 6.20 (2.50) | 892 739 | 6.25 (2.30) | 147 751 | 6.20 (2.30) |
| DBP | Physical measures | 4079 | mmHg | 357987 | 80.00 (14.00) | 393 765 | 80.00 (13.00) | 2 833 375 | 80.00 (15.00) | 417 257 | 80.00 (14.00) |
| FEV1 | Physical measures | 3063 | L | 6238 | 2.08 (0.99) | 1430 | 1.79 (0.95) | 47 847 | 2.03 (1.04) | 6214 | 2.00 (1.04) |
| FVC | Physical measures | 3062 | L | 2792 | 2.95 (1.29) | 233 | 2.96 (1.15) | 32 277 | 3.00 (1.31) | 3778 | 2.89 (1.26) |
| Height | Physical measures | 50 | cm | 144 | 168.00 (13.12) | 63 262 | 167.00 (14.00) | 1069 | 166.00 (15.00) | 27 449 | 167.64 (14.98) |
| SBP | Physical measures | 4080 | mmHg | 358071 | 136.00 (22.00) | 212 521 | 135.00 (21.00) | 2 836 175 | 136.00 (22.00) | 418 084 | 137.00 (23.00) |
| Weight | Physical measures | 21002 | Kg | 142704 | 77.00 (23.50) | 181 172 | 77.90 (23.32) | 1 137 892 | 78.00 (23.50) | 148 988 | 80.00 (25.00) |
Note: Statistics were stratified by data provider: 1 = England Vision, 2 = Scotland EMIS and Vision, 3 = England TPP and 4 = Wales.
ALP: alanine aminotransferase level; ALP: alkaline phosphatase level; CRP: C-reactive protein; DBP: diastolic blood pressure; FEV1: forced expiratory volume in 1 second; FVC: full vital capacity; HDL: high-density lipoprotein; MChb conc: mean corpuscular hemoglobin concentration; MCV: mean corpuscular volume; RBC: red blood cell; SBP: systolic blood pressure; WBC: white blood cell.
Figure 5.Adjusted Cox proportional hazards regression restricted cubic spline models for all biomarkers and all-cause mortality. Analyses were adjusted for patient sex and age. In each panel, the blue line indicates the estimated HR and the gray shading denotes the 95% confidence limits. The horizontal dashed line corresponds to the normal reference hazard ratio of 1.0, values above are associated with increased mortality risk, and values below are associated with decreased mortality risk compared with the reference value. ALP: alanine aminotransferase level; ALP: alkaline phosphatase level; CRP: C-reactive protein; DBP: diastolic blood pressure; FEV1: forced expiratory volume in 1 second; FVC: full vital capacity; HDL: high-density lipoprotein; MChb conc: mean corpuscular hemoglobin concentration; MCV: mean corpuscular volume; RBC: red blood cell; SBP: systolic blood pressure; WBC: white blood cell.