| Literature DB >> 26061293 |
Nicole A Restrepo1, Eric Farber-Eger1, Robert Goodloe1, Jonathan L Haines2, Dana C Crawford2.
Abstract
Electronic medical records (EMRs) are being widely implemented for use in genetic and genomic studies. As a phenotypic rich resource, EMRs provide researchers with the opportunity to identify disease cohorts and perform genotype-phenotype association studies. The Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study, as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study, has genotyped more than 15,000 individuals of diverse genetic ancestry in BioVU, the Vanderbilt University Medical Center's biorepository linked to a de-identified version of the EMR (EAGLE BioVU). Here we develop and deploy an algorithm utilizing data mining techniques to identify primary open-angle glaucoma (POAG) in African Americans from EAGLE BioVU for genetic association studies. The algorithm described here was designed using a combination of diagnostic codes, current procedural terminology billing codes, and free text searches to identify POAG status in situations where gold-standard digital photography cannot be accessed. The case algorithm identified 267 potential POAG subjects but underperformed after manual review with a positive predictive value of 51.6% and an accuracy of 76.3%. The control algorithm identified controls with a negative predictive value of 98.3%. Although the case algorithm requires more downstream manual review for use in large-scale studies, it provides a basis by which to extract a specific clinical subtype of glaucoma from EMRs in the absence of digital photographs.Entities:
Mesh:
Year: 2015 PMID: 26061293 PMCID: PMC4465698 DOI: 10.1371/journal.pone.0127817
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Criteria and decision tree used to classify individuals in EAGLE BioVU as POAG cases or controls.
(A) Flow diagram of phenotype algorithm for POAG cases and controls. (B) List of glaucoma ICD-9 codes (C) List of CPT codes for ophthalmology and general clinic procedures.
Fig 2De-identified clinic notes extracted from the Synthetic Derivative.
De-identified letters from the Vanderbilt Eye Institute’s ophthalmologists and optometrists extracted from the Synthetic Derivative. These letters represent the primary type of record used to verify POAG case status in EAGLE BioVU. It is an example of a definite case which has the specific clinical sub-type of glaucoma clearly stated and of a potential case which only includes a more general glaucoma diagnosis statement.
Evaluation of primary open-angle glaucoma phenotype algorithm in African Americans from EAGLE BioVU.
| Sample Size | Manually reviewed | PPV | NPV | Accuracy | |
|---|---|---|---|---|---|
| Cases | 267 | 267 | - | - | - |
| -Definite | 138 | 51.6% | - | 76.3% | |
| -Potential | 67 | 76.7% | - | 83.1% | |
| Controls | 4813 | 300 | - | 98.3% | - |
Definite cases were individuals whose POAG status could be determined with high likelihood. Potential cases were individuals whose medical records lacked sufficient information to make a definitive decision. Potential case results were calculated by including both potential and definite case numbers.
Fig 3Histogram plots of the distribution of cup-to-disk (CDR) ratios of POAG cases from the Synthetic Derivative.
Distribution of CDR for the right eyes (A) and the left eyes (B) of POAG cases from numbers that were manually extracted from the SD.
Study population characteristics of POAG definite cases and controls among African Americans in EAGLE BioVU.
| Definite Cases > 20 yrs (SD) | Controls > 40 yrs (SD) | |
|---|---|---|
| N | 138 | 4813 |
| Age at Diagnosis (years) | 62.0 (12.0) | -- |
| Age at Last Clinic (years) | -- | 54 (11.7) |
| Sex (% female) | 63.7 | 60 |
| Hypertensive (%) | 55.1 | 46.6 |
| BMI (kg/m2) | 30.1 (6.7) | 30.1 (8.0) |
| Diastolic (mm/Hg) | 74.5 (8.1) | 80 (33.6) |
| Systolic (mm/Hg) | 134.5 (14.1) | 124 (26.2) |
| Cholesterol (mg/dL) | 183 (40.6) | 161 (65.2) |
| HDL (mg/dL) | 52.5 (25.0) | 53 (38.6) |
| LDL (mg/dL) | 103 (42.9) | 99 (50.7) |
| Triglycerides (mg/dL) | 125 (76.3) | 98 (67.8) |
Median values were calculated for the following: Age at POAG diagnosis was determined by the date of when POAG ICD-9 (365.11) was first mentioned in the records. Age at last clinic visit (LCV) was taken as the date of the last CPT mentioned in the records for controls. An individual was classified as hypertensive if he/she met one of three criteria: systolic blood pressure > 140 mm/Hg, diastolic blood pressure > 90 mm/Hg, or on hypertension medications all within a two year window of when they were diagnosed with POAG in cases and a two year window of their LCV date for controls. Blood pressure (systolic and diastolic), lipids (total cholesterol, high-density cholesterol, low-density cholesterol, and triglycerides), and body mass index (height and weight) were calculated from labs or measurements within two years of POAG diagnosis or LCV. Abbreviations: standard deviation (SD)
Published index variants for the CDKN2B-AS1 region associated with POAG or POAG associated trait and availability of these variants on the Metabochip.
| rs# | population | OR | p-value | Discovery study | Current Study | OR | p-value |
|---|---|---|---|---|---|---|---|
| rs7865618 | Japanese | 1.78 | 9x10-11 | Nakano et al[ | not available | - | - |
| rs1063192 | Japanese | 1.33 | 5x10-10 | Osman et al[ | - | 0.92 | 0.75 |
| rs2157719 | European American | 1.45 | 2x10-18 | Wiggs et al[ | - | 0.97 | 0.92 |
| rs4977756 | European | 1.50 | 4.7x10-9 | Burdon et al[ | not available | - | - |
| rs10120688 | African American | 1.21 | 0.002 | Liu et al[ | not available | - | - |
Shown are significant index variants which are listed on the NHGRI GWAS catalog and within PubMed. Included is the availability of the index variants on the Metabochip and summary results for the current studies association analysis of African Americans with POAG in the CDKN2B-AS1 region.