| Literature DB >> 28686612 |
Wei-Qi Wei1, Lisa A Bastarache1, Robert J Carroll1, Joy E Marlo1, Travis J Osterman1,2, Eric R Gamazon3,4,5,6, Nancy J Cox3, Dan M Roden1,2,7, Joshua C Denny1,2.
Abstract
OBJECTIVE: To compare three groupings of Electronic Health Record (EHR) billing codes for their ability to represent clinically meaningful phenotypes and to replicate known genetic associations. The three tested coding systems were the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, the Agency for Healthcare Research and Quality Clinical Classification Software for ICD-9-CM (CCS), and manually curated "phecodes" designed to facilitate phenome-wide association studies (PheWAS) in EHRs. METHODS AND MATERIALS: We selected 100 disease phenotypes and compared the ability of each coding system to accurately represent them without performing additional groupings. The 100 phenotypes included 25 randomly-chosen clinical phenotypes pursued in prior genome-wide association studies (GWAS) and another 75 common disease phenotypes mentioned across free-text problem lists from 189,289 individuals. We then evaluated the performance of each coding system to replicate known associations for 440 SNP-phenotype pairs.Entities:
Mesh:
Year: 2017 PMID: 28686612 PMCID: PMC5501393 DOI: 10.1371/journal.pone.0175508
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of the evaluated coding schema.
| ICD-9_CM | AHRQ CCS | Phecodes | |
|---|---|---|---|
| Availability | n/a | ||
| Number of Phenotypes | L: 22,401; T:1,247 | L: 850; T: 294 | L: 1,360; T: 659 |
| Embedded control Definition | No | No | Yes |
| Control definition | All individuals except those with relevant top codes | All individuals except those with parent CCS code | All individuals except those with exclude codes in PheWAS control definition |
| Example control exclusion for “Atrial fibrillation” (ICD-9-CM 427.31, CCS 7.2.9.3, phecode 427.21) | 427. | 7.2.9 | 426–427.99 |
L: leaf codes; T: top codes
* “top codes” for ICD-9-CM are 3 digit ICD-9-CM codes
** 153 codes are both top and leaf codes.
Fig 1Weighted Venn diagrams of the distributions of power-enabled tests, replicated associations, best ORs, and best P values with CCS, ICD-9-CM, and phecodes.
Each color represents a resource.
Results of tests for genetic replications on 440 SNP and phenotype pairs.
| Phecode | ICD-9-CM | CCS | |
|---|---|---|---|
| Replication (P<0.05) | 153 (34.8%) | 143 (32.5%) | 139 (31.6%) |
| Replication (Bonferroni) | 43 (9.8%) | 34 (7.7%) | 34 (7.7%) |
| Best OR | 103 | 83 | 58 |
| Best P Value | 78 | 45 | 59 |
Fig 2PheWAS results of three SNPs (rs35391, rs731839 and rs769449) showed that phecodes outperformed ICD-9-CM and CCS.
Fig 3PEPD expression results suggest strong association with the gastrointestinal tract.
Fig 4SNP rs731839 is a cis acting eQTL for PEPD in esophagus mucosa.