| Literature DB >> 29666432 |
Chia-Yen Chen1,2,3,4,5, Phil H Lee1,3,4,5, Victor M Castro1,6,7, Jessica Minnier8, Alexander W Charney9,10,11, Eli A Stahl9,10, Douglas M Ruderfer12, Shawn N Murphy7,13,14, Vivian Gainer7, Tianxi Cai15, Ian Jones16, Carlos N Pato17, Michele T Pato17, Mikael Landén18,19, Pamela Sklar9,10,11, Roy H Perlis1,3,6, Jordan W Smoller20,21,22.
Abstract
Bipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. Am J Psychiatry 172:363-372, 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3330 BD cases and 3952 controls of European ancestry were used to estimate SNP-based heritability (h2g) and genetic correlation (rg) between EHR-based phenotype definitions and traditionally ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and three rule-based algorithms using codified EHR with decreasing levels of stringency-"coded-strict", "coded-broad", and "coded-broad based on a single clinical encounter" (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1968 coded-strict, 2581 coded-broad, 408 coded-broad-SV BD cases, and 3 952 controls. The estimated h2g were 0.24 (p = 0.015), 0.09 (p = 0.064), 0.13 (p = 0.003), 0.00 (p = 0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h2g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h2g) was 0.12 (p = 0.004). These h2g were lower or similar to the h2g observed by the ICCBD + PGCBD (0.23, p = 3.17E-80, total N = 33,181). However, the rg between ICCBD + PGCBD and the EHR-based cases were high for 95-NLP (0.66, p = 3.69 × 10-5), coded-strict (1.00, p = 2.40 × 10-4), and coded-broad (0.74, p = 8.11 × 10-7). The rg between EHR-based BD definitions ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.Entities:
Mesh:
Year: 2018 PMID: 29666432 PMCID: PMC5904248 DOI: 10.1038/s41398-018-0133-7
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
SNP-based heritability (h2g) for EHR-based bipolar disorder from the Partners Healthcare Research Patient Data Registry
| Bipolar disorder algorithms | Sample size | |||||
|---|---|---|---|---|---|---|
| Liability scale | Observed scale | PPV | cases | controls | ||
| 95-NLP | 0.24 (0.10) | 0.25 (0.10) | 0.015 | 0.86 | 862 | 3952 |
| Coded-strict | 0.09 (0.05) | 0.15 (0.08) | 0.064 | 0.84 | 1968 | 3952 |
| Coded-broad | 0.13 (0.04) | 0.22 (0.08) | 0.003 | 0.80 | 2581 | 3952 |
| Coded-broad-SV | 0.00 (0.11) | 0.00 (0.18) | 0.591 | 0.50 | 408 | 3952 |
| All except coded-broad-SV | 0.12 (0.04) | 0.21 (0.07) | 0.004 | 0.83 | 3013 | 3952 |
| ICCBD + PGCBDa | 0.23 (0.01) | 0.41 (0.02) | 3.17 × 10−80 | NA | 13902 | 19279 |
SNP-based heritability on liability scale was converted from observed scale based on population prevalence of 1%
aICCBD + PGCBD: Bipolar disorder genome-wide association study from the ICCBD and PGC1 with cases ascertained by traditional methods (Charney et al.[7])
bTest for h2g different from 0. PPV: positive predictive values from clinical validation (Castro et al.[10]). 95-NLP: probabilistic algorithm with 95% specificity based on natural language processing. Coded-strict, Coded-broad, Coded-broad-SV: coded rule-based algorithms with decreasing stringency. SV: single visit. SE: standard error
SNP-based genetic correlation (rg) between EHR-based bipolar disorder and bipolar disorder ascertained by traditional methods from ICCBD + PGCBD
| 95-NLP | 0.66 (0.16) | 3.69 × 10−5 |
| Coded-strict | 1.00 (0.29) | 2.40 × 10−4 |
| Coded-broad | 0.74 (0.15) | 8.11 × 10−7 |
| All except coded-broad-SV | 0.83 (0.17) | 7.19 × 10−7 |
Genetic correlation was not estimated for coded-broad-SV due to SNP-based heritability estimate of 0
SE standard error
aTest for different from 0
Fig. 1SNP-based genetic correlation (with 95% confidence interval) between bipolar disorder based on different ascertainment methods and other traits
Fig. 2Genome-wide Cochran’s Q-test for heterogeneity of SNP effects between ICCBD + PGCBD and EHR-based bipolar disorder.
Red line shows the Bonferroni-corrected significance level for the Q-test. SNPs are selected with association p-value threshold of 0.001 based on ICCBD + PGCBD analysis (total number of SNPs = 28,320)