Literature DB >> 34061827

High-throughput framework for genetic analyses of adverse drug reactions using electronic health records.

Neil S Zheng1, Cosby A Stone2, Lan Jiang3, Christian M Shaffer4, V Eric Kerchberger1,2, Cecilia P Chung3,4,5,6, QiPing Feng5, Nancy J Cox6,7, C Michael Stein5,8, Dan M Roden1,5,8,9, Joshua C Denny1, Elizabeth J Phillips10, Wei-Qi Wei1.   

Abstract

Understanding the contribution of genetic variation to drug response can improve the delivery of precision medicine. However, genome-wide association studies (GWAS) for drug response are uncommon and are often hindered by small sample sizes. We present a high-throughput framework to efficiently identify eligible patients for genetic studies of adverse drug reactions (ADRs) using "drug allergy" labels from electronic health records (EHRs). As a proof-of-concept, we conducted GWAS for ADRs to 14 common drug/drug groups with 81,739 individuals from Vanderbilt University Medical Center's BioVU DNA Biobank. We identified 7 genetic loci associated with ADRs at P < 5 × 10-8, including known genetic associations such as CYP2D6 and OPRM1 for CYP2D6-metabolized opioid ADR. Additional expression quantitative trait loci and phenome-wide association analyses added evidence to the observed associations. Our high-throughput framework is both scalable and portable, enabling impactful pharmacogenomic research to improve precision medicine.

Entities:  

Mesh:

Year:  2021        PMID: 34061827      PMCID: PMC8195357          DOI: 10.1371/journal.pgen.1009593

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   6.020


Introduction

Genome-wide association studies (GWAS) have contributed substantially to precision medicine, providing critical insights into the physiological and pathophysiological mechanisms of human complex traits and diseases. [1,2] However, less than 10% of published GWAS have focused on drug response. [3] Adverse drug reactions (ADRs) are a considerable burden on patients and healthcare systems as a major source of hospitalization, morbidity, and mortality. [4-7] The lack of such pharmacogenomics GWASs on ADRs hinders our ability to deliver the right drug to the right person. [3,6-9] A significant challenge for pharmacogenomics discovery is small sample size. [3,10] Drug response phenotypes, such as ADRs, are less often recorded than physiological traits and common diseases. [3,7,10] Traditional studies that recruit patient cohorts remain cumbersome and costly, and usually result in limited statistical power to detect genetic predictors with small effect sizes. [3,7,10] Biobanks that are linked to electronic health records (EHRs) can generate large datasets for efficient discovery and replication GWASs. [7,10,11] However, defining drug response using EHR data (i.e., pharmacological phenotyping), remains difficult. Unlike disease phenotypes, which can be represented with diagnostic codes, drug response information is often embedded in clinical notes, [11,12] complicating the development and implementation of uniform methods to extract drug response phenotypes. [11,13] In this study, we investigated the feasibility of using the allergy section in EHRs to conduct high-throughput GWAS of reported ADRs. In routine practice, healthcare providers often use “drug allergy” labels in an allergy section to document a patient’s intolerance or allergy to a drug as reported by the patient or observed by a healthcare provider. [14,15] Despite being called an “allergy” section, the documented information most clearly satisfies the definition for ADR, which includes any noxious, unintended or undesired effect of a drug experienced at normal therapeutic doses. [7,16] The ADR information in this section is meant to be reconciled with every patient encounter to capture new information. The allergy section is semi-structured (i.e., some structure but does not adhere to any rigorous format), which allows for easy retrieval of adverse reaction information without sophisticated natural language processing, enabling high-throughput analysis when linked to genetic data. We hypothesized that drug allergy labels from the allergy sections in EHRs can be leveraged for efficient identification of reported ADRs. We developed and applied a high-throughput approach for identifying ADRs from allergy sections in EHRs from Vanderbilt University Medical Center’s (VUMC) Synthetic Derivative. Then using VUMC’s BioVU DNA Biobank, [17] we conducted GWAS on 14 drug (drug class) ADRs in a subset of 67,323 individuals with self-reported European ancestry (EA), followed by trans-ethnic validation in 14,416 individuals with self-reported African ancestry (AA). Additional expression quantitative trait loci (eQTL) analyses and phenome-wide association analyses (PheWAS) were performed on the lead variants. [18,19]

Results

Identifying adverse drug reactions in electronic health records

A summary of selected EHR characteristics for all individuals with available EHRs at VUMC and the selected BioVU individuals is shown in Table 1. The BioVU cohort had mean EHR length of 10.6 years, which was more than double the length of the mean EHR length for all VUMC individuals (4.4 years). Additionally, a greater proportion of BioVU individuals (95.3%) had information documented in their allergy section compared to all VUMC individuals (62.4%). Similarly, a greater proportion of BioVU individuals (63.0%) had at least one reported ADR compared to all VUMC individuals (28.6%). While the proportion of individuals that have information documented in their allergy section is similar between EAs and AAs, we observed that the proportion of individuals with reported ADRs was greater among EAs compared to AAs for all VUMC individuals and the BioVU cohort.
Table 1

Summary of selected EHR characteristics for all VUMC individuals and the BioVU individuals selected for genetic analyses, stratified by self-reported ancestry.

CohortNEHR length (years), Mean ± SDHave allergy section*At least one ADR*
All individuals3,169,6254.4 ± 6.01,979,220 (62.4)905,301 (28.6%)
 European ancestry1,957,8465.3 ± 6.31,376,127 (70.2)679,141 (34.7%)
 African ancestry310,8646.3 ± 7.0214,149 (68.8)77,223 (24.8%)
 Other900,9151.6 ± 3.4388,944 (43.2)148,937 (16.5%)
BioVU individuals81,73910.6 ± 7.377,907 (95.3)51,534 (63.0)
 European ancestry67,32310.7 ± 7.264,166 (95.3)44,407 (66.0)
 African ancestry14,41610.0 ± 7.813,714 (95.3)7,127 (48.4)

* Reporting count and row percentages for the respective cohort

* Reporting count and row percentages for the respective cohort The most frequently documented ADRs were to penicillins (17.4%), sulfa drugs (11.6%), and codeine (9.1%). Cases and controls for GWAS of 14 adverse drug or drug group reactions are shown in Table 2. We selected the top 10 most frequent drugs or drug classes reported in the allergy sections: penicillins, sulfa drugs, codeine, morphine, aspirin, lisinopril, levofloxacin, erythromycin, meperidine, and cephalexin. The top 10 most frequently reported drugs in the allergy sections were the same for EAs and AAs with differences in ordering. Additionally, we observed that ADRs to statins as a class of drugs were reported frequently. Therefore, we identified ADRs to any statin for a grouped analysis since the class of drugs shares a similar metabolic pathway and further broke down ADRs into atorvastatin only or simvastatin only. Likewise, we selected CYP2D6-metabolized opioid prodrugs, including codeine, hydrocodone, oxycodone, and tramadol, as a grouped analysis. [20] Types of adverse drug reactions for the 14 selected drug or drug groups are summarized in S1 Table. The type of reaction is not always documented in the allergy section and the percent missing ranges from 24.4% to 58.0%.
Table 2

Case and control counts for adverse drug reactions to 14 selected drugs or drug groups, stratified by self-reported ancestry.

Drugs/Drug GroupsEuropean ancestry (N = 67,323)African ancestry (N = 14,416)
Cases (%)aControlsCases (%)Controls
Penicillin12294 (18.3)382841894 (13.1)9539
Sulfa8492 (12.6)46642964 (6.7)11085
Codeine6706 (10.0)24579706 (4.9)7330
Morphine3646 (5.4)38181450 (3.1)9515
Aspirin1800 (2.7)47351401 (2.8)10264
Lisinopril1591 (2.4)34096439 (3.0)8959
Levofloxacin1737 (2.6)35888136 (0.9)8560
Erythromycin1607 (2.4)23400138 (1.0)7310
Meperidine1499 (2.2)27261112 (0.8)7590
Cephalexin1460 (2.2)30479124 (0.9)7995
Any statin2927 (4.3)42551258 (1.8)9897
Atorvastatin1325 (2.0)3104886 (0.6)8234
Simvastatin1020 (1.5)31394111 (0.8)8053
CYP2D6-metabolized opioidsb10264 (15.2)484451343 (9.3)11288

Reporting count and percentage of self-reported ancestry population identified with ADR

CYP2D6-metabolized opioids include codeine, hydrocodone, oxycodone, and tramadol

Reporting count and percentage of self-reported ancestry population identified with ADR CYP2D6-metabolized opioids include codeine, hydrocodone, oxycodone, and tramadol

Genome-wide analysis

The genetic analyses for EAs identified genome-wide significant signals (P < 5 × 10−8) for 7 of the 14 adverse drug reactions. The lead variant for each signal is shown in Table 3, and additional correlated variants are reported in S2 Table. The trans-ethnic validation of the identified signals for EAs in the AA cohort yielded no significant findings (S3 Table). Genome-wide analyses in AA individuals were excluded in our primary analysis due to the potential for unstable point estimates and inflated false discovery rates from limited sample size. Nonetheless, significant ADR-genetic associations in AAs may be informative for future studies and have been included S4 Table.
Table 3

Lead variant per signal associated with adverse drug reactions for European ancestry patients.

Adverse Drug ReactionVariantMapped GeneConsequenceAlleleaEAFR2OR (95% CI)bP
Aspirinrs115346678SSBP2, ATG10intergenicG/A0.010.982.03 (1.79 to 2.28)1.40 × 10−8
Cephalexinrs34545984LOC105376453, OTUD1intergenicG/T0.010.502.03 (1.79 to 2.28)1.23 × 10−8
Codeiners9620007WBP2NL (CYP2D6)intronicC/G0.300.980.84 (0.79 to 0.89)1.24 × 10−13
CYP2D6-metabolized opioidsrs62436463OPRM1intronicC/T0.100.940.84 (0.79 to 0.90)5.43 × 10−10
rs739296SEPTIN3 (CYP2D6)intronicG/A0.300.990.86 (0.83 to 0.90)1.08 × 10−16
Meperidiners11049274PTHLH, LOC729291intergenicG/A0.080.991.42 (1.30 to 1.54)2.09 × 10−8
rs113100019FIP1L1intronicT/G0.010.822.10 (1.84 to 2.36)2.26 × 10−8
rs185462714SERINC5intronicA/G0.010.822.09 (1.83 to 2.35)3.37 × 10−8
Penicillinrs115200108HLA-B, MICA-AS1intergenicC/A0.020.991.30 (1.21 to 1.39)4.23 × 10−9
Simvastatinrs76103438DIPK2A, LNCSRLRintergenicT/A0.030.901.88 (1.65 to 2.09)2.56 × 10−8

EAF = Effect allele frequency; R2 = imputation quality

a Alleles are listed as reference/effect and are reported in the forward strand.

b OR and 95% CIs were derived from logistic regression models adjusted for sex, age, length of electronic health records (years), and first 10 principal components.

EAF = Effect allele frequency; R2 = imputation quality a Alleles are listed as reference/effect and are reported in the forward strand. b OR and 95% CIs were derived from logistic regression models adjusted for sex, age, length of electronic health records (years), and first 10 principal components. The opioids shown in Fig 1 are prodrugs metabolized to a morphine or morphine-like active metabolites by CYP2D6. We identified a strong genome-wide significant association signal near the CYP2D6 gene for codeine and CYP2D6-metabolized opioid ADRs (Fig 1). Near the CYP2D6 locus, the minor allele of the variant rs9620007 (G) was associated with reduced risk of codeine ADRs (Odds ratio [OR] = 0.84; 95% confidence interval [CI] = 0.79 to 0.89) and CYP2D6-metabolized opioid ADRs (OR = 0.86; 95% CI = 0.82 to 0.90). Additionally, the nearby variant rs739296 (A) was associated with reduced risk of CYP2D6-metabolized opioid ADRs (OR = 0.86; 95% CI = 0.83 to 0.90). The rs739296 (A) variant was also associated with reduced risk of specifically nausea/vomiting reactions to CYP2D6-metabolized opioids (OR = 0.80; 95% CI = 0.74 to 0.86). We found a significant association for OPRM1 and CYP2D6-metabolized opioid ADRs, where individuals carrying the minor allele of the lead variant rs62436463 (T) were less likely to have a reported ADR (OR = 0.84; 95% CI = 0.79 to 0.90). Notably, the minor allele of the exonic variant rs1799971 (G) in OPRM1, which is in high LD with the lead variant rs62436463, was also associated with reduced risk of CYP2D6-metabolized opioid ADRs (OR = 0.86; 95% CI = 0.82 to 0.91).
Fig 1

A) Manhattan plots of genome-wide association studies (GWAS) for codeine (left) and CYP2D6-metabolized opioid (right) adverse drug reactions (ADRs). Red lines on Manhattan plots show the genome-wide significance level (P < 5.0 × 10−8). B) CYP2D6 locus for CYP2D6-metabolized opioid ADRs. SNPs are colored according to their linkage disequilibrium (LD, based on 1000 Genome phase3 EUR reference panel) with the lead variant rs739296 (22:42389948), which is marked with a purple diamond. The lead variant rs9620007 (22:42405657) for codeine ADRs is also labeled. Dotted gray line shows the genome-wide significance level (P < 5.0 × 10−8).

A) Manhattan plots of genome-wide association studies (GWAS) for codeine (left) and CYP2D6-metabolized opioid (right) adverse drug reactions (ADRs). Red lines on Manhattan plots show the genome-wide significance level (P < 5.0 × 10−8). B) CYP2D6 locus for CYP2D6-metabolized opioid ADRs. SNPs are colored according to their linkage disequilibrium (LD, based on 1000 Genome phase3 EUR reference panel) with the lead variant rs739296 (22:42389948), which is marked with a purple diamond. The lead variant rs9620007 (22:42405657) for codeine ADRs is also labeled. Dotted gray line shows the genome-wide significance level (P < 5.0 × 10−8). For meperidine ADRs, the analysis revealed a genome-wide significant association signal upstream of PTHLH and two significantly associated variants in FIPL1 and SERINC5 (Fig 2A). Additionally, we identified a genome-wide significant signal in the major histocompatibility complex (MHC) region for penicillin ADR (Fig 2B). The minor allele of the lead variant rs115200108, which is located between HLA-B and MICA, was significantly associated with increased risk of penicillin ADRs (OR = 1.30, 95% CI = 1.21 to 1.39).
Fig 2

Risk loci for meperidine (a) and penicillin (b) adverse drug reactions (ADRs). SNPs are colored according to their linkage disequilibrium (LD, based on 1000 Genome phase3 EUR reference panel) with the lead variants rs11049274 (12:28161055) for meperidine ADRs and rs115200108 (6:31327622) for penicillin ADRs, which are marked with a purple diamond. Dotted gray line shows the genome-wide significance level (P < 5.0 × 10−8).

Risk loci for meperidine (a) and penicillin (b) adverse drug reactions (ADRs). SNPs are colored according to their linkage disequilibrium (LD, based on 1000 Genome phase3 EUR reference panel) with the lead variants rs11049274 (12:28161055) for meperidine ADRs and rs115200108 (6:31327622) for penicillin ADRs, which are marked with a purple diamond. Dotted gray line shows the genome-wide significance level (P < 5.0 × 10−8). We also identified three low-frequency variants (minor allele frequency [EAF] < 0.05) that were strongly associated with ADRs to aspirin (rs115346678; OR = 2.03; 95% CI = 1.79 to 2.28), cephalexin (rs34545984; OR = 2.03; 95% CI = 1.79 to 2.28), and simvastatin (rs76103438; OR = 1.88; 95% CI = 1.65 to 2.09).

Expression quantitative trait loci analyses

Using data from the Genotype-Tissue Expression (GTEx) project, we evaluated the correlation of the lead variants for the genetic loci identified by the GWAS and expression levels of putative target genes. For CYP2D6-metabolized opioid ADRs, the A allele of lead variant rs739296 in the CYP2D6 locus was most significantly associated with decreased WBP2NL expression in adipose tissue (normalized effect size [NES] = -0.33; P = 1.9 × 10−11) and increased CYP2D6 expression in brain tissue (NES = 0.55; P = 5.3 × 10−11). The T allele of the lead variant rs62436463 and the G allele of the exonic variant rs1799971 in OPRM1 were both associated with higher OPRM1 expression in the cerebellum with NES of 0.70 (P = 9.5 × 10−8) and 0.63 (P = 1.4 × 10−7), respectively. The A allele of the lead variant rs11049274 in PTHLH for meperidine ADRs was significantly associated with increased PTHLH expression in muscoskeletal tissue (NES = 0.28; P = 6.5 × 10−5). Additionally, the A allele of rs115200108 for penicillin ADRs was most significantly associated with higher MIR6891 expression in adipose tissue (NES = 1.3; P = 2.0 × 10−13) and reduced MICA expression in whole blood tissue (NES = -0.72; P = 2.0 × 10−13).

Phenome-wide analyses

To compare our framework with the ability of diagnosis codes to identify ADRs, we performed PheWAS of the lead variants from the identified genetic loci (CYP2D6, OPRM1, PTHLH, HLA/MICA) (S1 Fig). The lead variant rs115200108 in the HLA/MICA risk-locus was associated with increased risk of ‘Poisoning by antibiotic’ with an (OR = 2.37; 95% CI = 1.90 to 2.84; P = 3.0 × 10−4) but did not reach phenome-wide significance (P < 5.0 × 10−5).

Discussion

In this study, we present a high-throughput and scalable approach to conduct large-scale, genome-wide analyses for adverse drug reactions. Our framework can be adapted or shared between institutions, helping facilitate collaboration between sites. Utilizing EHRs allowed us to study ADRs in individuals with diverse clinical and ethnic backgrounds under the conditions of routine clinical care. As shown in this study, what and how physicians choose to document clinical observations or patients’ self-reported details as drug allergies in the EHR may provide useful information. In addition, our results demonstrated the potential of utilizing EHRs and our framework to efficiently generate pharmacogenomic findings, which can provide insights for optimizing drug therapy with maximal efficacy and minimal adverse effects. We found that 28.6% of individuals at VUMC had at least one drug listed in the allergy section of their EHRs. This is consistent with other studies have reported between 20 to 35 percent of their populations have at least one drug allergy label in their EHRs. [14,21] The genotyped BioVU cohort is a patient cohort (i.e., receives more frequent medical care than general population) and has more dense EHR data, which may explain the higher proportion of the BioVU cohort (66.0%) that reported at least one ADR. We also observed a lower proportion of reported ADRs among AAs than EAs, which is consistent with a previous report. [14] As noted by the previous study, the difference in the reported ADRs between AAs and EAs may reflect a documentation bias that has been reported in other clinical domains. [14] Using our ADR case-control definitions, analyses identified genetic loci for 7 of the 14 selected drug/drug group allergies. We found that variants in two well-known genetic loci, CYP2D6 and OPRM1, were associated with reduced risk of CYP2D6-metabolized opioid ADRs. The analysis of eQTL data from the GTEx project showed that variants in the CYP2D6 locus and in OPRM1 were associated with elevated expression of these genes in the brain. [22] Previous studies have implicated both of these genetic loci in opioid response and metabolism. [23-25] Notably, an independent report on variants associated with reduced risk of opioid-induced vomiting in a 23andMe cohort supported our findings that the minor alleles of rs9620007 near CYP2D6 and rs1799971 in OPRM1 were associated with reduced risk of CYP2D6-metabolized opioid ADRs. [26] Furthermore, our analysis of CYP2D6-metabolized opioid related nausea or vomiting also identified the same loci near CYP2D6 as associated with reduced risk. However, CYP2D6 metabolic activity also varies greatly depending on a copy number variation, [23] which was not available for this study. Therefore, further work is needed to better understand the contributions of genetic variations to CYP2D6-metabolized opioid ADRs. Additionally, studies have reported that patients who carried the G allele of rs1799971 in OPRM1 required higher doses of opioid for pain relief. [27,28] It is possible that patients carrying the minor allele for the significant variants in OPRM1 experienced reduced opioid effectiveness, which may affect their opioid sensitivity and risk of adverse reaction depending on the opioid dosage. We also identified HLA-MICA as a risk-locus for penicillin ADR, which is supported with a recent large-scale genetic analysis for penicillin allergy including data from UK Biobank, Estonian Biobank and BioVU. The previous study also showed a strong association between penicillin allergy label and the HLA-MICA region with a different lead variant. [29] The eQTL analysis showed that the minor allele of the lead variant rs115200108 in the HLA-MICA risk-locus for penicillin ADR was associated with reduced MICA expression in whole blood tissue. The PheWAS results found that the minor allele of rs115200108 was highly associated with increased risk of ‘Poisoning by antibiotic,’ but did not reach phenome-wide significance. This finding suggests that our approach to identifying ADRs not only offers ADR phenotypes that are not covered by diagnosis codes but may also provide more power for genetic analyses than using diagnostic codes alone. There have been no previous studies regarding the associations between PTHLH and meperidine allergy. In our eQTL analysis, we found that the lead variant in the PTHLH risk-locus for meperidine allergy was associated with increased PTHLH expression in muscoskeletal tissue. However, further investigation is needed to confirm this finding. Trans-ethnic validation among individuals with self-reported African ancestry did not replicate any associations of genome-wide significance, but this analysis may have been limited by smaller sample size. Additionally, we performed genetic imputation with reference panels from the Haplotype Reference Consortium, which were developed with individuals from predominantly European ancestry and therefore may not be adequate for individuals with self-reported African ancestry. [30] Likewise, genome-wide analyses in the African ancestry cohort were also limited by small sample sizes and predominantly European ancestry genetic reference panels. Further improvements in ADR documentation and genetic reference panels as well as the continued growth of EHR data may help us determine the generalizability of these findings in diverse populations. Due to the high-throughput nature of our framework, it should be easy to adapt to other large multi-ancestry EHR-based biobanks for future analyses. There are several additional limitations to this study and approach. Drug allergy labels in the allergy section are entered into the EHRs by healthcare providers, but this information is often self-reported or subject to interpretation bias by the individual receiving the information and entering the data, introducing potential documentation or selection bias. For instance, patients who communicate with their healthcare provider more frequently, whether due to their specific conditions or due to socio-behavioral factors, may be more likely to report their adverse drug reactions. A better understanding of the factors that affect the likelihood of receiving a drug allergy label may improve our ability to utilize EHRs to study ADRs. Additionally, it is likely there were some misclassification errors in the controls. Controls who were exposed to the drug and experienced an adverse reaction may not have reported the reaction to their clinician to be documented. Similarly, controls who were never exposed to the drug and only had the “no known drug allergy” label may experience an adverse reaction when exposed to the drug. However, misclassifications of cases as controls most likely biases the results to null and leads to an underestimation of the true contribution of genetic variation to ADRs. While a drug allergy labels in the allergy section is consistent with a previous adverse drug reaction to the drug, more detailed questioning often reveals that a true allergy is less certain. [15,31] For instance the vast majority of patients who are labeled as having a penicillin allergy were typically labeled much earlier in childhood. [32-34] Studies in allergy practice show that >95% of these individuals that undergo validated skin testing and challenge will tolerate penicillin, in part due to waning of this allergic response over time. [31] Therefore, our analysis did not consider the possibility of patients having lost their allergic tendency and being delabeled for a drug allergy, and our results should be explained as ‘ever or never’ reported an adverse reaction to a drug. Indeed, it is more challenging to capture specific details in the EHR when identifying individuals who ever had a penicillin allergy label, rather than those who currently have a penicillin allergy. We also observed that clinicians often do not enter information in the allergy section in a standardized manner, especially in older EHRs. Drug allergies and drug intolerances are frequently documented together in the allergy section without clear distinguishers. In addition, allergy section entries often omit details such as severity, type of reaction (e.g., anaphylaxis vs. rash), specific dose, and time of administration, limiting nuanced analyses. Although the CYP2D6-metabolized opioid related nausea/vomiting findings demonstrate that our framework can extract more detailed ADR phenotypes, the frequency of missing reaction information hinders a high-throughput analyses of specific adverse effects. Thus, the high-throughput nature of our framework means that our genetic analyses were likely driven by the milder, more frequent reactions (e.g., rash from penicillin) rather than rarer phenotypes like Stevens-Johnson syndrome. Nonetheless, genetic variants identified with our framework need further follow-up to better understand the potential risks of a medication for a patient. For instance, labeling a patient to be broadly ‘at risk’ for an ADR may cause the patient to be given suboptimal therapy even if the reaction may be a common, expected side effect. These observation highlights the need to emphasize efforts to capture more accurate and relevant drug response information. Our framework will yield better outcomes as newer EHR systems introduce more explicit semantic meaning (e.g., allergy vs. intolerance), structured inputs and questionnaires (e.g., drop-down menus or checkboxes), [15] and increased quantity of quality data to the allergy section. Although these improvements require time and planning, it is encouraging that our current study in the context of these limitations can successfully identify several known genetic associations for ADRs. In summary, our results demonstrate the utility and efficacy of a high-throughput framework to identifying ADRs and eligible individuals from EHRs for large-scale studies. Our approach is scalable and portable and can help accelerate the pace of impactful pharmacogenomic research for advancing precision medicine.

Methods

Ethics statement

This study was approved by VUMC Institutional Review Board (#150475). Written consent was obtained for use of genetic data (https://victr.vumc.org/biovu-consent/).

Identifying adverse drug reactions in EHRs

For a given patient, allergy sections across all their clinical notes were extracted as free text. The data in an allergy section is often semi-structured (e.g., pcn [rash] and sulfa [itching]), but formatting can vary depending on the healthcare provider who entered the data. Therefore, drugs that appear in the allergy section were identified using case-insensitive regular expressions for generic names, brand names, abbreviations (e.g., pcn for penicillin), and common misspellings. Regular expressions allow us to match drug keywords within a drug allergy label irrespective of formatting. A full list of regular expressions used to identify drugs in this study can be found in S5 Table. The type of reaction was identified similarly with regular expressions when available. For drug allergy labels that refer to a class of drugs (e.g., penicillin, sulfa, etc.), we grouped all the drugs in the class as one ADR phenotype. For each drug, we defined cases as individuals with any mention of the drug in the allergy section. For controls, we included individuals that met either of two criteria: 1) individuals who were prescribed the drug and had no mention of the drug in their allergy sections; or 2) individuals who only had labels for “no known drug allergy” or an equivalent description in of their allergy sections. For the first criteria, we used RxNorm codes–a normalized naming system for generic and branded drug–to identify individuals with prescriptions of the drug of interest.

Genotyping and SNP imputation

Genotyping was performed on the Infinium Multi-Ethnic Genotyping Array (MEGAchip). We excluded DNA samples: (1) with per-individual call rate < 95%; (2) with wrongly assigned sex; (3) with a cryptic relationship closer than a third-degree relative (proportion identity by descent ≥0.25); or (4) unexpected duplication. We performed whole genome imputation using the Michigan Imputation Server (https://imputationserver.sph.umich.edu) [35] with the Haplotype Reference Consortium (HRC), version r1.1, [36] as reference. Principal components for ancestry (PCs) were calculated using common variants (MAF > 0.01) with high variant call rate (> 98%), excluding variants in linkage and regions known to affect PCs (HLA region on chromosome 6, inversion on chromosome 8 (8135000–12000000) and inversion on chr 17 (40900000–45000000), GRCh37 build). For association analyses, we used EasyQC (www.genepi-regensburg.de/easyqc) [37] to filter (1) poorly imputed variants with imputation quality (R2) value of < 0.5, (2) EAF < 0.005, (3) deviation from Hardy-Weinberg equilibrium with a P-value ≤ 1×10−6 and (4) variants with EAF that deviated from the HRC reference panel by > 0.3.

Genetic analyses

All statistical analyses were performed with PLINK 2.0.[38] This study included 81,739 individuals from the Vanderbilt University Medical Center’s BioVU DNA Biobank, [17] including GWAS data from 67,323 individuals with self-reported European ancestry and trans-ethnic validation using 14,416 individuals with self-reported African ancestry. We applied logistic regression models to investigate the association of genetic variants with risk of ADR to any of the 14 drugs or drug groups selected for this study. All regression models were adjusted for sex, age, length of EHR, and the first 10 principal components of the genotyping array for ancestry. Association results were annotated with ANNOVAR. [39] Region plots were produced with LocusZoom. [40] Additional eQTL analysis used data from the Genotype-Tissue Expression (GTEx) project (www.gtexportal.org). [22] For PheWAS, we used logistic regression models with phecodes, which are diagnosis codes aggregated into meaningful disease phenotypes that have been commonly used in phenome-wide analyses. [18,19] Patients with ≥ 2 phecodes were assigned to case. All regression models were again adjusted for sex, age, length of EHR, and the first 10 principal components of the genotyping array for ancestry.

Summary of types of adverse drug reactions stratified by ancestry.

(PDF) Click here for additional data file.

Genome-wide significant variants associated with adverse drug reactions in European ancestry individuals.

(PDF) Click here for additional data file.

Trans-ethnic replication of lead variant per signal associated with adverse drug reactions in individuals with self-reported African ancestry.

(PDF) Click here for additional data file.

Genome-wide significant variants associated with adverse drug reactions in self-reported African ancestry individuals.

(PDF) Click here for additional data file.

Regular expressions for extracting adverse drug reactions from the ‘Allergy section’ of electronic health records.

(PDF) Click here for additional data file.

Manhattan plots of phenome-wide analysis of lead variants in CYP2D6, OPRM1, PTHLH, and HLA/MICA associated with adverse drug reactions (ADRs).

Red lines on Manhattan plots show the phenome-wide level of significance (5.0 × 10−5). Phenotypes with P-values < 0.005 were annotated. (PDF) Click here for additional data file. 1 Mar 2021 Dear Dr Wei, Thank you very much for submitting your Research Article entitled 'High-throughput genetic analyses of adverse drug reactions using electronic health records' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Gregory M. Cooper, PhD Associate Editor PLOS Genetics Scott Williams Section Editor: Natural Variation PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: While I agree that the drug allergy section in the EHR often lists ADRs, true allergies are also reported and captured (PMID:30100688 Figure 2). I am concerned that combining anything listed in this section for a certain drug does not provide an appropriate phenotype. Given that allergy data is not uniformly collected and/or reported and can represent everything from mild, expected, patient-reported side effects to hypersensitivity reactions, knowing the types of “allergies” reported is important to be able to determine risk-benefit. Grouping these effects in this manner assumes that genetic factors associated with gastrointestinal patient-reported side effects from antibiotics are the same that contribute to the development of Stevens Johnson Syndrome or other hypersensitivity reactions. While the authors address the inability to ascertain the specific drug reaction as a limitation, they also state that in their previous study (Ariosto 2014), the majority of patients had gastrointestinal effects when opioid allergy alerts were evaluated. Therefore, it seems that the type of allergy/ADR can be obtained in some circumstances. Would like to see a breakdown of the types of drug reactions that were included for each drug/drug class when this data was available. Without this information, it is difficult to determine (1) if the phenotype was appropriate for the GWAS and (2) what the results actually indicate. Broadly indicating a genetic variant is associated with an ADR does not provide any information on the potential risk of the medication for the patient. Expected GI side effects and hypersensitivity reactions would not result in the same clinical management decisions, and deeming a patient "at risk" for an ADR/allergy could cause the patient to be given suboptimal therapy when the risk was actually only a common, expected side effect. Can the authors comment on this, as this was not addressed in the manuscript, and is a major limitation. Page 12: The authors conclude that this framework provides insights for optimizing drug therapy with maximal efficacy and minimal adverse effects. While this approach may help minimize adverse effects, it does not provide information on treatment efficacy, as lack of an ADR/allergy does not indicate that the drug is effective. Recommend rewording. Page 12: The authors state that “utilizing EHRs allowed us to study ADRs in individuals with diverse clinical and ethnic backgrounds.” However, the GWAS was not conducted in patients of African ancestry and the identified associations were not validated in these patients, so this did not seem to be the best the method to study ADRs in non-Europeans. I am wondering (1) If the top 10 most reported drugs in the allergy section were the same for patients of European and African ancestry, and (2) given that there are known ancestral differences in susceptibility to certain ADRs (e.g. ACE inhibitor-induced angioedema in African ancestry), other than sample size, was there a reason the GWAS not also conducted in the African ancestry individuals? Given that the associations did not validate in African Americans, was any attempt made to validate the identified associations in individuals of European ancestry? Since previously identified and replicated associations for statin-induced myopathy (i.e. SLCO1B1) and aspirin-associated ADRs, such as aspirin-induced asthma (HLA-DPB1), were not identified in the study, I am wondering if this could be due to how the drug allergy phenotype was defined, or if the authors have another explanation. Reviewer #2: Zheng et al report the results of a genome-wide association study of adverse drug reactions using the “allergy section” of the electronic health-care records of a large medical center coupled to a biobank. The authors investigated 14 common drug/drug groups and report 7 genetic loci associated with ADRs at genome wide significance of p< 5 x 10-8. A considerable strength of the manuscript is the large sample size for a PGx analysis including ~81.7k participants combined with the use of EHR data. The authors may consider the following points to improve their manuscript. Major comments: • While a strength of the paper is the considerable size, the reported findings are confirmatory and lack novelty unfortunately. In fact the paper is more a method validation than a discovery study and this should be reflected in the title and throughout the paper. Therefore I would encourage to show more data on the text mining of the “allergy section” • It is not clear to me how the 14 drugs/groups were selected. This should be described in more detail. From the title I would have thought that the authors for instance would have focused on severe hypersensitivity reactions i.e. stevens–Johnson syndrome, toxic epidermal necrolysis, drug-induced liver injury. • In the introduction the authors correctly address the challenge that “Drug response phenotypes, such as ADRs, are less often recorded than physiological traits and common diseases” To be able to assess drug response phenotypes detailed information regarding dose, interval and time relation with occurrence of the ADR is important. Was more detailed information regarding dose and time of administration available from the EHR and considered for inclusion in the analysis? • The authors correct for multiple testing of the genetic variant but it is not clear how many endpoints were tested and if this is corrected for • The methods section is quite brief and should be expanded. For example, it would be useful to include detailed information on the identification of ADRs from the allergy sections free text. Minor comments • The paper is very brief and little detail is provided. For example, the results section on the PheWAS is only 2 sentences. • The figures are of low resolution and should be improved ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: None Reviewer #2: No: Authors claim data are part of EHR and cannot be shared ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 30 Apr 2021 Submitted filename: ADR_response.docx Click here for additional data file. 10 May 2021 Dear Dr Wei, We are pleased to inform you that your manuscript entitled "High-throughput framework for genetic analyses of adverse drug reactions using electronic health records" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Gregory M. Cooper, PhD Associate Editor PLOS Genetics Scott Williams Section Editor: Natural Variation PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-20-01665R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 25 May 2021 PGENETICS-D-20-01665R1 High-throughput framework for genetic analyses of adverse drug reactions using electronic health records Dear Dr Wei, We are pleased to inform you that your manuscript entitled "High-throughput framework for genetic analyses of adverse drug reactions using electronic health records" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofi Zombor PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics
  39 in total

1.  Development of a large-scale de-identified DNA biobank to enable personalized medicine.

Authors:  D M Roden; J M Pulley; M A Basford; G R Bernard; E W Clayton; J R Balser; D R Masys
Journal:  Clin Pharmacol Ther       Date:  2008-05-21       Impact factor: 6.875

2.  Pharmacokinetics of codeine and its metabolite morphine in ultra-rapid metabolizers due to CYP2D6 duplication.

Authors:  J Kirchheiner; H Schmidt; M Tzvetkov; J-T H A Keulen; J Lötsch; I Roots; J Brockmöller
Journal:  Pharmacogenomics J       Date:  2006-07-04       Impact factor: 3.550

3.  Penicillin Allergy. Reply.

Authors:  Mariana Castells; David A Khan; Elizabeth J Phillips
Journal:  N Engl J Med       Date:  2020-04-02       Impact factor: 91.245

4.  Genotype imputation performance of three reference panels using African ancestry individuals.

Authors:  Candelaria Vergara; Margaret M Parker; Liliana Franco; Michael H Cho; Ana V Valencia-Duarte; Terri H Beaty; Priya Duggal
Journal:  Hum Genet       Date:  2018-04-10       Impact factor: 4.132

5.  Next-generation genotype imputation service and methods.

Authors:  Sayantan Das; Lukas Forer; Sebastian Schönherr; Carlo Sidore; Adam E Locke; Alan Kwong; Scott I Vrieze; Emily Y Chew; Shawn Levy; Matt McGue; David Schlessinger; Dwight Stambolian; Po-Ru Loh; William G Iacono; Anand Swaroop; Laura J Scott; Francesco Cucca; Florian Kronenberg; Michael Boehnke; Gonçalo R Abecasis; Christian Fuchsberger
Journal:  Nat Genet       Date:  2016-08-29       Impact factor: 38.330

Review 6.  The value of CYP2D6 and OPRM1 pharmacogenetic testing for opioid therapy.

Authors:  Kristen K Reynolds; Bronwyn Ramey-Hartung; Saeed A Jortani
Journal:  Clin Lab Med       Date:  2008-12       Impact factor: 1.935

7.  The Genotype-Tissue Expression (GTEx) project.

Authors: 
Journal:  Nat Genet       Date:  2013-06       Impact factor: 38.330

8.  Readiness for PENicillin allergy testing: Perception of Allergy Label (PEN-PAL) survey.

Authors:  David T Coleman; Cosby A Stone; Wei-Qi Wei; Elizabeth J Phillips
Journal:  J Allergy Clin Immunol Pract       Date:  2020-04-15

Review 9.  Pharmacogenomics.

Authors:  Dan M Roden; Howard L McLeod; Mary V Relling; Marc S Williams; George A Mensah; Josh F Peterson; Sara L Van Driest
Journal:  Lancet       Date:  2019-08-05       Impact factor: 79.321

10.  CYP2D6 phenotypes are associated with adverse outcomes related to opioid medications.

Authors:  Jennifer L St Sauver; Janet E Olson; Veronique L Roger; Wayne T Nicholson; John L Black; Paul Y Takahashi; Pedro J Caraballo; Elizabeth J Bell; Debra J Jacobson; Nicholas B Larson; Suzette J Bielinski
Journal:  Pharmgenomics Pers Med       Date:  2017-07-24
View more
  1 in total

Review 1.  The Use of Electronic Health Records to Study Drug-Induced Hypersensitivity Reactions from 2000 to 2021: A Systematic Review.

Authors:  Fatima Bassir; Sheril Varghese; Liqin Wang; Yen Po Chin; Li Zhou
Journal:  Immunol Allergy Clin North Am       Date:  2022-03-31       Impact factor: 3.152

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.