Literature DB >> 28099408

Multiphenotype association study of patients randomized to initiate antiretroviral regimens in AIDS Clinical Trials Group protocol A5202.

Anurag Verma¹, Yuki Bradford, Shefali S Verma, Sarah A Pendergrass, Eric S Daar, Charles Venuto, Gene D Morse, Marylyn D Ritchie, David W Haas.

Abstract

BACKGROUND: High-throughput approaches are increasingly being used to identify genetic associations across multiple phenotypes simultaneously. Here, we describe a pilot analysis that considered multiple on-treatment laboratory phenotypes from antiretroviral therapy-naive patients who were randomized to initiate antiretroviral regimens in a prospective clinical trial, AIDS Clinical Trials Group protocol A5202. PARTICIPANTS AND METHODS: From among 5 9545 294 polymorphisms imputed genome-wide, we analyzed 2544, including 2124 annotated in the PharmGKB, and 420 previously associated with traits in the GWAS Catalog. We derived 774 phenotypes on the basis of context from six variables: plasma atazanavir (ATV) pharmacokinetics, plasma efavirenz (EFV) pharmacokinetics, change in the CD4+ T-cell count, HIV-1 RNA suppression, fasting low-density lipoprotein-cholesterol, and fasting triglycerides. Permutation testing assessed the likelihood of associations being by chance alone. Pleiotropy was assessed for polymorphisms with the lowest P-values.
RESULTS: This analysis included 1181 patients. At P less than 1.5×10, most associations were not by chance alone. Polymorphisms with the lowest P-values for EFV pharmacokinetics (CYPB26 rs3745274), low-density lipoprotein -cholesterol (APOE rs7412), and triglyceride (APOA5 rs651821) phenotypes had been associated previously with those traits in previous studies. The association between triglycerides and rs651821 was present with ATV-containing regimens, but not with EFV-containing regimens. Polymorphisms with the lowest P-values for ATV pharmacokinetics, CD4 T-cell count, and HIV-1 RNA phenotypes had not been reported previously to be associated with that trait.
CONCLUSION: Using data from a prospective HIV clinical trial, we identified expected genetic associations, potentially novel associations, and at least one context-dependent association. This study supports high-throughput strategies that simultaneously explore multiple phenotypes from clinical trials' datasets for genetic associations.

RCT Entities: Population Interventions Outcomes

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2017 PMID： 28099408 PMCID： PMC5285297 DOI： 10.1097/FPC.0000000000000263

Source DB: PubMed Journal: Pharmacogenet Genomics ISSN： 1744-6872 Impact factor: 2.089

Introduction

Access to safe and effective antiretroviral therapy (ART) is critical in the global response to the AIDS pandemic. Genetic polymorphisms in drug absorption, distribution, metabolism, and elimination (ADME) genes and off-target genes have convincingly been shown to be associated with adverse effects and/or pharmacokinetics of antiretroviral drugs including abacavir (ABC) 1, atazanavir (ATV) 2, dolutegravir 3, efavirenz (EFV) 4, etravirine 5, lopinavir 6, and nevirapine 7, and genetic screening to avoid ABC hypersensitivity reaction is now the standard of care in many resource-abundant countries. Genome-wide association studies (GWAS) explore whether an individual trait (i.e. phenotype) associates with single-nucleotide polymorphisms (SNPs) across the genome. Only one phenotype is typically considered in a GWAS. The term ‘phenome’ describes the aggregate of many phenotypes in a given dataset. Phenome-wide association studies (PheWAS) complement GWAS by testing for genotype–phenotype associations across numerous phenotypes 8–12. A PheWAS may interrogate a single SNP against the phenome or may interrogate numerous SNPs simultaneously. Also unique to PheWAS is the ability to identify pleiotropy, whereby one SNP is found to be associated with multiple seemingly unrelated phenotypes 13,14. Context-dependent genetic associations with antiretroviral drugs are well described. Failure to consider context may miss or underestimate important genetic associations. For EFV, some individuals with CYP2B6 slow metabolizer genotypes experience extremely high plasma EFV exposure only in the context of concomitant isoniazid 15–17. Among individuals with CYP2B6 slow metabolizer genotypes, the likelihood of EFV discontinuation for central nervous system side effects appears to be greater in the context of European versus African ancestry 18,19. For ATV, among individuals with UGT1A1 low expressor genotypes, the likelihood of bilirubin-related drug discontinuation is considerably greater in the context of European versus African ancestry 20. With nevirapine, among individuals with HLA risk alleles, severe cutaneous reactions occur largely when nevirapine is initiated in the context of higher CD4 T-cell counts 21. Prospective clinical trials that randomized HIV-infected patients to initiate different antiretroviral regimens, and that involve extensive data collection, offer a special opportunity to apply a multiphenotype analytical approach focused on pharmacogenomics. We previously applied PheWAS to pretreatment (i.e. baseline) laboratory data National Institute of Health-funded AIDS Clinical Trials Group (ACTG) protocols 22. That analysis established that our analysis pipeline for studying multiple phenotypes is robust, with 20 polymorphisms replicating associations with identical or related phenotypes reported in the National Human Genome Research Institute – European Bioinformatics Institute GWAS Catalog 23, including several not reported previously in HIV-positive cohorts. The present analyses explored associations with multiple on-treatment phenotypes from ACTG protocol A5202 24,25. We considered a total of 774 phenotypes representing ATV pharmacokinetics, CD4 T-cell count, EFV pharmacokinetics, fasting low-density lipoprotein (LDL) cholesterol, HIV-1 RNA, and fasting triglycerides, and that were derived by considering various contexts including sex, race/ethnicity, baseline age, baseline body mass index, baseline CD4+ T-cell count, baseline plasma HIV-1 RNA, randomized antiretroviral regimen, and component antiretroviral drug. These context-dependent phenotypes are useful in interpreting genome–phenome association results and highlight relationships of potential interest between these polymorphisms and phenotypes.

Participants and methods

Study participants

AIDS Clinical Trials Group protocol A5202 (ClinTrials.gov NCT00118898) was a phase IIIb equivalence study of four once-daily regimens for the initial treatment of HIV-1 infection. The primary results of A5202 have been reported previously 24,25. Patients enrolled from 2005 to 2007 were randomized to open-label ATV (300 mg) plus ritonavir (RTV, 100 mg) or EFV (600 mg) with either placebo-controlled ABC/lamivudine (3TC) (600 mg/300 mg) or tenofovir disoproxil fumarate/emtricitabine (TDF/FTC, 300 mg/200 mg). Study evaluations included laboratory testing at entry, at weeks 4, 8, 16, and 24, and every 12 weeks thereafter until the last enrolled patient was followed for 96 weeks. Analyses included A5202 participants who consented to provide DNA for genetic research under ACTG protocol A5128.

Phenotypes

For this analysis, we considered laboratory data from A5202 at entry and subsequent on-study weeks, and representing immunologic, virologic, metabolic, and pharmacologic domains. Immunologic phenotypes were derived from CD4+ T-cell count data, which are known to correlate with mortality on ART 26–30. Virologic phenotypes were derived from data on plasma HIV-1 RNA suppression to less than 200 copies/ml, which decreases transmission 31. Metabolic phenotypes were derived from data on fasting LDL cholesterol and fasting triglyceride levels, which are in the causal pathway to myocardial infarction 32,33. Pharmacologic phenotypes were derived from data on EFV and ATV pharmacokinetics, which relate to drug efficacy and toxicity 34–41. We define the terms variable, primary phenotype, and subphenotype as follows: variables represent data without regard to study week or context (e.g. among all study patients, all fasting triglyceride data). Primary phenotypes are derived from variables while also considering study week but without regard to context (e.g. among all study patients, fasting triglycerides at baseline, at study weeks 24, 48, and 96, and change in fasting triglycerides from baseline to week 24, to 48, and to 96). Subphenotypes are derived from primary phenotypes while considering context (e.g. the fasting triglyceride primary phenotype noted above, but only among patients randomized to receive ATV/RTV). Contexts for subphenotypes were defined as follows: categorical context included sex (male or female), self-identified race/ethnicity (White, Black, or Hispanic), randomized antiretroviral regimen (ATV+RTV+ABC/3TC, EFV+ABC/3TC, ATV+RTV+TDF/FTC, or EFV+TDF/FTC), and component antiretroviral drug (ATV+RTV, ABC/3TC, EFV, or TDF/FTC). Because ATV/RTV, ABC/3TC, and TDF/FTC were always prescribed as two-drug combinations, the component drugs could not be analyzed individually. For continuous baseline parameters, continuous context was derived on the basis of percentile cut-offs for age, BMI, CD4+ T-cell count, and plasma HIV-1 RNA (10, 25, 33, 50, 67, 75, and 90 percentile for each). With this approach, we generated a total of 774 primary phenotypes and subphenotypes for analysis, as listed in Supplemental Table 1 (Supplemental digital content 1, ). For each primary phenotype, we examined frequency distribution plots and reviewed summary information, identified phenotypes requiring transformation to approximate normality to fulfill assumptions for linear regression, assured consistent units of measurement, and censored outliers judged to be biologically implausible.

Imputation and QC of genetic data

Patients from A5202 were genotyped with the Illumina 1M duo array as part of a previous immunogenomics project 42. The PLINK program and R statistical programming language were used for QC procedures 43,44. Polymorphisms were censored for call rates below 98%. After excluding 10 samples where genetically inferred sex differed from clinical data, or missing sex status that could not be inferred, 26 samples with overall genotyping call rates below 98%, and one sample with cryptic relatedness on the basis of identity-by-descent estimates of more than 0.3 from ~100 000 pruned SNPs, there were 1221 samples for imputation. Post-QC data were imputed to 1000 genomes 45 after converting into genome build 37 using liftOver 46 and stratifying by chromosome to parallelize imputation processing. ShapeIt2 47 was used to check strand alignment and to phase data. The IMPUTE2 algorithm 48 was used to impute additional genotypes that were available in the 1000 genomes reference panel, but not directly genotyped. Each chromosome was segmented into 6 Mb regions with at least 3500 reference variants in each region. Imputed genotypes were included if posterior probabilities exceeded 0.9. The quality of imputed data was assessed following the Electronic Medical Records and Genomics protocol 49. Each chromosome from each phase was checked for 100% concordance with genotyped data. We excluded imputed SNPs with imputation scores less than 0.3, genotyping call rates below 98%, and minor allele frequencies (MAF) less than 0.01.

Candidate polymorphisms for analysis

From the set of imputed SNPs, we included in this analysis only SNPs for which there was some a priori evidence of a pharmacogenetic association with any drug and phenotype on the basis of data from PharmGKB (Pharmacogenomics Knowledgebase 50). There were 2622 such SNPs in 761 genes that were annotated for a possible drug–phenotype association. Of these 2622 SNPs, we included in this analysis only a subset of 2124 SNPs that were also represented in the imputed, post-QC genome-wide data. In addition to PharmGKB SNPs, from the set of imputed SNPs, we also included SNPs for which previous GWAS had shown evidence of association with any lipid-related trait with a P-value of less than 10−8, as represented in the GWAS Catalog SNPs 23, which includes results from published GWAS fulfilling catalog criteria 51. There were 447 such SNPs, of which we included in this analysis only a subset of 420 SNPs that were also represented in the imputed, post-QC genome-wide data. A total of 2544 SNPs were included in the analysis (listed in Supplemental Table 2, Supplemental digital content 2, ).

Statistical analysis

When linked with available laboratory phenotypes, the final analysis dataset included 1181 patients, 2124 PharmGKB SNPs, and 420 GWAS Catalog SNPs. Using the R statistical package, continuous phenotypes were modeled with linear regression and the dichotomous phenotype with logistic regression 44. The first three principal components, calculated using EIGENSOFT 52, were used to adjust for global ancestry. Each analysis was also adjusted for sex and age. Consideration of context resulted in models of varying sample sizes. For models with at least 100 patients, we excluded SNPs with MAF of less than 0.05. For models with fewer than 100 patients, we excluded SNPs with MAF of less than 0.10. We did not infer or impute missing laboratory data. Permutation testing was used to empirically derive P-value cut-offs (PPT) 53. Briefly, within the analysis dataset, we permuted the connection between genotype and phenotype data. This randomly matches each patient’s genotypes to another patient’s phenotypes, while preserving relationships between genotypes (e.g. linkage disequilibrium) and between phenotypes (e.g. correlations). Permutation was repeated 1000 times, each generating a new dataset. We then carried out the association analysis on each of the 1000 datasets, from which we determined, at various P-value cut-offs, the average number of SNPs per analysis that pass that cut-off in the permuted data (i.e. by chance alone). We compared this average number with the actual number of SNPs that passed that same cut-off in the unpermuted data. This yields probabilities that SNP–phenotype associations at any given P-value threshold in the unpermuted data were by chance alone. Our approach differs from a more traditional permutation approach that would calculate permuted P-values for each association test, the latter permutation approach being computationally prohibitive.

Results

This multiphenotype analysis included data from 1181 patients from A5202, who had consented to provide DNA for genetic research under ACTG protocol A5128. The characteristics of the study patients are presented in Table 1. The characteristics of patients included in the analysis generally reflected the characteristics of all A5202 study patients. From the available baseline and subsequent on-study data, a total of 774 phenotypes were derived for analysis as described under ‘participants and methods’ section. These comprised 19 primary phenotypes as well as 755 subphenotypes that were derived on the basis of baseline age, sex, race/ethnicity, BMI, CD4+ T-cell count, plasma HIV-1 RNA, randomized antiretroviral regimen, and component antiretroviral drug. This generated 68 phenotypes for ATV pharmacokinetics, 84 for CD4 T-cell count, 34 for EFV pharmacokinetics, 252 for fasting LDL cholesterol, 84 for HIV-1 RNA, and 252 for fasting triglycerides. Definitions for each of the 774 phenotypes are provided in Supplemental Table 2 (Supplemental digital content 2, ).

Table 1

Baseline characteristics of study patients included in phenome-wide association studies

Baseline characteristics of study patients included in phenome-wide association studies From the imputed genome-wide genotype data on these study patients, a total of 2501 SNPs (which were represented in either PharmGKB or the GWAS Catalog) provided at least one association result with at least one phenotype in this analysis. As noted in ‘participants and methods’ section, we excluded SNPs with MAF of less than 0.05 from models with at least 100 patients and SNPs with MAF of less than 0.10 from models with fewer than 100 patients. A total of 1 773 707 SNP–phenotype pairs provided P-values for association. To assess the likelihood that associations were by chance alone, permutation testing was used to empirically derive PPT to determine the probability that SNP–phenotype associations in the unpermuted data were by chance alone, as described in ‘participants and methods’ section. For example, at PPT less than 1.5×10−4, 50% of SNP–phenotype pairs in this analysis are likely not by chance alone (Fig. 1). Of the 1 773 707 SNP–phenotype pairs noted above, P-values for 737 (0.04%) were less than this PPT threshold. The number of patients included in each model ranged from 18 (e.g. for EFV concentrations in patients younger than 26 years of age) to 1080 (e.g. for HIV-1 RNA response at 48 weeks), with a median of 242 patients per model.

Fig. 1

Empirically derived P-values on the basis of permutation testing. Permutation testing was used to empirically derive P-value cut-offs (PPT). Briefly, within the dataset used for association analysis, we permuted the connection between genotype and phenotype data. Permutation was repeated 1000 times, each generating a new dataset. We then carried out analyses on each of the 1000 datasets, from which we determined, at various P-value cut-offs, the average number of single nucleotide polymorphisms (SNPs) per analysis that pass that cut-off in the permuted data. We compared this average number with the actual number of SNPs that passed that same cut-off in the unpermuted data, providing an empiric determination of the probability that SNP–phenotype associations in the unpermuted data were by chance alone. Within each phenotype domain, association results for the five SNPs with the lowest P-value with at least one phenotype are presented in Table 2. For EFV pharmacokinetics, fasting LDL-cholesterol, and fasting triglyceride phenotypes, the SNP with the lowest P-value had previously been associated with that trait at P less than 5.0×10−8 in at least one GWAS 4,54. For the five SNPs with the lowest P-values in EFV pharmacokinetics, fasting LDL-cholesterol, and fasting triglyceride domains (15 SNPs total), Manhattan plots for associations between each SNP and as many as 774 phenotypes across all six domains are shown in Fig. 2.

Table 2

Association results for the five lowest P-value single nucleotide polymorphisms within each phenotype domain

Fig. 2

Manhattan plots representing all phenotype associations for the five single nucleotide polymorphisms (SNPs) with the lowest P-values for efavirenz pharmacokinetic, fasting low-density lipoprotein (LDL) cholesterol, and fasting triglyceride phenotypes. We analyzed SNPs that were annotated previously for any drug in the PharmGKB or associated previously with any trait in the GWAS Catalog, and that were also represented in the imputed, post-QC genome-wide data. Each marker represents, for each phenotype, the –log10 P-value for association with the indicated SNP. Color-coded phenotype categories are indicated at bottom left of figure. Note that the scale of the Y-axis differs between plots.

Association results for the five lowest P-value single nucleotide polymorphisms within each phenotype domain Manhattan plots representing all phenotype associations for the five single nucleotide polymorphisms (SNPs) with the lowest P-values for efavirenz pharmacokinetic, fasting low-density lipoprotein (LDL) cholesterol, and fasting triglyceride phenotypes. We analyzed SNPs that were annotated previously for any drug in the PharmGKB or associated previously with any trait in the GWAS Catalog, and that were also represented in the imputed, post-QC genome-wide data. Each marker represents, for each phenotype, the –log10 P-value for association with the indicated SNP. Color-coded phenotype categories are indicated at bottom left of figure. Note that the scale of the Y-axis differs between plots. For EFV concentrations, the lowest P-value was with rs3745274 (P=1.1×10−28) among all 351 patients with evaluable data, but rs3745274 was also associated with numerous other context-derived EFV subphenotypes (Fig. 2). Log10 P-values for association between rs3745274 and EFV concentrations correlated very strongly with sample size in the model (Spearman’s ρ=0.95, P<0.0001), suggesting that this genetic association was present irrespective of context (i.e. sex, race/ethnicity, randomized antiretroviral regimen, component antiretroviral drug, baseline age, BMI, CD4+ T-cell count, and plasma HIV-1 RNA). In contrast, an association between rs10871777 and EFV concentration (P=2.0×10−5) was only observed among 34 individuals with baseline BMI in the lowest 10th percentile, but not among 80 individuals with BMI in the lowest 25th percentile (P=0.01), nor among 108 individuals with BMI in the lowest 33rd percentile (P=0.25) (Fig. 2). Furthermore, there was no hint of association between rs10871777 and EFV concentration within any other decile of BMI (i.e. 10th to 20th decile, 20th to 30th decile, etc.), considering both P-values and β coefficients (data not shown). For fasting LDL-cholesterol, the lowest P-value was between rs7412 in APOE and week 96 LDL-cholesterol among all 853 evaluable patients. As shown in Fig. 2, rs7412 was associated with numerous context-derived LDL-cholesterol phenotypes. Associations were only with absolute values of LDL-cholesterol at individual study weeks, not with LDL-cholesterol change from baseline. Log10 P-values for association between rs7412 and LDL-cholesterol correlated directly with sample size in the model (Spearman’s ρ=0.64, P<0.0001), without strong evidence for context dependence. For example, rs7412 was associated with week 96 LDL-cholesterol among patients randomized to either ATV/RTV-containing ART (n=419, P=2.0×10−7) or to EFV-containing ART (n=435, P=2.7×10−4). In addition, rs9644568 (near LPL) was associated with LDL-cholesterol change to week 96 among 63 individuals with baseline BMI in the lowest 10th percentile, but few other LDL-cholesterol phenotypes, but was more broadly associated with triglyceride phenotypes. In contrast, an association between rs16998073 and LDL-cholesterol at week 48 (P=2.9×10−7) was only at week 48 among 416 individuals in the lower 50th percentile for age (less than 38 years), but less so among individuals in the lowest 33rd percentile for age (n=270; P=3.6×10−3) or in the top 33rd percentile for age (n=277; P=0.042) (Fig. 2). For fasting triglycerides, the lowest P-value was between rs651821 in APOA5 and week 96 triglycerides among 439 individuals randomized to the ATV/RTV-containing ART. As shown in Fig. 2, rs651821 was associated with numerous context-derived triglyceride subphenotypes, including both absolute values at individual study weeks and change from baseline. Although log10 P-values for associations between rs651821 and triglycerides tended to correlate with phenotype sample size (Spearman’s ρ=0.42, P<0.0001), there was some evidence for context dependence. For example, rs651821 was associated with week 96 triglycerides among patients randomized to ATV/RTV-containing ART (n=439, P=4.3×10−7), but not EFV-containing ART (n=543, P=0.24). Furthermore, among patients randomized to ATV/RTV-containing ART, this association between rs651821 and week 96 triglycerides was also observed with concomitant TDF/FTC (n=219, P=2.3×10−4), with concomitant ABC/3TC (n=221, P=2.5×10−4), and was also observed at week 48 (n=481, P=3.2×10−4). Among patients randomized to EFV-containing ART, an association between rs651821 and week 96 triglycerides was also absent with concomitant TDF/FTC (n=224, P=0.04), with concomitant ABC/3TC (n=230, P=0.86), and at week 48 (n=488, P=0.02). In contrast, among individuals with baseline CD4 count of more than 302, there was an association between rs2302821 and change in triglycerides at week 96 (n=275, P=7.8×10−7), but not at week 48 (n=296, P=0.45). For ATV pharmacokinetics, CD4 T-cell count, and HIV-1 RNA phenotypes, the SNP with the lowest P-value had not been reported previously to be associated with that trait (Table 2). For the five SNPs with the lowest P-values in ATV pharmacokinetics, CD4 T-cell count, and HIV-1 RNA domains (15 SNPs total), Manhattan plots for associations between each SNP and as many as 774 phenotypes across all six domains are shown in Fig. 3. For ATV pharmacokinetics, the lowest P-value was with ATV clearance among patients with baseline CD4 T-cell count of less than 23 cells/mm3 (n=45), and was with rs12683493, which is intergenic between ABO and SURF6 (P=7.5×10−6). For CD4 T-cells, the lowest P-value was with change in patients CD4 T-cell count from baseline to week 96 among all patients (n=970) and was with rs2368393 in both MIR604 and SVIL (P=1.7×10−6). For HIV-1 RNA, the lowest P-value was with HIV-1 RNA control at week 96 among patients with baseline HIV-1 RNA of more than 5.0 log10 copies/ml (n=247) and was with rs7865618 in CDKN2B-AS1 (P=6.2×10−7).

Fig. 3

Manhattan plots representing all phenotype associations for the five single nucleotide polymorphisms (SNPs) with the lowest P-values for atazanavir pharmacokinetic, HIV-1 RNA, and CD4 T-cell phenotypes. We analyzed SNPs that were annotated previously for any drug in the PharmGKB or previously associated with any trait in the GWAS Catalog, and that were also represented in the imputed, post-QC genome-wide data. Each marker represents, for each phenotype, the –log10 P-value for association with the indicated SNP. Color-coded phenotype categories are indicated at the bottom left of the figure. Note that the scale of the Y-axis differs between plots.

Discussion

Phenome-wide association studies typically rely on observational data collected from electronic medical records, which may be subject to variability in the timing and completeness of data collection. The present multiphenotype analysis is unique in that it is the first to explore on-treatment data from a prospective clinical trial. The collection of specific data elements at predetermined intervals before and after initiation of therapy makes clinical trials an attractive resource of structured longitudinal data to evaluate pharmacogenomic associations. The present study characterized associations between 2544 SNPs from the PharmGKB and the GWAS Catalog and 774 context-derived phenotypes among 1181 HIV-infected participants from ACTG protocol A5202. Several associations replicated previous reports. We readily replicated the known association between CYP2B6 variants and plasma EFV concentrations 4,55,56. The lowest P-value was with rs3745274, which was associated with numerous context-derived EFV phenotypes. This genetic association appeared to persist irrespective of context (i.e. sex, race/ethnicity, randomized antiretroviral regimen, component antiretroviral drug, baseline age, BMI, CD4+ T-cell count, and plasma HIV-1 RNA). In contrast, the association between EFV concentration and rs10871777 (an SNP previously associated with obesity 57) is very likely spurious, as this was only observed among individuals with baseline BMI in the lowest 10th percentile. For LDL-cholesterol, rs7412 in APOE has been associated with LDL-cholesterol levels in previous GWAS 54,58, and was associated with numerous context-derived LDL-cholesterol phenotypes in this analysis. However, it was only associated with absolute values of LDL-cholesterol at individual study weeks, not with LDL-cholesterol change from baseline. In addition, there was no strong evidence for context dependence. An advantage of PheWAS is the ability to detect pleiotropy. In this respect, although rs9644568 (near LPL) was associated with LDL-cholesterol change at week 96 among 63 individuals with baseline BMI in the lowest 10th percentile, but very few other LDL-cholesterol phenotypes, it was associated more with numerous triglyceride phenotypes, consistent with its previously reported association with triglycerides in GWAS 59. In contrast, an association between LDL-cholesterol and rs16998073 (an SNP associated previously with diastolic blood pressure 60) is very likely spurious as an association was observed only at week 48 among individuals in the lower 50th percentile for age. For triglycerides, rs651821 in APOA5 has been associated with triglycerides in previous GWAS 61, as have three of our other top five SNPs (rs6589566 62; rs10790162 63; and rs1558861 64). The SNP rs651821 was associated with numerous context-derived triglyceride phenotypes, representing both absolute values at individual study weeks and change from baseline. In addition, there was some evidence that this association was context dependent, with rs651821 associated with week 96 LDL-cholesterol among patients randomized to ATV/RTV-containing ART, but not EFV-containing ART. An association between and change in triglycerides and rs2302821 (an SNP associated weakly with cardiovascular toxicity in patients treated with celecoxib 60) is very likely spurious, as this was observed in a phenotype at week 96, but not at week 48. In contrast to the above findings, SNPs with the lowest P-values for association with ATV pharmacokinetics, CD4 T-cell count, and HIV-1 RNA phenotypes had not been reported previously to be associated with that trait. The validity of multiple other associations, such as those between ATV pharmacokinetics and rs12683493 (intergenic between ABO and SURF6, in a haplotype implicated in cough with enalapril 65), change in CD4 T-cell count from baseline and rs2368393 (in both MIR604 and SVIL, not associated with risk of drug toxicity in children with lymphoblastic leukemia–lymphoma 66), and HIV-1 RNA control and rs7865618 (in CDKN2B-AS1, associated with cardiovascular disease 67 and glaucoma in GWAS 68) may be spurious. Replication for these and other SNP associations (many of which may be spurious) in independent cohorts is warranted. In this analysis, we used context as a strategy to derive multiple subphenotypes from each primary phenotype. Our rationale is that genetic associations for a given SNP–trait association may differ depending on context; thus, we expect that context-dependent associations may be more readily identified and understood using this approach. One advantage of this approach is that it allows for a very granular exploration of SNP–genotype associations that may be influenced by context. We found such context dependence in the association between rs651821 and triglyceride phenotypes among patients randomized to ATV/RTV-containing ART regimens, but not among patients randomized to EFV-containing ART regimens. The random assignment of A5202 participants to receive either ATV/RTV-containing or EFV-containing ART decreases the likelihood that our finding was because of confounding as unrecognized confounders should be equally distributed across arms. This is an advantage of using clinical trials datasets for genetic association analyses. Another advantage of granular exploration of SNP–genotype associations is the ability to discern associations that are almost certainly spurious by comparing strengths of association between closely related phenotypes. For example, among individuals with baseline CD4 count of more than 302, the association between rs2302821 and change in triglycerides at week 96 (P=7.8×10−7) is almost certainly spurious as there was no such association at week 48 (P=0.45). Multiple hypothesis testing is inherent in approaches that examine multiple phenotypes such as PheWAS, but Bonferoni correction is not appropriate because the assumption that tests are independent is violated. To address this issue, we performed 1000 permutations of the analysis to create an empirical null distribution. This showed that in the present analysis, the majority of models with P-value less than 1.5×10−4 were unlikely to be by chance alone. This analysis leveraged extensive a priori knowledge of genes and phenotypes from PharmGKB and the GWAS Catalog. This increases the likelihood of validity, biological plausibility, and supportive publications. As was apparent in both our previous PheWAS 22 and the present study, disease-specific knowledge is useful in interpreting genetic associations and to prioritize associations for further replication and study. Because every phenome is unique, analyses that consider large numbers of phenotypes may benefit more so than GWAS from disease-specific knowledge and understanding, including relationships among phenotypes. This study had limitations. A larger sample size may have shown additional associations and we have not yet sought to replicate associations in other datasets. We considered a limited number of phenotypes and contexts. We considered ART as intent to treat. We only used available SNPs that were available or imputed from genome-wide genotyping, without additional genotyping. We also focused analyses on individual SNPs, whereas multiple SNPs considered in combination may more strongly associate with some phenotypes. Data from prospective, randomized clinical trials offer distinct advantages (e.g. randomization tends to evenly distribute covariates across study arms), but there are limitations. Although data of ACTG clinical trials are rigorously collected and validated, electronic medical records datasets will likely contain a wider range of variables. In addition, eligibility criteria for clinical trials may exclude some individuals who would otherwise be included in electronic medical records datasets. In summary, this pilot study supports a multiphenotype analysis strategy to explore clinical trials datasets for genetic associations and to ultimately identify genetic associations with the potential to optimize ART safety and efficacy. This approach complements more established GWAS by performing simultaneous calculations for identifying genotype–phenotype associations across numerous phenotypes. Work is ongoing to further evaluate and optimize multiphenotype analyses for clinical trials datasets. On the basis of results from this pilot study, we plan to extend the PheWAS approach both to a much more extensive set of traits and to multiple other clinical trials datasets. This will include replication of associations identified here. Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's website ().

61 in total

Review 1. The detection and characterization of pleiotropy: discovery, progress, and promise.

Authors: Anna L Tyler; Dana C Crawford; Sarah A Pendergrass
Journal: Brief Bioinform Date: 2015-07-28 Impact factor: 11.622

2. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205

Review 3. Phenome-Wide Association Studies: Embracing Complexity for Discovery.

Authors: Sarah A Pendergrass; Anurag Verma; Anna Okula; Molly A Hall; Dana C Crawford; Marylyn D Ritchie
Journal: Hum Hered Date: 2015-07-28 Impact factor: 0.444

4. Gilbert syndrome and the development of antiretroviral therapy-associated hyperbilirubinemia.

Authors: Margalida Rotger; Patrick Taffe; Gabriela Bleiber; Huldrych F Gunthard; Hansjakob Furrer; Pietro Vernazza; Henning Drechsler; Enos Bernasconi; Martin Rickenbach; Amalio Telenti
Journal: J Infect Dis Date: 2005-09-09 Impact factor: 5.226

5. Efavirenz plasma levels can predict treatment failure and central nervous system side effects in HIV-1-infected patients.

Authors: C Marzolini; A Telenti; L A Decosterd; G Greub; J Biollaz; T Buclin
Journal: AIDS Date: 2001-01-05 Impact factor: 4.177

6. Prediction of neuropsychiatric adverse events associated with long-term efavirenz therapy, using plasma drug level monitoring.

Authors: Félix Gutiérrez; Andrés Navarro; Sergio Padilla; Rosa Antón; Mar Masiá; Joaquín Borrás; Alberto Martín-Hidalgo
Journal: Clin Infect Dis Date: 2005-10-19 Impact factor: 9.079

7. CD4+ count and risk of non-AIDS diseases following initial treatment for HIV infection.

Authors: Jason V Baker; Grace Peng; Joshua Rapkin; Donald I Abrams; Michael J Silverberg; Rodger D MacArthur; Winston P Cavert; W Keith Henry; James D Neaton
Journal: AIDS Date: 2008-04-23 Impact factor: 4.177

8. Genomic study in Mexicans identifies a new locus for triglycerides and refines European lipid loci.

Authors: Daphna Weissglas-Volkov; Carlos A Aguilar-Salinas; Elina Nikkola; Kerry A Deere; Ivette Cruz-Bautista; Olimpia Arellano-Campos; Linda Liliana Muñoz-Hernandez; Lizeth Gomez-Munguia; Maria Luisa Ordoñez-Sánchez; Prasad M V Linga Reddy; Aldons J Lusis; Niina Matikainen; Marja-Riitta Taskinen; Laura Riba; Rita M Cantor; Janet S Sinsheimer; Teresa Tusie-Luna; Päivi Pajukanta
Journal: J Med Genet Date: 2013-03-15 Impact factor: 6.318

9. Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations.

Authors: Marc A Coram; Qing Duan; Thomas J Hoffmann; Timothy Thornton; Joshua W Knowles; Nicholas A Johnson; Heather M Ochs-Balcom; Timothy A Donlon; Lisa W Martin; Charles B Eaton; Jennifer G Robinson; Neil J Risch; Xiaofeng Zhu; Charles Kooperberg; Yun Li; Alex P Reiner; Hua Tang
Journal: Am J Hum Genet Date: 2013-05-30 Impact factor: 11.025

10. Common variants associated with plasma triglycerides and risk for coronary artery disease.

Authors: Ron Do; Cristen J Willer; Ellen M Schmidt; Sebanti Sengupta; Chi Gao; Gina M Peloso; Stefan Gustafsson; Stavroula Kanoni; Andrea Ganna; Jin Chen; Martin L Buchkovich; Samia Mora; Jacques S Beckmann; Jennifer L Bragg-Gresham; Hsing-Yi Chang; Ayşe Demirkan; Heleen M Den Hertog; Louise A Donnelly; Georg B Ehret; Tõnu Esko; Mary F Feitosa; Teresa Ferreira; Krista Fischer; Pierre Fontanillas; Ross M Fraser; Daniel F Freitag; Deepti Gurdasani; Kauko Heikkilä; Elina Hyppönen; Aaron Isaacs; Anne U Jackson; Asa Johansson; Toby Johnson; Marika Kaakinen; Johannes Kettunen; Marcus E Kleber; Xiaohui Li; Jian'an Luan; Leo-Pekka Lyytikäinen; Patrik K E Magnusson; Massimo Mangino; Evelin Mihailov; May E Montasser; Martina Müller-Nurasyid; Ilja M Nolte; Jeffrey R O'Connell; Cameron D Palmer; Markus Perola; Ann-Kristin Petersen; Serena Sanna; Richa Saxena; Susan K Service; Sonia Shah; Dmitry Shungin; Carlo Sidore; Ci Song; Rona J Strawbridge; Ida Surakka; Toshiko Tanaka; Tanya M Teslovich; Gudmar Thorleifsson; Evita G Van den Herik; Benjamin F Voight; Kelly A Volcik; Lindsay L Waite; Andrew Wong; Ying Wu; Weihua Zhang; Devin Absher; Gershim Asiki; Inês Barroso; Latonya F Been; Jennifer L Bolton; Lori L Bonnycastle; Paolo Brambilla; Mary S Burnett; Giancarlo Cesana; Maria Dimitriou; Alex S F Doney; Angela Döring; Paul Elliott; Stephen E Epstein; Gudmundur Ingi Eyjolfsson; Bruna Gigante; Mark O Goodarzi; Harald Grallert; Martha L Gravito; Christopher J Groves; Göran Hallmans; Anna-Liisa Hartikainen; Caroline Hayward; Dena Hernandez; Andrew A Hicks; Hilma Holm; Yi-Jen Hung; Thomas Illig; Michelle R Jones; Pontiano Kaleebu; John J P Kastelein; Kay-Tee Khaw; Eric Kim; Norman Klopp; Pirjo Komulainen; Meena Kumari; Claudia Langenberg; Terho Lehtimäki; Shih-Yi Lin; Jaana Lindström; Ruth J F Loos; François Mach; Wendy L McArdle; Christa Meisinger; Braxton D Mitchell; Gabrielle Müller; Ramaiah Nagaraja; Narisu Narisu; Tuomo V M Nieminen; Rebecca N Nsubuga; Isleifur Olafsson; Ken K Ong; Aarno Palotie; Theodore Papamarkou; Cristina Pomilla; Anneli Pouta; Daniel J Rader; Muredach P Reilly; Paul M Ridker; Fernando Rivadeneira; Igor Rudan; Aimo Ruokonen; Nilesh Samani; Hubert Scharnagl; Janet Seeley; Kaisa Silander; Alena Stančáková; Kathleen Stirrups; Amy J Swift; Laurence Tiret; Andre G Uitterlinden; L Joost van Pelt; Sailaja Vedantam; Nicholas Wainwright; Cisca Wijmenga; Sarah H Wild; Gonneke Willemsen; Tom Wilsgaard; James F Wilson; Elizabeth H Young; Jing Hua Zhao; Linda S Adair; Dominique Arveiler; Themistocles L Assimes; Stefania Bandinelli; Franklyn Bennett; Murielle Bochud; Bernhard O Boehm; Dorret I Boomsma; Ingrid B Borecki; Stefan R Bornstein; Pascal Bovet; Michel Burnier; Harry Campbell; Aravinda Chakravarti; John C Chambers; Yii-Der Ida Chen; Francis S Collins; Richard S Cooper; John Danesh; George Dedoussis; Ulf de Faire; Alan B Feranil; Jean Ferrières; Luigi Ferrucci; Nelson B Freimer; Christian Gieger; Leif C Groop; Vilmundur Gudnason; Ulf Gyllensten; Anders Hamsten; Tamara B Harris; Aroon Hingorani; Joel N Hirschhorn; Albert Hofman; G Kees Hovingh; Chao Agnes Hsiung; Steve E Humphries; Steven C Hunt; Kristian Hveem; Carlos Iribarren; Marjo-Riitta Järvelin; Antti Jula; Mika Kähönen; Jaakko Kaprio; Antero Kesäniemi; Mika Kivimaki; Jaspal S Kooner; Peter J Koudstaal; Ronald M Krauss; Diana Kuh; Johanna Kuusisto; Kirsten O Kyvik; Markku Laakso; Timo A Lakka; Lars Lind; Cecilia M Lindgren; Nicholas G Martin; Winfried März; Mark I McCarthy; Colin A McKenzie; Pierre Meneton; Andres Metspalu; Leena Moilanen; Andrew D Morris; Patricia B Munroe; Inger Njølstad; Nancy L Pedersen; Chris Power; Peter P Pramstaller; Jackie F Price; Bruce M Psaty; Thomas Quertermous; Rainer Rauramaa; Danish Saleheen; Veikko Salomaa; Dharambir K Sanghera; Jouko Saramies; Peter E H Schwarz; Wayne H-H Sheu; Alan R Shuldiner; Agneta Siegbahn; Tim D Spector; Kari Stefansson; David P Strachan; Bamidele O Tayo; Elena Tremoli; Jaakko Tuomilehto; Matti Uusitupa; Cornelia M van Duijn; Peter Vollenweider; Lars Wallentin; Nicholas J Wareham; John B Whitfield; Bruce H R Wolffenbuttel; David Altshuler; Jose M Ordovas; Eric Boerwinkle; Colin N A Palmer; Unnur Thorsteinsdottir; Daniel I Chasman; Jerome I Rotter; Paul W Franks; Samuli Ripatti; L Adrienne Cupples; Manjinder S Sandhu; Stephen S Rich; Michael Boehnke; Panos Deloukas; Karen L Mohlke; Erik Ingelsson; Goncalo R Abecasis; Mark J Daly; Benjamin M Neale; Sekar Kathiresan
Journal: Nat Genet Date: 2013-10-06 Impact factor: 38.330

8 in total

1. PRECISION MEDICINE: FROM DIPLOTYPES TO DISPARITIES TOWARDS IMPROVED HEALTH AND THERAPIES.

Authors: Dana C Crawford; Alexander A Morgan; Joshua C Denny; Bruce J Aronow; Steven E Brenner
Journal: Pac Symp Biocomput Date: 2018

2. Influence of tissue context on gene prioritization for predicted transcriptome-wide association studies.

Authors: Binglan Li; Yogasudha Veturi; Yuki Bradford; Shefali S Verma; Anurag Verma; Anastasia M Lucas; David W Haas; Marylyn D Ritchie
Journal: Pac Symp Biocomput Date: 2019

3. Tissue specificity-aware TWAS (TSA-TWAS) framework identifies novel associations with metabolic, immunologic, and virologic traits in HIV-positive adults.

Authors: Binglan Li; Yogasudha Veturi; Anurag Verma; Yuki Bradford; Eric S Daar; Roy M Gulick; Sharon A Riddler; Gregory K Robbins; Jeffrey L Lennox; David W Haas; Marylyn D Ritchie
Journal: PLoS Genet Date: 2021-04-26 Impact factor: 6.020

4. Current Scope and Challenges in Phenome-Wide Association Studies.

Authors: Anurag Verma; Marylyn D Ritchie
Journal: Curr Epidemiol Rep Date: 2017-11-02

Review 5. Another Round of "Clue" to Uncover the Mystery of Complex Traits.

Authors: Shefali Setia Verma; Marylyn D Ritchie
Journal: Genes (Basel) Date: 2018-01-25 Impact factor: 4.096

6. Evaluation of PrediXcan for prioritizing GWAS associations and predicting gene expression.

Authors: Binglan Li; Shefali S Verma; Yogasudha C Veturi; Anurag Verma; Yuki Bradford; David W Haas; Marylyn D Ritchie
Journal: Pac Symp Biocomput Date: 2018

7. A simulation study investigating power estimates in phenome-wide association studies.

Authors: Anurag Verma; Yuki Bradford; Scott Dudek; Anastasia M Lucas; Shefali S Verma; Sarah A Pendergrass; Marylyn D Ritchie
Journal: BMC Bioinformatics Date: 2018-04-04 Impact factor: 3.169

8. A phenome-wide association study (PheWAS) in the Population Architecture using Genomics and Epidemiology (PAGE) study reveals potential pleiotropy in African Americans.

Authors: Sarah A Pendergrass; Steven Buyske; Janina M Jeff; Alex Frase; Scott Dudek; Yuki Bradford; Jose-Luis Ambite; Christy L Avery; Petra Buzkova; Ewa Deelman; Megan D Fesinmeyer; Christopher Haiman; Gerardo Heiss; Lucia A Hindorff; Chun-Nan Hsu; Rebecca D Jackson; Yi Lin; Loic Le Marchand; Tara C Matise; Kristine R Monroe; Larry Moreland; Kari E North; Sungshim L Park; Alex Reiner; Robert Wallace; Lynne R Wilkens; Charles Kooperberg; Marylyn D Ritchie; Dana C Crawford
Journal: PLoS One Date: 2019-12-31 Impact factor: 3.240

8 in total