| Literature DB >> 35680959 |
Hong Zhang1, Aparna Chhibber2,3, Peter M Shaw2, Devan V Mehrotra4, Judong Shen5.
Abstract
In pharmacogenetic (PGx) studies, drug response phenotypes are often measured in the form of change in a quantitative trait before and after treatment. There is some debate in recent literature regarding baseline adjustment, or inclusion of pre-treatment or baseline value as a covariate, in PGx genome-wide association studies (GWAS) analysis. Here, we provide a clear statistical perspective on this baseline adjustment issue by running extensive simulations based on nine statistical models to evaluate the influence of baseline adjustment on type I error and power. We then apply these nine models to analyzing the change in low-density lipoprotein cholesterol (LDL-C) levels with ezetimibe + simvastatin combination therapy compared with simvastatin monotherapy therapy in the 5661 participants of the IMPROVE-IT (IMProved Reduction of Outcomes: Vytroin Efficacy International Trial) PGx GWAS, supporting the conclusions drawn from our simulations. Both simulations and GWAS analyses consistently show that baseline-unadjusted models inflate type I error for the variants associated with baseline value if the baseline value is also associated with change from baseline (e.g., when baseline value is a mediator between a variant and change from baseline), while baseline-adjusted models can control type I error in various scenarios. We thus recommend performing baseline-adjusted analyses in PGx GWASs of quantitative change.Entities:
Year: 2022 PMID: 35680959 PMCID: PMC9184591 DOI: 10.1038/s41525-022-00303-2
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 6.083
Fig. 1Ratios between empirical type I error rates and the nominal α levels.
Upper panels (a and b): no measurement errors. Lower panels (c and d): white noise measurement error (normal relative error rate with mean zero). Dotted curves: baseline-unadjusted models. Horizontal dash line: the ratio ( + 3*SE)/ = 1 + 3*SE/, where SE is the margin of error calculated as , is the nominal level and is the number of simulations. M1–M9 are defined in the “Methods” section.
Fig. 2Power comparison between baseline-adjusted and unadjusted models.
Upper panels (a–c): . Lower panels (d–f): . First, second, third column: , respectively. , which is consistent with the type I error simulation. M1–M9 are defined in the “Methods” section.
Fig. 3Manhattan plots for five genome-wide association studies (GWAS) of drug-induced change in low-density lipoprotein cholesterol (LDL-C) from IMPROVE-IT PGx study.
These five Manhattan plots are from Model 1 (a, M1), Model 5 (b, M5), Model 3 (c, M3), Model 6 (d, M6) and Model 7 (e, M7). M1 used log-fold-CFB as phenotype, adjusted for baseline LDL-C and used a 2-step approach in the regression, in which residuals were obtained by regressing out the covariates and then inverse normally transformed. M3 was the same as M1 except that it did not adjust for baseline LDL-C in the model. Both M1 and M3 yielded the same three significant loci. M5 used log-fold-CFB as the phenotype, adjusted baseline LDL-C, and used the 2-df test in the regression. M6 was the same as M5 except that it did not adjust for baseline LDL-C in the model. M5 yielded three significant loci while M6 yielded two additional significant loci (STAG1/SLC35G2/NCK1, SLCO1B1) on chromosome 3 and 12, respectively. M7 used log-baseline as the phenotype for the baseline association test, which yielded one significant locus on chromosome 19. The horizontal red line represents the whole-genome significant p value threshold 5e−08. All tests were two-sided.
All five lead variants from the GWAS analyses of natural log-transformed CFB of LDL-C, LDL-C percent change and natural log-transformed baseline LDL-C based on five statistical models: M1 (baseline adjustment, 1-df test and 2-step regression), M3 (baseline un-adjustment, 1-df test and 2-step regression), M5 (baseline adjustment and 2-df test), M6 (baseline un-adjustment and 2-df test) and M7 (baseline association only).
| Gene | SNP | CHR | BP | MA | MAF | Modela | P_G | P_GTd | P_2dfd | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rs599839 | 1 | 109822166 | G | 0.240 | M1 (BJ-1dfT-2SR) | −0.490 | −0.172 | 2.92E−14 | ||||
| M3 (BuJ-1dfT-2SR) | −0.156 | 6.19E−12 | ||||||||||
| M5 (BJ-2dfT) | −0.493 | −0.052 | 1.12E−12 | 0.026 | 8.12E−02 | 2.22E−12 | ||||||
| M6 (BuJ-2dfT) | −0.049 | 1.74E−10 | 0.029 | 6.04E−02 | 2.44E−10 | |||||||
| M7 (Baseline only) | −0.008 | 6.58E−02e | ||||||||||
| rs79929954 | 3 | 136623748 | G | 0.015 | M1 (BJ-1dfT-2SR) | −0.490 | 0.215 | 7.81E−03e | ||||
| M3 (BuJ-1dfT-2SR) | 0.214 | 7.91E−03e | ||||||||||
| M5 (BJ-2dfT) | −0.486 | 0.079 | 2.68E−03 | −0.244 | 3.04E−06 | 2.05E−07e | ||||||
| M6 (BuJ-2dfT) | 0.080 | 3.23E−03 | −0.285 | 1.78E−07 | 1.58E−08 | |||||||
| M7 (Baseline only) | −0.002 | 8.92E−01e | ||||||||||
| rs10455872 | 6 | 161010118 | G | 0.080 | M1 (BJ-1dfT-2SR) | −0.490 | 0.287 | 5.94E−16 | ||||
| M3 (BuJ-1dfT-2SR) | 0.240 | 1.38E−11 | ||||||||||
| M5 (BJ-2dfT) | −0.499 | 0.090 | 6.82E−15 | −0.050 | 3.06E−02 | 6.53E−15 | ||||||
| M6 (BuJ-2dfT) | 0.078 | 1.13E−10 | −0.049 | 4.28E−02 | 1.20E−10 | |||||||
| M7 (Baseline only) | 0.024 | 3.68E−04e | ||||||||||
| rs4149056 | 12 | 21331549 | C | 0.162 | M1 (BJ-1dfT-2SR) | −0.490 | 0.129 | 6.92E−07e | ||||
| M3 (BuJ-1dfT-2SR) | 0.141 | 5.20E−08e | ||||||||||
| M5 (BJ-2dfT) | −0.485 | 0.043 | 3.76E−07 | −0.028 | 9.10E−02 | 5.95E−07e | ||||||
| M6 (BuJ-2dfT) | 0.050 | 1.33E−08 | −0.036 | 3.75E−02 | 1.12E−08 | |||||||
| M7 (Baseline only) | −0.013 | 9.64E−03e | ||||||||||
| rs1065853 | 19 | 45413233 | T | 0.089 | M1 (BJ-1dfT-2SR) | −0.490 | −0.491 | 2.13E−48 | ||||
| M3 (BuJ-1dfT-2SR) | −0.388 | 1.32E−30 | ||||||||||
| M5 (BJ-2dfT) | −0.535 | −0.164 | 3.91E−50 | 0.081 | 1.59E−04 | 5.87E−52 | ||||||
| M6 (BuJ-2dfT) | −0.130 | 7.99E−30 | 0.082 | 2.73E−04 | 1.52E−31 | |||||||
| M7 (Baseline only) | −0.064 | 2.35E−23 |
SNP single nucleotide polymorphism, CHR chromosome, BP base pair, MA minor allele, MAF minor allele frequency, effect size of baseline variable, β effect size of G (genotype) on CFB (from M1, M3, M5 and M6), effect size of G (genotype) on baseline (from M7), P_G p value of G (genotype), β effect size of G*T (genotype by treatment interaction), P_GT p value of G*T (genotype by treatment interaction), P_2df p value of 2df test (joint test of genotype and genotype*treatment interaction).
aBJ: Baseline-adjusted; BuJ: Baseline-unadjusted; 1dfT: 1 degree of freedom test; 2dfT: 2 degree of freedom test or joint test of genotype and genotype*treatment interaction; 1SR: 1-step regression; 2SR: 2-step regression. In M1, M3, M5 and M6, difference of natural log-transformed Simvastatin and Ezetimibe/Simvastatin on low-density lipoprotein cholesterol levels were used for analysis.
bEffects calculated for the nature log-transformed baseline LDL-C.
cEffects calculated with respect to the minor allele. A negative value indicates more intense drug (Simvastatin and Ezetimibe/Simvastatin) LDL-C lowering.
dResults were only available in the 2df test model M5 and M6, which also tests the genotype*treatment interaction and joint test of genotype and genotype*treatment interaction.
eNot reaching genome-wide significance (p < 5E−08). For 2df test methods, p values from the 2df test (P_2df) were used.
Fig. 4QQ plots of the p values between two sets of the baseline-adjusted models (black) vs. the baseline-unadjusted models (green) from the four GWAS analyses based on four models with log-CFB endpoint.
M1 vs. M3 (a) M5 vs. M6 (b). The variants were first filtered based on the baseline association p value <1e−03 from M7 and 10,187 SNPs were used for both plots. These variants showed clear mediator effect . The red line was the diagonal line and the 95% confidence interval polygon in each QQ plot was based on the p values from the baseline-adjusted model (M1 in a and M5 in b).
Fig. 5Illustration of baseline as a mediator effector in the analysis of CFB in PGx studies.
This mediator effector, if existing, must be accounted for, e.g., by baseline adjustment.