| Literature DB >> 34082482 |
Yilin Song1,2, Joanna M Biernacka3, Stacey J Winham3.
Abstract
Interest in analyzing X chromosome single nucleotide polymorphisms (SNPs) is growing and several approaches have been proposed. Prior studies have compared power of different approaches, but bias and interpretation of coefficients have received less attention. We performed simulations to demonstrate the impact of X chromosome model assumptions on effect estimates. We investigated the coefficient biases of SNP and sex effects with commonly used models for X chromosome SNPs, including models with and without assumptions of X chromosome inactivation (XCI), and with and without SNP-sex interaction terms. Sex and SNP coefficient biases were observed when assumptions made about XCI and sex differences in SNP effect in the analysis model were inconsistent with the data-generating model. However, including a SNP-sex interaction term often eliminated these biases. To illustrate these findings, estimates under different genetic model assumptions are compared and interpreted in a real data example. Models to analyze X chromosome SNPs make assumptions beyond those made in autosomal variant analysis. Assumptions made about X chromosome SNP effects should be stated clearly when reporting and interpreting X chromosome associations. Fitting models with SNP × Sex interaction terms can avoid reliance on assumptions, eliminating coefficient bias even in the absence of sex differences in SNP effect.Entities:
Keywords: SNP coefficient; X chromosome variants; bias; model assumptions; sex coefficient
Mesh:
Year: 2021 PMID: 34082482 PMCID: PMC8453908 DOI: 10.1002/gepi.22393
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Data‐generating models in the absence of SNP × Sex interaction effects using (a) XCI (Clayton) coding and (b) eXCI (PLINK) coding
| Coding | Model coefficients | OR for effect of sex | ORs for effect of SNP, given sex | Prevalence | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sex | # Copies of effect allele | βsex | βSNP | ORsex (eβ sex) | ORSNP|M (e2βSNP) | ORSNP|W1 (eβSNP) | ORSNP|W2 (e2βSNP) | Overall | Female | Male | ||
| (a) | ||||||||||||
|
|
|
| 0 | 0.75 | e0 | e1.5 | e0.75 | e1.5 | 0.664 | 0.668 | 0.660 | |
|
| 0 | 2 | NA | 0.1 | 0.5 | e0.1 | e1 | e0.5 | e1 | 0.628 | 0.618 | 0.638 |
|
| 0 | 1 | 2 | 0.2 | 0.2 | e0.2 | e0.4 | e0.2 | e0.4 | 0.574 | 0.550 | 0.598 |
| 0.5 | 0.1 | e0.5 | e0.2 | e0.1 | e0.2 | 0.585 | 0.514 | 0.646 | ||||
| 0.75 | 0 | e0.75 | e0 | e0 | e0 | 0.589 | 0.500 | 0.678 | ||||
| (b) | ||||||||||||
|
|
|
| 0 | 0.75 | e0 | e0.75 | e0.75 | e1.5 | 0.629 | 0.668 | 0.590 | |
|
| 0 | 1 | NA | 0.1 | 0.5 | e0.1 | e0.5 | e0.5 | e1 | 0.602 | 0.618 | 0.586 |
|
| 0 | 1 | 2 | 0.2 | 0.2 | e0.2 | e0.2 | e0.2 | e0.4 | 0.562 | 0.550 | 0.574 |
| 0.5 | 0.1 | e0.5 | e0.1 | e0.1 | e0.2 | 0.579 | 0.524 | 0.634 | ||||
| 0.75 | 0 | e0.75 | e0 | e0 | e0 | 0.589 | 0.500 | 0.678 | ||||
Note: With the coding scheme used when generating the data specified on the left, we give how SNPs are coded within sex under each coding scheme. The “Model” column provides the five coefficient combinations we used to generate the data in this simulation study. We then calculated the odds ratio for the effect of sex and for the effect of SNP given sex. ORsex refers to the odds ratio for the effect of sex with female as the reference level and male equal to 1. ORSNP|M refers to the odds ratio for the effect of SNP, either comparing SNP = 0 with SNP = 2 in XCI (Clayton) or with SNP = 1 in eXCI (PLINK) coding, given sex = male. ORSNP|W1 refers to the odds ratio for the effect of SNP comparing SNP = 0 with SNP = 1 and ORSNP|W2 refers to the odds ratio for the effect of SNP comparing SNP = 0 with SNP = 2, given sex = female. In the “Prevalence” column, we calculated the proportion of cases in the overall population (1000 cases), in females (500 cases), and in males (500 cases).
Data‐generating models in the presence of SNP × Sex interaction effects using (a) XCI (Clayton) coding and (b) eXCI (PLINK) coding
| Model | ORs for effect of sex, given SNP | ORs for effect of SNP, given Sex | Prevalence | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sex | # Copies of effect allele | βsex | βSNP | βint | ORsex|SNP0 (eβsex) | ORsex|SNP2 (eβsex+2βint) | ORSNP|M (e2βSNP+2βint) | ORSNP|W1 (eβSNP) | ORSNP|W2 (e2βSNP) | Overall | Female | Male | ||
| (a) | ||||||||||||||
|
|
|
| 0.2 | 0.2 | 0.1 | e0.2 | e0.4 | e0.6 | e0.2 | e0.4 | 0.585 | 0.550 | 0.620 | |
|
| 0 | 2 | NA | 0.2 | 0.2 | 0.2 | e0.2 | e0.6 | e0.8 | e0.2 | e0.4 | 0.595 | 0.550 | 0.640 |
|
| 0 | 1 | 2 | 0.2 | 0.2 | 0.3 | e0.2 | e0.8 | e1 | e0.2 | e0.4 | 0.604 | 0.550 | 0.658 |
| (b) | ||||||||||||||
|
|
|
| 0.2 | 0.2 | 0.1 | e0.2 | e0.3 | e0.3 | e0.2 | e0.4 | 0.568 | 0.550 | 0.586 | |
|
| 0 | 1 | NA | 0.2 | 0.2 | 0.2 | e0.2 | e0.4 | e0.4 | e0.2 | e0.4 | 0.574 | 0.550 | 0.598 |
|
| 0 | 1 | 2 | 0.2 | 0.2 | 0.3 | e0.2 | e0.5 | e0.5 | e0.2 | e0.4 | 0.579 | 0.550 | 0.608 |
Note: with the coding scheme used when generating the data specified on the left, we give how SNPs are coded within sex under each coding scheme. The “Model” column provides the three coefficient combinations we used to generate the data in this simulation study. We then calculated the odds ratio for the effect of sex given SNP and for the effect of SNP given sex. ORsex|SNP0 refers to the odds ratio for the effect of sex (with female as the reference level and male equals to 1) given SNP = 0 for both coding schemes. For XCI (Clayton), ORsex|SNP2 refers to the odds ratio for the effect of sex given SNP = 2 (with female as the reference level); for eXCI (PLINK) coding, ORsex|SNP1 refers to the odds ratio for the effect of sex given SNP = 1 (with female as the reference level). ORSNP|M refers to the odds ratio for the effect of SNP, either comparing SNP = 0 with SNP = 2 in XCI or with SNP = 1 in eXCI coding, given sex = male. ORSNP|W1 refers to the odds ratio for the effect of SNP comparing SNP = 0 with SNP = 1 and ORSNP|W2 refers to the odds ratio for the effect of SNP comparing SNP = 0 with SNP = 2, given sex = female. In the “Prevalence” column, we calculated the proportion of cases in the overall population (1000 cases), in females (500 cases), and in male (500 cases).
Logistic regression models fit to each of the simulated data sets, using either XCI (Clayton) or eXCI (PLINK) coding for the SNP effects, and with and without SNP × Sex interaction terms
| 1 | Logit ( |
| 2 | Logit ( |
| 3 | Logit ( |
| 4 | Logit ( |
Figure 1Bias and p‐values of sex and SNP coefficients when generating data in the absence of sex differences in SNP effect and fitting models without SNP–sex interaction terms. Top row: Boxplots of bias (Y‐axis; estimate minus true coefficient) of sex and SNP coefficients across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI (PLINK) coding (left) or XCI (Clayton) coding (right). Bottom row: Boxplots of p‐values (Y‐axis) of sex and SNP coefficients across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI coding (left) or XCI coding (right). Color indicates the model that was fit (Model 1 or Model 2, XCI or eXCI coding without a SNP–sex interaction term)
Figure 2Bias and p‐values of sex and SNP coefficients when generating data in the absence of sex differences in SNP effect and fitting models with SNP–sex interaction terms. Top row: Boxplots of bias (Y‐axis; estimate minus true coefficient) of sex and SNP coefficients across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI (PLINK) coding (left) or XCI (Clayton) coding (right). Bottom row: Boxplots of p‐values (Y‐axis) of sex and SNP coefficients across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI coding (left) or XCI coding (right). Color indicates the model that was fit (Model 3 or Model 4, XCI or eXCI coding with a SNP–sex interaction term)
Figure 3Two degree‐of‐freedom F tests for SNP coefficients for data generated in the absence of sex differences and fitting models with SNP‐sex interaction terms. (a) Boxplots of p‐values (Y‐axis) of df = 2F tests across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI (PLINK) coding (left) or XCI (Clayton) coding (right). (b) Power defined as the proportion of p < 0.01 (Y‐axis) across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI coding (left) or XCI coding (right). Color indicates the model that was fit (Model 3 or Model 4, XCI or eXCI coding with a SNP–sex interaction term)
Figure 4Bias and p‐values of sex and SNP coefficients when generating data in the presence of sex differences in SNP effect and fitting models without SNP‐sex interaction terms. Top row: Boxplots of bias (Y‐axis; estimate minus true coefficient) of sex and SNP coefficients across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI (PLINK) coding (left) or XCI (Clayton) coding (right). Bottom row: Boxplots of p‐values (Y‐axis) of sex and SNP coefficients across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI coding (left) or XCI coding (right). Color indicates the model that was fit (Model 1 or Model 2, XCI or eXCI coding without a SNP–sex interaction term)
Figure 5Bias and p‐values of sex and SNP coefficients when generating data in the presence of sex differences in SNP effect and fitting models with SNP–sex interaction terms. Top row: Boxplots of bias (Y‐axis; estimate minus true coefficient) of sex and SNP coefficients across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI (PLINK) coding (left) or XCI (Clayton) coding (right). Bottom row: Boxplots of p‐values (Y‐axis) of sex and SNP coefficients across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI coding (left) or XCI coding (right). Color indicates the model that was fit (Model 3 or Model 4, XCI or eXCI coding with a SNP‐sex interaction term)
Figure 6Two degree‐of‐freedom F tests for SNP coefficients for data generated in the presence of sex differences and fitting models with SNP–sex interaction terms. (a) Boxplots of p‐values (Y‐axis) of df = 2 F tests across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI (PLINK) coding (left) or XCI (Clayton) coding (right). (b) Power defined as the proportion of p < 0.01 (Y‐axis) across 1000 simulation runs for various simulation settings (X‐axis) when data is generated using eXCI (PLINK) coding (left) or XCI (Clayton) coding (right). Color indicates the model that was fit (Model 3 or Model 4, XCI (Clayton) or eXCI (PLINK) coding with a SNP–sex interaction term)
Results of Table 3 logistic regression models on obesity (defined as BMI > 30) in UK Biobank sample
| SNP coding | Interaction | Effect | Beta | SE | P | exp(beta) | ORsex|0 | ORsex|1 | ORsex|2 | ORSNP|M | ORSNP|W1 | ORSNP|W2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| XCI/Clayton | No | Sex | 0.111 | 0.008 | 5.77E−48 | 1.118 | 1.118 | 1.118 | 1.091 | 1.044 | 1.091 | |
| SNP | 0.044 | 0.006 | 1.15E−13 | 1.044 | ||||||||
| eXCI/PLINK | No | Sex | 0.121 | 0.008 | 1.55E−54 | 1.129 | 1.129 | 1.129 | 1.057 | 1.057 | 1.118 | |
| SNP | 0.056 | 0.008 | 9.11E−12 | 1.057 | ||||||||
| XCI/Clayton | Yes | Sex | 0.109 | 0.009 | 1.26E−35 | 1.116 | 1.116 | 1.123 | 1.096 | 1.040 | 1.082 | |
| SNP | 0.039 | 0.010 | 6.79E−05 | 1.040 | ||||||||
| Sex × SNP | 0.006 | 0.012 | 6.10E−01 | 1.006 | ||||||||
| eXCI/PLINK | Yes | Sex | 0.109 | 0.009 | 1.26E−35 | 1.116 | 1.116 | 1.175 | 1.096 | 1.040 | 1.082 | |
| SNP | 0.039 | 0.010 | 6.79E−05 | 1.040 | ||||||||
| Sex × SNP | 0.052 | 0.018 | 3.12E−03 | 1.053 |
Note: Female is coded as the reference level, and logistic regression models are adjusted for age, assessment center, genotyping batch, and PC's.
Abbreviations: ORsex|SNP0, odds ratio for the effect of sex, given SNP = 0; ORsex|SNP2, odds ratio for the effect of sex, given SNP = 2; ORsex|SNP1 = odds ratio for the effect of sex, given SNP = 1; ORSNP|M, odds ratio for the effect of SNP, either comparing SNP = 0 with SNP = 2 in XCI or with SNP = 1 in eXCI coding, given sex = male; ORSNP|W1 = odds ratio for the effect of SNP comparing SNP = 0 with SNP = 1, given sex = female; ORSNP|W2, odds ratio for the effect of SNP comparing SNP = 0 with SNP = 2, given sex = female.