| Literature DB >> 34224641 |
Bo Chen1, Radu V Craiu1, Lisa J Strug1,2,3,4, Lei Sun1,2.
Abstract
The X-chromosome is often excluded from genome-wide association studies because of analytical challenges. Some of the problems, such as the random, skewed, or no X-inactivation model uncertainty, have been investigated. Other considerations have received little to no attention, such as the value in considering nonadditive and gene-sex interaction effects, and the inferential consequence of choosing different baseline alleles (i.e., the reference vs. the alternative allele). Here we propose a unified and flexible regression-based association test for X-chromosomal variants. We provide theoretical justifications for its robustness in the presence of various model uncertainties, as well as for its improved power when compared with the existing approaches under certain scenarios. For completeness, we also revisit the autosomes and show that the proposed framework leads to a more robust approach than the standard method. Finally, we provide supporting evidence by revisiting several published association studies. Supporting Information for this article are available online.Entities:
Keywords: confounding; dominance; interaction; model uncertainty; regression
Mesh:
Year: 2021 PMID: 34224641 PMCID: PMC9292551 DOI: 10.1002/gepi.22422
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.344
Eight analytical considerations and challenges, C1–C8, present in X‐chromosome‐inclusive association studies
| Problem | Solution | Relevant sections |
|---|---|---|
|
| ||
|
Allele‐based association tests, comparing allele frequency differences between cases and controls, are locally most powerful. However, they analyze binary outcomes only and are sensitive to the Hardy–Weinberg equilibrium (HWE) assumption (Sasieni, | Genotype‐based regression models, | Sections |
|
For the autosomes, switching the two alleles does not affect the association inference. Is this true for the X‐chromosome? | It is | Sections |
|
Unlike the autosomes, sex is a confounder when analyzing the X‐chromosome for traits exhibiting sexual dimorphism (e.g., height and weight). Even for the autosomes, sex can be a confounder if allele frequencies differ significantly between males and females.× | To maintain the correct type I error rate control, the sex main effect must be considered particular when analyzing the X‐chromosome. The resulting association test is also invariant to the choice of the baseline allele. | Section |
|
Gene–sex interaction might exist, but there is a concern over loss of power due to increased degrees of freedom. In addition, what is the interpretation of gene–sex interaction effect in the presence of X‐inactivation? | Under no interaction, power loss of modeling interaction is capped at 11.4%. Models including the | Sections |
|
XCI occurs if one of the two alleles in a genotype of a female is silenced. Individual‐level XCI status requires additional biological information that are not typically available to genetic association studies. Assuming XCI or no XCI at the sample level leads to different genotype coding strategies (Table | XCI uncertainty implies sex‐stratified genetic effect which can be analytically represented by the | Sections |
|
If the choice of the silenced allele in females is skewed toward a specific allele, the average effect of the | XCI skewness is statistically equivalent to a dominance genetic effect. | Section |
|
For both the autosomes and X‐chromosome, the most common practice is to use the additive test which has better power than the genotypic test under (approximate) additivity, but it cannot capture dominance effects. The exact trade‐off, however, is not clear. | We provide analytical and empirical evidence supporting the use of genotypic model when analyzing either the autosomes or X‐chromosome. For an X‐chromosomal variant, including the dominance effect term has the added benefit of resolving of the skewed X‐inactivation uncertainty issue. | Sections |
Covariate coding schemes for examining the additive, dominance, gene–sex interaction, and sex effects under different assumptions of the X‐chromosome inactivation status and the choice of the baseline allele
| Effect interpretation | Covariate notation | Non‐baseline allele | X‐chromosome inactivation (XCI) status | Coding schemes | ||||
|---|---|---|---|---|---|---|---|---|
| Females | Males | |||||||
|
|
|
|
|
| ||||
|
|
| Yes | 0 | 0.5 | 1 | 0 | 1 | |
| Additive |
|
| Yes | 1 | 0.5 | 0 | 1 | 0 |
|
|
|
| No | 0 | 1 | 2 | 0 | 1 |
|
|
| No | 2 | 1 | 0 | 1 | 0 | |
| Dominance |
| Either | Either | 0 | 1 | 0 | 0 | 0 |
| Gene–sex interaction |
|
| Either | 0 | 0 | 0 | 0 | 1 |
|
|
|
| Either | 0 | 0 | 0 | 1 | 0 |
| Sex |
| Either | Either | 0 | 0 | 0 | 1 | 1 |
Note: The subscripts and represent additive and dominance effects, or represents the non‐baseline allele of which we count the number of copies present in a genotype, and or denotes X‐chromosome inactivated or not inactivated.
Figure 1Equivalency between different regression models for association analysis of an X‐chromosomal bi‐allelic SNP. The subscript or represents the non‐baseline allele of which we count the number of copies present in a genotype, and or denotes X‐chromosome inactivated or not inactivated; see Table 2 for additional covariate coding details. Two groups of coding connected by a line if there is an invertible linear transformation between the design matrices as specified in Theorem 1, and the resulting test statistics for testing the specified will be identical to each other. Part (a) corresponds to models and tests without the dominance covariate, and part (b) corresponds to models and tests with included. Inclusion of has no effect on the linear relationships established in part (a), because coding of in Table 2 is invariant to the choice of the baseline allele or the XCI status. However, effect is statistically equivalent to skewed XCI as shown in Section 2.4
Properties of different regression models in the presence of the eight analytical challenges, as detailed in Table 1
| Model, | Testing | C3/C4 | C5/C6 | C7/C8 |
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: Whole‐genome considerations such as C1 (continuous vs. binary traits) and C2 (Hardy–Weinberg equilibrium vs. disequilibrium) are naturally dealt with by the genotype‐based regression approach. X‐chromosome‐specific considerations include C3 (choice of the baseline allele), C4 (sex as a confounder and type I error control), C5 (gene‐sex interaction), C6 (X‐chromosome inactivation (XCI) vs. no XCI), C7 (random vs. skewed XCI), and C8 (the dominance effect). In the table, indicates a problem for the corresponding model and test, and means no problem. Relevant covariates 's should be included in the model but omitted here for notation simplicity. Joint testing of based on is the recommended, most robust approach; see Figures 2 and S5 for power comparisons among –.
Figure 2Power comparison for analyzing X‐chromosomal single nucleotide polymorphisms (SNPs). Black curves for testing based on model as specified in Table 3, green curves for testing based on model , blue curves for testing based on model , and red curves for testing based on the proposed model . (a) ffemale = fmale = 0.2 and (b) ffemale = fmale = 0.5. Upper panels in (a) and (b) examine power as a function of the “dominance” effect. Lower panels in (a) and (b) examine power as a function of the gene–sex “interaction” effect. Note that biological dominance effect and skewed X‐chromosome inactivation (XCI), and gene–sex interaction effect and the XCI status are statistically confounded with each other; see Section 3.2. Results for other parameter values including differential between males and females are shown in Figures S5. The analyses related to – assume that the true baseline allele is known and being the allele frequency of the non‐baseline allele, and the true XCI status is known at the population level. Unlike the other methods (–), the proposed method () is invariant to the assumptions of the baseline allele and the XCI status
Figure 3Results of a genome‐wide association study of meconium ileus in cystic fibrosis subjects. In total, 3199 independent cystic fibrosis subjects, 14,279 X‐chromosomal single nucleotide polymorphisms (SNPs), and 556,445 autosomal SNPs are analyzed. The SNPs are ordered by the minimal p value of the different tests considered, and the lines connecting the SNPs are used only for visualization purposes to demonstrate the robustness of a particular method. (a) X‐chromosome results. These top 15 ranked X‐chromosomal SNPs are selected based on any of the six tests based on – models in Table 3: the Black curve for testing based on assuming X‐chromosome inactivation (XCI), the brown curve for testing based on assuming no XCI, the green curve for testing based on assuming XCI, the orange curve for testing based on assuming no XCI, the blue curve for testing based on (invariant to the XCI assumptions if is included in the model and tested), and the red curve for testing based on the recommended model that is most robust for analyzing the X‐chromosome. (b) Autosome results. These top 15 ranked autosomal SNPs are selected based on either the 1 df additive test or the 2 df genotypic test. The black curve for testing using the standard additive model, and the red curve for testing using the recommend genotypic model that is most robust for analyzing the autosomes