| Literature DB >> 23166519 |
Dajiang J Liu1, Suzanne M Leal.
Abstract
Next-generation sequencing has made possible the detection of rare variant (RV) associations with quantitative traits (QT). Due to high sequencing cost, many studies can only sequence a modest number of selected samples with extreme QT. Therefore association testing in individual studies can be underpowered. Besides the primary trait, many clinically important secondary traits are often measured. It is highly beneficial if multiple studies can be jointly analyzed for detecting associations with commonly measured traits. However, analyzing secondary traits in selected samples can be biased if sample ascertainment is not properly modeled. Some methods exist for analyzing secondary traits in selected samples, where some burden tests can be implemented. However p-values can only be evaluated analytically via asymptotic approximations, which may not be accurate. Additionally, potentially more powerful sequence kernel association tests, variable selection-based methods, and burden tests that require permutations cannot be incorporated. To overcome these limitations, we developed a unified method for analyzing secondary trait associations with RVs (STAR) in selected samples, incorporating all RV tests. Statistical significance can be evaluated either through permutations or analytically. STAR makes it possible to apply more powerful RV tests to analyze secondary trait associations. It also enables jointly analyzing multiple cohorts ascertained under different study designs, which greatly boosts power. The performance of STAR and commonly used RV association tests were comprehensively evaluated using simulation studies. STAR was also implemented to analyze a dataset from the SardiNIA project where samples with extreme low-density lipoprotein levels were sequenced. A significant association between LDLR and systolic blood pressure was identified, which is supported by pharmacogenetic studies. In summary, for sequencing studies, STAR is an important tool for detecting secondary-trait RV associations.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23166519 PMCID: PMC3499373 DOI: 10.1371/journal.pgen.1003075
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Quantile-Quantile plot of p-values for rare variant tests in STAR under the null hypothesis of no gene/secondary trait associations.
Five tests were evaluated, i.e. CMC, WSS, KBAC, VT and SKAT. Empirical p-values for each test were plotted against their theoretical expectations. A variety of scenarios with different primary trait effects and trait residual correlations were examined, which include (A) ; (B) ; (C) and (D) . P-values were obtained with 5,000 permutations. Type I error was evaluated using 10,000 replicates. For each replicate, 5,000 individuals with extreme quantitative traits were selected from a cohort of 100,000.
Figure 2The power for detecting associations with secondary traits in selected samples.
Power is calculated for CMC, WSS, KBAC, VT, and SKAT implemented in STAR framework. Secondary trait effects are assumed to be fixed and unidirectional with . A variety of scenarios with different primary trait effects and trait residual correlations were examined, which include (A) ; (B) ; (C) ; (D) ; (E) and (F) . P-values were obtained with 5,000 permutations. Power was evaluated using 10,000 replicates for a significance level of . For each replicate, 5,000 individuals with extreme quantitative traits were selected from a cohort of 100,000 individuals.
Figure 3The power for detecting association with secondary traits in selected samples.
Power is calculated for CMC, WSS, KBAC, VT, and SKAT implemented in STAR framework. Secondary trait effects are assumed to be bidirectional with fixed magnitude (i.e.), where 80% of the causal variants increase the mean secondary trait value and the other 20% decrease the mean secondary trait value. A variety of scenarios with different primary trait effects and trait residual correlations were examined, which include (A) ; (B) ; (C) ; (D) ; (E) and (F) . P-values were obtained with 5,000 permutations. Power was evaluated using 10,000 replicates for a significance level of . For each replicate, 5,000 individuals with extreme quantitative traits were selected from a cohort of 100,000 individuals.
Association Analysis of APOB, B3GA4, LDLR, and PCSK9 genes with LDL levels.
| CMC | WSS | KBAC | VT | SKAT | |
|
| 0.014* | 0.029* | 0.026* | 0.045* | 0.317 |
|
| 0.820 | 0.946 | 0.942 | 0.964 | 0.971 |
|
| 0.050* | 0.025* | 0.035* | 0.009# | 0.234 |
|
| 0.272 | 0.299 | 0.381 | 0.491 | 0.491 |
For CMC, WSS, KBAC, and SKAT, only variants with MAF≤1% were analyzed.
For VT, variants with MAF≤5% were analyzed.
The statistical significance of all tests was obtained empirically via 5,000 permutations. Nominally significant p-values are labeled with an asterisk. P-values that are significant after Bonferroni corrections are labeled with a pound sign.
Association Analyses of APOB, B3GA4, LDLR, and PCSK9 genes.
| Gene | Trait | CMC | WSS | KBAC | VT | SKAT |
|
| TCL | 3.07E-01 | 6.04E-01 | 6.75E-01 | 2.76E-01 | 8.88E-01 |
|
| HDL | 7.00E-01 | 9.71E-01 | 5.35E-01 | 9.21E-01 | 6.30E-01 |
|
| BMI | 5.98E-01 | 2.57E-01 | 7.05E-01 | 5.22E-01 | 2.29E-01 |
|
| DiasBP | 9.10E-01 | 1.71E-01 | 1.52E-01 | 2.22E-01 | 3.79E-01 |
|
| SysBP | 7.54E-01 | 6.74E-01 | 5.37E-01 | 9.22E-01 | 7.69E-01 |
|
| TG | 8.76E-01 | 3.60E-01 | 2.46E-01 | 4.16E-01 | 7.06E-01 |
|
| INSULIN | 8.30E-01 | 4.85E-01 | 4.07E-01 | 5.96E-01 | 1.68E-01 |
|
| TCL | 6.67E-01 | 8.18E-01 | 6.97E-01 | 3.93E-01 | 1.54E-01 |
|
| HDL | 8.71E-01 | 2.78E-01 | 2.21E-01 | 4.14E-01 | 6.90E-02 |
|
| BMI | 3.81E-01 | 7.72E-01 | 7.66E-01 | 9.50E-01 | 9.84E-01 |
|
| DiasBP | 5.63E-01 | 8.10E-01 | 8.58E-01 | 4.41E-01 | 5.29E-01 |
|
| SysBP | 5.39E-01 | 9.47E-01 | 9.22E-01 | 8.26E-01 | 8.62E-01 |
|
| TG | 5.60E-01 | 9.22E-01 | 9.14E-01 | 6.12E-01 | 4.98E-01 |
|
| INSULIN | 5.14E-01 | 9.74E-01 | 9.79E-01 | 9.73E-01 | 9.85E-01 |
|
| TCL | 2.31E-02* | 3.60E-02* | 2.90E-02* | 4.90E-02* | 8.93E-01 |
|
| HDL | 1.13E-01 | 1.59E-01 | 2.35E-01 | 4.19E-01 | 4.61E-01 |
|
| BMI | 1.01E-01 | 2.62E-01 | 1.94E-01 | 3.74E-01 | 7.41E-01 |
|
| DiasBP | 1.64E-02* | 2.70E-02* | 2.50E-02* | 4.70E-02* | 2.33E-01 |
|
| SysBP | 9.14E-04# | 3.08E-04# | 1.20E-03# | 3.00E-03# | 6.00E-03# |
|
| TG | 4.73E-01 | 9.21E-01 | 9.64E-01 | 9.88E-01 | 9.97E-01 |
|
| INSULIN | 3.76E-01 | 7.91E-01 | 7.77E-01 | 4.88E-01 | 9.67E-01 |
|
| TCL | 1.98E-02* | 4.80E-02* | 2.30E-02* | 1.52E-01 | 9.22E-01 |
|
| HDL | 4.98E-02* | 6.70E-02 | 5.80E-02 | 1.44E-01 | 2.33E-01 |
|
| BMI | 3.85E-01 | 6.81E-01 | 7.27E-01 | 4.11E-01 | 8.73E-01 |
|
| DiasBP | 4.24E-01 | 8.03E-01 | 8.42E-01 | 1.18E-01 | 5.05E-01 |
|
| SysBP | 2.76E-01 | 5.67E-01 | 5.79E-01 | 1.28E-01 | 7.43E-01 |
|
| TG | 3.29E-01 | 5.25E-01 | 6.53E-01 | 6.56E-01 | 6.31E-01 |
|
| INSULIN | 7.53E-01 | 6.15E-01 | 4.83E-01 | 1.12E-01 | 5.46E-01 |
Secondary traits, total cholesterol levels (TCL), high density lipoprotein (HDL), body mass index (BMI), diastolic blood pressure (DiasBP), systolic blood pressure (SysBP), triglyceride (TG) and insulin levels (INSULIN) were studied.
For CMC, WSS, KBAC, and SKAT, variants with MAF≤1% were analyzed.
For VT, variants with MAF≤5% were analyzed.
Statistical significance for all tests was obtained empirically via 5,000 permutations. Nominally significant p-values are labeled with an asterisk, while the associations that are significant after Bonferroni corrections are labeled with a pound sign.