| Literature DB >> 22373146 |
Yildiz E Yilmaz1, Shelley B Bull.
Abstract
Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs.Entities:
Year: 2011 PMID: 22373146 PMCID: PMC3287835 DOI: 10.1186/1753-6561-5-S9-S111
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Sampling designs and analytical methods for QT association analysis
| Sampling designs | Method of analysis | Software | |
|---|---|---|---|
| Common variant | Rare variant score | ||
| 1. Entire cohort (100%) | a. Linear regression of QT on genotype | c. Linear regression of QT on rare allele count | Designs 1–5: generalized linear regression (glm) function in R for fitting all models |
Results of regression analysis of Q1 with the FLT1 gene in replicate 1
| Sampling design | Linear regression | Poisson regression | ||||||
|---|---|---|---|---|---|---|---|---|
| Coefficient | SE | Test statistic | Coefficient | SE | Test statistic | |||
| Rare variant score | ||||||||
| 1 | 0.44 | 0.08 | 5.18 | 4 × 10−7 | 0.60 | 0.11 | 5.59 | 2 × 10−8 |
| 2 | 0.43 | 0.11 | 3.81 | 2 × 10−4 | 0.55 | 0.13 | 4.14 | 3 × 10−5 |
| 3 | 0.71 | 0.14 | 4.94 | 2 × 10−6 | 0.68 | 0.13 | 5.26 | 1 × 10−7 |
| 4 | 0.80 | 0.14 | 5.59 | 1 × 10−7 | 0.79 | 0.14 | 5.81 | 6 × 10−9 |
| 5.1 | 0.30 | 0.11 | 2.82 | 5 × 10−3 | 0.61 | 0.11 | 5.37 | 8 × 10−8 |
| 5.2 | 0.15 | 0.10 | 1.48 | 0.14 | 0.63 | 0.11 | 5.58 | 2 × 10−8 |
| Common variant C13S523 | ||||||||
| 1 | 1.00 | 0.12 | 8.11 | 1 × 10−14 | 1.39 | 0.21 | 6.51 | 8 × 10−11 |
| 2 | 0.96 | 0.18 | 5.24 | 5 × 10−7 | 1.40 | 0.32 | 4.36 | 1 × 10−5 |
| 3 | 1.67 | 0.21 | 7.95 | 3 × 10−13 | 1.66 | 0.34 | 4.88 | 1 × 10−6 |
| 4 | 1.71 | 0.21 | 8.17 | 9 × 10−14 | 1.86 | 0.37 | 5.01 | 5 × 10−7 |
| 5.1 | 0.90 | 0.17 | 5.45 | 2 × 10−7 | 1.39 | 0.27 | 5.18 | 2 × 10−7 |
| 5.2 | 0.26 | 0.22 | 1.16 | 0.25 | 1.39 | 0.26 | 5.44 | 5 × 10−8 |
P-values are based on the asymptotic distribution of the test statistic.
Simulation results for analysis of Q4 with the FLT1 rare variant score, replicates 1–200
| Sampling design | Linear regression | Poisson regression | ||||||
| Mean coefficient | Empirical SD | Mean SE | Type I error | Mean coefficient | Empirical SD | Mean SE | Type I error | |
| 1 | −0.0014 | 0.042 | 0.044 | 0.040 | −0.0089 | 0.254 | 0.233 | 0.075 |
| 2 | −0.0044 | 0.060 | 0.062 | 0.040 | −0.0280 | 0.364 | 0.333 | 0.065 |
| 3 | −0.0025 | 0.083 | 0.086 | 0.040 | −0.0076 | 0.270 | 0.246 | 0.060 |
| 4 | −0.0031 | 0.079 | 0.083 | 0.030 | −0.0127 | 0.274 | 0.252 | 0.050 |
| 5.1 | −0.0007 | 0.045 | 0.046 | 0.045 | −0.0025 | 0.304 | 0.281 | 0.070 |
| 5.2 | −0.0004 | 0.051 | 0.046 | 0.105 | −0.0142 | 0.274 | 0.251 | 0.070 |
SD, standard deviation. SE, standard error.
Figure 1Linear (upper panels) and Poisson (lower panels) regression test statistics in Q1 rare variant analysis. MRT, mean ratio of test statistics.
Figure 2Linear (upper panels) and Poisson (lower panels) regression coefficients in Q1 rare variant analysis
Relative efficiencies of regression coefficients for analysis of Q1 and Q4 with FLT1, replicates 1–200
| Sampling design | Rare variant score | Common variant | ||||||
|---|---|---|---|---|---|---|---|---|
| Linear regression | Logistic regression | Linear regression | Poisson regression | |||||
| Q4 | Q1 | Q4 | Q1 | Q4 | Q1 | Q4 | Q1 | |
| 2 | 50 | 41 | 49 | 43 | 46 | 46 | 45 | 45 |
| 3 | 26 | 36 | 89 | 70 | 26 | 55 | 87 | 50 |
| 4 | 29 | 34 | 86 | 68 | 26 | 50 | 78 | 53 |
| 5.1 | 88 | 62 | 70 | 53 | 88 | 57 | 72 | 46 |
| 5.2 | 70 | 44 | 86 | 76 | 56 | 31 | 84 | 52 |