| Literature DB >> 22373404 |
Abstract
Methods that can evaluate aggregate effects of rare and common variants are limited. Therefore, we applied a two-stage approach to evaluate aggregate gene effects in the 1000 Genomes Project data, which contain 24,487 single-nucleotide polymorphisms (SNPs) in 697 unrelated individuals from 7 populations. In stage 1, we identified potentially interesting genes (PIGs) as those having at least one SNP meeting Bonferroni correction using univariate, multiple regression models. In stage 2, we evaluate aggregate PIG effects on trait, Q1, by modeling each gene as a latent construct, which is defined by multiple common and rare variants, using the multivariate statistical framework of structural equation modeling (SEM). In stage 1, we found that PIGs varied markedly between a randomly selected replicate (replicate 137) and 100 other replicates, with the exception of FLT1. In stage 1, collapsing rare variants decreased false positives but increased false negatives. In stage 2, we developed a good-fitting SEM model that included all nine genes simulated to affect Q1 (FLT1, KDR, ARNT, ELAV4, FLT4, HIF1A, HIF3A, VEGFA, VEGFC) and found that FLT1 had the largest effect on Q1 (βstd = 0.33 ± 0.05). Using replicate 137 estimates as population values, we found that the mean relative bias in the parameters (loadings, paths, residuals) and their standard errors across 100 replicates was on average, less than 5%. Our latent variable SEM approach provides a viable framework for modeling aggregate effects of rare and common variants in multiple genes, but more elegant methods are needed in stage 1 to minimize type I and type II error.Entities:
Year: 2011 PMID: 22373404 PMCID: PMC3287884 DOI: 10.1186/1753-6561-5-S9-S47
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Modeling the aggregate effects of common and rare variants in . Adding rare variants (B) to the FLT1 latent construct composed of common variants (A) improved the model fit (A: CFI = 0.90, RMSEA = 0.03, SRMR = 0.08; vs. B: CFI = 0.96, RMSEA = 0.02, SRMR = 0.05) and the variance explained in Q1 (R: 0.36 ± 0.04 (B) vs. 0.30 ± 0.04 (A)). Standardized parameters and standard errors are shown above the arrows. Yellow, rare variant; blue, population substructure (PopStr; principal component, PC); red, gene; green, trait. * p ≤ 0.05; ** p ≤ 0.001. Residuals not shown for clarity.
Top potentially interests genes (PIGs) with SNPs associated with Q1 in replicate 137 of GAW17 exon sequencing data (unrelated individuals)
| Gene | Chromosome | Total Number of SNPs | Distance (bp) | Crude model | Adjusted model 1a | Adjusted model 2b | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Number of SNPs with | Number of SNPs with | Highest | Number of SNPs with | Number of SNPs with | Highest | Number of SNPs with | Number of SNPs with | Highest | ||||
| 13 | 35 | 16,389 | 2 (C13S522 n, C13S523 n) | 10 (8 n, 2 s) | 3.41 × 10−18 (C13S523) | 3 (C13S522 n, C13S523 n, C13S524 n) | 11 (7 n, 4 s) | 5.64 × 10−21 (C13S423) | 2 (C13S522 n, C13S523 n) | 11 (7 n, 4 s) | 2.10 × 10−11 (C13S423) | |
| 11 | 9 | 15,457 | 1 (C11S3071 n) | 1 (1 n) | 2.37 × 10−7 (C11S3071) | 1 (C11S3071 n) | 2 (1 n, 1 s) | 1.33 × 10−7 (C11S3071) | 1 (C11S3071 n) | 1 (1 n) | 8.68 × 10−7 (C11S3071) | |
| 5 | 22 | 55,401 | 1 (C5S4371 n) | 2 (1 n, 1 s) | 3.99 × 10−5 (C5S4371) | 1 (C5S4371 n) | 2 (1 n, 1 s) | 3.45 × 10−7 (C5S4371) | 0 | 2 (1 n, 1 s) | 4.56 × 10−5 (C5S4371) | |
| 15 | 163 | 223,193 | 1 (C15S4393 n) | 10 (8 n, 2 s) | 1.58 × 10−6 (C15S4393) | 1 (C15S4393 n) | 11 (8 n, 3 s) | 1.37 × 10−6 (C15S4393) | 0 | 14 (10 n, 4s) | 3.89 × 10−5 (C15S4393) | |
| 1 | 16 | 495 | 3 (C1S11528 n, C1S11529 s, C1S11541 n) | 11 (8 n, 3 s) | 8.06 × 10−9 (C1S11541) | 3 (C1S11528 n, C1S11529 s, C1S11541 n) | 11 (8 n, 3 s) | 2.80 × 10−10 (C1S11541) | 0 | 6 (3 n, 3s) | 3.28 × 10−4 (C1S11541) | |
a Adjusted for Age, Sex, Smoking, Pop1.
b Adjusted for Age, Sex, Smoking, Pop1 and top 12 principal components (PCs).
c n=nonsynonymous SNP; s=synonymous SNP
Select >FLT1 SNPs in GAW17 exon sequencing data (replicate 137; unrelated individuals) and associations with Q1
| SNP | Minor allele frequency | Crude model | Adjusted model 1a | Adjusted model 2b | Adjusted model 3c | Adjusted model 4d | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.0014 | 1.18 (0.71) | 0.0954 | 0.81 (0.67) | 0.2227 | 0.78 (0.67) | 0.2438 | 0.96 (0.65) | 0.1375 | 0.96 (0.65) | 0.1385 | |
| 0.0007 | 0.30 (1.00) | 0.7676 | 0.13 (0.95) | 0.8947 | 0.10 (0.95) | 0.9145 | −0.02 (0.92) | 0.9819 | −0.07(0.92) | 0.9383 | |
| 0.0172 | 0.80 (0.21) | 1.06 × 10−4 | 0.66 (0.19) | 6.94 × 10-4 | 0.69 (0.20) | 4.72 × 10−4 | 0.62 (0.22) | 5.39 × 10−3 | 0.59 (0.22) | 8.07 × 10−3 | |
| C13S458 | 0.0014 | 1.84 (0.71) | 9.38 × 10−3 | 1.32 (0.67) | 0.0490 | 1.29 (0.67) | 0.0552 | 1.30 (0.65) | 0.0465 | 1.32 (0.65) | 0.0432 |
| 0.0007 | 0.22 (1.00) | 0.8263 | 0.39 (0.94) | 0.6780 | 0.42 (0.94) | 0.6546 | 0.29 (0.91) | 0.7500 | 0.37 (0.91) | 0.6821 | |
| 0.0007 | −0.14 (1.00) | 0.8926 | 0.25 (0.94) | 0.7933 | 0.23 (0.95) | 0.8094 | 0.34 (0.91) | 0.7101 | 0.41 (0.92) | 0.6583 | |
| 0.0007 | 1.03 (1.00) | 0.3052 | 1.05 (0.94) | 0.2660 | 1.08 (0.94) | 0.2507 | 1.43 (0.91) | 0.1173 | 1.43 (0.91) | 0.1172 | |
| 0.0280 | 1.14 (0.16) | 2.0 × 10−12 | 1.12 (0.15) | 1.7 × 10−13 | 1.12 (0.15) | 2.1 × 10−13 | 0.96 (0.16) | 1.12 × 10−9 | 0.98 (0.16) | 9.3 × 10−10 | |
| 0.0667 | 0.94 (0.11) | 3.4 × 10−18 | 0.96 (0.10) | 3.3 × 10−21 | 0.96 (0.10) | 5.6 × 10−21 | 0.81 (0.12) | 1.9 × 10−11 | 0.81 (0.12) | 2.1 × 10−11 | |
| 0.0043 | 1.68 (0.41) | 3.7 × 10−5 | 1.98 (0.38) | 2.66 × 10−7 | 1.97 (0.38) | 3.17 × 10−7 | 1.58 (0.38) | 3.12 × 10−5 | 1.59 (0.38) | 3.04 × 10−5 | |
| 0.0007 | 0.08 (1.00) | 0.9330 | 0.54 (0.94) | 0.5652 | 0.57 (0.95) | 0.5457 | 0.70 (0.91) | 0.4443 | 0.69 (0.91) | 0.4501 | |
| C13S557 | 0.0072 | 0.92 (0.32) | 3.63 × 10−3 | 0.90 (0.30) | 2.78 × 10−3 | 0.89 (0.30) | 2.97 × 10−3 | 0.69 (0.29) | 0.0186 | 0.70 (0.29) | 0.0171 |
| 0.0007 | 0.43 (1.00) | 0.6694 | 0.27 (0.94) | 0.7758 | 0.29 (0.95) | 0.7564 | 0.13 (0.92) | 0.8837 | 0.15 (0.91) | 0.8673 | |
a Adjusted for Age and Smoking.
b Adjusted for Age, Smoking, Sex, and Pop1.
c Adjusted for Age, Smoking, Sex, Pop1 and top 10 principal components (PCs).
d Adjusted for Age, Smoking, Sex, Pop1 and top 12 PCs.
Figure 2Modeling the aggregate effects of common and rare variants in multiple potentially interesting genes (without knowledge of the GAW17 answers) using latent variable structural equation modeling. Model of the associations between 7 putative genes (26 SNPs) and Q1 (Q1 R2 = 0.36, CFI = 0.90, RMSEA = 0.05, SRMR = 0.07). * p ≤ 0.05; ** p ≤ 0.001. Residuals not shown for clarity.
True-positive (TP), false-positive (FP), and false-negative (FN) genes for Q1 over 100 replicates (99–136 and 138–200) in GAW17 exon sequencing data (unrelated individuals)
| Adjusted model 1a | Adjusted model 2b | Adjusted model 3c | Adjusted model 4d | Adjusted model 5e | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TP | FP | FN | TP | FP | FN | TP | FP | FN | TP | FP | FN | TP | FP | FN | |
| Mean | 1.80 | 43.48 | 7.20 | 1.23 | 5.69 | 7.77 | 1.23 | 5.43 | 7.77 | 1.13 | 1.25 | 7.87 | 1.13 | 1.20 | 7.87 |
| Standard deviation | 0.72 | 26.78 | 0.72 | 0.42 | 11.38 | 0.42 | 0.42 | 11.20 | 0.42 | 0.34 | 2.35 | 0.34 | 0.34 | 2.26 | 0.34 |
| Range | 1–4 | 2–122 | 5–8 | 1–2 | 0–43 | 7–8 | 1–2 | 0–42 | 7–8 | 1–2 | 0–14 | 7–8 | 1–2 | 0–14 | 7–8 |
a Adjusted for Age, Smoking, Sex, and population (Pop1).
b Adjusted for Age, Smoking, Sex, Pop1 and top 10 PCs.
c Adjusted for Age, Smoking, Sex, Pop1 and top 12 PCs.
d Rare variants collapsed, adjusted for Age, Smoking, Sex, Pop1 and top 10 PCs.
e Rare variants collapsed, adjusted for Age, Smoking, Sex, Pop1 and top 12 PCs.
Figure 3Modeling the aggregate effects of common and rare variants in multiple genes (with knowledge of the answers) using latent variable structural equation modeling. Model of the associations between 9 genes (19 SNPs) simulated to affect Q1 (Q1 R2 = 0.42, CFI = 0.90, RMSEA = 0.04, SRMR = 0.03). * p < 0.10; ** p ≤ 0.05; *** p ≤ 0.01. Residuals and paths from population structure not shown for clarity.