| Literature DB >> 28209135 |
Marika Kaakinen1, Reedik Mägi2, Krista Fischer2, Jani Heikkinen3,4, Marjo-Riitta Järvelin5,6,7,8, Andrew P Morris9, Inga Prokopenko10.
Abstract
BACKGROUND: Genome-wide association studies have enabled identification of thousands of loci for hundreds of traits. Yet, for most human traits a substantial part of the estimated heritability is unexplained. This and recent advances in technology to produce high-dimensional data cost-effectively have led to method development beyond standard common variant analysis, including single-phenotype rare variant and multi-phenotype common variant analysis, with the latter increasing power for locus discovery and providing suggestions of pleiotropic effects. However, there are currently no optimal methods and tools for the combined analysis of rare variants and multiple phenotypes.Entities:
Keywords: High-dimensional data; Multi-phenotype analysis; Rare variant analysis
Mesh:
Year: 2017 PMID: 28209135 PMCID: PMC5311849 DOI: 10.1186/s12859-017-1530-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow of a MARV run including required files, commands and resulting output files
Fig. 2Examples of the required input file formats for MARV and the resulting output files
Results for loci reaching genome-wide significance in the multi-phenotype rare variant analysis of NFBC1966 (N = 4,721). Regression coefficients with their standard errors (SE) are reported, followed by the P-value and the Bayesian Information Criterion (BIC) for the analysed model. TG, triglycerides; ln(FI), natural logarithm transformed fasting insulin; WHR, waist-to-hip ratio
|
|
| |||
|---|---|---|---|---|
| Model | β (SE) |
| β (SE) |
|
| TG + ln(FI) + WHR, full model a | 3.32 × 10−8; −19877.3 | 6.3 × 10−8; −25069.6 | ||
| TG | 0.011 (0.002) | - | 0.007 (0.001) | - |
| ln(FI) | −0.010 (0.004) | - | −0.008 (0.002) | - |
| WHR | 0.027 (0.019) | - | 0.010 (0.011) | - |
| TG + ln(FI) | 2.00 × 10−8; −19883.5 | 1.8 × 10−9; −25077.3 | ||
| TG | 0.011 (0.002) | - | 0.007 (0.001) | - |
| ln(FI) | −0.010 (0.004) | - | −0.007 (0.002) | - |
| TG + WHR | 3.34 × 10−7; −19877.9 | 4.1 × 10−7; −25066.5 | ||
| TG | 0.009 (0.002) | - | 0.005 (0.001) | - |
| WHR | 0.020 (0.019) | - | 0.005 (0.001) | - |
| ln(FI) + WHR | 0.08; −19853.1 | 0.10; −25427.6 | ||
| ln(FI) | −0.003 (0.004) | - | −0.003 (0.002) | - |
| WHR | 0.041 (0.019) | - | 0.019 (0.011) | - |
| Univariate b | ||||
| TG | 0.009 (0.002) | 9.15 × 10−8; −19885.1 | 0.005 (0.001) | 6.5 × 10−8; −25074.7 |
| ln(FI) | −0.002 (0.003) | 0.62; −19856.8 | −0.002 (0.002) | 0.27; −25046.7 |
| WHR | 0.038 (0.019) | 0.04; −19860.7 | 0.016 (0.011) | 0.16; −25047.5 |
a For a genome-wide joint analysis, the level of significance is P < 1.67 × 10−6 after Bonferroni correction for 30,000 genes
b For univariate analysis, the level of significance is P < 5.56 × 10−7 after Bonferroni correction for 30,000 genes and three phenotypes
Fig. 3QQ-plot of MARV analysis results on triglycerides, fasting insulin and waist-to-hip ratio in the NFBC1966
Fig. 4Manhattan plot of MARV analysis results on triglycerides, fasting insulin and waist-to-hip ratio in the NFBC1966. Genes reaching genome-wide significance (P < 1.67 × 10−6) are annotated
Computational time and peak memory usage of MARV by varying sample size, chromosomal size and number of phenotypes
| Number of phenotypes (number of fitted models) | Chr 1 (249 Mbp) | Chr 22 (35 Mbp) |
|---|---|---|
|
| h:min:s (memory) | h:min:s (memory) |
| 2 (3) | 4:06:54 (215 MB) | 00:38:04 (260 MB) |
| 4 (15) | 3:51:23 (215 MB) | 00:38:37 (260 MB) |
| 8 (63) | 5:07:11 (215 MB) | 00:55:58 (260 MB) |
|
| h:min:s (memory) | h:min:s (memory) |
| 2 (3) | 14:47:34 (780 MB) | 02:26:00 (500 MB) |
| 4 (15) | 13:40:11 (780 MB) | 02:26:10 (600 MB) |
| 8 (63) | 17:26:08 (780 MB) | 03:03:00 (600 MB) |