| Literature DB >> 29697360 |
Sungyoung Lee1, Sungkyoung Choi2, Dandi Qiao3, Michael Cho3,4, Edwin K Silverman3,4, Taesung Park5,6, Sungho Won7,8,9.
Abstract
BACKGROUND: A Mendelian transmission produces phenotypic and genetic relatedness between family members, giving family-based analytical methods an important role in genetic epidemiological studies-from heritability estimations to genetic association analyses. With the advance in genotyping technologies, whole-genome sequence data can be utilized for genetic epidemiological studies, and family-based samples may become more useful for detecting de novo mutations. However, genetic analyses employing family-based samples usually suffer from the complexity of the computational/statistical algorithms, and certain types of family designs, such as incorporating data from extended families, have rarely been used.Entities:
Keywords: Family-based design; Genome-wide association analyses; Multi-threaded analyses; Next generation sequencing; Related samples
Mesh:
Year: 2018 PMID: 29697360 PMCID: PMC5918457 DOI: 10.1186/s12920-018-0345-y
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Lists of tasks supported by WISARD. famUniq (Family-unique variants), popUniq (P opulation-unique variants), fastEpi (Fast-epistasis method), FQLS (Family QLS method), inbCoef (Inbreeding coefficients), MFQLS (Multivariate FQLS)
A list of association tests supported by WISARD
| Variant types | Population-based samples | Family-based samples, or population-based sample under population substructure |
|---|---|---|
| Common | ||
| Binary | Cochran-Armitage | TDT/SDT (family-based only) |
| Continuous | Linear regression | Score test for linear mixed model |
| Rare | ||
| Binary | CMC [ | PEDCMC [ |
| Continuous | VT [ | FARVAT [ |
Comparison of available functions for existing toolsets
| Category | WISARD | PLINK2 | GCTA | FREGAT | Rvtests |
|---|---|---|---|---|---|
| Input format | |||||
| PED | O | O | X | X | X |
| Binary PED | O | O | O | O | X |
| VCF | O | O | X | O | O |
| Binary VCF | O | O | X | X | X |
| Dosage | O | O | O | X | X |
| Others | O | O | X | O | X |
| Random dataset | O | O | X | X | X |
| Recode dataset | |||||
| PED | O | O | X | X | X |
| Binary PED | O | O | O | X | X |
| VCF | O | O | X | X | X |
| Binary VCF | O | O | X | X | X |
| Others | O | O | O | X | X |
| Data manipulation | |||||
| # of variant filters | 38 | 27 | 8 | 0 | 11 |
| # of gene filters | 4 | 0 | 0 | 0 | 2 |
| # of subject filters | 28 | 27 | 2 | 0 | 4 |
| Family-specific filters | O | X | X | X | X |
| VCF-specific filters | O | X | X | X | O |
| Data merge | O | O | X | X | X |
| Covariate filters | O | O | X | X | X |
| Data split | O | O | X | X | X |
| Distance matrix | |||||
| # of input formats | 4 | 1 | 1 | 0 | 1 |
| # of output formats | 4 | 1 | 1 | 0 | 1 |
| # of producible distances | 7 | 2 | 1 | 0 | 4 |
| Data summary | |||||
| Variant summary | O | O | X | X | X |
| Gene summary functions | O | X | X | O | X |
| Variant-level analysis of unrelated samples | |||||
| binary phenotypes | O | O | O | X | O |
| continuous phenotypes | O | O | O | X | O |
| multivariate phenotypes | O | O | O | X | O |
| Gene-level analysis of unrelated samples | |||||
| binary phenotypes | O | O | X | O | O |
| continuous phenotypes | O | O | X | O | O |
| multivariate phenotypes | O | X | X | X | O |
| X-chromosome | O | X | X | X | X |
| Variant-level analysis of related samples | |||||
| binary phenotypes | O | X | O | X | O |
| continuous phenotypes | O | X | O | X | O |
| multivariate phenotypes | O | X | O | X | O |
| Gene-level analysis of related samples | |||||
| binary phenotypes | O | X | X | O | O |
| continuous phenotypes | O | X | X | O | O |
| multivariate phenotypes | O | X | X | X | O |
| Others features | |||||
| Variant-level meta-analysis | O | O | X | X | O |
| Gene-level meta-analysis | O | X | X | X | O |
| R connectivity | O | O | X | O | X |
| Multi-thread analyses | O | O | O | O | O |
| Programming Language | C/C++ | C/C++ | C/C++ | R | C/C++ |
| # of supported platforms | 5 | 3 | 1 | 3 | 1 |
Fig. 2Comparisons of computational time. Computational times were compared with GAW18 simulation data. In each plot, bars indicate execution time in seconds, and their amount can be obtained from left y-axis. Confidence intervals were calculated from five runs. Right y-axis is for red lines and they indicate relative ratios between WISARD and other existing toolset. Relative ratios which are larger than 1 indicate that WISARD is computationally faster, and horizontal blue dashed line indicates 1 for relative ratios. Regression and Fisher’s exact test from WISARD were compared with results from R. In the plots for GRM and IBS, dashed, dotted and dash-dotted red lines indicate relative ratios when 2, 4 and 8 threads of WISARD are used, compared to Rvtests with the same number of threads
Estimated type-1 error rates
|
| WISARD | |||||
| cFARVAT-s | cFARVAT-b | cFARVAT-o | famVT | |||
| 0.1 | 0.093 (±0.024) | 0.096 (±0.023) | 0.093 (±0.022) | 0.081 (±0.02) | ||
| 0.05 | 0.047 (±0.016) | 0.048 (±0.017) | 0.048 (±0.016) | 0.043 (±0.016) | ||
| 0.01 | 0.01 (±0.006) | 0.01 (±0.007) | 0.011 (±0.007) | 0.012 (±0.008) | ||
|
| famSKAT | MONSTER | FREGAT | |||
| famBT | famFLM | FFBSKAT | MLR | |||
| 0.1 | 0.097 (±0.029) | 0.128 (±0.021) | 0.1 (±0.03) | 0.104 (±0.032) | 0.101 (±0.032) | 0.104 (±0.032) |
| 0.05 | 0.05 (±0.02) | 0.072 (±0.019) | 0.048 (±0.02) | 0.051 (±0.023) | 0.058 (±0.022) | 0.053 (±0.023) |
| 0.01 | 0.011 (±0.008) | 0.022 (±0.006) | 0.01 (±0.008) | 0.016 (±0.014) | 0.012 (±0.009) | 0.016 (±0.013) |
Empirical type-1 error rates at the several significance levels and their standard errors which is in parenthesis were estimated with GAW17 simulation data
Fig. 3Power comparison of the proposed methods (cFARVAT-b, cFARVAT-o and famVT) and compared methods. Statistical powers were estimated with GAW17 dataset. X and Y axes indicate significance level for power evaluation and statistical power estimates, respectively. Figures (a) and (b) show results without and with PC adjustment, respectively
Fig. 4QQ plots of results from WISARD and compared methods for phenotypes of EOCOPD dataset. Red and black straight lines indicate y = x, and its confidence intervals respectively. a QQ plots of results from WISARD methods. b QQ plots of results from compared methods
Number of significant genes at the Bonferroni-adjusted 0.05 significance level
| Phenotype | WISARD | |||||
| PedCMC | famVT | SKAT | cFARVAT-s | cFARVAT-b | cFARVAT-o | |
| DPOF2575 | 4 |
| 0 |
|
|
|
| F2575RAT | 0 |
| 1 |
|
|
|
| FEVPRE | 1 |
| 1 |
|
|
|
| FVCPST | 4 |
| 2 |
|
|
|
| RATIO | 1 |
| 0 |
|
|
|
| Phenotype | famSKAT | MONSTER | FREGAT | |||
| famBT | famFLM | FFBSKAT | MLR | |||
| DPOF2575 | 5 | 1 | 1 | 11 | 5 | 11 |
| F2575RAT | 0 | 0 | 0 | 4 | 0 | 4 |
| FEVPRE | 2 | 0 | 0 | 11 | 3 | 11 |
| FVCPST | 9 | 0 | 0 | 34 | 9 | 34 |
| RATIO | 0 | 0 | 0 | 0 | 0 | 0 |
Rare variant association analyses of DPOF2575, F2575RAT, FEVPRE, FVCPST and RATIO were conducted with EOCOPD data. Upper and lower table display results from WISARD and other toolsets, respectively. Bolded numbers represent the number of identified genes from newly proposed methods
Fig. 5An example of plots and summary tables generated from Web-WISARD