| Literature DB >> 26866608 |
Arne Schillert1, Stefan Konigorski2.
Abstract
For Genetic Analysis Workshop 19, 2 extensive data sets were provided, including whole genome and whole exome sequence data, gene expression data, and longitudinal blood pressure outcomes, together with nongenetic covariates. These data sets gave researchers the chance to investigate different aspects of more complex relationships within the data, and the contributions in our working group focused on statistical methods for the joint analysis of multiple phenotypes, which is part of the research field of data integration. The analysis of data from different sources poses challenges to researchers but provides the opportunity to model the real-life situation more realistically.Our 4 contributions all used the provided real data to identify genetic predictors for blood pressure. In the contributions, novel multivariate rare variant tests, copula models, structural equation models and a sparse matrix representation variable selection approach were applied. Each of these statistical models can be used to investigate specific hypothesized relationships, which are described together with their biological assumptions.The results showed that all methods are ready for application on a genome-wide scale and can be used or extended to include multiple omics data sets. The results provide potentially interesting genetic targets for future investigation and replication. Furthermore, all contributions demonstrated that the analysis of complex data sets could benefit from modeling correlated phenotypes jointly as well as by adding further bioinformatics information.Entities:
Mesh:
Year: 2016 PMID: 26866608 PMCID: PMC4895558 DOI: 10.1186/s12863-015-0317-6
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Fig. 1Results of PubMed literature search. Results of a literature search on PubMed on June 25, 2015, for articles published between January 1, 1990 and June 1, 2015, containing any of “(data integration OR joint model OR joint analysis OR multiple phenotypes OR multivariate model OR multivariate statistics)” as well as “(omics OR NGS OR high-throughput OR xx)”, where xx was any permutation of 2 omics measures from genomics, transcriptomics, epigenomics, proteomics, and metabolomics, each parameterized by different possible keywords. The number of retrieved articles in 2015 (left panel) is multiplied by 2.4 to predict published articles in 2015. The top 20 journals with the most publications are shown in the right panel
Fig. 2Simplified sketches of the underlying biological models and assumptions of the working group papers. BP , BP measure at the i th visit; DBP, diastolic blood pressure; GE, gene expression; LBP, latent variable affecting both systolic and diastolic blood pressure; MURAT, multivariate rare-variant association test; SBP, systolic blood pressure; SEM, structural equation modeling; SNV, single nucleotide variant; SRVS, sparse representation-based variable selection. For a more detailed presentation of the models, please refer to the original articles
Overview of the analyzed sample and data in the contributions
| Ref. # | Contribution | Sample | BP data | GE and genetic data | Method | Software | Main findings |
|---|---|---|---|---|---|---|---|
| [ | Cao et al. [2015] |
| Real data: SBP at time point 3 | GE and SNP data: k = 11,522 transcripts, l = 354,893 SNPs | SRVS | Matlab-toolbox | Of top 1000 variables associated with BP, 575 are SNPs and 425 are GE, 302 have plausible relevance for BP, 173 are associated with body weight, and 84 associated with left ventricular contractility |
| [ | Konigorski et al. [2015] |
| Real data: SBP at time point 1 | GE and WGS data on chromosome 19: k = 848 transcripts, l = 68,727 SNVs | Copula | R functions, available upon request | Higher power of bivariate copula models compared to univariate regression and univariate SKAT, SKAT-O |
| Identification of 5 SNVs in CEACAM5 gene relevant for SBP, and 1075 | |||||||
| [ | Song et al. [2015] |
| Real data: SBP and DBP at time points 1–3 | SNP data: l = 460,359 SNPs | SEM | R-package | The 2 tested models (autoregressive and latent growth curve) show similar ranking of relevant SNPs |
| Identification of 10 SNPs related to both SBP and DBP, mostly on chromosome 1 | |||||||
| [ | Sun et al. [2015] |
| real data: SBP and DBP | WES data: l = 152,337 SNVs | MURAT | R functions, available upon request | Multivariate tests tend to give smaller |
| Identification of 2 SNPs in CYP4A22 and near APOC4, which were previously reported to be associated with BP |
BP blood pressure, eQTL expression quantitative trait locus, GE gene expression, MURAT multivariate rare-variant association test, SBP/DBP systolic/diastolic blood pressure, SEM structural equation modeling, SKAT sequence kernel association test, SKAT-O optimal sequence kernel association test, SNP single nucleotide polymorphism, SNV single nucleotide variant, SRVS sparse representation variable selection, WES whole exome sequence, WGS whole genome sequence