| Literature DB >> 31797641 |
Jack M Wolf1, Martha Barnard, Xueting Xia, Nathan Ryder, Jason Westra, Nathan Tintle.
Abstract
The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and costs, data size and transfer, and privacy and security. The publishing of summary statistics from these biobanks, and the use of them in a variety of downstream statistical analyses, alleviates many of these logistical problems. However, major questions remain about how to use summary statistics in all but the simplest downstream applications. Here, we present a novel approach to utilize basic summary statistics (estimates from single marker regressions on single phenotypes) to evaluate more complex phenotypes using multivariate methods. In particular, we present a covariate-adjusted method for conducting principal component analysis (PCA) utilizing only biobank summary statistics. We validate exact formulas for this method, as well as provide a framework of estimation when specific summary statistics are not available, through simulation. We apply our method to a real data set of fatty acid and genomic data.Entities:
Mesh:
Year: 2020 PMID: 31797641 PMCID: PMC6907735
Source DB: PubMed Journal: Pac Symp Biocomput ISSN: 2335-6928
Fig. 1:Differences of our method’s approximations of slope, standard error of slope, and p-values and those achieved when fitting a model for the first principal component on the raw data. These figures illustrate the high accuracy of our method, even when approximating the covariance structure of the phenotypes.
(a) Difference of observed and predicted SNP slope coefficients on simulated data when approximating phenotype covariance.
(b) Difference of observed and predicted standard errors of the SNP slope coefficient on simulated data when approximating phenotype covariance.
(c) Difference of observed and predicted p-values of SNPs and the first principal component on simulated data when approximating phenotype covariance.(−log10 scale)
The accuracy of our method to estimate the first and second principal components of Omega-3 and Omega-6 fatty acids. Errors were minimal with low variance in all cases. A portion of these errors can be explained by deviations from HWE and missing genotype data.
| Response | Adjustments | Mean Slope Error | Mean % Slope Error | Variance Slope Error | Mean SE Error | Variance SE Error |
|---|---|---|---|---|---|---|
| Omega-3, PC1 | Age, Sex, Cohort | 1.03 × 10−7 | 2% | 9.19 × 10−11 | −1.57 × 10−7 | 3.66 × 10−12 |
| Omega-3, PC2 | Age, Sex, Cohort | −1.67 × 10−8 | 2% | 1.13 × 10−11 | 2.04 × 10−9 | 4.34 × 10−13 |
| Omega-3, PC1 | Age, Sex, Cohort, Omega-6 FA | 4.95 × 10−8 | 4% | 6.53 × 10−11 | 1.17 × 10−8 | 2.42 × 10−11 |
| Omega-3, PC2 | Age, Sex, Cohort, Omega-6 FA | −1.45 × 10−8 | 4% | 1.27 × 10−11 | 2.50 × 10−8 | 4.14 × 10−13 |
| Omega-6, PC1 | Age, Sex, Cohort | 1.71 × 10−7 | 3% | 2.82 × 10−10 | 2.04 × 10−8 | 1.86 × 10−11 |
| Omega-6, PC2 | Age, Sex, Cohort | 4.88 × 10−8 | 2% | 8.07 × 10−11 | −8.72 × 10−8 | 4.17 × 10−12 |
| Omega-6, PC1 | Age, Sex, Cohort, Omega-3 FA | 9.96 × 10−8 | 2% | 2.59 × 10−10 | −2.18 × 10−8 | 8.64 × 10−12 |
| Omega-6, PC2 | Age, Sex, Cohort, Omega-3 FA | 5.27 × 10−8 | 3% | 7.98 × 10−11 | −4.07 × 10−8 | 3.11 × 10−12 |
Fig. 2:Differences of our method’s approximation of SNP slope coefficients, slope standard errors, and p-values on the first principal component of Omega-3 fatty acids, adjusting for age, sex, and cohort using data from the Framingham Heart Study. These figures show our method’s high accuracy.
(a) Approximated and true slopes of the first principal component of Omega-3 fatty acids on FHS data.
(b) Approximated and true slope standard errors of the slope of the first principal component of Omega-3 fatty acids on FHS data.
(c) Difference in observed and predicted p-values of the first principal component of Omega-3 fatty acids on FHS data.(−log10 scale)
Results of significant (p < 2 × 10−7) SNPs from Fatty Acids comparing models with and without fatty acids as covariates. Our method and traditional methods on the raw data found the same SNPs significant in all cases.
| # of SNPs | Chr | Pos | Gene | Significant w/ out FA Covariates | Significant w/ FA Covariates |
|---|---|---|---|---|---|
| 11 | 6 | 10954307–11050290 | ELOVL2 | DPA, O3PC2 | O3PC2, O3PC1 |
| 1 | 6 | 161187057 | AGPAT4 | O6PC3 | |
| 10 | 11 | 61781986–61888710 | FADS1 | LA, ADA, Adrenic, O6PC1, O6PC2 | O6PC1, O6PC2, O3PC1, O3PC3 |
| 5 | 12 | 6966719–7013532 | LPCAT3 | LA, O6PC1 | O6PC1, O3PC1 |
| 2 | 12 | 7057810–7069674 | None | LA, O6PC1 | |
| 1 | 18 | 7881144 | PTPRM | O3PC3 | |