| Literature DB >> 33462484 |
Nasa Sinnott-Armstrong1,2,3, Yosuke Tanigawa4, Manuel A Rivas5, David Amar6,7, Nina Mars8, Christian Benner8, Matthew Aguirre6, Guhan Ram Venkataraman6, Michael Wainberg9, Hanna M Ollila8,10,11, Tuomo Kiiskinen8,12, Aki S Havulinna8,12, James P Pirruccello13,14, Junyang Qian15, Anna Shcherbina8,7, Fatima Rodriguez7, Themistocles L Assimes16,7, Vineeta Agarwala7, Robert Tibshirani6,15, Trevor Hastie6,15, Samuli Ripatti8,14,17, Jonathan K Pritchard18,19, Mark J Daly8,14,20.
Abstract
Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33462484 PMCID: PMC7867639 DOI: 10.1038/s41588-020-00757-z
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Figure 1.Schematic overview of the study.
We prepared a dataset of 35 serum and urine biomarkers from 363,228 individuals in UK Biobank. We analyzed the genetic basis of these biomarkers, assessed their relationship to medically relevant phenotypes, and generated predictive models of disease outcomes from genome-wide data.
Figure 2.Genetics of 35 biomarkers. (top left inset)
Correlation of phenotypic (lower triangular matrix) and genetic (upper triangular matrix) effects plot between the 35 lab phenotypes, estimated using LD Score regression. The absolute heritability estimates with standard errors are in Supplementary Table 11a. (main panel) Fuji plot of lab phenotypes across the six categories provided by UK Biobank and genetic variant associations shown for LD independent variants with meta-analysis p < 5 × 10−9. Large-effect protein-truncating and protein-altering variants (labeled when abs(beta) >= 0.1 standard deviation [SD]) annotated with the category of association displayed (colored fill boxes) and highlighted if the loci were not previously reported in the comparison studies (Methods). Pleiotropic association and trait-specific association are shown by different sized circles. The p-values were from two-sided tests and were not corrected for multiple hypothesis testing.
Figure 3.Summary of fine-mapped associations across 35 biomarker traits.
(a) FINEMAP analysis summary. (top) The number of identified distinct association signals (color gradient from green to blue) in each region with at least one genome-wide significant (UK Biobank meta-analysis p < 5 × 10−9) association and the number of regions are shown, such as a single signal at 33 regions and two to forty signals at 5,330 regions across 35 traits. (bottom) The number of identified candidate causal variants in the credible set with >= 99% posterior probability (color gradient from green to blue) and the number of signals are shown, such as 2,547 signals were mapped to a single variant in the credible set across 35 traits. (b) Breakdown of the number of fine-mapped associations with posterior probability greater than 0.95 or 0.99 across all biomarkers. Orange, posterior greater than 0.99, green, posterior between 0.95 and 0.99. The total variance explained for each trait is shown and in Supplementary Table 14b. (c) Allelic series showing combined missense, non-coding, and rare copy number variants at the SLCO1B1/SLCO1B3 on total bilirubin levels. Copy number variants annotated below axis and SNPs and short indels annotated above the axis. (d) Pleiotropic effects of fine-mapped rare coding (rs114303452, left) and common non-coding (rs59950280, right) variants at the HGFAC locus. Darker colors of purple indicate more significant associations. The p-values were from two-sided tests and were not corrected for multiple hypothesis testing. The error bars represent standard deviations.
Figure 4.Causal inference, transferability of polygenic risk scores, and complex trait association in polygenic risk tails.
(a) Mendelian Randomization estimates causal links between biomarkers (blue nodes) and selected complex traits (red nodes). Association arrows are drawn based on effect direction (red decreasing, blue increasing). Associations were adjusted for FDR 5% cutoff across all tests (Methods, Supplementary Table 16). Edge width is proportional to the absolute causal effect size (log odds per standard deviation). (b) Summary of prediction accuracy of the snpnet polygenic scores across traits, evaluated on a held-out test set in White British as well as other 4 populations in UK Biobank. (c) (x-axis) Biomarker polygenic risk scores for the top 1%, top 10%, bottom 1%, and bottom 10% of individuals and their association to different diseases in UK Biobank, represented as the odds ratio of the disease in this group relative to the 40–60% quantiles. Traits without rows did not have any outcomes with FDR-adjusted significant associations.
Figure 5.Multiple regression with biomarker polygenic scores improve prevalent and incident disease prediction.
(a) (x-axis) quantiles of polygenic risk score, spaced to linearly represent the mean of the corresponding bin of scores. (y-axis) Prevalence of chronic kidney disease (n = 2,780 cases and n = 89,409 total, defined by verbal questionnaire and hospital in-patient record ICD code data) within each quantile bin of the polygenic risk score. Error bars represent the standard error around each measurement, and individuals evaluated are held-out European-ancestry individuals in UK Biobank. (b) ROC curve with AUC for chronic kidney disease, comparing the snpnet-derived polygenic score to a multi-PRS model trained across biomarkers as well. Individuals evaluated are held -out European-ancestry individuals in UK Biobank. (c) AUC-ROC estimates for prediction of 10 disease outcomes in a held-out test set of the UK Biobank. Diabetes was run using both a strict definition (excluding from control individuals with HbA1c < 39) and the complete sample (Methods). (d) Hazard ratios for the incidence of type 2 diabetes (n = 17,519), chronic kidney disease (n=3,058), myocardial infarction (n=7,913), heart failure (n = 13,965), gout (n = 1,936), gallstones (n = 11,629), and cirrhosis (n=845) in FinnGen using the standard single-disease PRS trained on UK Biobank using snpnet versus the multi-PRS including both biomarker PRSs and the trait PRS. The strict definition of type 2 diabetes is shown. Error bars represent 95% confidence intervals and points represent mean hazard ratio estimates.