Literature DB >> 33462484

Genetics of 35 blood and urine biomarkers in the UK Biobank.

Nasa Sinnott-Armstrong^1,2,3, Yosuke Tanigawa⁴, Manuel A Rivas⁵, David Amar^6,7, Nina Mars⁸, Christian Benner⁸, Matthew Aguirre⁶, Guhan Ram Venkataraman⁶, Michael Wainberg⁹, Hanna M Ollila^8,10,11, Tuomo Kiiskinen^8,12, Aki S Havulinna^8,12, James P Pirruccello^13,14, Junyang Qian¹⁵, Anna Shcherbina^8,7, Fatima Rodriguez⁷, Themistocles L Assimes^16,7, Vineeta Agarwala⁷, Robert Tibshirani^6,15, Trevor Hastie^6,15, Samuli Ripatti^8,14,17, Jonathan K Pritchard^18,19, Mark J Daly^8,14,20.

Abstract

Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 33462484 PMCID： PMC7867639 DOI： 10.1038/s41588-020-00757-z

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Introduction

Serum and urine biomarkers are frequently measured to diagnose and monitor chronic disease conditions. Understanding the genetic predisposition to particular biomarker states, and the factors that confound them, may have implications for disease treatment. While the genetics of some biomarkers have been extensively studied, most notably lipids[1,2,3], glycemic traits[4-6], and measurements of kidney function[7-9], the genetic basis of most biomarkers has not been queried in large population-scale datasets. To this end, UK Biobank (UKB) has performed laboratory testing of >30 commonly measured biomarkers in serum and urine on a cohort of >480,000 individuals with extensive phenotype and genome-wide genotype data, including the unrelated individuals in this study (Supplementary Figure 1)[10]. Here, we 1) performed a systematic analysis of the genetic architecture and detailed fine-mapping of biomarker-associated loci in 363,228 individuals including protein-altering, protein-truncating (PTV), non-coding, human leukocyte antigen (HLA), and copy number variants; 2) built phenome-wide associations for implicated genetic variants; 3) evaluated causal relationships between biomarkers and 40 medically relevant phenotypes; and 4) constructed polygenic prediction models (Figure 1).

Figure 1.

Schematic overview of the study.

We prepared a dataset of 35 serum and urine biomarkers from 363,228 individuals in UK Biobank. We analyzed the genetic basis of these biomarkers, assessed their relationship to medically relevant phenotypes, and generated predictive models of disease outcomes from genome-wide data.

Results

Biomarker phenotype distributions

We first examined the consistency of the biomarker measurements[10,11]. After adjusting for statin usage (Supplementary Table 1a–c), we fit a regression model with multiple covariates (see Methods). For each biomarker, we measured the proportion of phenotypic variance explained by these covariates; this ranged from 1.7% (Rheumatoid factor) to 90% (Testosterone) depending on the biomarker (Supplementary Figure 2a–c, Supplementary Table 2). We evaluated body mass index as a confounder in associations, and there were minimal differences in genetic effects under this model (Supplementary Tables 3–4). Taking all the 35 lab phenotypes together, we recover several previously estimated phenotype correlations (Figure 2)[12,13].

Figure 2.

Genetics of 35 biomarkers. (top left inset)

Correlation of phenotypic (lower triangular matrix) and genetic (upper triangular matrix) effects plot between the 35 lab phenotypes, estimated using LD Score regression. The absolute heritability estimates with standard errors are in Supplementary Table 11a. (main panel) Fuji plot of lab phenotypes across the six categories provided by UK Biobank and genetic variant associations shown for LD independent variants with meta-analysis p < 5 × 10−9. Large-effect protein-truncating and protein-altering variants (labeled when abs(beta) >= 0.1 standard deviation [SD]) annotated with the category of association displayed (colored fill boxes) and highlighted if the loci were not previously reported in the comparison studies (Methods). Pleiotropic association and trait-specific association are shown by different sized circles. The p-values were from two-sided tests and were not corrected for multiple hypothesis testing.

Genetics of biomarkers

We performed association analysis between directly genotyped and imputed autosomal genetic variants, copy number variations (CNVs), and HLA allelotypes and 35 biomarkers in the unrelated individuals in UK Biobank across 5 population groups (N = 318,953 for White British, 23,582 for non-British White, 6,019 for African, 7,338 for South Asian, and 1,082 for East Asian) followed by meta-analysis of all but East Asian populations (N in meta-analysis = 355,891, Methods, Figure 2, Supplementary Figure 3). We stratified the genetic variants into three bins: 1) protein-truncating (27,816), 2) protein-altering (87,430), and 3) synonymous and non-coding variants (minor allele frequency [MAF] > 0.1% and INFO score > 0.3, imputed variants present in Haplotype Reference Consortium [HRC], 9,444,561[14]) (Figure 2). Comparison of effect sizes estimated across 42 other previously published study cohorts for 25 of the biomarkers showed overall high agreement (Supplementary Figure 4, Supplementary Table 5). This was true when comparing to previous studies of lipids[1,2,15,16], glycemic traits[17,18], kidney function tests[19,20], liver function tests[17], and other biomarker measurements[21,22]. We adjusted the nominal association p-values for multiple hypothesis testing and identified over 10,000 significant associations (Bonferroni-corrected meta-analysis p < 5 × 10−9 for assayed and imputed variants; Bonferroni-corrected p < 1 × 10−6 for non-rare [MAF > 0.1%] CNVs and CNV burden test for 23,598 genes; and Benjamini-Yekutieli [BY] adjusted p < 0.05 for HLA alleles, Methods, Supplementary Figure 5, Supplementary Tables 6–10). Linkage disequilibrium (LD) Score intercepts for single-variant association results were between 0.999 and 1.137 for all 35 phenotypes, consistent with anthropometric traits in UKB and suggesting that population structure in our analysis is well-controlled[23] (Supplementary Table 11a).

Global and local heritability of biomarkers

To characterize the heritability of the 35 biomarkers we first applied LD Score regression[24] to stratify heritability into 10 tissue types and 53 general genomic features (e.g. coding variants and regulatory variants) and further applied the Heritability Estimator from Summary Statistics (HESS)[25,26]. We found that both LD Score regression and HESS indicate common SNPs explain substantial heritability of some but not all biomarkers (0.6% [Lipoprotein A, also referred to as lipoprotein(a)] to 23.9% [IGF-1] using LD Score regression and 3.2% [Microalbumin in urine] to 57% [Total bilirubin] using HESS across the studied continuous phenotypes, Supplementary Tables 11a,b). Estimates were lower in LD Score regression than HESS for traits with lower polygenicity (e.g. Lipoprotein A, h2ldsc = 0.6% and h2HESS = 24%), as LD Score regression estimates polygenic heritability[24]. We compared the polygenicity of all 35 biomarkers by computing the fraction of total SNP heritability attributable to loci by the top 1% of SNPs. We found that three biomarkers have more than 50% of the SNP heritability explained by the top 1% of loci (Lipoprotein A 67.7%, total bilirubin 60.9%, and direct bilirubin 57.5%) while the remaining 32 phenotypes show patterns of moderate to high polygenicity (Supplementary Table 11b).

Associated variants prioritize therapeutic targets

We found 58 (43 rare, MAF < 1%, and 55 not reported in comparison study, Methods) PTV associations and 1,323 (306 rare, 1,079 not reported in comparison studies) protein-altering variant (PAV) associations outside the major histocompatibility complex (MHC) region (hg19 chr6:25,477,797–36,448,354; meta-analyzed p < 5 × 10−9). We found 19 non-MHC PTV associations (17 rare [MAF < 1%]) with large estimated biomarker-lowering effects (>0.1 sd) and 26 (24 rare) with biomarker-raising effects (>0.1 sd) across 31 (27 rare) PTVs and at least one biomarker phenotype, where the same PTV may have both increasing and decreasing associations across different biomarkers (Figure 2, Supplementary Table 6). Similarly, there were 240 (161 rare) and 182 (125 rare) non-MHC PAV associations with large estimated lowering and raising effects (>0.1 sd) across 241 (179 rare) PAVs and at least one biomarker phenotype, respectively (Figure 2, Supplementary Table 7). To assess whether the variants associated with biomarkers impact medically relevant phenotypes, we performed a phenome-wide association analysis (PheWAS) across 166 traits in UK Biobank, compared our findings with previously published literature, and sought independent replication in the FinnGen R2 cohort (Supplementary Tables 12–13, Methods). We found 57 phenotype associations (33 and 24 for increasing and decreasing disease risk, respectively) across 26 medically relevant phenotypes for 2 PTVs and 31 PAVs (p < 1 × 10−7), of which 31 associations were previously reported and 26 were novel (Supplementary Tables 13a, Methods). For eight cardiovascular biomarkers (Supplementary Table 4a), we identified a stop-gain variant in PDE3B with documented protection against high cholesterol and a range of effects on increasing HDL cholesterol and Apolipoprotein A (0.40, 0.27 sd) and decreasing triglycerides and Apolipoprotein B (0.43, 0.27 sd)[2,27]; a stop gain variant in ANGPTL8, where we replicated a previously-reported effect on HDL cholesterol (0.06 sd in our dataset) and discovered a triglyceride-lowering effect (0.06 sd)[28]; two PTVs in LPA with lowering effects on Lipoprotein A levels (0.37, 0.42 sd), of which one is known to be associated with decreased risk of coronary artery disease (p = 3 × 10−11; OR = 0.89 [95% CI 0.86, 0.92])[29]; a 0.2% MAF missense allele in ACACB associated with LDL, triglyceride, ApoB, and alkaline phosphatase[30]; two independent missense alleles in PLA2G12A with increasing effects on triglycerides, sex-hormone binding globulin (SHBG), and testosterone, and lowering effects on HDL cholesterol, ApoA, and HbA1c levels (Supplementary Table 6); a splice region variant in CPT1A, with lowering effects on triglycerides; and a missense variant in PCSK6 with ApoB- and LDL-lowering effects (Supplementary Table 7). For seven liver biomarkers (Supplementary Table 4a), we found a 0.05% MAF inframe deletion in GOT1 with a lowering effect on aspartate aminotransferase (2.6 sd); a 0.1% MAF missense allele in SLC30A10 with increasing effects on alanine and aspartate aminotransferases; four missense alleles in GPT with alanine aminotransferase lowering effects; a missense variant in ABCB4 with increasing effect on alanine aminotransferase and increased risk of gallstones in UK Biobank (p = 1.2 × 10−8, OR = 1.38 [95% CI: 1.23, 1.38]); an allelic series of 3 missense variants in SERPINA1 with pleiotropic increasing effects on albumin, aspartate aminotransferase, direct bilirubin, and gamma glutamyltransferase, and lowering effects on AST to ALT ratio, with one of these missense alleles associated with increased risk of gallstones (p = 8.1 × 10−17, OR = 1.36 [95% CI: 1.27, 1.47]) and cholecystitis (p = 1.6 × 10−8, OR = 1.26 [95% CI: 1.16, 1.37]) in UK Biobank; and two missense alleles in DGKD, with raising and lowering effects, respectively, on direct and total bilirubin (Supplementary Tables 7, 13a). For 12 renal biomarkers (Supplementary Table 4a), we found a PTV in COL4A4 associated with an increasing effect on microalbumin in urine (0.77 sd) and an increased risk of kidney disease (p = 6.7 × 10−13, OR = 6.9 [95% CI: 4.06, 11.6]) in UK Biobank, which is defined using a combination of hospital in-patient record (ICD-10 codes: Q60 [Renal agenesis and other reduction defects of kidney], and its sub-concepts) and self-report kidney diseases (coded as 1405 [other renal/kidney problem] in UK Biobank)[31]; a frame-shift variant in SLC22A2 with strong lowering effects on eGFR (0.52 sd) and increasing effect on creatinine (0.52 sd); a stop-gain variant in SLC22A11 with raising effects on urate (0.14 sd; Supplementary Tables 6, 13a); a 0.1% rare missense allele in SLC34A3 with strong eGFR and phosphate lowering and serum creatinine, Cystatin C, and urea raising effects; missense alleles in SLC6A19, LRP2, ALDOB, and SLC7A9, and two missense variants in SLC25A45, all associated with creatinine lowering and eGFR raising, among other examples (Supplementary Table 7). Notably, the majority of these genes are known to have high expression levels in renal tissue[32]. For three bone and joint biomarkers (Supplementary Table 4a), we found an allelic series of two frame-shift variants and a missense variant in GPLD133, in addition to an allelic series of missense variants in ALPL. Similarly, we found an allelic series in CASR that is associated with both calcium increasing and lowering effects (Supplementary Tables 6, 7). For glucose and HbA1C (biomarkers for diabetes, Supplementary Table 4a), we found a known missense variant association in ANKH (−0.11 and −0.17 sd for glucose and HbA1C, respectively), which we also replicated the documented protective effects to diabetes (p = 1.2 × 10−8, OR = 0.66 [95% CI: 0.57, 0.76]). We also found a splice-donor variant in RHAG that is strongly associated with lower HbA1c (0.80 sd) and allelic series containing 4 missense variants each in G6PC2 and TMC8 (Figure 2, Supplementary Tables 6, 7). For three hormone biomarkers (Supplementary Table 4a), we found a PTV in ADH1C, MSR1, and NUBP2 affecting serum IGF-1 levels, and an allelic series including the hepatocyte growth factor genes HGFAC, HGF, and HNF4A with effects on SHBG. Among those, we identified novel associations with HNF4A alleles: a missense variant with MAF 0.02% was associated with increased risk for diabetic eye disorders (p = 3.1 × 10−8, OR = 9.60 [95% CI: 4.30, 21.4]) and diabetes (p = 4.7 × 10−8, OR = 3.8 [95% CI: 2.34, 6.09]) and another missense variant (MAF 3.1%) was associated with increased risk for cholecystitis (p = 2.2 × 10−13, OR = 1.27 [95% CI: 1.22, 1.38] in UK Biobank, and also replicated in FinnGen R2, p = 2.9 × 10−17, OR = 1.46 [95% CI: 1.34, 1.60]) (Figure 2, Supplementary Tables 6, 7, 13a). These results suggest that the genetic underpinning of biomarker levels could aid in prioritizing and better understanding the mechanisms of disease-associated variants.

CNVs and HLA allelotypes influencing biomarkers

Copy number variations (CNV) constitute a significant fraction of all base pair differences between individuals. We found 13 unique associations across 10 individual CNVs (Bonferroni p < 1 × 10−6, MAF > 0.01%, Supplementary Table 10a)[34]. We performed aggregate rare (MAF < 0.1%) CNV burden tests, pooling CNVs in each gene, for 23,598 genes. We found 29 gene-level associations (Bonferroni p < 1 × 10−6; Supplementary Table 10a) including a burden of rare CNVs overlapping HNF1B associated with serum urea, eGFR, creatinine, and cystatin C (the least significant p = 8.8 × 10−13) estimated to have large effects (beta = 0.77, −0.90, 0.93, 0.98 sd, respectively; Supplementary Figure 6a). Previous studies have associated mutations in HNF1B with maturity onset diabetes of the young (MODY5) and altered kidney function[35].The rare CNVs overlapping HNF1B were associated with chronic kidney disease (p = 1 × 10−7; OR = 4.94, SE = 0.30; Supplementary Figure 6a)[36,37] in a diabetes-dependent fashion (Supplementary Table 10b). We found a rare duplication in the CST3 gene associated with increased levels of cystatin C, the protein it encodes, of opposite effect to a rare PTV at the same locus (Supplementary Figure 6b). These results highlight the value of CNV analysis with potentially large effects on lab measurements. To identify HLA allelotype associations that are not driven by pervasive LD structure in the HLA region, we applied Bayesian model averaging (Methods) to the significant allelotype-trait pairs (BY adjusted p-value < 0.05). We found 58 associations across 28 biomarker traits and allelotypes (Supplementary Table 9).

Fine-mapping of common associated variants

To nominate potentially causal variants at loci with common (MAF > 1%) variant associations, we performed fine-mapping analysis. Specifically, focusing on the White British summary statistics, we applied FINEMAP[38,39]. From over 9,000 biomarker-associated loci, we identified 27,853 distinct signals in 5,363 regions across 35 traits. In the identified credible sets, 17,696 signals were fine-mapped to 50 or fewer variants with posterior probability of including the causal variant > 0.99; at 2,547 biomarker-associated loci, we resolved the signal to a single nominated causal variant (Figure 3a, Supplementary Table 14a). Moreover, we identified 3,374 unique trait-variant associations with a posterior probability > 0.99 of being the causal variant (Figure 3b, Supplementary Table 14a). These explain between 0% (urine potassium) and 48% (Lipoprotein A) of the residual trait variance (Supplementary Table 14b).

Figure 3.

Summary of fine-mapped associations across 35 biomarker traits.

(a) FINEMAP analysis summary. (top) The number of identified distinct association signals (color gradient from green to blue) in each region with at least one genome-wide significant (UK Biobank meta-analysis p < 5 × 10−9) association and the number of regions are shown, such as a single signal at 33 regions and two to forty signals at 5,330 regions across 35 traits. (bottom) The number of identified candidate causal variants in the credible set with >= 99% posterior probability (color gradient from green to blue) and the number of signals are shown, such as 2,547 signals were mapped to a single variant in the credible set across 35 traits. (b) Breakdown of the number of fine-mapped associations with posterior probability greater than 0.95 or 0.99 across all biomarkers. Orange, posterior greater than 0.99, green, posterior between 0.95 and 0.99. The total variance explained for each trait is shown and in Supplementary Table 14b. (c) Allelic series showing combined missense, non-coding, and rare copy number variants at the SLCO1B1/SLCO1B3 on total bilirubin levels. Copy number variants annotated below axis and SNPs and short indels annotated above the axis. (d) Pleiotropic effects of fine-mapped rare coding (rs114303452, left) and common non-coding (rs59950280, right) variants at the HGFAC locus. Darker colors of purple indicate more significant associations. The p-values were from two-sided tests and were not corrected for multiple hypothesis testing. The error bars represent standard deviations.

Glycemic trait fine-mapping

We discovered fine-mapped associations for glycemic traits, including multiple variants at the TGFB1/AXL locus; rare missense variants in PFN1 and GYPC (previously implicated in a small GWAS of Mexican Americans)[40]; an intronic variant at the cytokine receptor IL6R; a downstream variant at VEGFA41; a missense variant at HFE, the gene responsible for hemochromatosis[42,43]; and an intronic variant at CD33 and 3’UTR variant at CD36 (Supplementary Table 14a). CD36 encodes for a well-studied fatty acid receptor and biomarker for type 2 diabetes (T2D)[44,45], and CD33 levels are known to be perturbed in T2D cases[46].

Allelic series at the SLCO1B locus

We discovered several alleles implicated in the genetic control of bilirubin levels at the SLCO1B locus (Figure 3c). We find several heterozygous deletion events, and single-nucleotide variants that we fine-mapped to two main signals: a missense variant in SLCO1B1 (rs34671512, marginal beta = −0.11 sd, p = 1.25 × 10−95) and a non-coding association in an intron of SLCO1B3 (rs11045598, marginal beta = 0.076 sd, p = 1.31 × 10−139). Despite two PTVs (one in SLCO1B1 and one in SLCO1B3) at the locus, neither had a conditionally independent effect on bilirubin levels. The diversity of variant types at this critical bilirubin and drug transporter suggest that large-effect loci can harbor variants with multiple independent genetic mechanisms contributing to their trait associations.

HGFAC pleiotropy

We scanned for loci with large effects across multiple biomarkers. The most prominent of these is HGFAC, the gene encoding hepatocyte growth factor activator. At this locus, we discovered two independent fine-mapped variants, rs114303452 (a missense variant with MAF = 1%) and rs59950280 (a non-coding variant with MAF = 34%). These two variants show significant associations with a number of diverse biomarker traits, including lipids, IGF-1, albumin, and calcium (Figure 3d). In addition, rs114303452 has been previously associated with serum HGF levels[47], supporting the role of HGFAC in control of a number of other serum biomarkers through regulation of hepatocyte growth factor.

Targeted phenome-wide association analysis

We conducted PheWAS of the fine-mapped imputed variants across 166 UK Biobank phenotypes and identified 14 and 263 coding and non-coding associations, of which 109 were not previously reported in literature (p < 10−7, Supplementary Table 12, 13b–c, Methods). For example, a common (MAF = 33%) intronic variant in DPEP1 has protective effects against skin cancers (OR = 0.88 [95% CI: 0.86, 0.90], 0.81 [0.77, 0.84], and 0.89 [0.87, 0.91] for skin cancer, malignant melanoma, and non-melanoma skin cancer, respectively), with replication in FinnGen R2 (p = 3.1 × 10−5, OR = 0.81 [95% CI: 0.74, 0.90] for malignant neoplasm of skin). An allelic series of two intronic variants in ABCG2 identified with increasing and lowering urate level associations that have risk-increasing (p = 2.8 × 10−67, OR = 1.38 [95% CI: 1.33, 1.44]) and protective (OR = 0.72 [95% CI: 0.69, 0.74]) associations with gout, respectively. Both of these associations with gout are also replicated in FinnGen R2 (p = 6.3 × 10−6, OR = 1.25 [95% CI: 1.13, 1.37] and p = 8.4 × 10−5, OR = 0.84 [95% CI: 0.78, 0.92]). Those two variants (r2 = 0.47) have low linkage with a known common protein-altering variant in ABCG2 (r2 = 0.22 and 0.11 in UKB White British for rs2231142 [Q141K]) which contributes to risk of gout[48]. These results indicate that variants with effects on biomarkers may have pleiotropic effects across medically relevant phenotypes.

Causal inference

Given the relevance of several of the biomarkers studied to disease conditions we estimated causal effects of biomarker levels on 40 medically relevant phenotypes (including 32 diseases; Supplementary Table 15) using two-sample Mendelian Randomization with the genome-wide significant variants for each biomarker as instrumental variables[49-52] (see Methods). We identified 51 significant causal relationships at FDR of 5% (Figure 4a, Supplementary Table 16). Many of these and their causal effects are well-described. We found genetic evidence supporting the protective effect of sex hormone binding globulin on diabetes (0.7 OR/SD) consistent with existing reports[53-54], of ApoA on fasting glucose (0.84 OR/SD, FDR-adjusted p = 0.02)[55-57], as well as an increasing effect of alanine aminotransferase levels on diabetes (1.53 OR/SD, FDR-adjusted p = 0.0018)[58,59]. There was a consistent effect of cystatin C on stroke (1.2 OR/SD, FDR-adjusted p = 8.7×10−4 for any stroke and 1.21 OR/SD, FDR-adjusted p = 2.8×10−3 for ischemic stroke)[60,61]. Finally, both HDL and ApoA were associated with increased risk of age-related macular degeneration[62], as was cystatin C[63-65].

Figure 4.

Causal inference, transferability of polygenic risk scores, and complex trait association in polygenic risk tails.

(a) Mendelian Randomization estimates causal links between biomarkers (blue nodes) and selected complex traits (red nodes). Association arrows are drawn based on effect direction (red decreasing, blue increasing). Associations were adjusted for FDR 5% cutoff across all tests (Methods, Supplementary Table 16). Edge width is proportional to the absolute causal effect size (log odds per standard deviation). (b) Summary of prediction accuracy of the snpnet polygenic scores across traits, evaluated on a held-out test set in White British as well as other 4 populations in UK Biobank. (c) (x-axis) Biomarker polygenic risk scores for the top 1%, top 10%, bottom 1%, and bottom 10% of individuals and their association to different diseases in UK Biobank, represented as the odds ratio of the disease in this group relative to the 40–60% quantiles. Traits without rows did not have any outcomes with FDR-adjusted significant associations.

We estimated a causal protective effect of testosterone on inflammatory bowel disease (0.70 OR/SD, FDR-adjusted p = 3.86×10−3)[66,67], and a protective effect of urate on breast cancer risk (0.87 OR/SD, FDR-adjusted p = 0.033)[68].

Polygenic prediction of biomarkers

The vast size of the UK Biobank cohort affords the opportunity to build predictive polygenic risk models of biomarkers from genotype data alone[69]. We constructed PRS for all 35 biomarkers using batch screening iterative lasso (BASIL) implemented in the R snpnet package[70,71]. Specifically, we split the White British individuals into 70% training, 10% validation (to identify the optimal sparsity parameter), and 20% test sets and evaluated the predictive performance (R2) in the held-out test set (n = 63,818) as well as in 4 populations in UK Biobank (see Methods). We found the mean predictive performance relative to the White British test set for these 4 populations were 93%, 70%, 51%, and 24%, respectively, suggesting these polygenic models have limited generalizability across populations (Figure 4b, Supplementary Figure 7, Supplementary Table 17)[72]. As an external validation, we found that the PRSs had high portability to self-identified white individuals from the MESA cohort (Supplementary Table 18)[73].

Multiple regression with PRSs

We hypothesized that the 35 biomarker PRSs may improve the prediction of higher-level traits and diseases in combination with the PRS for the trait or disease itself. To this end, we constructed multi-PRS models for traits by using multiple regression to predict the trait or disease from a) its own PRS, b) the PRSs for each of the 35 biomarkers, and c) the covariates age, sex, and principal components (Methods). We selected disease endpoints for multi-PRSs analysis by considering the enrichment of disease prevalence at the tails of the distribution of the single-trait biomarker PRSs (Figure 4c, Supplementary Table 19, Supplementary Figure 8). We focused on traits with three or more associated biomarkers (Supplementary Figure 9), as we reasoned these would benefit most from the combination of multiple biomarker PRSs. For chronic kidney disease, the multi-PRS stratified individuals according to disease status better than the snpnet PRS (Figure 5a–b, Supplementary Table 20). In contrast, the myocardial infarction snpnet PRS was equally stratifying as compared to the multi-PRS, with both explaining a substantial portion of trait heritability (AUC 0.58–0.59; Figure 5c). This trend held after including additional existing polygenic scores for type 2 diabetes and myocardial infarction as well (AUC 0.594 and 0.611 respectively; Supplementary Figure 10, Supplementary Tables 20–23). This suggests that the genetic basis of myocardial infarction, as previously reported[74], already captures the majority of the genetic component of serum lipids and other biomarkers. Similar weak effects of biomarkers were estimated in hypertension, angina, and gallstones, while alcoholic cirrhosis, gout, type 2 diabetes, and heart failure were better predicted with multi-PRS models (Figure 5c, Supplementary Table 20). Improved predictions relied on relevant and variable biomarkers across these traits (Supplementary Figure 11), including eGFR, creatinine, cystatin C, and bilirubin for CKD; creatinine, bilirubin, total and LDL cholesterol, cystatin C, and eGFR for heart failure; and bilirubin, GGT, eGFR, HDL cholesterol, and IGF-1 for alcoholic cirrhosis.

Figure 5.

Multiple regression with biomarker polygenic scores improve prevalent and incident disease prediction.

(a) (x-axis) quantiles of polygenic risk score, spaced to linearly represent the mean of the corresponding bin of scores. (y-axis) Prevalence of chronic kidney disease (n = 2,780 cases and n = 89,409 total, defined by verbal questionnaire and hospital in-patient record ICD code data) within each quantile bin of the polygenic risk score. Error bars represent the standard error around each measurement, and individuals evaluated are held-out European-ancestry individuals in UK Biobank. (b) ROC curve with AUC for chronic kidney disease, comparing the snpnet-derived polygenic score to a multi-PRS model trained across biomarkers as well. Individuals evaluated are held -out European-ancestry individuals in UK Biobank. (c) AUC-ROC estimates for prediction of 10 disease outcomes in a held-out test set of the UK Biobank. Diabetes was run using both a strict definition (excluding from control individuals with HbA1c < 39) and the complete sample (Methods). (d) Hazard ratios for the incidence of type 2 diabetes (n = 17,519), chronic kidney disease (n=3,058), myocardial infarction (n=7,913), heart failure (n = 13,965), gout (n = 1,936), gallstones (n = 11,629), and cirrhosis (n=845) in FinnGen using the standard single-disease PRS trained on UK Biobank using snpnet versus the multi-PRS including both biomarker PRSs and the trait PRS. The strict definition of type 2 diabetes is shown. Error bars represent 95% confidence intervals and points represent mean hazard ratio estimates.

Encouraged by these findings, we evaluated the potential of these improved polygenic scores in identifying disease cases by applying both trait-specific PRS and combined PRS in an independent replication cohort, FinnGen (R3, n = 135,500, Supplementary Tables 24–26). Here, we found evidence that the combination of PRS increased the effect size in chronic kidney disease (hazard ratio (HR) = 0.99, p = 0.46 for snpnet PRS and hazard ratio = 1.12, p = 2.09 × 10−10 for multi-PRS, Figure 5d, Supplementary Figure 12, Supplementary Table 24), type 2 diabetes (HR = 1.37, p < 2 × 10−16 for snpnet PRS and HR = 1.49 for multi-PRS), gout (HR = 1.39, p < 2 × 10−16 for snpnet PRS and HR = 1.58 for multi-PRS), heart failure (HR = 1.01, p = 0.38 for snpnet PRS and HR = 1.08, p < 2 × 10−16 for multi-PRS), and alcoholic cirrhosis (HR = 0.97, p = 0.35 for snpnet PRS and HR = 1.18, p = 1.04 × 10−6 for multi-PRS, Supplementary Table 24). Similar results to UK Biobank were found in models including existing polygenic scores (Supplementary Table 26, Supplementary Figure 13), with the integrated type 2 diabetes model, including both pre-existing PRS and biomarker PRS, resulting in 1.67 HR change per standard deviation. This suggests that multiple regression of polygenic risk for biomarkers might capture multiple underlying disease states and/or underlying causes, and that these multiple states are predictive of disease.

Discussion

Using data from 35 biomarkers in ~363,000 UKB samples, we provide an assessment of genetic associations with biomarker levels, the relevance of these associations in disease phenotypes, and their utility in risk stratification. Protein-altering variants that modify biomarker levels and disease risk can provide in vivo validation of therapeutic targets[75,76]. Here, we found multiple protein-altering variants that directly implicate genes associated with the studied biomarkers, and we hypothesize that some of these genes may provide potential therapeutic targets. To assess the translatability of our findings, we built predictive models aggregating trait PRS with those of the biomarkers, improving the predictive accuracy of multiple disease outcomes both overall and especially at the extremes of genetic risk. Given that biomarker values are already routinely collected in structured data formats, we anticipate that the multi-PRS methods could inform clinical practice in the coming years, as a larger fraction of the population is genotyped and sequenced. In addition to the discovery of multiple individual loci and candidate causal variants, we can also draw some general conclusions across the traits evaluated with our multi-PRS models. Traits and diseases were predicted best when they had individually predictive biomarkers and a complex etiology (e.g. chronic kidney disease), but underpowered genetic studies. We believe that a large number of disease cases is typically most useful in developing well-powered models, as it helps both with the baseline polygenic score and fitting of the multi-PRS components. Further exploration of the conditions where multi-PRS models perform particularly well is an area of future study. Numerous limitations to this work are present. We assigned individuals to ancestry groups based on self-reported ethnicity categories and the top two principal components of the genotype matrix. We included many technical covariates in order to reduce bias in the measurements of the biomarkers, but doing so has the potential to reduce power. We fine-map based on imputed genotypes and summary statistics, and both reduce power to detect true causal variants. In addition, the large and complex linkage present at some loci, including notably the LPA locus, might result in spurious fine-mapped and rare coding variant associations, though conditional analyses (e.g. of a rare coding variant in SLC22A2) are inconclusive (Supplementary Figure 14). Similarly, causal inference using individual-level data[77] can increase power and reduce bias, and we recommend it for future studies. Lastly, we anticipate that including other genetic risk scores will fit well into the multi-PRS framework to further improve prediction of common complex disease. The genome-wide resource made available with this study, including the association summary statistics, fine-mapped regions, and polygenic prediction models (Data availability)[78], provides a starting point for causal mapping of genetic variants affecting the 35 biomarkers and their relevance to medical phenotypes. These results highlight the benefits of direct measurements of biomarkers in population cohorts for interpreting the genetic basis of biomarkers and improved prediction of multiple common diseases.

Online Methods

Genotype and phenotype data in UK Biobank

We used genotype datasets from the UK Biobank (release version 2 for the directly genotyped variants and the imputed HLA allelotype datasets and release version 3 for the imputed genotype dataset), the copy number variation dataset[34], and the hg19 human genome reference for all analyses in the study[10]. To minimize the variability due to population structure in our dataset, we restricted our analyses to unrelated individuals based on the following four criteria reported by the UK Biobank in the sample QC file, “ukb_sqc_v2.txt”: 1) used to compute principal components (“used_in_pca_calculation” column); 2) not marked as outliers for heterozygosity and missing rates (“het_missing_outliers” column); 3) do not show putative sex chromosome aneuploidy (“putative_sex_chromo- some_aneuploidy” column); and 4) have at most 10 putative third-degree relatives (“excess_relatives” column). Additionally, we used the “in_white_British_ancestry_subset” column in the sample QC file as a part of the population definition as shown below. We used a combination of self-reported ethnicity (UK Biobank field ID: 21000) and principal component analysis and analyzed 5 subpopulations in the study: self-identified White British (n = 337,151 individuals), African (6,498), East Asian (1,772), South Asian (7,962), and self-identified non-British White (24,909). We first used the genotype principal components (PCs) of the genotyped variants from the UK Biobank and defined thresholds on PC1 and PC2 and further refine the population definition (described in Supplementary Note). We subsequently focused on a subset of individuals with non-missing values for covariates and biomarkers as described below.

Variant annotation and quality control

Detailed information on variant annotation and quality control is described in Supplementary Note.

Biomarker phenotype definition

Phenotype and covariate quality control excluded rheumatoid factor and estradiol from further analyses, and fasting glucose (available for 17,439 self-reported fasting individuals) was used as a phenotype-level quality control for the glucose measurements -- throughout the text, “glucose” refers to glucose levels adjusted for fasting time rather than the GWAS among only fasting individuals (self-report of more than 7 and less than 24 hours of fasting, n = 17,439) unless otherwise noted. We focused on 32 biomarkers for genetic analysis and also defined three derived phenotypes, estimated glomerular filtration rate (eGFR), non-albumin protein, and AST to ALT ratio, for a total of 35 biomarkers (Supplementary Table 4a). The eGFR measure is an indicator of renal function and is defined by the CKD-EPI equation[79]. We defined non-albumin protein as the difference between the total protein and albumin. Then after applying covariate correction (see Covariate correction below, Supplementary Table 4b), we additionally defined the AST to ALT ratio as the difference of the (log-transformed) estimates for aspartate aminotransferase and alanine aminotransferase.

Statin identification and LDL adjustment

Statin identification and LDL adjustment is described in Supplementary Note.

Covariate correction

Covariate adjustment is described in Supplementary Note.

Definition of type 2 diabetes

We used the definition of type 2 diabetes from our previous paper, including removal of type 1 diabetes cases from both cases and controls[80]. We use the terminology from Eastwood et al. throughout this description[81]. Type 2 diabetes was assigned case status for “probable type 2 diabetes” and “possible type 2 diabetes” and control status for “type 2 diabetes unlikely”; in addition, individuals with “probable type 1 diabetes,” “possible type 1 diabetes,” or “ probable gestational diabetes” were excluded. Finally, for the “strict” type 2 diabetes definition, we removed controls with HbA1c ≥ 39 mmol/mol.

Genome-wide association analysis

We performed genome-wide association analyses using the following four datasets: 1) The directly genotyped variants on the array (for protein-truncating and protein-altering variants); 2) The imputed variants (version 3); 3) The imputed HLA alleles; and 4) The copy number variations (CNVs) and gene-level aggregated CNV burden[34]. All the p-values from the association analyses are from two-sided tests. Detailed description of the association analysis is described in Supplementary Note.

Meta-analysis

Using the GWAS summary statistics for four analyzed populations (White British, Non-British White, South Asian, and African; East Asian GWAS were excluded) in UK Biobank, we performed inverse-variance weighted meta-analysis using METAL (version 2011-03-25) and included heterogeneity of effects analysis. For the summary statistics from the meta-analysis, we checked whether the A1 and A2 alleles match with the alternate and reference allele in GRCh37/hg19 reference genome (fasta file) using bedtools getfasta subcommand[82], and canonicalized our association summary statistics such that we always report the effect size with respect to the alternate allele in the reference genome.

Derivation of independent loci

Once we ran the GWAS, full summary statistics were clumped to r2 > 0.1 using the following clump command: plink1.9 --bfile <1000G Phase 3 European plink file> --clump

--clump-p1 1e-6 --clump-p2 1e-4 --clump-r2 0.1 --clump-kb 10000 --clump-field P --clump-snp-field ID Then, to avoid calling very large signals as multiple associations, these were further filtered such that any SNPs within 0.1cM of each other (as annotated by 1000 Genomes) were considered part of the same association signal, with the cM annotation derived from the 1000G Phase 3 European samples (n = 489)[24] -- variants within 0.1cM were chose to only have the minimum p-value. In order to report independent signals, we ran the following plink command: plink1.9 --bfile <1000G Phase 3 European plink file> --extract --indep 50 5 2 And counted the number of independent SNPs it reported.

Comparison of effect sizes with published studies

Full summary statistics from comparison studies (PMID in Supplementary Table 5) were downloaded and overlapped with our GWAS summary statistics using the munging framework from LD Score regression to align alleles (modified to additionally report the unnormalized beta). The observed correlation coefficients and linear effect regression coefficients across variants with p < 1 × 10−6 in either study (subthreshold) or p < 5 × 10−8 in our study (GWAS hits) are listed in Supplementary Table 5. Using the same set of comparison studies, we also checked whether the protein-truncating and protein-altering associations were previously reported for a given trait by calling the association reported if the p-value of the variant is less than 1 × 10−6 in any comparison study for a given trait.

Fine-mapping biomarker-associated regions

Independent loci were defined by clumping White British GWAS summary statistics (see section “Derivation of independent loci”). For each putative SNP, we defined distance-independent regions by collating all variants in linkage disequilibrium with the following plink command: plink1.9 --clump-p1 1e-3 --clump-p2 1e-3 --clump-r2 0.0001 --clump-kb 10000 --clump-field P-value --clump-snp-field MarkerName In this way, we defined the individual loci contributing to the fine-mapping. We identified putative causal SNPs in each locus by using the FINEMAP software version 1.3 and 1.4[39]. The output from FINEMAP is (1) a list of potential causal configurations together with their posterior probabilities and Bayes factors, (2) for each SNP, the posterior probability, and Bayes factor of being causal, and (3) credible sets for each identified causal signal. We applied FINEMAP with its default settings while allowing for a maximum of forty causal SNPs and by using pairwise correlations between SNPs computed from the original GWAS genotype data as previously recommended[38]. We ran fine mapping on all associations with more than one variant for which the most significantly associated variant had a p-value less than 1 × 10−3. We filtered regions based on the unique variant ID (in MFI file from UK Biobank) to those regions for which at least one of the variants in the region was annotated as an association lead SNP in our analysis (p < 5 × 10−9).

Heritability estimates

Heritability analysis is described in Supplementary Note.

Targeted phenome-wide association analysis

We curated a list of 166 medically relevant phenotypes from previously-reported binary phenotypes in Global Biobank Engine (GBE)[27,31,76]. Specifically, we selected phenotypes with at least 700 cases in white British grouping and removed phenotypes that were likely to be duplicated (Supplementary Table 12). Those phenotypes include non-cancer disease-outcome endpoints derived from a combination of the ICD codes from hospital inpatient records as well as self-reported disease ascertainment status[31], family history phenotype (UK Biobank data category 100034), cancer phenotypes derived from a combination of the UK cancer registry data and questionnaire data[76], and additional set of medically relevant phenotypes derived from the following data fields in UK Biobank: 2247, 2463, 2834, 3591, 6148, 6149, 6152, 6153, 20126, 20406, 20483, and 21068. For example, the chronic kidney disease phenotype is defined based on the combination of self-reported kidney disease (coded as “1192” in UKB Data coding ID 6) and ICD-10 code (N17 [“Acute kidney failure”], N18 [“Chronic kidney disease (CKD)”], N19 [“Unspecified kidney failure”], and its sub-concepts) from hospital inpatient data (Supplementary Figure 8a), which is visualized with R UpSetR package version 1.4.0. The data source of the phenotype definitions is described in “Source of the phenotype” column in Supplementary Table 12. After performing LD-pruning using PLINK with “--indep 50 5 2” as previously described[27,76], we prioritized (A) the 632 LD-independent protein-altering or protein-truncating variants outside of the MHC region that showed the significant associations (p < 5×10−9) on the genotyping array, as well as (B) 43 non-synonymous and (C) 2,442 synonymous or non-coding variants with significant associations (p < 5×10−9) from the imputation dataset (Supplementary Tables 13 a–c). We applied the PheWAS analysis for those variants with a p-value threshold of p < 1×10−7. For the resulting associations, we checked the NHGRI-EBI GWAS catalog to see whether they are already reported in the previous studies[83]. Specifically, we identified the LD proxy (r2 > .9) of the PheWAS target variants and manually inspected the reported associations for those variants. For associations with no supporting prior studies, we additionally queried Open Target Genetics and manually assessed the novelty of the associations[84]. In addition, we also checked the FinnGen study (Freeze R2, http://r2.finngen.fi/) and asked whether the PheWAS target variants and its LD proxy have similar associations. Those PheWAS results and the reference to the prior association reports are summarized in Supplementary Tables 13 a–c. For the CNV PheWAS, we queried summary statistics from previous CNV association tests for the 173 traits of interest[34]. Results for a burden of HNF1B copy number variation are in (Supplementary Figure 6a), along with the corresponding meta-analyzed summary statistics for biomarker traits described in this work.

Correlation of genetic effects across relevant phenotypes

We used LD Score regression in genetic correlation mode to estimate genetic correlation effects between biomarkers and the 166 medically relevant phenotypes used in the PheWAS analysis. The exact arguments were: ldsc.py --rg --ref-ld-chr ldsc/1000G.EUR.QC/ --w-ld-chr ldsc/weights_hm3_no_hla/weights. For the final results, all lead variants with p < 5 × 10−8 were kept for the Mendelian Randomization analyses. All MR calculations were done using TwoSampleMR, which was also used to perform trait munging[85]. We used the Rücker model-selection framework for causal inference as follows[50-52,86]. For each exposure-outcome pair, we started with a simple fixed-effects inverse variance weighted (FE-IVW) MR analysis and computed the model’s significance and the Q statistic for heterogeneity. If the significance of Q was <0.01 then we used it as evidence for heterogeneity and switched to a mixed-effects IVW (ME-IVW) model instead. We then computed an MR-Egger model and compared it to the IVW selected model. Let Q be the Q-statistic of the IVW model and Q be the Q statistic of the MR-Egger model. We computed the significance of the difference Q − Q using a χ1 distribution and switched to the MR-Egger model if the result was significant (p<0.01). The significance of all selected models was adjusted using a BY FDR correction at 5%[87]. Network visualization of the results was done using Cytoscape version 3.7 and 3.8[88].

Polygenic prediction within and across populations

To construct polygenic risk scores for each of the traits, we applied the batch screening iterative lasso (BASIL) algorithm implemented in the R snpnet package. This method is capable of finding the exact solution for L1-penalized multivariate regression (lasso) on an ultra-high dimensional large dataset through an iterative procedure built on top of the R glmnet package[70,71,89]. Because this method considers all of the genetic variants available in the input dataset and performs variable selection and multivariate regression fit simultaneously, it is suitable for the polygenic risk prediction from a large-scale dataset. We randomly split the White British individuals into training (70%, n = 223,327 with non-missing phenotype for at least one biomarker trait), validation (10%, n = 31,929), and test (20%, n = 63,818) sets and used both training and validation sets to fit multivariate Lasso regression models. The validation set is used to find the optimal penalization (sparsity) parameter with respect to the predictive performance (R2). To maximize the performance of polygenic prediction, we combined the directly genotyped variants, the imputed HLA allelotypes, and the CNV dataset with PLINK version 1.9, and used it as the input genotype dataset consists of 1,080,968 variants. For each biomarker phenotype, we applied the R snpnet package for the log-transformed and covariate-adjusted phenotypes and regression coefficients, BETAs[71]. Using the beta values from multivariate Lasso regression, we computed the polygenic risk score for each individual with PLINK2 --score subcommand[90]. To evaluate the performance of the models, we computed R2 values for log-transformed phenotypes using individuals in the held out White British test set (n = 63,818), as well as self-identified non-British white (n = 23,595), African (n = 6,021), South Asian (n = 7,341), and East Asian (n = 1,082) populations. To assess the incremental predictive performance compared to the covariates, we evaluated the R2 values for the risk score computed from the covariate (defined as the difference between the log-transformed phenotype value and log-transformed and covariate-adjusted phenotype values) as well as the combined risk score (the sum of the covariate score and genotype PRS, Supplementary Table 17a). Polygenic score accuracy was generally independent of residualization strategy (Supplementary Table 17b). For the evaluation of multi-PRS models, we also trained snpnet PRS models for disease outcomes using the R snpnet package in the same way as in the biomarker phenotypes, except that we used the binomial family for logistic regression and AUC as the criterion to select the sparsity parameter. Evaluation of snpnet PRS models with MESA cohort is described in Supplementary Note.

Single-trait biomarker PRS-PheWAS

We started by enumerating all our 166 high-confidence traits which were replicated between ICD codes and self-reported, cancer, family history, and manually curated traits[31,76] described in the PheWAS analysis above. For each of the 35 biomarkers, we used R’s fisher.test implementation of the Fisher’s Exact test between the 40–60 percentile and the top and bottom 1%ile and 1–10%ile of PRS in the union of the unrelated non-British White and held-out test set of unrelated White British individuals. We then corrected for multiple hypotheses using a Bonferroni-adjusted q-value less than 5% within each biomarker and reported the enrichment as the odds ratio estimate from Fisher’s exact test.

Models for multi-PRS prediction of disease outcomes

In order to perform out-of-sample validation, we trained L1-regularized logistic regression models with glmnet using just the 35 biomarker PRSs and the snpnet PRS for the trait of interest as predictors[70]. Results were evaluated using the area under the receiver operating characteristic curve (AUC-ROC) in the union of the held-out test set of self-identified White British individuals and all unrelated, self-identified non-British White individuals for which the corresponding phenotype was available (as used in the cross-population testing; see above). We also performed the lasso regression additionally including age, sex, genotyping array, and the top ten global principal components of the genotyping matrix as covariates for each outcome (referred to as “Age/sex/PCs”) and additional information provided in Supplementary Note. Finally, we derived versions of the multi-PRS model with these covariates and relevant pre-existing polygenic scores for gallstones[91], type 2 diabetes[92,93], and heart attack[94,95] in the model as well (Supplementary Table 20). We refer to models trained just on covariates and trait polygenic scores as “Baseline” models and those which additionally include the 35 biomarker PRSs as “multi-PRS” models throughout the manuscript.

Evaluation of multi-PRS prediction in an external cohort

The FinnGen Data Freeze 3 comprised 135,300 Finnish participants, with phenotypes derived from International Classification of Diseases (8th, 9th, and 10th revision) diagnosis codes obtained from national registries, including the national Finnish hospital discharge and cause-of-death registries as a part of the FinnGen project (Supplementary Table 24). FinnGen samples were genotyped with Illumina and Affymetrix arrays (Thermo Fisher Scientific, Santa Clara, CA, USA). Genotype imputation was carried out by using the population-specific SISu v3 imputation reference panel with Beagle 4.1 (version 08Jun17.d8b, https://faculty.washington.edu/browning/beagle/b4_1.html) as described in the following protocol: dx.doi.org/10.17504/protocols.io.nmndc5e. Post-imputation quality control involved excluding variants with INFO score < 0.6. We estimated a full weighting matrix for each SNP from the corresponding coefficients of the regression model, then applied the per-SNP weighted model to individuals in FinnGen. To assess the risk for first disease events, hazard ratios, and 95% confidence intervals per SD increment were estimated with Cox proportional hazards models after evaluation of the proportionality assumption. For the comparison, on type 2 diabetes and myocardial infarction, with models including the existing polygenic scores, scores were standardized within the population before being applied using the weights of standardized PRSs (Supplementary Table 22–23) in order to capture the differences in SNP sets used across the scores. An R script to perform these analyses, which takes as input the raw PRS for each outcome of interest, is available in the code repository. With age as the time scale, the survival models were stratified by sex and adjusted for batch, and the first ten principal components of ancestry calculated within Finns.

Statistics

For computational and statistical analysis, we used Jupyter notebook with Python (3.6 and 2.7) and R kernels (http://jupyter.org/), R (version 3.5.2 and 3.4.0), R studio (3.5.2), R tidyverse package version 1.3.0, and Stata version 15. Software and packages used for specific analysis are listed in the corresponding subsection above. The p-values are computed from two-sided tests, unless otherwise specified.

71 in total

1. Rx exercise: effects and side effects.

Authors: R E Leach
Journal: Hosp Pract (Off Ed) Date: 1981-01

2. Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations.

Authors: Yukinori Okada; Xueling Sim; Min Jin Go; Jer-Yuarn Wu; Dongfeng Gu; Fumihiko Takeuchi; Atsushi Takahashi; Shiro Maeda; Tatsuhiko Tsunoda; Peng Chen; Su-Chi Lim; Tien-Yin Wong; Jianjun Liu; Terri L Young; Tin Aung; Mark Seielstad; Yik-Ying Teo; Young Jin Kim; Jong-Young Lee; Bok-Ghee Han; Daehee Kang; Chien-Hsiun Chen; Fuu-Jen Tsai; Li-Ching Chang; S-J Cathy Fann; Hao Mei; Dabeeru C Rao; James E Hixson; Shufeng Chen; Tomohiro Katsuya; Masato Isono; Toshio Ogihara; John C Chambers; Weihua Zhang; Jaspal S Kooner; Eva Albrecht; Kazuhiko Yamamoto; Michiaki Kubo; Yusuke Nakamura; Naoyuki Kamatani; Norihiro Kato; Jiang He; Yuan-Tsong Chen; Yoon Shin Cho; E-Shyong Tai; Toshihiro Tanaka
Journal: Nat Genet Date: 2012-07-15 Impact factor: 38.330

3. A catalog of genetic loci associated with kidney function from analyses of a million individuals.

Authors: Matthias Wuttke; Yong Li; Man Li; Karsten B Sieber; Mary F Feitosa; Mathias Gorski; Adrienne Tin; Lihua Wang; Audrey Y Chu; Anselm Hoppmann; Holger Kirsten; Ayush Giri; Jin-Fang Chai; Gardar Sveinbjornsson; Bamidele O Tayo; Teresa Nutile; Christian Fuchsberger; Jonathan Marten; Massimiliano Cocca; Sahar Ghasemi; Yizhe Xu; Katrin Horn; Damia Noce; Peter J van der Most; Sanaz Sedaghat; Zhi Yu; Masato Akiyama; Saima Afaq; Tarunveer S Ahluwalia; Peter Almgren; Najaf Amin; Johan Ärnlöv; Stephan J L Bakker; Nisha Bansal; Daniela Baptista; Sven Bergmann; Mary L Biggs; Ginevra Biino; Michael Boehnke; Eric Boerwinkle; Mathilde Boissel; Erwin P Bottinger; Thibaud S Boutin; Hermann Brenner; Marco Brumat; Ralph Burkhardt; Adam S Butterworth; Eric Campana; Archie Campbell; Harry Campbell; Mickaël Canouil; Robert J Carroll; Eulalia Catamo; John C Chambers; Miao-Ling Chee; Miao-Li Chee; Xu Chen; Ching-Yu Cheng; Yurong Cheng; Kaare Christensen; Renata Cifkova; Marina Ciullo; Maria Pina Concas; James P Cook; Josef Coresh; Tanguy Corre; Cinzia Felicita Sala; Daniele Cusi; John Danesh; E Warwick Daw; Martin H de Borst; Alessandro De Grandi; Renée de Mutsert; Aiko P J de Vries; Frauke Degenhardt; Graciela Delgado; Ayse Demirkan; Emanuele Di Angelantonio; Katalin Dittrich; Jasmin Divers; Rajkumar Dorajoo; Kai-Uwe Eckardt; Georg Ehret; Paul Elliott; Karlhans Endlich; Michele K Evans; Janine F Felix; Valencia Hui Xian Foo; Oscar H Franco; Andre Franke; Barry I Freedman; Sandra Freitag-Wolf; Yechiel Friedlander; Philippe Froguel; Ron T Gansevoort; He Gao; Paolo Gasparini; J Michael Gaziano; Vilmantas Giedraitis; Christian Gieger; Giorgia Girotto; Franco Giulianini; Martin Gögele; Scott D Gordon; Daniel F Gudbjartsson; Vilmundur Gudnason; Toomas Haller; Pavel Hamet; Tamara B Harris; Catharina A Hartman; Caroline Hayward; Jacklyn N Hellwege; Chew-Kiat Heng; Andrew A Hicks; Edith Hofer; Wei Huang; Nina Hutri-Kähönen; Shih-Jen Hwang; M Arfan Ikram; Olafur S Indridason; Erik Ingelsson; Marcus Ising; Vincent W V Jaddoe; Johanna Jakobsdottir; Jost B Jonas; Peter K Joshi; Navya Shilpa Josyula; Bettina Jung; Mika Kähönen; Yoichiro Kamatani; Candace M Kammerer; Masahiro Kanai; Mika Kastarinen; Shona M Kerr; Chiea-Chuen Khor; Wieland Kiess; Marcus E Kleber; Wolfgang Koenig; Jaspal S Kooner; Antje Körner; Peter Kovacs; Aldi T Kraja; Alena Krajcoviechova; Holly Kramer; Bernhard K Krämer; Florian Kronenberg; Michiaki Kubo; Brigitte Kühnel; Mikko Kuokkanen; Johanna Kuusisto; Martina La Bianca; Markku Laakso; Leslie A Lange; Carl D Langefeld; Jeannette Jen-Mai Lee; Benjamin Lehne; Terho Lehtimäki; Wolfgang Lieb; Su-Chi Lim; Lars Lind; Cecilia M Lindgren; Jun Liu; Jianjun Liu; Markus Loeffler; Ruth J F Loos; Susanne Lucae; Mary Ann Lukas; Leo-Pekka Lyytikäinen; Reedik Mägi; Patrik K E Magnusson; Anubha Mahajan; Nicholas G Martin; Jade Martins; Winfried März; Deborah Mascalzoni; Koichi Matsuda; Christa Meisinger; Thomas Meitinger; Olle Melander; Andres Metspalu; Evgenia K Mikaelsdottir; Yuri Milaneschi; Kozeta Miliku; Pashupati P Mishra; Karen L Mohlke; Nina Mononen; Grant W Montgomery; Dennis O Mook-Kanamori; Josyf C Mychaleckyj; Girish N Nadkarni; Mike A Nalls; Matthias Nauck; Kjell Nikus; Boting Ning; Ilja M Nolte; Raymond Noordam; Jeffrey O'Connell; Michelle L O'Donoghue; Isleifur Olafsson; Albertine J Oldehinkel; Marju Orho-Melander; Willem H Ouwehand; Sandosh Padmanabhan; Nicholette D Palmer; Runolfur Palsson; Brenda W J H Penninx; Thomas Perls; Markus Perola; Mario Pirastu; Nicola Pirastu; Giorgio Pistis; Anna I Podgornaia; Ozren Polasek; Belen Ponte; David J Porteous; Tanja Poulain; Peter P Pramstaller; Michael H Preuss; Bram P Prins; Michael A Province; Ton J Rabelink; Laura M Raffield; Olli T Raitakari; Dermot F Reilly; Rainer Rettig; Myriam Rheinberger; Kenneth M Rice; Paul M Ridker; Fernando Rivadeneira; Federica Rizzi; David J Roberts; Antonietta Robino; Peter Rossing; Igor Rudan; Rico Rueedi; Daniela Ruggiero; Kathleen A Ryan; Yasaman Saba; Charumathi Sabanayagam; Veikko Salomaa; Erika Salvi; Kai-Uwe Saum; Helena Schmidt; Reinhold Schmidt; Ben Schöttker; Christina-Alexandra Schulz; Nicole Schupf; Christian M Shaffer; Yuan Shi; Albert V Smith; Blair H Smith; Nicole Soranzo; Cassandra N Spracklen; Konstantin Strauch; Heather M Stringham; Michael Stumvoll; Per O Svensson; Silke Szymczak; E-Shyong Tai; Salman M Tajuddin; Nicholas Y Q Tan; Kent D Taylor; Andrej Teren; Yih-Chung Tham; Joachim Thiery; Chris H L Thio; Hauke Thomsen; Gudmar Thorleifsson; Daniela Toniolo; Anke Tönjes; Johanne Tremblay; Ioanna Tzoulaki; André G Uitterlinden; Simona Vaccargiu; Rob M van Dam; Pim van der Harst; Cornelia M van Duijn; Digna R Velez Edward; Niek Verweij; Suzanne Vogelezang; Uwe Völker; Peter Vollenweider; Gerard Waeber; Melanie Waldenberger; Lars Wallentin; Ya Xing Wang; Chaolong Wang; Dawn M Waterworth; Wen Bin Wei; Harvey White; John B Whitfield; Sarah H Wild; James F Wilson; Mary K Wojczynski; Charlene Wong; Tien-Yin Wong; Liang Xu; Qiong Yang; Masayuki Yasuda; Laura M Yerges-Armstrong; Weihua Zhang; Alan B Zonderman; Jerome I Rotter; Murielle Bochud; Bruce M Psaty; Veronique Vitart; James G Wilson; Abbas Dehghan; Afshin Parsa; Daniel I Chasman; Kevin Ho; Andrew P Morris; Olivier Devuyst; Shreeram Akilesh; Sarah A Pendergrass; Xueling Sim; Carsten A Böger; Yukinori Okada; Todd L Edwards; Harold Snieder; Kari Stefansson; Adriana M Hung; Iris M Heid; Markus Scholz; Alexander Teumer; Anna Köttgen; Cristian Pattaro
Journal: Nat Genet Date: 2019-05-31 Impact factor: 38.330

4. Exome-wide association study of plasma lipids in >300,000 individuals.

Authors: Dajiang J Liu; Gina M Peloso; Haojie Yu; Adam S Butterworth; Xiao Wang; Anubha Mahajan; Danish Saleheen; Connor Emdin; Dewan Alam; Alexessander Couto Alves; Philippe Amouyel; Emanuele Di Angelantonio; Dominique Arveiler; Themistocles L Assimes; Paul L Auer; Usman Baber; Christie M Ballantyne; Lia E Bang; Marianne Benn; Joshua C Bis; Michael Boehnke; Eric Boerwinkle; Jette Bork-Jensen; Erwin P Bottinger; Ivan Brandslund; Morris Brown; Fabio Busonero; Mark J Caulfield; John C Chambers; Daniel I Chasman; Y Eugene Chen; Yii-Der Ida Chen; Rajiv Chowdhury; Cramer Christensen; Audrey Y Chu; John M Connell; Francesco Cucca; L Adrienne Cupples; Scott M Damrauer; Gail Davies; Ian J Deary; George Dedoussis; Joshua C Denny; Anna Dominiczak; Marie-Pierre Dubé; Tapani Ebeling; Gudny Eiriksdottir; Tõnu Esko; Aliki-Eleni Farmaki; Mary F Feitosa; Marco Ferrario; Jean Ferrieres; Ian Ford; Myriam Fornage; Paul W Franks; Timothy M Frayling; Ruth Frikke-Schmidt; Lars G Fritsche; Philippe Frossard; Valentin Fuster; Santhi K Ganesh; Wei Gao; Melissa E Garcia; Christian Gieger; Franco Giulianini; Mark O Goodarzi; Harald Grallert; Niels Grarup; Leif Groop; Megan L Grove; Vilmundur Gudnason; Torben Hansen; Tamara B Harris; Caroline Hayward; Joel N Hirschhorn; Oddgeir L Holmen; Jennifer Huffman; Yong Huo; Kristian Hveem; Sehrish Jabeen; Anne U Jackson; Johanna Jakobsdottir; Marjo-Riitta Jarvelin; Gorm B Jensen; Marit E Jørgensen; J Wouter Jukema; Johanne M Justesen; Pia R Kamstrup; Stavroula Kanoni; Fredrik Karpe; Frank Kee; Amit V Khera; Derek Klarin; Heikki A Koistinen; Jaspal S Kooner; Charles Kooperberg; Kari Kuulasmaa; Johanna Kuusisto; Markku Laakso; Timo Lakka; Claudia Langenberg; Anne Langsted; Lenore J Launer; Torsten Lauritzen; David C M Liewald; Li An Lin; Allan Linneberg; Ruth J F Loos; Yingchang Lu; Xiangfeng Lu; Reedik Mägi; Anders Malarstig; Ani Manichaikul; Alisa K Manning; Pekka Mäntyselkä; Eirini Marouli; Nicholas G D Masca; Andrea Maschio; James B Meigs; Olle Melander; Andres Metspalu; Andrew P Morris; Alanna C Morrison; Antonella Mulas; Martina Müller-Nurasyid; Patricia B Munroe; Matt J Neville; Jonas B Nielsen; Sune F Nielsen; Børge G Nordestgaard; Jose M Ordovas; Roxana Mehran; Christoper J O'Donnell; Marju Orho-Melander; Cliona M Molony; Pieter Muntendam; Sandosh Padmanabhan; Colin N A Palmer; Dorota Pasko; Aniruddh P Patel; Oluf Pedersen; Markus Perola; Annette Peters; Charlotta Pisinger; Giorgio Pistis; Ozren Polasek; Neil Poulter; Bruce M Psaty; Daniel J Rader; Asif Rasheed; Rainer Rauramaa; Dermot F Reilly; Alex P Reiner; Frida Renström; Stephen S Rich; Paul M Ridker; John D Rioux; Neil R Robertson; Dan M Roden; Jerome I Rotter; Igor Rudan; Veikko Salomaa; Nilesh J Samani; Serena Sanna; Naveed Sattar; Ellen M Schmidt; Robert A Scott; Peter Sever; Raquel S Sevilla; Christian M Shaffer; Xueling Sim; Suthesh Sivapalaratnam; Kerrin S Small; Albert V Smith; Blair H Smith; Sangeetha Somayajula; Lorraine Southam; Timothy D Spector; Elizabeth K Speliotes; John M Starr; Kathleen E Stirrups; Nathan Stitziel; Konstantin Strauch; Heather M Stringham; Praveen Surendran; Hayato Tada; Alan R Tall; Hua Tang; Jean-Claude Tardif; Kent D Taylor; Stella Trompet; Philip S Tsao; Jaakko Tuomilehto; Anne Tybjaerg-Hansen; Natalie R van Zuydam; Anette Varbo; Tibor V Varga; Jarmo Virtamo; Melanie Waldenberger; Nan Wang; Nick J Wareham; Helen R Warren; Peter E Weeke; Joshua Weinstock; Jennifer Wessel; James G Wilson; Peter W F Wilson; Ming Xu; Hanieh Yaghootkar; Robin Young; Eleftheria Zeggini; He Zhang; Neil S Zheng; Weihua Zhang; Yan Zhang; Wei Zhou; Yanhua Zhou; Magdalena Zoledziewska; Joanna M M Howson; John Danesh; Mark I McCarthy; Chad A Cowan; Goncalo Abecasis; Panos Deloukas; Kiran Musunuru; Cristen J Willer; Sekar Kathiresan
Journal: Nat Genet Date: 2017-10-30 Impact factor: 38.330

5. Identification and functional analysis of glycemic trait loci in the China Health and Nutrition Survey.

Authors: Cassandra N Spracklen; Jinxiu Shi; Swarooparani Vadlamudi; Ying Wu; Meng Zou; Chelsea K Raulerson; James P Davis; Monica Zeynalzadeh; Kayla Jackson; Wentao Yuan; Haifeng Wang; Weihua Shou; Ying Wang; Jingchun Luo; Leslie A Lange; Ethan M Lange; Barry M Popkin; Penny Gordon-Larsen; Shufa Du; Wei Huang; Karen L Mohlke
Journal: PLoS Genet Date: 2018-04-05 Impact factor: 5.917

6. Sex-specific and pleiotropic effects underlying kidney function identified from GWAS meta-analysis.

Authors: Sarah E Graham; Jonas B Nielsen; Matthew Zawistowski; Wei Zhou; Lars G Fritsche; Maiken E Gabrielsen; Anne Heidi Skogholt; Ida Surakka; Whitney E Hornsby; Damian Fermin; Daniel B Larach; Sachin Kheterpal; Chad M Brummett; Seunggeun Lee; Hyun Min Kang; Goncalo R Abecasis; Solfrid Romundstad; Stein Hallan; Matthew G Sampson; Kristian Hveem; Cristen J Willer
Journal: Nat Commun Date: 2019-04-23 Impact factor: 14.919

7. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways.

Authors: Robert A Scott; Vasiliki Lagou; Ryan P Welch; Eleanor Wheeler; May E Montasser; Jian'an Luan; Reedik Mägi; Rona J Strawbridge; Emil Rehnberg; Stefan Gustafsson; Stavroula Kanoni; Laura J Rasmussen-Torvik; Loïc Yengo; Cecile Lecoeur; Dmitry Shungin; Serena Sanna; Carlo Sidore; Paul C D Johnson; J Wouter Jukema; Toby Johnson; Anubha Mahajan; Niek Verweij; Gudmar Thorleifsson; Jouke-Jan Hottenga; Sonia Shah; Albert V Smith; Bengt Sennblad; Christian Gieger; Perttu Salo; Markus Perola; Nicholas J Timpson; David M Evans; Beate St Pourcain; Ying Wu; Jeanette S Andrews; Jennie Hui; Lawrence F Bielak; Wei Zhao; Momoko Horikoshi; Pau Navarro; Aaron Isaacs; Jeffrey R O'Connell; Kathleen Stirrups; Veronique Vitart; Caroline Hayward; Tõnu Esko; Evelin Mihailov; Ross M Fraser; Tove Fall; Benjamin F Voight; Soumya Raychaudhuri; Han Chen; Cecilia M Lindgren; Andrew P Morris; Nigel W Rayner; Neil Robertson; Denis Rybin; Ching-Ti Liu; Jacques S Beckmann; Sara M Willems; Peter S Chines; Anne U Jackson; Hyun Min Kang; Heather M Stringham; Kijoung Song; Toshiko Tanaka; John F Peden; Anuj Goel; Andrew A Hicks; Ping An; Martina Müller-Nurasyid; Anders Franco-Cereceda; Lasse Folkersen; Letizia Marullo; Hanneke Jansen; Albertine J Oldehinkel; Marcel Bruinenberg; James S Pankow; Kari E North; Nita G Forouhi; Ruth J F Loos; Sarah Edkins; Tibor V Varga; Göran Hallmans; Heikki Oksa; Mulas Antonella; Ramaiah Nagaraja; Stella Trompet; Ian Ford; Stephan J L Bakker; Augustine Kong; Meena Kumari; Bruna Gigante; Christian Herder; Patricia B Munroe; Mark Caulfield; Jula Antti; Massimo Mangino; Kerrin Small; Iva Miljkovic; Yongmei Liu; Mustafa Atalay; Wieland Kiess; Alan L James; Fernando Rivadeneira; Andre G Uitterlinden; Colin N A Palmer; Alex S F Doney; Gonneke Willemsen; Johannes H Smit; Susan Campbell; Ozren Polasek; Lori L Bonnycastle; Serge Hercberg; Maria Dimitriou; Jennifer L Bolton; Gerard R Fowkes; Peter Kovacs; Jaana Lindström; Tatijana Zemunik; Stefania Bandinelli; Sarah H Wild; Hanneke V Basart; Wolfgang Rathmann; Harald Grallert; Winfried Maerz; Marcus E Kleber; Bernhard O Boehm; Annette Peters; Peter P Pramstaller; Michael A Province; Ingrid B Borecki; Nicholas D Hastie; Igor Rudan; Harry Campbell; Hugh Watkins; Martin Farrall; Michael Stumvoll; Luigi Ferrucci; Dawn M Waterworth; Richard N Bergman; Francis S Collins; Jaakko Tuomilehto; Richard M Watanabe; Eco J C de Geus; Brenda W Penninx; Albert Hofman; Ben A Oostra; Bruce M Psaty; Peter Vollenweider; James F Wilson; Alan F Wright; G Kees Hovingh; Andres Metspalu; Matti Uusitupa; Patrik K E Magnusson; Kirsten O Kyvik; Jaakko Kaprio; Jackie F Price; George V Dedoussis; Panos Deloukas; Pierre Meneton; Lars Lind; Michael Boehnke; Alan R Shuldiner; Cornelia M van Duijn; Andrew D Morris; Anke Toenjes; Patricia A Peyser; John P Beilby; Antje Körner; Johanna Kuusisto; Markku Laakso; Stefan R Bornstein; Peter E H Schwarz; Timo A Lakka; Rainer Rauramaa; Linda S Adair; George Davey Smith; Tim D Spector; Thomas Illig; Ulf de Faire; Anders Hamsten; Vilmundur Gudnason; Mika Kivimaki; Aroon Hingorani; Sirkka M Keinanen-Kiukaanniemi; Timo E Saaristo; Dorret I Boomsma; Kari Stefansson; Pim van der Harst; Josée Dupuis; Nancy L Pedersen; Naveed Sattar; Tamara B Harris; Francesco Cucca; Samuli Ripatti; Veikko Salomaa; Karen L Mohlke; Beverley Balkau; Philippe Froguel; Anneli Pouta; Marjo-Riitta Jarvelin; Nicholas J Wareham; Nabila Bouatia-Naji; Mark I McCarthy; Paul W Franks; James B Meigs; Tanya M Teslovich; Jose C Florez; Claudia Langenberg; Erik Ingelsson; Inga Prokopenko; Inês Barroso
Journal: Nat Genet Date: 2012-08-12 Impact factor: 38.330

8. Discovery and refinement of loci associated with lipid levels.

Authors: Cristen J Willer; Ellen M Schmidt; Sebanti Sengupta; Michael Boehnke; Panos Deloukas; Sekar Kathiresan; Karen L Mohlke; Erik Ingelsson; Gonçalo R Abecasis; Gina M Peloso; Stefan Gustafsson; Stavroula Kanoni; Andrea Ganna; Jin Chen; Martin L Buchkovich; Samia Mora; Jacques S Beckmann; Jennifer L Bragg-Gresham; Hsing-Yi Chang; Ayşe Demirkan; Heleen M Den Hertog; Ron Do; Louise A Donnelly; Georg B Ehret; Tõnu Esko; Mary F Feitosa; Teresa Ferreira; Krista Fischer; Pierre Fontanillas; Ross M Fraser; Daniel F Freitag; Deepti Gurdasani; Kauko Heikkilä; Elina Hyppönen; Aaron Isaacs; Anne U Jackson; Åsa Johansson; Toby Johnson; Marika Kaakinen; Johannes Kettunen; Marcus E Kleber; Xiaohui Li; Jian'an Luan; Leo-Pekka Lyytikäinen; Patrik K E Magnusson; Massimo Mangino; Evelin Mihailov; May E Montasser; Martina Müller-Nurasyid; Ilja M Nolte; Jeffrey R O'Connell; Cameron D Palmer; Markus Perola; Ann-Kristin Petersen; Serena Sanna; Richa Saxena; Susan K Service; Sonia Shah; Dmitry Shungin; Carlo Sidore; Ci Song; Rona J Strawbridge; Ida Surakka; Toshiko Tanaka; Tanya M Teslovich; Gudmar Thorleifsson; Evita G Van den Herik; Benjamin F Voight; Kelly A Volcik; Lindsay L Waite; Andrew Wong; Ying Wu; Weihua Zhang; Devin Absher; Gershim Asiki; Inês Barroso; Latonya F Been; Jennifer L Bolton; Lori L Bonnycastle; Paolo Brambilla; Mary S Burnett; Giancarlo Cesana; Maria Dimitriou; Alex S F Doney; Angela Döring; Paul Elliott; Stephen E Epstein; Gudmundur Ingi Eyjolfsson; Bruna Gigante; Mark O Goodarzi; Harald Grallert; Martha L Gravito; Christopher J Groves; Göran Hallmans; Anna-Liisa Hartikainen; Caroline Hayward; Dena Hernandez; Andrew A Hicks; Hilma Holm; Yi-Jen Hung; Thomas Illig; Michelle R Jones; Pontiano Kaleebu; John J P Kastelein; Kay-Tee Khaw; Eric Kim; Norman Klopp; Pirjo Komulainen; Meena Kumari; Claudia Langenberg; Terho Lehtimäki; Shih-Yi Lin; Jaana Lindström; Ruth J F Loos; François Mach; Wendy L McArdle; Christa Meisinger; Braxton D Mitchell; Gabrielle Müller; Ramaiah Nagaraja; Narisu Narisu; Tuomo V M Nieminen; Rebecca N Nsubuga; Isleifur Olafsson; Ken K Ong; Aarno Palotie; Theodore Papamarkou; Cristina Pomilla; Anneli Pouta; Daniel J Rader; Muredach P Reilly; Paul M Ridker; Fernando Rivadeneira; Igor Rudan; Aimo Ruokonen; Nilesh Samani; Hubert Scharnagl; Janet Seeley; Kaisa Silander; Alena Stančáková; Kathleen Stirrups; Amy J Swift; Laurence Tiret; Andre G Uitterlinden; L Joost van Pelt; Sailaja Vedantam; Nicholas Wainwright; Cisca Wijmenga; Sarah H Wild; Gonneke Willemsen; Tom Wilsgaard; James F Wilson; Elizabeth H Young; Jing Hua Zhao; Linda S Adair; Dominique Arveiler; Themistocles L Assimes; Stefania Bandinelli; Franklyn Bennett; Murielle Bochud; Bernhard O Boehm; Dorret I Boomsma; Ingrid B Borecki; Stefan R Bornstein; Pascal Bovet; Michel Burnier; Harry Campbell; Aravinda Chakravarti; John C Chambers; Yii-Der Ida Chen; Francis S Collins; Richard S Cooper; John Danesh; George Dedoussis; Ulf de Faire; Alan B Feranil; Jean Ferrières; Luigi Ferrucci; Nelson B Freimer; Christian Gieger; Leif C Groop; Vilmundur Gudnason; Ulf Gyllensten; Anders Hamsten; Tamara B Harris; Aroon Hingorani; Joel N Hirschhorn; Albert Hofman; G Kees Hovingh; Chao Agnes Hsiung; Steve E Humphries; Steven C Hunt; Kristian Hveem; Carlos Iribarren; Marjo-Riitta Järvelin; Antti Jula; Mika Kähönen; Jaakko Kaprio; Antero Kesäniemi; Mika Kivimaki; Jaspal S Kooner; Peter J Koudstaal; Ronald M Krauss; Diana Kuh; Johanna Kuusisto; Kirsten O Kyvik; Markku Laakso; Timo A Lakka; Lars Lind; Cecilia M Lindgren; Nicholas G Martin; Winfried März; Mark I McCarthy; Colin A McKenzie; Pierre Meneton; Andres Metspalu; Leena Moilanen; Andrew D Morris; Patricia B Munroe; Inger Njølstad; Nancy L Pedersen; Chris Power; Peter P Pramstaller; Jackie F Price; Bruce M Psaty; Thomas Quertermous; Rainer Rauramaa; Danish Saleheen; Veikko Salomaa; Dharambir K Sanghera; Jouko Saramies; Peter E H Schwarz; Wayne H-H Sheu; Alan R Shuldiner; Agneta Siegbahn; Tim D Spector; Kari Stefansson; David P Strachan; Bamidele O Tayo; Elena Tremoli; Jaakko Tuomilehto; Matti Uusitupa; Cornelia M van Duijn; Peter Vollenweider; Lars Wallentin; Nicholas J Wareham; John B Whitfield; Bruce H R Wolffenbuttel; Jose M Ordovas; Eric Boerwinkle; Colin N A Palmer; Unnur Thorsteinsdottir; Daniel I Chasman; Jerome I Rotter; Paul W Franks; Samuli Ripatti; L Adrienne Cupples; Manjinder S Sandhu; Stephen S Rich
Journal: Nat Genet Date: 2013-10-06 Impact factor: 38.330

9. Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis.

Authors: Eleanor Wheeler; Aaron Leong; Ching-Ti Liu; Marie-France Hivert; Rona J Strawbridge; Clara Podmore; Man Li; Jie Yao; Xueling Sim; Jaeyoung Hong; Audrey Y Chu; Weihua Zhang; Xu Wang; Peng Chen; Nisa M Maruthur; Bianca C Porneala; Stephen J Sharp; Yucheng Jia; Edmond K Kabagambe; Li-Ching Chang; Wei-Min Chen; Cathy E Elks; Daniel S Evans; Qiao Fan; Franco Giulianini; Min Jin Go; Jouke-Jan Hottenga; Yao Hu; Anne U Jackson; Stavroula Kanoni; Young Jin Kim; Marcus E Kleber; Claes Ladenvall; Cecile Lecoeur; Sing-Hui Lim; Yingchang Lu; Anubha Mahajan; Carola Marzi; Mike A Nalls; Pau Navarro; Ilja M Nolte; Lynda M Rose; Denis V Rybin; Serena Sanna; Yuan Shi; Daniel O Stram; Fumihiko Takeuchi; Shu Pei Tan; Peter J van der Most; Jana V Van Vliet-Ostaptchouk; Andrew Wong; Loic Yengo; Wanting Zhao; Anuj Goel; Maria Teresa Martinez Larrad; Dörte Radke; Perttu Salo; Toshiko Tanaka; Erik P A van Iperen; Goncalo Abecasis; Saima Afaq; Behrooz Z Alizadeh; Alain G Bertoni; Amelie Bonnefond; Yvonne Böttcher; Erwin P Bottinger; Harry Campbell; Olga D Carlson; Chien-Hsiun Chen; Yoon Shin Cho; W Timothy Garvey; Christian Gieger; Mark O Goodarzi; Harald Grallert; Anders Hamsten; Catharina A Hartman; Christian Herder; Chao Agnes Hsiung; Jie Huang; Michiya Igase; Masato Isono; Tomohiro Katsuya; Chiea-Chuen Khor; Wieland Kiess; Katsuhiko Kohara; Peter Kovacs; Juyoung Lee; Wen-Jane Lee; Benjamin Lehne; Huaixing Li; Jianjun Liu; Stephane Lobbens; Jian'an Luan; Valeriya Lyssenko; Thomas Meitinger; Tetsuro Miki; Iva Miljkovic; Sanghoon Moon; Antonella Mulas; Gabriele Müller; Martina Müller-Nurasyid; Ramaiah Nagaraja; Matthias Nauck; James S Pankow; Ozren Polasek; Inga Prokopenko; Paula S Ramos; Laura Rasmussen-Torvik; Wolfgang Rathmann; Stephen S Rich; Neil R Robertson; Michael Roden; Ronan Roussel; Igor Rudan; Robert A Scott; William R Scott; Bengt Sennblad; David S Siscovick; Konstantin Strauch; Liang Sun; Morris Swertz; Salman M Tajuddin; Kent D Taylor; Yik-Ying Teo; Yih Chung Tham; Anke Tönjes; Nicholas J Wareham; Gonneke Willemsen; Tom Wilsgaard; Aroon D Hingorani; Josephine Egan; Luigi Ferrucci; G Kees Hovingh; Antti Jula; Mika Kivimaki; Meena Kumari; Inger Njølstad; Colin N A Palmer; Manuel Serrano Ríos; Michael Stumvoll; Hugh Watkins; Tin Aung; Matthias Blüher; Michael Boehnke; Dorret I Boomsma; Stefan R Bornstein; John C Chambers; Daniel I Chasman; Yii-Der Ida Chen; Yduan-Tsong Chen; Ching-Yu Cheng; Francesco Cucca; Eco J C de Geus; Panos Deloukas; Michele K Evans; Myriam Fornage; Yechiel Friedlander; Philippe Froguel; Leif Groop; Myron D Gross; Tamara B Harris; Caroline Hayward; Chew-Kiat Heng; Erik Ingelsson; Norihiro Kato; Bong-Jo Kim; Woon-Puay Koh; Jaspal S Kooner; Antje Körner; Diana Kuh; Johanna Kuusisto; Markku Laakso; Xu Lin; Yongmei Liu; Ruth J F Loos; Patrik K E Magnusson; Winfried März; Mark I McCarthy; Albertine J Oldehinkel; Ken K Ong; Nancy L Pedersen; Mark A Pereira; Annette Peters; Paul M Ridker; Charumathi Sabanayagam; Michele Sale; Danish Saleheen; Juha Saltevo; Peter Eh Schwarz; Wayne H H Sheu; Harold Snieder; Timothy D Spector; Yasuharu Tabara; Jaakko Tuomilehto; Rob M van Dam; James G Wilson; James F Wilson; Bruce H R Wolffenbuttel; Tien Yin Wong; Jer-Yuarn Wu; Jian-Min Yuan; Alan B Zonderman; Nicole Soranzo; Xiuqing Guo; David J Roberts; Jose C Florez; Robert Sladek; Josée Dupuis; Andrew P Morris; E-Shyong Tai; Elizabeth Selvin; Jerome I Rotter; Claudia Langenberg; Inês Barroso; James B Meigs
Journal: PLoS Med Date: 2017-09-12 Impact factor: 11.069

10. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program.

Authors: Derek Klarin; Scott M Damrauer; Kelly Cho; Yan V Sun; Tanya M Teslovich; Jacqueline Honerlaw; David R Gagnon; Scott L DuVall; Jin Li; Gina M Peloso; Mark Chaffin; Aeron M Small; Jie Huang; Hua Tang; Julie A Lynch; Yuk-Lam Ho; Dajiang J Liu; Connor A Emdin; Alexander H Li; Jennifer E Huffman; Jennifer S Lee; Pradeep Natarajan; Rajiv Chowdhury; Danish Saleheen; Marijana Vujkovic; Aris Baras; Saiju Pyarajan; Emanuele Di Angelantonio; Benjamin M Neale; Aliya Naheed; Amit V Khera; John Danesh; Kyong-Mi Chang; Gonçalo Abecasis; Cristen Willer; Frederick E Dewey; David J Carey; John Concato; J Michael Gaziano; Christopher J O'Donnell; Philip S Tsao; Sekar Kathiresan; Daniel J Rader; Peter W F Wilson; Themistocles L Assimes
Journal: Nat Genet Date: 2018-10-01 Impact factor: 38.330

86 in total

1. Biomarker genetics.

Authors: Susan J Allison
Journal: Nat Rev Nephrol Date: 2021-01-29 Impact factor: 28.314

2. Prognostic significance of circulating insulin growth-like factor 1 and insulin growth-like factor binding protein 3 in renal cell carcinoma patients.

Authors: Chia-Wen Tsai; Wen-Shin Chang; Yifan Xu; Maosheng Huang; Pheroze Tamboli; Christopher G Wood; Da-Tian Bau; Jian Gu
Journal: Am J Cancer Res Date: 2022-02-15 Impact factor: 6.166

10. Clinical laboratory tests and five-year incidence of major depressive disorder: a prospective cohort study of 433,890 participants from the UK Biobank.

Authors: Michael Wainberg; Stefan Kloiber; Breno Diniz; Roger S McIntyre; Daniel Felsky; Shreejoy J Tripathy
Journal: Transl Psychiatry Date: 2021-07-07 Impact factor: 6.222

Introduction

Results

Biomarker phenotype distributions

Genetics of biomarkers

Global and local heritability of biomarkers

Associated variants prioritize therapeutic targets

CNVs and HLA allelotypes influencing biomarkers

Fine-mapping of common associated variants

Glycemic trait fine-mapping

Allelic series at the SLCO1B locus

HGFAC pleiotropy

Targeted phenome-wide association analysis

Causal inference

Polygenic prediction of biomarkers

Multiple regression with PRSs

Discussion

Online Methods

Genotype and phenotype data in UK Biobank

Variant annotation and quality control

Biomarker phenotype definition

Statin identification and LDL adjustment

Covariate correction

Definition of type 2 diabetes

Genome-wide association analysis

Meta-analysis

Derivation of independent loci

Comparison of effect sizes with published studies

Fine-mapping biomarker-associated regions

Heritability estimates

Targeted phenome-wide association analysis

Correlation of genetic effects across relevant phenotypes

Polygenic prediction within and across populations

Single-trait biomarker PRS-PheWAS

Models for multi-PRS prediction of disease outcomes

Evaluation of multi-PRS prediction in an external cohort

Statistics

Review 9. Genetics of Type 2 Diabetes: Opportunities for Precision Medicine: JACC Focus Seminar.