Literature DB >> 35592775

The Value of Rare Genetic Variation in the Prediction of Common Obesity in European Ancestry Populations.

Zhe Wang^1,2, Shing Wan Choi³, Nathalie Chami^1,2, Eric Boerwinkle^4,5, Myriam Fornage⁶, Susan Redline^7,8, Joshua C Bis⁹, Jennifer A Brody⁹, Bruce M Psaty^9,10, Wonji Kim¹¹, Merry-Lynn N McDonald¹², Elizabeth A Regan¹³, Edwin K Silverman^14,15, Ching-Ti Liu¹⁶, Ramachandran S Vasan^17,18,19, Rita R Kalyani²⁰, Rasika A Mathias²⁰, Lisa R Yanek²⁰, Donna K Arnett²¹, Anne E Justice²², Kari E North²³, Robert Kaplan²⁴, Susan R Heckbert^10,25, Mariza de Andrade²⁶, Xiuqing Guo²⁷, Leslie A Lange²⁸, Stephen S Rich²⁹, Jerome I Rotter²⁷, Patrick T Ellinor^30,31, Steven A Lubitz^30,31, John Blangero³², M Benjamin Shoemaker³³, Dawood Darbar³⁴, Mark T Gladwin³⁵, Christine M Albert^36,37, Daniel I Chasman^15,37, Rebecca D Jackson³⁸, Charles Kooperberg³⁹, Alexander P Reiner^10,39, Paul F O'Reilly³, Ruth J F Loos^1,2,40.

Abstract

Polygenic risk scores (PRSs) aggregate the effects of genetic variants across the genome and are used to predict risk of complex diseases, such as obesity. Current PRSs only include common variants (minor allele frequency (MAF) ≥1%), whereas the contribution of rare variants in PRSs to predict disease remains unknown. Here, we examine whether augmenting the standard common variant PRS (PRScommon) with a rare variant PRS (PRSrare) improves prediction of obesity. We used genome-wide genotyped and imputed data on 451,145 European-ancestry participants of the UK Biobank, as well as whole exome sequencing (WES) data on 184,385 participants. We performed single variant analyses (for both common and rare variants) and gene-based analyses (for rare variants) for association with BMI (kg/m2), obesity (BMI ≥ 30 kg/m2), and extreme obesity (BMI ≥ 40 kg/m2). We built PRSscommon and PRSsrare using a range of methods (Clumping+Thresholding [C+T], PRS-CS, lassosum, gene-burden test). We selected the best-performing PRSs and assessed their performance in 36,757 European-ancestry unrelated participants with whole genome sequencing (WGS) data from the Trans-Omics for Precision Medicine (TOPMed) program. The best-performing PRScommon explained 10.1% of variation in BMI, and 18.3% and 22.5% of the susceptibility to obesity and extreme obesity, respectively, whereas the best-performing PRSrare explained 1.49%, and 2.97% and 3.68%, respectively. The PRSrare was associated with an increased risk of obesity and extreme obesity (ORobesity = 1.37 per SDPRS, Pobesity = 1.7x10-85; ORextremeobesity = 1.55 per SDPRS, Pextremeobesity = 3.8x10-40), which was attenuated, after adjusting for PRScommon (ORobesity = 1.08 per SDPRS, Pobesity = 9.8x10-6; ORextremeobesity= 1.09 per SDPRS, Pextremeobesity = 0.02). When PRSrare and PRScommon are combined, the increase in explained variance attributed to PRSrare was small (incremental Nagelkerke R2 = 0.24% for obesity and 0.51% for extreme obesity). Consistently, combining PRSrare to PRScommon provided little improvement to the prediction of obesity (PRSrare AUC = 0.591; PRScommon AUC = 0.708; PRScombined AUC = 0.710). In summary, while rare variants show convincing association with BMI, obesity and extreme obesity, the PRSrare provides limited improvement over PRScommon in the prediction of obesity risk, based on these large populations.

Copyright © 2022 Wang, Choi, Chami, Boerwinkle, Fornage, Redline, Bis, Brody, Psaty, Kim, McDonald, Regan, Silverman, Liu, Vasan, Kalyani, Mathias, Yanek, Arnett, Justice, North, Kaplan, Heckbert, de Andrade, Guo, Lange, Rich, Rotter, Ellinor, Lubitz, Blangero, Shoemaker, Darbar, Gladwin, Albert, Chasman, Jackson, Kooperberg, Reiner, O’Reilly and Loos.

Entities: Chemical

Keywords: BMI - body mass index; C+T; PRS-CS; burden score; lassosum; obesity risk; polygenic risk score; rare variants

Mesh：

Year: 2022 PMID： 35592775 PMCID： PMC9110787 DOI： 10.3389/fendo.2022.863893

Source DB: PubMed Journal: Front Endocrinol (Lausanne) ISSN： 1664-2392 Impact factor: 6.055

Introduction

With an estimated prevalence of 12% among adults worldwide and up to 42% in the US (1, 2), obesity is a growing epidemic, causing major public health concerns (1, 3). Risk prediction and early prevention of weight gain is key to reducing the personal and global burden of obesity and its comorbidities (4). Developing obesity across the lifespan is the result of an interaction between environmental and innate biological factors, encoded by our genomes. Twin and family studies have reported heritability estimates of obesity that range between 40 - 70% (5). In the past 15 years, genome-wide association studies (GWAS) have identified thousands of variants associated with obesity-related traits (6). Polygenic risk scores (PRSs), which are based on GWAS summary statistics, represent an individual’s overall genetic predisposition to obesity. In recent years, PRSs have been studied for their use in the prediction of future obesity and the identification of individuals at risk of obesity early on in life (7). The promise is that accurate estimation of people’s genetic predisposition would allow more targeted lifestyle intervention for those at risk. However, current PRSs, which are based on traditional GWAS, have been shown to be suboptimal, with unsolved challenges remaining (8). For example, existing methods to develop PRSs only include common variants (MAF ≥ 1%), they explain little of the variation (< 10%) in BMI and, thus, have limited ability to predict obesity (7, 9). There is a pressing need to incorporate rare variants (MAF < 1%), which have been shown to capture a proportion of the ‘missing heritability’ (10), and are currently not considered in the PRS construction. Including rare variants in the PRS may improve the accuracy with which we estimate individuals’ genetic predisposition. Because of the large sample size of studies, such as the UK Biobank, association summary statistics for rare variants (0.1% ≤ MAF < 1%) can be assessed by single variant testing (11). However, for ultra-rare variants (MAF < 0.1%), which occur by definition very infrequently in the population, even current large-scale studies are not large enough to study their individual effects (12). The accuracy of the PRS depends largely on the power of the discovery GWAS summary statistics (13). Therefore, aggregating ultra-rare variants in genes, based on their predicted functional consequences, offers a potentially powerful complementary approach to the single variant testing (14) and subsequently, building rare variant PRSs. The aim of our study is to leverage sequencing data from the UK Biobank and the Trans-Omics for Precision Medicine (TOPMed) program to build obesity PRSs that use rare variants (PRSsrare) and test their associations with obesity and extreme obesity. In addition, we will test the predictive power of PRSsrare for obesity outcomes alone or in combination PRSscommon.

Materials and Methods

Study Design

We built and tested PRSs from common variants (MAF ≥ 1%), rare variants (MAF < 1%) and ultra-rare variants (MAF < 0.1%) for three traits; BMI, obesity and extreme obesity. We used data from the UK Biobank to conduct single variant GWAS analyses and gene burden analyses (ultra-rare variants). Then, the GWAS summary statistics, calculated using the UK Biobank data, were used to build PRSs for which we tested the predictive performance in the TOPMed program ( ).

Figure 1

Overview of the study framework.

Study Populations

UK Biobank

All GWAS analyses were performed using data of the UK Biobank, a prospective cohort study with extensive genetic and phenotypic data collected in approximately 500,000 individuals, aged between 40–69 years (11). Briefly, participants were enrolled from 2006 to 2010 at one of 22 assessment centers across the UK to provide baseline information, physical measures, and biological samples according to standardized procedures (11, 15). All participants provided written informed consent. We restricted analyses to individuals of European ancestry (described in detail below), excluded individuals who underwent weight loss surgery before recruitment and women who were pregnant at the time of recruitment. Data for 451,145 individuals was available for analyses.

TOPMed

For constructing and testing the PRS, we used data from 22 parent studies of the TOPMed program ( ). We restricted analyses to 43,251 individuals of European ancestry that have cleaned phenotype data (described in detail below) and Whole Genome Sequencing (WGS) data. We removed one individual from each related pair (Nexcl = 6,494; genetic relatedness ≥.0625). In addition, we removed Data for a total of 36,757 individuals were available for analyses ( ).

Phenotype Definitions

Height and weight, used to calculate BMI as weight (kg) divided by height squared (m2), were collected at the baseline visit. BMI was used to categorize individuals with underweight (BMI < 18.5 kg/m2), normal weight (18.5 kg/m2 ≤ BMI < 25 kg/m2), overweight (25 kg/m2 ≤ BMI < 30 kg/m2), obesity (BMI ≥ 30 kg/m2) or extreme obesity (BMI ≥ 40 kg/m2). More details can be found elsewhere (11, 15). Data on height and weight, used to calculate BMI, were harmonized across studies by the TOPMed Anthropometry Working Group. BMI was calculated based on weight and height measurements, collected from the participating studies. We excluded individuals with known pregnancy at measurement, with implausibly high BMI values (> 100 kg/m2), and those < 18 years old. In the presence of duplicated samples, the sample with the highest sequencing depth was retained.

Genotyping, Imputation and Sequencing Data

All UK Biobank participants were genotyped using the UK Biobank Axiom Array. More than 800,000 variants were directly genotyped and > 90 million variants were imputed, using the Haplotype Reference Consortium or UK10K + 1000G reference panels (11). Variants with imputation INFO score of ≥ 0.3 for common (MAF ≥ 1%), and imputation INFO score of ≥ 0.8 for rare variants (MAF < 1%) were included in analyses. We identified individuals of European ancestry based on their genetic information, using k-means clustering. First, we calculated principal components and their loadings for 488,377 genotyped UK Biobank participants based on the intersection of ~121,000 variants after quality control and 1000G Phase 3v5 reference panel. Reference ancestries are 504 European (EUR), 347 American Admixed (AMR), 661 African (AFR), 504 East Asian (EAS) and 489 South Asian (SAS) samples (overall 2504). We projected the 1000G reference panel dataset based on the calculated PCA loadings from UK Biobank. We then used k-means clustering with a pre-specified amount of 4 clusters to the UK Biobank PCA and the projected 1000G reference panel dataset. Individuals that clustered within the EUR individual cluster from the 1000G reference panel were assigned as individuals of European ancestry (N = 453,812). Because PRSs based on current methods generalize poorly across other ancestries, and because of the smaller sample sizes of non-European ancestry population, we performed analyses only in European ancestry populations. In addition to the genotyped and imputed data, we used data of the first release of exome sequencing (N=184,385). The approach used to perform exome sequencing and quality control is described in detail elsewhere (16, 17). We annotated variants using Variant Effect Predictor (VEP) v104.3 with genome build GRCh38 (18). WGS, targeting a mean depth of >30X coverage, was performed at seven different Sequencing Centers. For this study, we used WGS data from Freeze 8 release (19). Information about genome sequencing, variant calling, and quality control procedures can be accessed through the TOPMed website (20). The genetic relationship was estimated using the PC-Relate algorithm (21). We removed one from each pair of the individuals with genetic relationship closer than 3rd degree (≥.0625) of relatedness (21). Population groups in TOPMed were based on a combination of participants’ self-reported race/ethnicity and genetic ancestry represented by PCs. When participants’ self-reported race/ethnicity values were “Other”, “Multiple” or missing, the HARE method was used to classify individuals into “Asian”, “Black”, “White”, or “Hispanic/Latino” subgroups using the first nine PC-AiR PCs (22). For this project, we limited our analyses to those either self-identified as “White” or they had overall genetic ancestry that closely resembled groups of European ancestry (HARE strata classified as ‘White”).

Genome-Wide Association Testing: Single Variant and Gene Burden Tests in UK Biobank

BMI residuals were generated in men and women separately, adjusting for age, age2, and the first 10 genetic principal components (PCs). Residuals underwent inverse normal transformation, to achieve a normal distribution with a mean of 0 and a standard deviation of 1.

Single Variant Association Testing

Association analyses of the inverse normal BMI residuals, obesity, and extreme obesity were carried out using a (generalized) linear mixed-model approach in BOLT-LMM (23) and REGENIE (24). Models were adjusted for age, age2, sex and first 10 PCs for obesity and extreme obesity. For all single variant association testing, variants with a minor allele count of ≤20 were excluded. We performed single variant association testing using [1] genotyped and imputed variants, and [2] WES data, separately.

Gene Burden Testing

We aggregated ultra-rare variants (MAF < 0.1%) from the WES data for gene burden testing. For each gene, we considered five categories of masks (i.e. variant sets considered in burden test): [M1] a strict burden of rare loss-of-function (LoF) variants (i.e. splice_acceptor, splice_donor, stop_gained, frameshift, stop_lost, and start_lost), [M2] a permissive burden of rare LoF variants and inframe indels, [M3] a more permissive burden of all high and moderate impact rare variants (including LoF, inframe indels, and missense variants) [M4] moderate impact variants (inframe indels and missense variants), and [M5] high, moderate and low impact variants (LoF, inframe indels, missense and synonymous variants, ). We aggregated MAF ≤ 0.1% variants for each of these masks, that is up to 5 burden tests per gene.

Figure 2

Allele frequency spectrum of imputed variants and number of aggregated sequenced variants captured in the UK Biobank and the TOPMed. (A) Minor allele frequency spectrum of imputed variants present in the UK Biobank (rare variants imputation INFO ≥ 0.8, common Hapmap3 variants imputation INFO ≥ 0.3) and TOPMed; (B) Number of variants for different functional class of variants and masks (aggregation model) in the UK Biobank WES ultra-rare variants (MAF < 0.1%).

Polygenic Risk Score Derivation in TOPMed

Based on the single variant association testing and gene burden testing results in UK Biobank, we generated PRSscommon and PRSsrare using three different approaches (PRScommon: Clumping + Thresholding [C+T], PRS-CS (18), lassosum (25); PRSrare: C+T, lassosum, gene-burden test) in 36,757 unrelated individuals of European ancestry of TOPMed. Summary statistics from GWAS of the UK Biobank were filtered for variants present in TOPMed ( ). C+T denotes the Linkage Disequilibrium (LD) clumping and P value thresholding method, which was conducted using the PRSice-2 software (26). For clumping, we used the entire sample of 36,757 unrelated individuals of European ancestry as the reference panel for LD and set clumping parameters to R2 = 0.2, 0.5 and 0.8, with each region being 250kb in size. We varied the P value thresholds from 5x10-5 to 0.8, with a step-wise increase of 1x10-4. The C +T method was used to build both PRScommon and PRSrare. PRS-CS is a Bayesian method that infers the posterior mean effect size of each variant using GWAS summary statistics and external LD (27), but is distinct from previous methods by placing a continuous shrinkage (CS) prior on the variant effect sizes (27). A 1000G LD reference panel for European ancestry populations was provided by the developers. We followed the PRS-CS author recommended protocol by removing ambiguous A/T or G/C variants and restricting to common variants (MAF ≥ 1%) included in HapMap3. Therefore, this method was used only to build PRScommon. We considered the shrinkage prior (phi = 1x10-3, 1x10-4) and the PRS-CS auto option, which allows the software to learn the continuous shrinkage prior from the data. lassosum is an approach that uses penalized regression on summary statistics and accounts for LD using an external reference panel or target sample to produce more accurate weights for building PRSs (25). To accurately assess the LD – particularly important for rare variants – we used the entire sample of 36,757 unrelated individuals of European ancestry TOPMed as the reference panel. lassosum’s model parameters (s, the shrinkage parameter: 0.2, 0.5, 0.9 and 1; and λ, the penalty parameter: varied from 0.001 to 0.1) were tuned. We applied the lassosum method to common and rare variants separately to build PRScommon and PRSrare. Lastly, we built ultra-rare variant burden scores using the gene burden test results from the UK Biobank. For each of the five masks, we tested the following P value threshold of gene burden tests; P = 0.05, 0.001, 0.0001, 10-5, and 2.8x10-6 (i.e. exome-wide significance level). For assigning weights to variants within each gene, we tested two methods: 1) a simple method, which assigned the same weights to all variants in the same mask (i.e. using the aggregate effect size estimated from LoF (mask1) gene A in UK Biobank to the LoF (mask1) variants in gene A in the TOPMed samples); 2) a nested method, which assigned a weight to each variant equal to the aggregate effect size of variants with annotation at least as severe as the variant ( provides an example to illustrate the nested method). For each individual in the testing sets (TOPMed), PRSs were calculated as the sum of the dosages multiplied by the given weight at each variant. Taken together, we generated six sets of PRSs (PRScommon-C+T, PRScommon-lassosum, PRScommon-PRS-CS, PRSrare-C+T, PRSrare-lassosum, and PRSrare-burden) for each trait (BMI, obesity and extreme obesity) using the different methods under a range of tuning parameters.

Statistical Analyses

BMI in TOPMed was inverse rank normalized, in men and women separately. We split unrelated individuals in TOPMed by randomly selecting 20% for PRS training (N=7,433, tuning parameter and selecting the best performing PRS) and 80% for evaluation (N=29,324, validating R2 and predicting performance). For each PRS method applied, we calculated adjusted R2 values for BMI and Nagelkerke R2 values for (extreme) obesity. Models were adjusted for age, sex, the first ten PCs and study. 95% confidence intervals were calculated using bootstrapping. We selected the best-performing PRS for each method and PRS combination (i.e. the largest variance explained (adjusted R2 values or Nagelkerke R2), resulting in six best-performing PRSs in total (one for each from PRScommon-C+T, PRScommon-lassosum, PRScommon-PRS-CS, PRSrare-C+T, PRSrare-lassosum, and PRSrare-burden). In the 80% withheld TOPMed individuals, we tested the association between each PRS and obesity/extreme obesity status using logistic regression. The best-performing PRScommon and PRSrare across multiple methods were then combined to study the joint effects of PRScommon and PRSrare to predict obesity. To evaluate the prediction performance of PRSrare, we calculated the area under the receiver operator curve (AUC) in a Cox regression model with the obesity/extreme obesity status as the outcome. We also assessed the net reclassification index (NRI) and the Integrated Discrimination Increment (IDI), which evaluated the model improvement in discrimination and reclassification.

Results

Best-Performing Polygenic Risk Scores Based on Common Variants (PRSscommon)

Using BMI-GWAS summary statistics derived in the UK Biobank ( ), the PRScommon built with the lassosum method ( and ) explained the most variation in BMI (R2 = 10.1%, 95% CI = 9.4-10.7%).

Figure 3

Variance explained by PRS for BMI, obesity, and extreme obesity in BMI, obesity and extreme obesity. (A) PRScommon (B) PRSrare, We reported adjusted R2 for BMI, Nagelkerke’s R2 for (extreme) obesity on top of covariates including age, sex, study and PCs. C+T: Clumping and Thresholding method. Error bars indicates 95% CI. Similarly, the best-performing PRSscommon based on summary statistics of obesity and extreme obesity GWASs, was built using lassosum (Nagelkerke R2 = 16.7% for obesity and 20.7% for extreme obesity, and ). Of interest is that that the PRScommon based on BMI-GWAS summary statistics explained more of the variation in (extreme) obesity (Nagelkerke R2 = 18.3% for obesity and 22.5% for extreme obesity) than the PRScommon based on (extreme) obesity GWAS summary statistics ( ). This likely reflects the relatively higher power of the BMI GWAS.

Best-Performing Polygenic Risk Scores Based on Rare Variants (PRSsrare) at Single Variant Level

The best-performing PRSrare for BMI was built using the lassosum method, based on BMI-GWAS summary statistics, explaining 1.49% of variation in BMI (95% CI = 1.23-1.77%, and ). Consistent with our observations for the PRSscommon, a PRSrare based on BMI-GWAS summary statistics explained more of the variance for (extreme) obesity liability (Nagelkerke R2 = 2.97% for obesity and 3.68% for extreme obesity) than a PRSrare based on (extreme) obesity GWAS (Nagelkerke R2 = 2.28% for obesity and 2.55% for extreme obesity) ( ).

Best-Performing Polygenic Risk Score Based on Ultra-Rare Variants (PRSrare-Burden) Using Gene Burden Score

Aggregating variants using mask1 (LoF variants) with an association significance of P < 2.8x10-6 resulted in the best-performing PRS , explaining a mere 0.03% (95%CI = 0.002-0.08%) of variation in BMI (Methods, and ). However, this PRS aggregated LoF variants in only two genes (MC4R and UBN2) and identified 2,957 individuals (8% of the TOPMed population) with non-zero values of the score ( ). We repeated the gene burden score approach using summary statistics of obesity and extreme obesity ( ), yielding slightly improved results than for a PRSrare-burden based on BMI summary statistics. Mask3, which aggregates variants in genes that reached exome-wide significance—only MC4R meets this P-value threshold (P < 2.8x10-6)—provided the best-performing PRSrare-burden score, explaining 0.08% of variation in obesity and 0.39% of variation in extreme obesity liability.

Association of PRSscommon and PRSsrare With Risk of Obesity

We next tested the association of the best-performing PRSs (i.e. PRScommon-lassosum and PRSrare-lassosum based on BMI-GWAS summary statistics and PRSrare-burden based on obesity-GWAS summary statistics) with obesity outcome. Each SD increase in the BMI-GWAS based PRSrare-lassosum was associated with a 1.37 (P = 1.7x10-85) increase in the odds of obesity ( ). Adding PRScommon-lassosum to the model substantially attenuated the association between PRSrare-lassosum and risk of obesity (OR = 1.08 per SD, P = 9.8x10-6). This attenuation is likely due to the correlation between PRSrare-lassosum and PRScommon-lassosum (r = 0.31). Each 0.1 increase in obesity-GWAS based PRSrare-burden (range: 0 - 0.41) was associated with a 1.83 higher odds of obesity (P = 0.02). Adding the PRScommon-lassosum, (r = 0.008) and/or PRSrare-lassosum (r=0.01) had little impact on the association ( ). We observed a similar pattern for the PRSs’ associations with extreme obesity ( ). Consistently, adding both PRSrare-lassosum and PRSrare-burden in addition to model with PRScommon was extremely small (incremental Nagelkerke R2 0.24% for obesity and 0.51% for extreme obesity, ). Using the PRScommon-lassosum and PRSrare-lassosum to identify individuals at high risk of obesity (top PRS decile), we observe that, relative to the reference group (deciles 1-9), individuals in the top decile for both PRSs had the highest risk of obesity and extreme obesity (OR [95%CI] = 5.3 [4.2-6.7], 13.5 [9.6-18.9], respectively), as compared to individuals that were defined as high risk by only one of the two PRSs ( ).

Figure 4

Risk of obesity among individuals with high PRSrare and PRScommon. Reference: deciles 1-9 of PRScommon and PRSrare, PRSrare High: top decile of PRSrare, PRScommon High: top decile of PRScommon, Both PRS High: top decile of PRScommon and PRSrare.

Using PRScommon and PRSrare to Predict Common Obesity

Adding both PRSrare-lassosum and PRSrare-burden to PRScommon-lassosum in the prediction model did not improve the prediction of obesity (PRScommon only AUC [95%CI] 0.708 [0.701 – 0.716] vs all three PRSs 0.710 [0.702 – 0.717], ). Adding both PRSrare-lassosum and PRSrare-burden to a model with PRScommon-lassosum only slightly improved the discrimination of the model (IDI= 0.0014 [0.0008 - 0.0019], ). Knowledge of individuals’ PRSrare-lassosum and PRSrare-burden, in addition to the PRScommon-lassosum, would only reassign 0.9% of individuals to their appropriate risk category (NRI=0.9%; 95%CI= 0.49-1.32%; P = 2x10-5). Using extreme obesity as the outcome yielded similarly small improvements in predictive accuracy ( , ).

Figure 5

The receiver operating characteristic curve (ROC) of obesity. (A) Model only included PCs as baseline covariates. (B) Additionally included age, sex, and study. PRSrare includes PRSrare-lassosum and PRSrare-burden.

Discussion

In this study, we examined the contribution of rare variants to the polygenic prediction of obesity by leveraging data from 451,145 European-ancestry individuals in UK Biobank and 36,757 in TOPMed. We observed that PRSsrare were associated with an increased risk of obesity and extreme obesity, partially independent of PRScommon. Nevertheless, their explained variance (up to 1.49%) as well as predictive accuracy were small (AUC 0.591 for obesity and 0.630 for extreme obesity), and particularly limited when considered in combination with PRScommon. As PRSs are becoming a standard tools in translational research and clinical practice, there has been an increasing interest to study the role of rare variants, in addition to common ones, for a range of common diseases, such as breast cancer, prostate cancer, coronary artery disease (CAD) and obesity (28–31). Most previous studies that have reported on the contribution of rare variants studied the role of pathogenic variants in one or few high-penetrance genes and did not investigate their predictive accuracy at a population level (28, 29, 31). Consistent with our findings, though, these studies demonstrated that rare variants act—at least in part—independently from common variant PRSs and add to people’s polygenic susceptibility to disease (28, 29, 31). Thus, knowing an individuals’ PRSrare, in addition to PRScommon, may contribute to identifying individuals at high risk of obesity. However, given the limited explained variance observed in our analyses, we expect that few individuals will indeed score high on both scores. Nevertheless, for these few individuals, knowing their high risk may be valuable. Recently, a new framework was developed to aggregate rare variant burden into a rare variant PRS (30). As an example, a rare variant genetic risk score for CAD was built, using UK Biobank data. Similar to our findings for obesity and extreme obesity, a significant association of this PRSrare with risk of CAD was observed, although the explained variation was only 0.1% of the population variance (30). We report a similar explained variance of 0.2% for obesity and 0.5% for extreme obesity. The reasons why the PRSrare’s explained variance is small, in particular in addition to the PRScommon, are threefold. First, the PRSrare was not completely independent from PRScommon, even after including only non-overlapping variants. It is likely that the true causal (rare) variants were tagged by common variants in LD. Second, any new (rare) variant added to the PRS increases the PRS’ uncertainty due to statistical noise associated with estimating a new weight (32). The PRSrare might have suffered more from this, as accurately estimating weights for rare variants requires larger sample size in general. Third, rare variants, although more likely to have larger effects (12), are too rare to explain much of the obesity epidemic in the general population. Consistent with the low variance explained, the predictive power by the PRSrare over that of the PRScommon was limited. The improvement in AUC for obesity (from 0.708 to 0.710) was negligible, although the AUC for the PRSrare alone was up to 0.59. This supports our observation that the predictive power of the PRSrare in part overlapped with that of the PRScommon. So far, no other studies have reported on the contribution of PRSrare, in the presence of PRScommon. In addition to using BMI summary statistics to build PRSs and test their predictive performance for obesity and extreme obesity, we built PRSscommon and PRSsrare based on obesity and extreme obesity GWAS summary statistics. The PRScommon and PRSrare based on BMI-GWAS summary statistics outperformed those based on obesity or extreme obesity GWAS summary statistics, which is in line with previous findings that PRScommon based on the full distribution explains a larger proportion of the variance than when based on the tails of the distribution (33). For the ultra-rare variants, the PRSrare-burden based on obesity summary statistics performed better than the those based BMI-based summary statistics, which maybe be due to the role of ultra-rare variants in (extreme) obesity, but less in BMI. Our discovery GWASs were conducted in a relatively healthy and less deprived UK Biobank population (34), which may have limited our ability to capture the genetic contribution of rare variants for obesity and extreme obesity. We acknowledged that our samples for analyses were restricted to one ancestry only. We focused our analyses on European-ancestry populations for which the most data are available. Because allele frequencies, LD patterns, and effect sizes, differ between ancestries, the accuracy of European-derived PRSs decays rapidly when applied to other ancestries (35). PRSs derived from other ancestries are currently underpowered because of relatively small sample sizes. As more data becomes available for other ancestries, both GWAS as well as sequencing data, the here described analyses should be performed to examine whether observation are generalizable across ancestries. Furthermore, we focused solely on obesity, a common multifactorial trait that is moderately heritable. While many complex traits have similar feature, we cannot guarantee that our observations can be extrapolated to other outcomes as the genetic architecture, explained variance from common variants, and contribution from rare pathogenic variants may differ (36). Taken together, we demonstrate that while rare variants, aggregated in PRSsrare, have been shown to independently associate with obesity risk, they provide a minimal improvement in prediction accuracy over PRScommon in predicting obesity risk in the general population. Our findings cast an important light on the potential value of rare variants in the prediction of complex diseases, such as obesity.

Data Availability Statement

Publicly available datasets were analyzed in this study. UK Biobank data can be found here: UK Biobank (https://www.ukbiobank.ac.uk/). All TOPMed data for each participating study can be accessed through dbGaP with the corresponding accession number listed in Acknowledgments.

Ethics Statement

All phenotypic and genetic data were collected with approval from the Institutional Review Board with patient consent at each institution. This study was approved by the Institutional Review Board (IRB) of the Icahn School of Medicine at Mount Sinai in New York, New York.

Author Contributions

Study concept and design: ZW and RL. Acquisition of cohort level data: EB, RL, ZW, NC, MF, SR, BP, JAB, JCB, ES, M-LM, ER, WK, RV, C-TL, RM, LY, RRK, DA, RK, KN, AJ, SH, MA, JR, XG, LL, SSR, PE, SL, JB, MS, DD, MG, CA, DC, CK, RJ, and AR. Statistical analysis: ZW and SC. Interpretation of data: ZW, PFO, and RL. Manuscript writing group: ZW, PFO, SC, and RL. Supervision: PFO and RL. All authors contributed to the article and approved the submitted version.

Author Disclaimer

The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services.

Conflict of Interest

BP serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. PE has received sponsored research support from Bayer AG and from IBM Research and has also served on advisory boards or consulted for Bayer AG, Quest Diagnostics, MyoKardia and Novartis. SL receives sponsored research support from Bristol Myers Squibb/Pfizer, Bayer AG, Boehringer Ingelheim, Fitbit, and IBM, and has consulted for Bristol Myers Squibb/Pfizer, Blackstone Life Sciences, and Invitae. ES has received grant support from GSK and Bayer. The handling editor declared a past co-authorship with one of the authors RL. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

35 in total

1. Polygenic Risk, Fitness, and Obesity in the Coronary Artery Risk Development in Young Adults (CARDIA) Study.

Authors: Venkatesh L Murthy; Rui Xia; Abigail S Baldridge; Mercedes R Carnethon; Stephen Sidney; Claude Bouchard; Mark A Sarzynski; João A C Lima; Gregory D Lewis; Sanjiv J Shah; Myriam Fornage; Ravi V Shah
Journal: JAMA Cardiol Date: 2020-03-01 Impact factor: 14.676

2. Prevalence of Obesity and Severe Obesity Among Adults: United States, 2017-2018.

Authors: Craig M Hales; Margaret D Carroll; Cheryl D Fryar; Cynthia L Ogden
Journal: NCHS Data Brief Date: 2020-02

3. Computationally efficient whole-genome regression for quantitative and binary traits.

Authors: Joelle Mbatchou; Leland Barnard; Joshua Backman; Anthony Marcketta; Jack A Kosmicki; Andrey Ziyatdinov; Christian Benner; Colm O'Dushlaine; Mathew Barber; Boris Boutkov; Lukas Habegger; Manuel Ferreira; Aris Baras; Jeffrey Reid; Goncalo Abecasis; Evan Maxwell; Jonathan Marchini
Journal: Nat Genet Date: 2021-05-20 Impact factor: 38.330

4. Variability in the heritability of body mass index: a systematic review and meta-regression.

Authors: Cathy E Elks; Marcel den Hoed; Jing Hua Zhao; Stephen J Sharp; Nicholas J Wareham; Ruth J F Loos; Ken K Ong
Journal: Front Endocrinol (Lausanne) Date: 2012-02-28 Impact factor: 5.555

5. Health Effects of Overweight and Obesity in 195 Countries over 25 Years.

Authors: Ashkan Afshin; Mohammad H Forouzanfar; Marissa B Reitsma; Patrick Sur; Kara Estep; Alex Lee; Laurie Marczak; Ali H Mokdad; Maziar Moradi-Lakeh; Mohsen Naghavi; Joseph S Salama; Theo Vos; Kalkidan H Abate; Cristiana Abbafati; Muktar B Ahmed; Ziyad Al-Aly; Ala’a Alkerwi; Rajaa Al-Raddadi; Azmeraw T Amare; Alemayehu Amberbir; Adeladza K Amegah; Erfan Amini; Stephen M Amrock; Ranjit M Anjana; Johan Ärnlöv; Hamid Asayesh; Amitava Banerjee; Aleksandra Barac; Estifanos Baye; Derrick A Bennett; Addisu S Beyene; Sibhatu Biadgilign; Stan Biryukov; Espen Bjertness; Dube J Boneya; Ismael Campos-Nonato; Juan J Carrero; Pedro Cecilio; Kelly Cercy; Liliana G Ciobanu; Leslie Cornaby; Solomon A Damtew; Lalit Dandona; Rakhi Dandona; Samath D Dharmaratne; Bruce B Duncan; Babak Eshrati; Alireza Esteghamati; Valery L Feigin; João C Fernandes; Thomas Fürst; Tsegaye T Gebrehiwot; Audra Gold; Philimon N Gona; Atsushi Goto; Tesfa D Habtewold; Kokeb T Hadush; Nima Hafezi-Nejad; Simon I Hay; Masako Horino; Farhad Islami; Ritul Kamal; Amir Kasaeian; Srinivasa V Katikireddi; Andre P Kengne; Chandrasekharan N Kesavachandran; Yousef S Khader; Young-Ho Khang; Jagdish Khubchandani; Daniel Kim; Yun J Kim; Yohannes Kinfu; Soewarta Kosen; Tiffany Ku; Barthelemy Kuate Defo; G Anil Kumar; Heidi J Larson; Mall Leinsalu; Xiaofeng Liang; Stephen S Lim; Patrick Liu; Alan D Lopez; Rafael Lozano; Azeem Majeed; Reza Malekzadeh; Deborah C Malta; Mohsen Mazidi; Colm McAlinden; Stephen T McGarvey; Desalegn T Mengistu; George A Mensah; Gert B M Mensink; Haftay B Mezgebe; Erkin M Mirrakhimov; Ulrich O Mueller; Jean J Noubiap; Carla M Obermeyer; Felix A Ogbo; Mayowa O Owolabi; George C Patton; Farshad Pourmalek; Mostafa Qorbani; Anwar Rafay; Rajesh K Rai; Chhabi L Ranabhat; Nikolas Reinig; Saeid Safiri; Joshua A Salomon; Juan R Sanabria; Itamar S Santos; Benn Sartorius; Monika Sawhney; Josef Schmidhuber; Aletta E Schutte; Maria I Schmidt; Sadaf G Sepanlou; Moretza Shamsizadeh; Sara Sheikhbahaei; Min-Jeong Shin; Rahman Shiri; Ivy Shiue; Hirbo S Roba; Diego A S Silva; Jonathan I Silverberg; Jasvinder A Singh; Saverio Stranges; Soumya Swaminathan; Rafael Tabarés-Seisdedos; Fentaw Tadese; Bemnet A Tedla; Balewgizie S Tegegne; Abdullah S Terkawi; J S Thakur; Marcello Tonelli; Roman Topor-Madry; Stefanos Tyrovolas; Kingsley N Ukwaja; Olalekan A Uthman; Masoud Vaezghasemi; Tommi Vasankari; Vasiliy V Vlassov; Stein E Vollset; Elisabete Weiderpass; Andrea Werdecker; Joshua Wesana; Ronny Westerman; Yuichiro Yano; Naohiro Yonemoto; Gerald Yonga; Zoubida Zaidi; Zerihun M Zenebe; Ben Zipkin; Christopher J L Murray
Journal: N Engl J Med Date: 2017-06-12 Impact factor: 91.245

6. Association of a Polygenic Risk Score With Breast Cancer Among Women Carriers of High- and Moderate-Risk Breast Cancer Genes.

Authors: Shannon Gallagher; Elisha Hughes; Susanne Wagner; Placede Tshiaba; Eric Rosenthal; Benjamin B Roa; Allison W Kurian; Susan M Domchek; Judy Garber; Johnathan Lancaster; Jeffrey N Weitzel; Alexander Gutin; Jerry S Lanchbury; Mark Robson
Journal: JAMA Netw Open Date: 2020-07-01

7. The role of polygenic susceptibility to obesity among carriers of pathogenic mutations in MC4R in the UK Biobank population.

Authors: Nathalie Chami; Michael Preuss; Ryan W Walker; Arden Moscati; Ruth J F Loos
Journal: PLoS Med Date: 2020-07-21 Impact factor: 11.069

8. Power and predictive accuracy of polygenic risk scores.

Authors: Frank Dudbridge
Journal: PLoS Genet Date: 2013-03-21 Impact factor: 5.917

9. Nearly a decade on - trends, risk factors and policy implications in global obesity.

Authors: Vasanti S Malik; Walter C Willet; Frank B Hu
Journal: Nat Rev Endocrinol Date: 2020-11 Impact factor: 43.330