Literature DB >> 34841290

Electronic health record-based genome-wide meta-analysis provides insights on the genetic architecture of non-alcoholic fatty liver disease.

Nooshin Ghodsian1, Erik Abner2, Connor A Emdin3,4, Émilie Gobeil1, Nele Taba2,5, Mary E Haas3,6, Nicolas Perrot1, Hasanga D Manikpurage1, Éloi Gagnon1, Jérôme Bourgault1, Alexis St-Amand1, Christian Couture1, Patricia L Mitchell1, Yohan Bossé1,7, Patrick Mathieu1,8, Marie-Claude Vohl9,10, André Tchernof1,10, Sébastien Thériault1,11, Amit V Khera3,4,12, Tõnu Esko2, Benoit J Arsenault1,13.   

Abstract

Non-alcoholic fatty liver disease (NAFLD) is a complex disease linked to several chronic diseases. We aimed at identifying genetic variants associated with NAFLD and evaluating their functional consequences. We performed a genome-wide meta-analysis of 4 cohorts of electronic health record-documented NAFLD in participants of European ancestry (8,434 cases and 770,180 controls). We identify 5 potential susceptibility loci for NAFLD (located at or near GCKR, TR1B1, MAU2/TM6SF2, APOE, and PNPLA3). We also report a potentially causal effect of lower LPL expression in adipose tissue on NAFLD susceptibility and an effect of the FTO genotype on NAFLD. Positive genetic correlations between NAFLD and cardiometabolic diseases and risk factors such as body fat accumulation/distribution, lipoprotein-lipid levels, insulin resistance, and coronary artery disease and negative genetic correlations with parental lifespan, socio-economic status, and acetoacetate levels are observed. This large GWAS meta-analysis reveals insights into the genetic architecture of NAFLD.
© 2021 The Author(s).

Entities:  

Keywords:  adipose tissue; electronic health records; genetics; genome-wide association study; lipoprotein lipase; non-alcoholic fatty liver disease

Mesh:

Substances:

Year:  2021        PMID: 34841290      PMCID: PMC8606899          DOI: 10.1016/j.xcrm.2021.100437

Source DB:  PubMed          Journal:  Cell Rep Med        ISSN: 2666-3791


Introduction

Non-alcoholic fatty liver disease (NAFLD) is one of the most prevalent chronic liver diseases., According to recent estimates, ∼25% of the adult population worldwide may have NAFLD., This disease has been predicted to become the most frequent indication for liver transplantation in Western countries by 2030. NAFLD is a progressive liver disease with potential consequences for several other chronic disorders such as cardiovascular disease (CVD) (the leading cause of death in patients with NAFLD),6, 7, 8, 9 type 2 diabetes (T2D),, dyslipidemia, and other extrahepatic manifestations such as chronic kidney disease and gastrointestinal neoplasms. To better understand the etiology of complex diseases such as NAFLD and to develop therapies that may help patients with this disease living longer and healthier, the genetic architecture of NAFLD needs to be better understood. Although genome-wide association studies (GWASs) have identified genetic variants associated with liver fat accumulation,, liver enzymes, and different forms of liver diseases,, less than a handful of small GWASs sought to identify genetic variants associated with a clinical diagnosis of NAFLD. The GWAS of the Electronic Medical Records and Genomics (eMERGE) network included 1,106 NAFLD cases and 8,571 controls identified only 1 NAFLD susceptibility locus (PNPLA3). The NAFLD GWAS of the UK Biobank included 1,664 NAFLD cases and 400,055 controls identified only 2 regions robustly associated with NAFLD (PNPLA3 and PBX4/TM6SF2). The UK Biobank analysis did not exclude participants with secondary causes of NAFLD (e.g., hepatitis, alcoholism) and used a rather vague definition of NAFLD (phecode 571.5: other forms of nonalcoholic liver disease). Genetic variation at these 2 loci is also associated with NAFLD in the data freeze #4 of the FinnGen cohorts (651 NAFLD cases and 176,248 controls). Here, we present the results of a meta-analysis of electronic health record (EHR)-based GWASs to identify genetic variants associated with NAFLD. This analysis included GWAS summary statistics from the eMERGE and FinnGen cohorts, an updated NAFLD GWAS in the UK Biobank (2,558 cases and 395,241 controls), and a new GWAS performed in the Estonian Biobank (4,119 cases and 190,120 controls), for a total of 8,434 NAFLD cases and 770,180 controls.

Results

Identification of genetic variants associated with NAFLD

To identify genetic variants associated with NAFLD, we performed 2 new GWASes in the UK Biobank and Estonian Biobank and performed a meta-analysis of 4 cohorts (UK Biobank, Estonian Biobank, eMERGE, and FinnGen), totaling 8,434 NAFLD cases, all identified through EHRs, and 770,180 controls. We identified 4 genetic loci that harbored at least 1 SNP that passed the genome-wide significance threshold of p £ 5 × 10−8 (TRIB1, MAU2 [TM6SF2], APOE, and PNPLA3). Figure 1A presents the Manhattan plot of the NAFLD GWAS meta-analysis identifying genetic regions with a p value for association with NAFLD £5 × 10−8. The associated quantile-quantile plot is presented in Figure S1. The genomic inflation factor (λ) was 1.02 and the linkage disequilibrium score regression (LDSC) intercept was 1.002. To identify potentially new relevant NAFLD genetic loci, we used a Bayesian approach (bGWAS) recently described by Mounier and Kutalik. This method seeks to identify new variants associated with complex diseases using inference from risk factors of these diseases. By leveraging GWAS summary statistics from risk factors likely causally associated with NAFLD in a previous magnetic resonance imaging (MRI) study (body mass index [BMI] and triglyceride levels) as priors, this analysis revealed genetic variation at 3 additional loci (GCKR, LPL, and FTO) associated with NAFLD (Table S1; STAR Methods). Figure S2 presents the multivariable causal effect estimates for the 2 risk factors (BMI and triglycerides) used to create the prior. Variation at these loci act through selected NAFLD risk factors on Bayes factors, meaning that these SNPs are acting on NAFLD through their effect on risk factors (Figure 1B), rather than through direct effects (Figure 1C) or posterior effects (Figure 1D) (i.e., not acting through selected risk factors). The association of lead SNPs at these loci with NAFLD as well as those from the conventional GWAS are presented in Table S2 in each cohort separately and in the GWAS meta-analysis. Because some of these SNPs showed evidence of heterogeneity, p values are presented from fixed effects and random effects meta-analysis. Through a combination of conventional GWAS and risk factor-informed GWAS, our analysis identified genetic variation at 7 loci that may influence susceptibility to NAFLD.
Figure 1

Main results of the meta-analysis of genome-wide association studies (GWASs)

(A) Manhattan plot depicting single-nucleotide polymorphisms (SNPs) associated with non-alcoholic fatty liver disease in the GWAS meta-analysis of the eMERGE, FinnGen, UK Biobank, and Estonian Biobank cohorts. Identification of genetic variants linked with NAFLD via a risk factor-informed Bayesian GWAS based on (B) Bayes Factors (BFs), (C) direct effects, and (D) posterior effects. Genetic loci harboring SNPs associated with NAFLD (p < 5.0e−8) are shown.

Main results of the meta-analysis of genome-wide association studies (GWASs) (A) Manhattan plot depicting single-nucleotide polymorphisms (SNPs) associated with non-alcoholic fatty liver disease in the GWAS meta-analysis of the eMERGE, FinnGen, UK Biobank, and Estonian Biobank cohorts. Identification of genetic variants linked with NAFLD via a risk factor-informed Bayesian GWAS based on (B) Bayes Factors (BFs), (C) direct effects, and (D) posterior effects. Genetic loci harboring SNPs associated with NAFLD (p < 5.0e−8) are shown.

Impact of the 7 variants on NAFLD after accounting for obesity

To determine whether these 7 SNPs were associated with NAFLD independently of obesity, we performed another GWAS meta-analysis using the same models described in the Method details section but adding BMI as a covariate. The GWAS from eMERGE already provided summary statistics adjusted for BMI. Because BMI was not available for every participant of the UK and especially the Estonian Biobank, we performed another GWAS in slightly fewer individuals in the UK Biobank (2,541 cases and 394,053 controls) and in the Estonian Biobank participants with available BMI values (2,817 cases and 133,909 controls). The total number of NAFLD cases for this analysis was 6,464 and the total number of controls was 536,533. The Manhattan plot of this GWAS meta-analysis is presented in Figure S3. The impact of the 7 SNPs on NAFLD in BMI-adjusted analyses are presented in Table S3. The effect of the 7 variants on NAFLD appeared to remain in the same range, with the exception of FTO, which was no longer statistically significant after adjusting for BMI. Interestingly, the association between the variant at the GCKR locus (rs1260326) became associated with NAFLD, with a p value below the GWAS significance threshold of £5 × 10−8. This analysis did not reveal any new NAFLD susceptibility loci beyond the variant at the GCKR locus.

Evaluation of the functionality of variants associated with NAFLD

Some of the top variants linked with NAFLD in this analysis may have functional consequences. For instance, the rs1260326 at GCKR is a missense variant (p.P446L). The rs1260326 at APOE is also a missense variant (p.R130C). The lead variant at MAU2/TM6SF2 rs73001065 is in linkage disequilibrium (r2 = 0.90) with the missense variant p.E167K at TM6SF2, and the lead variant at PNPLA3 is in high linkage disequilibrium (r2 = 0.98) with the missense variant p.I148M at PNPLA3. Table 1 presents the details of these results as well as the effect of other previously associated variants with NAFLD (p.A165T at MTARC1, a splice variant HSD17B13, and another variant at MBOAT7). This analysis confirmed previous NAFLD functional variants at MTARC1 and MBOAT7, but not at HSD17B13. Genetic variation at the PNPLA3, TM6SF2, and GCKR have been linked with NAFLD-related traits in previous studies.,, Recent studies identified APOE, TR1B1, and FTO as potential new loci for liver enzymes., Our study extends the results of these studies by linking variation at these loci with a clinical diagnosis of NAFLD and identifies LPL as a potential new susceptibility locus for NAFLD. Interestingly, the minor allele (C) at rs13702 associated here with protection against NAFLD has been predicted to disrupt a microRNA recognition element seed site for human microRNA miR-410, resulting in higher LPL expression. We therefore sought to determine whether genetically predicted LPL expression was associated with NAFLD. We performed a transcriptome-wide association study for NAFLD to map genetically regulated genes from the Genotype Tissue Expression (GTEx, version 8) consortium with NAFLD using S-PrediXcan. This analysis did not reveal new NAFLD genes outside those that had a genome-wide signal such as PNPLA3 and TM6SF2 (data not shown). Genetically predicted LPL expression could be estimated in 11 tissues. The association between genetically predicted LPL expression in these 11 tissues and NAFLD is presented in Table S4. This analysis suggests a negative association between genetically predicted LPL expression in subcutaneous adipose tissue and NAFLD (p = 3.1e−4). The LocusCompare plot (Figure 2) further suggests shared genetic etiology at this locus with the rs13702 variant being significantly associated with both subcutaneous adipose tissue expression of LPL and NAFLD. In summary, most of the 7 SNPs identified in this analysis or SNPs in close proximity may be considered functional SNPs.
Table 1

Association of previously identified functional variants linked with liver diseases in the present genome-wide association study

GeneCHRSNPImpact on proteinMinor alleleMajor alleleAssociation with NAFLD
β (minor allele)SEp
MTARC11rs2642438missense (p.A165T)AG−0.06740.01781.54E−4
GCKR2rs1260326missense (p.P446L)TC0.07550.01675.98E−6
HSD17B13a4rs72613567splice variantCG−0.03040.01861.02E−1
MBOAT719rs641738linked to 3' UTRTC0.05190.01641.53E−3
APOE19rs429358missense (p.R130C)CT−0.13660.02391.14E−8
TM6SF219rs58542926missense (p.E167K)TC0.26760.03206.90E−17
PNPLA322rs738409missense (p.I148M)GC0.28690.01981.23E−47

The effect of a SNP in linkage disequilibrium (r2 = 0.96) with this variant (rs10433879) is presented.

Figure 2

Shared genetic etiology at the LPL locus

LocusCompare plot depicting colocalization of the top SNPs associated with subcutaneous adipose tissue LPL expression and NAFLD. Each dot represents a SNP at the LPL locus. In the left panel, these SNPs are plotted to represent their effect on LPL expression (top right) against their effect on NAFLD (bottom right).

Association of previously identified functional variants linked with liver diseases in the present genome-wide association study The effect of a SNP in linkage disequilibrium (r2 = 0.96) with this variant (rs10433879) is presented. Shared genetic etiology at the LPL locus LocusCompare plot depicting colocalization of the top SNPs associated with subcutaneous adipose tissue LPL expression and NAFLD. Each dot represents a SNP at the LPL locus. In the left panel, these SNPs are plotted to represent their effect on LPL expression (top right) against their effect on NAFLD (bottom right).

Association of variants associated with NAFLD with NAFLD-related phenotypes

We investigated the effect of these variants in another cohort and with NAFLD-related traits such as liver fat accumulation and liver enzymes in the UK Biobank. In the Mass General Brigham Biobank, 4,312 patients with non-alcoholic steatohepatitis (NASH) or NAFLD (diagnosed by computed tomography and/or MRI) were compared to 26,404 controls. The direction of the effects of the 7 SNPs were concordant with those observed in the GWAS meta-analysis. All SNPs were significantly associated with NAFLD in the Mass General Brigham Biobank, with the exception of the variants at the FTO and at the LPL loci (Table S5). Liver fat accumulation in the UK Biobank was quantified via machine learning of abdominal MRI images, as previously described. We analyzed liver fat accumulation as a continuous trait in 32,976 study participants. The direction of the effects of the 7 SNPs on liver fat accumulation was concordant with those observed in the GWAS meta-analysis, and all SNPs were significantly associated with liver fat accumulation, with the exception of the variant at the LPL locus (Table S5). Finally, the association between the 7 variants associated NAFLD with the liver enzymes ALT (alanine aminotransferase), AST (aspartate aminotransferase), GGT (γ-glutamyl transferase), and ALP (alkaline phosphatase) was investigated in 361,194 participants of the UK Biobank. Results presented in Table S5 suggest that all of the variants were positively associated with liver enzymes, except that the variant at GCKR was not associated with ALT levels, the variant at APOE was not associated with AST levels, and the variant at PNPLA3 was not associated with GGT levels. Variants at the GCKR, LPL, TRIB1, and APOE were positively associated with ALP levels, the variant at FTO was not associated with ALP levels, and the variants at MAU2/TM6SF2 and PNPLA3 were negatively associated with ALP levels. Overall, the results of this analysis suggest that the 7 variants associated with NAFLD are associated with NAFLD-related traits such as liver fat accumulation and/or liver enzymes.

Association of NAFLD with human metabolic and phenotypic traits

We performed cross-trait genetic correlation analyses between NAFLD and 240 human traits centralized in the LD Hub database. LD Hub includes GWAS publicly available summary statistics on hundreds of human traits and enables the assessment of LD score regression among those traits. The results presented in Figure 3 show high levels of genetic correlation between NAFLD and cardiometabolic traits and diseases such as obesity, insulin resistance, triglycerides, coronary artery disease (CAD), T2D, and negative genetic correlation with parental lifespan, education, and the ketone body acetoacetate.
Figure 3

Results of the LD regression analysis between NAFLD and other human diseases and traits

LD regression analyses were performed in LD Hub to test the genetic correlation of NAFLD with 240 human diseases and traits. Statistically significant (p < 0.05) genetic correlation coefficients (Rg) and their 95% confidence intervals are presented. adjBMI, adjusted for body mass index; FEV1/FVC, forced expiratory volume in 1 s/forced vital capacity; HOMA-IR, homeostatic model of insulin resistance; VLDL, very-low-density lipoproteins.

Results of the LD regression analysis between NAFLD and other human diseases and traits LD regression analyses were performed in LD Hub to test the genetic correlation of NAFLD with 240 human diseases and traits. Statistically significant (p < 0.05) genetic correlation coefficients (Rg) and their 95% confidence intervals are presented. adjBMI, adjusted for body mass index; FEV1/FVC, forced expiratory volume in 1 s/forced vital capacity; HOMA-IR, homeostatic model of insulin resistance; VLDL, very-low-density lipoproteins.

Discussion

We performed 2 genome-wide association studies for NAFLD in the UK Biobank and in the Estonian Biobank and combined these results with those of 2 publicly available NAFLD GWASs (from the eMERGE network and FinnGen). This GWAS meta-analysis included 8,434 NAFLD cases available via EHRs and 770,180 controls, making it the largest genome-wide analysis for a clinical diagnosis of NAFLD. In combination with a risk factor-informed bGWAS, this analysis identified 2 known susceptibility loci for NAFLD (TM6SF2 and PNPLA3) and 5 potentially new candidate genetic regions for a clinical diagnosis NAFLD based on EHRs (GCKR, TRIB1, LPL, FTO, APOE). Our conventional GWAS analyses (adjusted for BMI or not) report that variation at the GCKR, TRIB1, MAU2/TM6SF2, APOE, and PNPLA3 loci may be linked to NAFLD. While genetic variants at these loci have been associated with some liver phenotypes,,, this GWAS meta-analysis revealed important information on the genetic architecture of NAFLD. Using bGWAS, our study identified known and potentially new loci for NAFLD (LPL and FTO) that may be associated with NAFLD through their effects on NAFLD risk factors (BMI and triglycerides). A recent preprint identified a variant at the FTO locus as a susceptibility locus for having high ALT levels in the Million Veteran Program Although the biological relevance of variation at the FTO locus is still a matter of debate, FTO is a well-characterized genetic locus for obesity. Upon adjusting for BMI, the association between the variant at the FTO locus was no longer significantly associated with NAFLD, confirming that the effect of this variant on NAFLD is dependent on its effect on body weight. Although variants at the GCKR locus were not associated with NAFLD in the main analysis, the bGWAS analysis and the conventional GWAS adjusted for BMI identified GCKR as a susceptibility locus for NAFLD. Other studies reported an association of variants at the GCKR loci and liver fat accumulation and liver enzymes. This analysis suggests that genetic variation at the GCKR locus may modulate NAFLD risk associated with obesity and/or elevated triglyceride levels. The same may be true for variants at the LPL locus, the gene that encodes lipoprotein lipase (LPL). LPL is a key enzyme that regulates the catabolism of triglycerides-rich lipoproteins such as chylomicrons and very-low-density lipoproteins in adipose tissue, skeletal muscle, and the heart. Gain-of-function mutations in LPL were associated with lower triglyceride levels and lower risk of CAD. In the present study, we found a potentially causal inverse association between genetically predicted LPL expression in subcutaneous adipose tissue and NAFLD. These results are in line with the recent study of Maltais et al., who reported that 4 in 10 patients with familial chylomicronemia syndrome and almost 3 in 4 patients with multifactorial chylomicronemia syndrome (2 disorders of impaired LPL function) met the criteria of NAFLD independently of their BMI. It should be noted that although the variant at the LPL locus linked with higher NAFLD was associated with higher liver enzymes levels in the UK Biobank, it was not associated with liver fat accumulation in the UK Biobank or with NAFLD in the Mass General Brigham Biobank. In addition, although these results did not reach the level of genome-wide significance, we found significant associations at the MTARC1 and MBOAT7 loci, thereby confirming the role of these genes in the etiology of NAFLD. Previous studies have shown that NAFLD could be associated with or predict the risk of chronic diseases such as CVD or T2D. Our genetic correlation analyses revealed associations with these diseases as well as risk factors for these diseases such as obesity and insulin resistance. We also report interesting negative correlations between NAFLD and the ketone body acetoacetate (as previously suggested in an observational study), as well as parental lifespan, suggesting that NAFLD may be a critical component of long-term disease risk potentially influencing human lifespan. Whether the resolution of NAFLD will influence these traits and outcomes remains to be determined. Interestingly, combined with the results of other studies that have linked variation at LPL as being associated with lower lipid levels and risk of CAD, our analysis suggests that targeting the LPL pathway may prevent NAFLD as well as other diseases such as hyperlipidemia and CAD without increasing the risk of other human diseases. Drugs targeting the LPL pathway under investigation for NAFLD include the angiopoietin-like protein-3 (ANGPTL3) inhibitors, glucagon-like peptide-1 (GLP-1) receptor agonists, and dual glucose-dependent insulinotropic peptide (GIP)/GLP-1 receptor agonists. Drugs targeting obesity such as semaglutide were also recently associated with NASH resolution without worsening in liver fibrosis.

Limitations of the study

Our study has limitations. For instance, although we have excluded secondary causes of NAFLD whenever possible, an EHR-based diagnosis of complex diseases such as NAFLD may be prone to misclassification of cases and controls. Our analysis revealed FTO and LPL, 2 potentially new NAFLD loci. However, although the top variants at these loci were associated with liver fat accumulation and/or liver enzymes in the UK Biobank, these variants did not replicate in a smaller NAFLD GWAS. It should also be re-emphasized that variation at these loci act on NAFLD through selected risk factors and therefore may lead to NAFLD via indirect mechanisms. Although our study reports 2 conventional GWAS analyses (adjusting or not adjusting for BMI), we could not perform a GWAS meta-analysis adjusting for triglyceride levels. Therefore, studies with larger sample sizes and accounting for triglyceride levels will be needed to document whether variation at the LPL locus are strongly associated with NAFLD and whether their effects are entirely mediated by triglyceride levels. In conclusion, we conducted a large NAFLD GWAS based on EHRs from 4 cohorts to identify genetic variants of NAFLD susceptibility. We identified known NAFLD variants and show that variants associated with liver fat accumulation and liver enzymes may also be associated with the presence of NAFLD. Our analysis revealed a potentially causal effect of lower adipose-tissue expression of LPL and NAFLD that will need confirmation by other, larger studies.

STAR★Methods

Key resources table

Resource availability

Lead Contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Benoit Arsenault (benoit.arsenault@criucpq.ulaval.ca).

Materials availability

No materials were used to perform the genome-wide association meta-analysis and follow-up studies.

Experimental model and subject details

Study participants

To obtain a comprehensive set of NAFLD GWAS summary statistics, we performed a GWAS meta-analysis of four cohorts: The Electronic Medical Records and Genomics (eMERGE) network, the UK Biobank, the Estonian Biobank and FinnGen. The NAFLD GWAS in the eMERGE network has previously been published. The study sample included 1106 NAFLD cases and 8571 controls participants of European ancestry. Of them, 396 NAFLD cases and 846 controls participants (47% males) were derived from a pediatric population and 710 NAFLD cases and 7725 controls participants (42% males) were derived from an adult population. NAFLD was defined by the use of EHR codes (ICD9: 571.5, ICD9: 571.8, ICD9: 571.9, ICD10: K75.81, ICD10: K76.0 and ICD10: K76.9. Exclusion criteria included, but were not limited to alcohol dependence, alcoholic liver disease, alpha-1 antitrypsin deficiency, Alagille syndrome, liver transplant, cystic fibrosis, hepatitis, abetalipoproteinemia, LCAT deficiency, lipodystrophy, disorders of copper metabolism Reye’s syndrome, inborn errors of metabolism, HELLP syndrome, starvation and acute fatty liver (as suggested by the American Association for the Study of Liver Disease [AASLD]). We performed a new GWAS for NAFLD in the UK Biobank (data application number 25205). NAFLD diagnosis was established from hospital records (ICD10: K74.0 and K74.2 (hepatic fibrosis), K75.8 (NASH), K76.0 (NAFLD) and ICD10: K76.9 (other specified diseases of the liver). Exclusion criteria were the same as those used in the eMERGE study. In the UK Biobank analysis, we included 2558 NAFLD cases and 395,241 controls. We also performed a GWAS for NAFLD in the Estonian Biobank. This study and the use of data from 4119 cases and 190,120 controls was approved by the Research Ethics Committee of the University of Tartu (Approval number 288/M-18). We used the same case definition and inclusion/exclusion criteria as in the UK Biobank. In the FinnGen data freeze 4 (November 30, 2020), 651 patients had a NAFLD diagnosis (EHR code K76.0). They were compared to 176,248 controls. The Mass General Brigham Biobank is a hospital-based biorepository with genetic data linked to clinical records as previously described. Patients were defined as having NAFLD or NASH according to diagnosis codes in the electronic health care record and were compared to controls without such diagnoses.

Method details

In the eMERGE study, logistic regression analysis was performed on over 7 million SNPs with MAF > 1% adjusted for age, gender, body mass index, genotyping site and the first three ancestry based principal components. In the UK Biobank genome-wide genotyping was available for over 28 million genetic markers directly genotyped or imputed by the Haplotype Reference Consortium (HRC) panel. In FinnGen, GWAS was performed using over 16 million genetic markers genotyped with the Illumina or Affymetrix arrays or imputed using the population specific SISu v3 reference panel. Variables included in the models were gender, age, the 10-main ancestry-based principal components and genotyping batch.

Quantification and statistical analysis

Genome-wide association study summary statistics NAFLD

We used the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Models) method to perform the GWAS in the UK Biobank and in the Estonian Biobank. This method is based on generalized mixed models and was developed to control for case-control imbalance, sample relatedness and population structure. In this analysis, gender, age and the 10 main ancestry-based principal components were used as covariates. Age, gender and the 10-main ancestry-based PCs were used as covariates. Finally, SAIGE was also used to obtain GWAS summary statistics of the FinnGen cohort. We performed a fixed-effect GWAS meta-analysis of the eMERGE, UK Biobank, FinnGen and Estonian Biobank cohorts using the METAL package. When variants showed evidence of heterogeneity, we performed a random effect meta-analysis. A total of 6,797,908 SNPs with a minor allele frequency equal or above 0.01 were investigated. The genomic inflation factor and the LDSC intercept were computed using the GenomicSEM R package.

Risk-factor informed Bayesian genome-wide association study

We used bGWAS to identify more SNPs associated with NAFLD. The aim of bGWAS is to identify new variants associated with complex diseases using inference from risk factors of focal traits. We used GWAS summary statistics from two risk factors causally associated with NAFLD in a previous MR study (BMI and triglyceride levels) as priors and worked with default parameters of the package as these two risk factors showed significant multivariable causal effects (Figure S2). The bWAS approach increases power over conventional GWAS by comparing the observed Z-statistics (the observed effect size for each SNP divided by its standard error) from the focal phenotype (i.e., NAFLD) to prior effects using Bayes Factors (Bayesian effects). The prior effects are calculated from publicly available GWAS summary statistics for related risk factors and are included in the bGWAS package. These were obtained from the Global Lipids Genetic Consortium and the Genetics of Anthropometric Traits (GIANT). Briefly, bGWAS derives informative prior effects from these risk factors and their causal effect on NAFLD using multivariable MR. Prior estimates (mu) are calculated for each SNP by multiplying the SNP-risk factor effect by the risk factor-NAFLD causal effect estimates. By combining observed effects from the NAFLD GWAS meta-analysis and prior effects, Bayes factors, posterior effects and direct effects and their corresponding p values are generated. The direct effect of each SNP is the part of the observed effect that is not mediated through the selected risk factors.

Transcriptome-wide association study of NAFLD

Tissues from the GTEx consortium (version 8) with less than 70 samples were not used to provide sufficient statistical power for eQTL discovery, resulting in a set of 48 tissues. Only non-gender-specific tissues (N = 43) were analyzed. Alignment to the human reference genome hg28/GRCh38 was performed using STAR v2.6.1d, based on the GENCODE v30 annotation. RNA-seq expression outliers were excluded using a multidimensional extension of the statistic described by Wright et al. Samples with less than 10 million mapped reads were removed. For samples with replicates, replicate with the greatest number of reads were selected. Expression values were normalized between samples using TMM as implemented in edgeR. For each gene, expression values were normalized across samples using an inverse normal transformation. eQTL prediction models were performed using elastic net, a regularized regression method, as implemented in S-PrediXcan., We used SNPs with a minor allele frequency greater than 1% from European ancestry participants. Locuscompare function from the LocuscompareR R package was used to depict the colocalization event at the LPL locus. Locuscompare enables visualization of the strengths of eQTLs and outcomes associations by plotting p values for each within a given genomic location, thereby contributing to distinguish candidates from false-positive genes.

Replication of variants associated with NAFLD in the Mass General Brigham Biobank

In this cohort, genotyping was performed using the Illumina MEGA array. Association of each of the seven variants associated with NAFLD was assessed using logistic regression of disease status with age, gender and five principal components of ancestry as covariates.

Impact of NAFLD variants on liver fat accumulation in the UK Biobank

As part of the study protocol of the UK Biobank, a subset of individuals who underwent detailed imaging between years 2014 and 2019 including abdominal MRI. Liver fat in this cohort was quantified via machine learning of abdominal MRI images as previously described. We excluded samples that had no imputed genetic data, a genotyping call rate < 0.98, a mismatch between submitted and inferred gender, sex chromosome aneuploidy, exclusion from kinship inference, excessive third-degree relatives, or that were outliers in heterozygosity or genotype missingness rates, all of which were previously defined centrally by the UK Biobank Due to the small percentage of samples of non-European ancestries, to avoid artifacts from population stratification we restricted our GWAS to samples of European ancestries, determined via self-reported ancestry of British, Irish, or other white and outlier detection using the R package aberrant, resulting in a total of 32,976 individuals. We did not remove related individuals from this analysis as we used a linear mixed model able to account for cryptic relatedness in common variant association studies. For analysis of liver fat as a continuous trait, we applied a rank-based inverse normal transformation. We took the residuals of liver fat in a linear model that included gender, year of birth, age at time of MRI, age at time of MRI squared, genotyping array, MRI device serial number, and the first ten principal components of ancestry. We then performed the inverse normal transform on the residuals from this model, yielding a standardized output with mean 0 and standard deviation of 1. We measured the association of genetic variants with rank inverse normal transformed liver fat via a linear mixed model using BOLT-LMM (version 2.3.4) to account for ancestry, cryptic population structure, and sample relatedness. The default European linkage disequilibrium panel provided with BOLT was used.

Impact of NAFLD variants on liver enzymes in the UK Biobank

Age, gender and ancestry-based principal components-adjusted GWAS summary statistics on ALT, AST, GGT and ALP concentrations in 361,194 participants of the UK Biobank of European ancestry were obtained from the Neale lab. Details on the protocols used to measure these biomarkers is available on the UK Biobank website: https://biobank.ndph.ox.ac.uk/showcase/showcase/docs/serum_biochemistry.pdf.
RESOURCESOURCEIDENTIFIER
Deposited data

ScriptsThis paperhttps://github.com/LaboArsenault

Software and algorithms

SAIGEZhou et al.39https://github.com/weizhouUMICH/SAIGE
METAL packageWiller et al.40https://github.com/statgen/METAL
GenomicSEM R packageGrotzinger et al.41https://github.com/GenomicSEM/GenomicSEM
STAR v2.6.1dGENCODE v30https://github.com/alexdobin/STAR
TMM (edgeR)Robinson et al.42https://www.biostars.org/p/317701/
S-PrediXcanGamazon et al.43 and Barbeira et al.44N/A
LocuscompareR (R package)Liu et al.28https://github.com/boxiangliu/locuscomparer
R package aberrantBellenguez et al.45https://github.com/carbocation/aberrant
BOLT-LMM (version 2.3.4)Loh et al.46 and Kang et al.47https://alkesgroup.broadinstitute.org/BOLT-LMM/BOLT-LMM_manual.html
bGWAS R packageMounier et al.20https://github.com/n-mounier/bGWAS

Other

GWAS summary statistic of NAFLD (eMERGE)Namjou et al.48https://www.ebi.ac.uk/gwas/studies/GCST008468
GTEx consortium (version 8)GTEx Consortium49https://gtexportal.org/home/publicationsPage
GWAS summary statistics on liver enzymes (UK Biobank)NAhttp://www.nealelab.is/blog/2019/9/16/biomarkers-gwas-results
GWAS summary statistic for FinnGenNAhttps://www.finngen.fi/en/access_results
Research Ethics Committee of the University of TartuNAApproval number 288/M-18
UK BiobankNAData application number 25205
  53 in total

Review 1.  Risk of cardiovascular disease in patients with nonalcoholic fatty liver disease.

Authors:  Giovanni Targher; Christopher P Day; Enzo Bonora
Journal:  N Engl J Med       Date:  2010-09-30       Impact factor: 91.245

2.  Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts.

Authors:  Wei Zhou; Zhangchen Zhao; Jonas B Nielsen; Lars G Fritsche; Jonathon LeFaive; Sarah A Gagliano Taliun; Wenjian Bi; Maiken E Gabrielsen; Mark J Daly; Benjamin M Neale; Kristian Hveem; Goncalo R Abecasis; Cristen J Willer; Seunggeun Lee
Journal:  Nat Genet       Date:  2020-05-18       Impact factor: 38.330

Review 3.  NAFLD and liver transplantation: Current burden and expected challenges.

Authors:  Raluca Pais; A Sidney Barritt; Yvon Calmus; Olivier Scatton; Thomas Runge; Pascal Lebray; Thierry Poynard; Vlad Ratziu; Filomena Conti
Journal:  J Hepatol       Date:  2016-07-30       Impact factor: 25.083

4.  Liraglutide Increases the Catabolism of Apolipoprotein B100-Containing Lipoproteins in Patients With Type 2 Diabetes and Reduces Proprotein Convertase Subtilisin/Kexin Type 9 Expression.

Authors:  Bruno Vergès; Laurence Duvillard; Jean Paul Pais de Barros; Benjamin Bouillet; Sabine Baillot-Rudoni; Alexia Rouland; Jean Michel Petit; Pascal Degrace; Laurent Demizieux
Journal:  Diabetes Care       Date:  2021-02-02       Impact factor: 19.112

5.  Prevalence and associated metabolic factors of nonalcoholic fatty liver disease in the general population from 2009 to 2010 in Japan: a multicenter large retrospective study.

Authors:  Yuichiro Eguchi; Hideyuki Hyogo; Masafumi Ono; Toshihiko Mizuta; Naofumi Ono; Kazuma Fujimoto; Kazuaki Chayama; Toshiji Saibara
Journal:  J Gastroenterol       Date:  2012-02-11       Impact factor: 7.527

6.  Genome-wide association study of non-alcoholic fatty liver and steatohepatitis in a histologically characterised cohort.

Authors:  Quentin M Anstee; Rebecca Darlay; Simon Cockell; Marica Meroni; Olivier Govaere; Dina Tiniakos; Alastair D Burt; Pierre Bedossa; Jeremy Palmer; Yang-Lin Liu; Guruprasad P Aithal; Michael Allison; Hannele Yki-Järvinen; Michele Vacca; Jean-Francois Dufour; Pietro Invernizzi; Daniele Prati; Mattias Ekstedt; Stergios Kechagias; Sven Francque; Salvatore Petta; Elisabetta Bugianesi; Karine Clement; Vlad Ratziu; Jörn M Schattenberg; Luca Valenti; Christopher P Day; Heather J Cordell; Ann K Daly
Journal:  J Hepatol       Date:  2020-04-13       Impact factor: 25.083

7.  Gain-of-function lipoprotein lipase variant rs13702 modulates lipid traits through disruption of a microRNA-410 seed site.

Authors:  Kris Richardson; Jennifer A Nettleton; Noemi Rotllan; Toshiko Tanaka; Caren E Smith; Chao-Qiang Lai; Laurence D Parnell; Yu-Chi Lee; Jari Lahti; Rozenn N Lemaitre; Ani Manichaikul; Margaux Keller; Vera Mikkilä; Julius Ngwa; Frank J A van Rooij; Christie M Ballentyne; Ingrid B Borecki; L Adrienne Cupples; Melissa Garcia; Albert Hofman; Luigi Ferrucci; Dariush Mozaffarian; Mia-Maria Perälä; Olli Raitakari; Russell P Tracy; Donna K Arnett; Stefania Bandinelli; Eric Boerwinkle; Johan G Eriksson; Oscar H Franco; Mika Kähönen; Michael Nalls; David S Siscovick; Denise K Houston; Bruce M Psaty; Jorma Viikari; Jacqueline C M Witteman; Mark O Goodarzi; Terho Lehtimäki; Yongmei Liu; M Carola Zillikens; Yii-Der I Chen; André G Uitterlinden; Jerome I Rotter; Carlos Fernandez-Hernando; Jose M Ordovas
Journal:  Am J Hum Genet       Date:  2012-12-13       Impact factor: 11.025

8.  Efficient Bayesian mixed-model analysis increases association power in large cohorts.

Authors:  Po-Ru Loh; George Tucker; Brendan K Bulik-Sullivan; Bjarni J Vilhjálmsson; Hilary K Finucane; Rany M Salem; Daniel I Chasman; Paul M Ridker; Benjamin M Neale; Bonnie Berger; Nick Patterson; Alkes L Price
Journal:  Nat Genet       Date:  2015-02-02       Impact factor: 38.330

9.  GWAS and enrichment analyses of non-alcoholic fatty liver disease identify new trait-associated genes and pathways across eMERGE Network.

Authors:  Bahram Namjou; Todd Lingren; Yongbo Huang; Sreeja Parameswaran; Beth L Cobb; Ian B Stanaway; John J Connolly; Frank D Mentch; Barbara Benoit; Xinnan Niu; Wei-Qi Wei; Robert J Carroll; Jennifer A Pacheco; Isaac T W Harley; Senad Divanovic; David S Carrell; Eric B Larson; David J Carey; Shefali Verma; Marylyn D Ritchie; Ali G Gharavi; Shawn Murphy; Marc S Williams; David R Crosslin; Gail P Jarvik; Iftikhar J Kullo; Hakon Hakonarson; Rongling Li; Stavra A Xanthakos; John B Harley
Journal:  BMC Med       Date:  2019-07-17       Impact factor: 8.775

10.  Coding Variation in ANGPTL4, LPL, and SVEP1 and the Risk of Coronary Disease.

Authors:  Nathan O Stitziel; Kathleen E Stirrups; Nicholas G D Masca; Jeanette Erdmann; Paola G Ferrario; Inke R König; Peter E Weeke; Thomas R Webb; Paul L Auer; Ursula M Schick; Yingchang Lu; He Zhang; Marie-Pierre Dube; Anuj Goel; Martin Farrall; Gina M Peloso; Hong-Hee Won; Ron Do; Erik van Iperen; Stavroula Kanoni; Jochen Kruppa; Anubha Mahajan; Robert A Scott; Christina Willenberg; Peter S Braund; Julian C van Capelleveen; Alex S F Doney; Louise A Donnelly; Rosanna Asselta; Piera A Merlini; Stefano Duga; Nicola Marziliano; Josh C Denny; Christian M Shaffer; Nour Eddine El-Mokhtari; Andre Franke; Omri Gottesman; Stefanie Heilmann; Christian Hengstenberg; Per Hoffman; Oddgeir L Holmen; Kristian Hveem; Jan-Håkan Jansson; Karl-Heinz Jöckel; Thorsten Kessler; Jennifer Kriebel; Karl L Laugwitz; Eirini Marouli; Nicola Martinelli; Mark I McCarthy; Natalie R Van Zuydam; Christa Meisinger; Tõnu Esko; Evelin Mihailov; Stefan A Escher; Maris Alver; Susanne Moebus; Andrew D Morris; Martina Müller-Nurasyid; Majid Nikpay; Oliviero Olivieri; Louis-Philippe Lemieux Perreault; Alaa AlQarawi; Neil R Robertson; Karen O Akinsanya; Dermot F Reilly; Thomas F Vogt; Wu Yin; Folkert W Asselbergs; Charles Kooperberg; Rebecca D Jackson; Eli Stahl; Konstantin Strauch; Tibor V Varga; Melanie Waldenberger; Lingyao Zeng; Aldi T Kraja; Chunyu Liu; George B Ehret; Christopher Newton-Cheh; Daniel I Chasman; Rajiv Chowdhury; Marco Ferrario; Ian Ford; J Wouter Jukema; Frank Kee; Kari Kuulasmaa; Børge G Nordestgaard; Markus Perola; Danish Saleheen; Naveed Sattar; Praveen Surendran; David Tregouet; Robin Young; Joanna M M Howson; Adam S Butterworth; John Danesh; Diego Ardissino; Erwin P Bottinger; Raimund Erbel; Paul W Franks; Domenico Girelli; Alistair S Hall; G Kees Hovingh; Adnan Kastrati; Wolfgang Lieb; Thomas Meitinger; William E Kraus; Svati H Shah; Ruth McPherson; Marju Orho-Melander; Olle Melander; Andres Metspalu; Colin N A Palmer; Annette Peters; Daniel Rader; Muredach P Reilly; Ruth J F Loos; Alex P Reiner; Dan M Roden; Jean-Claude Tardif; John R Thompson; Nicholas J Wareham; Hugh Watkins; Cristen J Willer; Sekkar Kathiresan; Panos Deloukas; Nilesh J Samani; Heribert Schunkert
Journal:  N Engl J Med       Date:  2016-03-02       Impact factor: 91.245

View more
  7 in total

1.  Mendelian Randomization Analysis Reveals No Causal Relationship Between Nonalcoholic Fatty Liver Disease and Severe COVID-19.

Authors:  Jiuling Li; Aowen Tian; Haoxue Zhu; Lanlan Chen; Jianping Wen; Wanqing Liu; Peng Chen
Journal:  Clin Gastroenterol Hepatol       Date:  2022-02-03       Impact factor: 13.576

2.  Mendelian Randomization Analysis Identifies Blood Tyrosine Levels as a Biomarker of Non-Alcoholic Fatty Liver Disease.

Authors:  Émilie Gobeil; Ina Maltais-Payette; Nele Taba; Francis Brière; Nooshin Ghodsian; Erik Abner; Jérôme Bourgault; Eloi Gagnon; Hasanga D Manikpurage; Christian Couture; Patricia L Mitchell; Patrick Mathieu; François Julien; Jacques Corbeil; Marie-Claude Vohl; Sébastien Thériault; Tõnu Esko; André Tchernof; Benoit J Arsenault
Journal:  Metabolites       Date:  2022-05-13

3.  Single Nucleotide Polymorphism of Genes Associated with Metabolic Fatty Liver Disease.

Authors:  Tong Mu; Linrui Peng; Xinglei Xie; He He; Qing Shao; Xiran Wang; Yuwei Zhang
Journal:  J Oncol       Date:  2022-02-03       Impact factor: 4.375

4.  Lifestyle and metabolic factors for nonalcoholic fatty liver disease: Mendelian randomization study.

Authors:  Ju-Sheng Zheng; Susanna C Larsson; Shuai Yuan; Jie Chen; Xue Li; Rongrong Fan; Benoit Arsenault; Dipender Gill; Edward L Giovannucci
Journal:  Eur J Epidemiol       Date:  2022-04-30       Impact factor: 12.434

5.  Mendelian randomization prioritizes abdominal adiposity as an independent causal factor for liver fat accumulation and cardiometabolic diseases.

Authors:  Eloi Gagnon; William Pelletier; Émilie Gobeil; Jérôme Bourgault; Hasanga D Manikpurage; Ina Maltais-Payette; Erik Abner; Nele Taba; Tõnu Esko; Patricia L Mitchell; Nooshin Ghodsian; Jean-Pierre Després; Marie-Claude Vohl; André Tchernof; Sébastien Thériault; Benoit J Arsenault
Journal:  Commun Med (Lond)       Date:  2022-10-13

6.  Genetic effects of iron levels on liver injury and risk of liver diseases: A two-sample Mendelian randomization analysis.

Authors:  Kai Wang; Fangkun Yang; Pengcheng Zhang; Yang Yang; Li Jiang
Journal:  Front Nutr       Date:  2022-09-16

7.  Blood Levels of the SMOC1 Hepatokine Are Not Causally Linked with Type 2 Diabetes: A Bidirectional Mendelian Randomization Study.

Authors:  Nooshin Ghodsian; Eloi Gagnon; Jérôme Bourgault; Émilie Gobeil; Hasanga D Manikpurage; Nicolas Perrot; Arnaud Girard; Patricia L Mitchell; Benoit J Arsenault
Journal:  Nutrients       Date:  2021-11-24       Impact factor: 5.717

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.