Literature DB >> 34535985

Genome-Wide Association Study of NAFLD Using Electronic Health Records.

Ewen M Harrison^1,2, Athina Spiliopoulou³, Cameron J Fairfield¹, Thomas M Drake¹, Riinu Pius¹, Andrew D Bretherick⁴, Archie Campbell^1,5,6, David W Clark³, Jonathan A Fallowfield⁷, Caroline Hayward⁴, Neil C Henderson⁷, Peter K Joshi³, Nicholas L Mills⁸, David J Porteous⁵, Prakash Ramachandran⁷, Robert K Semple⁸, Catherine A Shaw¹, Cathie L M Sudlow¹, Paul R H J Timmers^4,3, James F Wilson^4,3, Stephen J Wigmore².

Abstract

Genome-wide association studies (GWAS) have identified several risk loci for nonalcoholic fatty liver disease (NAFLD). Previous studies have largely relied on small sample sizes and have assessed quantitative traits. We performed a case-control GWAS in the UK Biobank using recorded diagnosis of NAFLD based on diagnostic codes recommended in recent consensus guidelines. We performed a GWAS of 4,761 cases of NAFLD and 373,227 healthy controls without evidence of NAFLD. Sensitivity analyses were performed excluding other co-existing hepatic pathology, adjusting for body mass index (BMI) and adjusting for alcohol intake. A total of 9,723,654 variants were assessed by logistic regression adjusted for age, sex, genetic principal components, and genotyping batch. We performed a GWAS meta-analysis using available summary association statistics. Six risk loci were identified (P < 5*10-8 ) (apolipoprotein E [APOE], patatin-like phospholipase domain containing 3 [PNPLA3, transmembrane 6 superfamily member 2 [TM6SF2], glucokinase regulator [GCKR], mitochondrial amidoxime reducing component 1 [MARC1], and tribbles pseudokinase 1 [TRIB1]). All loci retained significance in sensitivity analyses without co-existent hepatic pathology and after adjustment for BMI. PNPLA3 and TM6SF2 remained significant after adjustment for alcohol (alcohol intake was known in only 158,388 individuals), with others demonstrating consistent direction and magnitude of effect. All six loci were significant on meta-analysis. Rs429358 (P = 2.17*10-11 ) is a missense variant within the APOE gene determining ϵ4 versus ϵ2/ϵ3 alleles. The ϵ4 allele of APOE offered protection against NAFLD (odds ratio for heterozygotes 0.84 [95% confidence interval 0.78-0.90] and homozygotes 0.64 [0.50-0.79]).
Conclusion: This GWAS replicates six known NAFLD-susceptibility loci and confirms that the ϵ4 allele of APOE is associated with protection against NAFLD. The results are consistent with published GWAS using histological and radiological measures of NAFLD, confirming that NAFLD identified through diagnostic codes from consensus guidelines is a valid alternative to more invasive and costly approaches.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 34535985 PMCID： PMC8793997 DOI： 10.1002/hep4.1805

Source DB: PubMed Journal: Hepatol Commun ISSN： 2471-254X

alanine aminotransferase apolipoprotein E body mass index confidence interval electronic health record glucokinase regulator Generation Scotland: Scottish Family Health Study genome‐wide association study glycated hemoglobin electronic health record hydroxysteroid 17‐beta dehydrogenase 13 linkage disequilibrium mitochondrial amidoxime reducing component 1 nonalcoholic fatty liver disease odds ratio patatin‐like phospholipase domain containing 3 single nucleotide polymorphism transmembrane 6 superfamily member 2 tribbles pseudokinase 1 UK Biobank very‐low‐density lipoprotein Nonalcoholic fatty liver disease (NAFLD) is the most common form of metabolic disease worldwide with an estimated prevalence of 24%, which appears to be increasing in all populations.( ) NAFLD has an even higher prevalence in those with other forms of metabolic disease including obesity, type 2 diabetes mellitus and hyperlipidemia, which are rising in prevalence. Therefore, the prevalence of NAFLD is also anticipated to rise.( ) NAFLD covers a spectrum of disease severity including elevated hepatic triglyceride content (isolated steatosis), inflammation (nonalcoholic steatohepatitis), fibrosis and cirrhosis, and is associated with elevated risk of further morbidity. It has risen to the second‐most‐common indication for liver transplantation in the United States and carries a significantly elevated risk of hepatocellular carcinoma, cardiovascular disease, diabetes mellitus, and all‐cause mortality.( ) However, the pathogenesis of NAFLD is complex, and disease progression is highly variable. Despite contributing to significant morbidity and mortality, there are no licensed pharmacological therapies for NAFLD, and several agents under investigation have thus far shown limited effectiveness.( ) Development of NAFLD is influenced by both environmental and genetic factors, its heritability estimated at 22%‐50%.( ) Several genome‐wide association studies (GWASs) have been published examining various phenotype definitions for NAFLD or NAFLD severity encompassing radiological evidence of hepatic steatosis, visceral fat content, lean NAFLD (NAFLD in a nonobese individual), and histological severity of fibrosis. These studies have highlighted several loci and candidate mechanisms underlying NAFLD pathogenesis.( , , , , , , , , , , , , , , , , , , , , ) Studies using a case‐control GWAS methodology have identified five loci associated with NAFLD at genome‐wide significance (P < 5*10−8; mitochondrial amidoxime reducing component 1 (MARC1), glucokinase regulator (GCKR), hydroxysteroid 17‐beta dehydrogenase 13 (HSD17B13), transmembrane 6 superfamily member 2 (TM6SF2), and patatin‐like phospholipase domain containing 3 [PNPLA3]).( , , , , , , , ) Known loci are shown in Supporting Tables S1 and S2. Most case‐control analyses have relied on histological and radiological confirmation of NAFLD limiting the sample size in these studies. Use of routinely collected administrative data allows for larger cohorts and reduces the need for invasive or expensive investigations such as biopsy or magnetic resonance imaging (MRI).( ) One published GWAS relied on natural language processing to identify NAFLD cases from electronic health record (EHR) results and found significant associations at loci previously identified in cohorts with complete histological classification.( ) Studies such as the UK Biobank (UKB) have extensive genotypic data linked to hospital and primary care discharge codes. A subgroup of 14,440 participants in the UKB has been analyzed in a published GWAS focusing on MRI‐based measures of steatohepatitis and fibrosis.( ) Separately a case control for all‐cause cirrhosis using several sources including UKB has been published with an additional single variant analysis for NAFLD at rs2642438 using only limited International Classification of Diseases, 10th Revision diagnostic codes for NAFLD.( ) Finally, a recent exome‐wide association study analyzed association of missense and nonsense mutations with serum alanine aminotransferase (ALT) before analyzing significant variants for association with hepatic fat content in a subset of 8,930 UKB participants.( ) The analysis identified several known NAFLD‐susceptibility variants as well as three NAFLD‐susceptibility loci (apolipoprotein E [APOE], glycerol‐3‐phosphate acyltransferase 1 mitochondrial [GPAM], and olfactory receptor family 12 subfamily D member 2 [OR12D2]), although APOE was identified in an earlier subanalysis.( ) There are no published case‐control GWAS using administrative data based on recent consensus recommendations.( ) We therefore undertook a GWAS of participants with recorded diagnostic codes attributed to NAFLD compared with healthy controls.

Materials and Methods

Diagnostic codes used for the identification of NAFLD are shown in Supporting Tables S3 and S4. Methods for identification of participants, determination of NAFLD status, and genotyping for the UKB and Generation Scotland–Scottish Family Health Study (GS‐SFHS) are provided in Supporting Materials 2. The UKB received ethical approval (research ethical committee reference 11/NW/0382). UKB data access was approved under projects 30439 (phenotype data) and 19655 (genotype data). Ethical approval for GS‐SFHS was granted by National Health Service Tayside Research Ethics Committee (REC reference number 05/S1401/89).

GWAS

Initial Analysis

Genotyped and imputed single nucleotide polymorphisms (SNPs) were analyzed. Participants of self‐reported European ancestry were considered eligible for inclusion. Outliers for heterozygosity and unexpected runs of homozygosity were excluded. One participant from each pair of related individuals in the UKB (Kinship > 0.0884( )) was excluded. Association with NAFLD was analyzed using logistic regression adjusted for age, sex, the first 20 genetic principal components, and batch, with batch included as a random effect. Imputation dosage was used for imputed SNPs. Quality control to exclude SNPs with low minor allele frequency (), low imputation quality score (), and deviation from Hardy‐Weinberg Equilibrium (HWE: P < 5*10−6) were applied. Genome‐wide significance was determined as P < 5*10−8 in the UKB and P < 0.0083 in the GS‐SFHS replication cohort (P value of 0.05 subjected to Bonferroni correction for each locus tested). Replication was undertaken by analyzing the lead SNP within each locus for the GS‐SFHS study.

Linkage Disequilibrium Clumping and Conditional Analyses

SNPs showing genome‐wide significance (P < 5*10−8) were considered significant. Linkage disequilibrium (LD) clumping was performed using the functional mapping and annotation of GWAS (FUMA) web application v1.3.5d (https://fuma.ctglab.nl/). ( ) Loci were established for lead SNPs with a minimum distance of 250 Kb between loci and using an r 2 < 0.25 to indicate independent SNPs within the same locus. Full details of the parameters passed to FUMA are available in Supporting Materials3A. Each locus was re‐analyzed while conditioning on the lead SNP, and further signals with genome‐wide significance were identified. This process was repeated until no remaining SNPs reached genome‐wide significance.

Population Stratification

The genomic inflation factor was calculated using the summary association statistics. Evidence of test statistic inflation in was investigated with LD score regression.( )

Sensitivity Analyses

Sensitivity analyses were conducted to ensure the robustness of the case definition. First, the analysis was conducted at each of the NAFLD‐susceptibility loci after exclusion of individuals with alternative hepatic pathology as described previously. Second, each of the analyses was conducted adjusting for body mass index (BMI) as a covariate. Third, the analysis was conducted adjusting for estimated consumption of alcohol (units per day). The alcohol estimate was derived from a 24‐hour dietary recall questionnaire during online follow‐up and was available for about 40% of participants. The logistic regression methodology for the sensitivity analyses was identical to the main GWAS. An additional analysis in the UKB cohort was undertaken to assess the relationship between APOE genotype and NAFLD based on the GWAS results (one lead SNP was a missense mutation within APOE). See Supporting Materials 2 for methods explaining the determination of APOE alleles and statistical analysis of association with NAFLD.

Association of NAFLD‐Susceptibility Loci With Phenotypic Traits

NAFLD is strongly associated with obesity, hyperlipidemia, hyperglycemia, inflammation, and deranged liver function tests, in particular ALT.( , ) Associations with serum biochemistry values for each identified variant were assessed in an age‐adjusted and sex‐adjusted linear regression model. Serum ALT, aspartate aminotransferase, gamma‐glutamyltransferase, alkaline phosphatase, total cholesterol (TC), low‐density lipoprotein cholesterol (LDL‐C), high‐density lipoprotein cholesterol, triglycerides, apolipoprotein A and B, lipoprotein A, glucose, glycated hemoglobin (HbA1c), C‐reactive protein, BMI, waist circumference, hip circumference, and waist‐to‐hip ratio were assessed. Each trait was assessed visually through histogram and log‐transformed in the case of skewed distribution. To evaluate potential amplification of steatogenic effects in the context of obesity, insulin resistance and alcohol intake,( , ) a gene‐environment interaction was assessed for each NAFLD‐susceptibility variant with BMI, HbA1c, and alcohol intake. A linear regression adjusted for age, sex, the first 20 genetic principal components, and genotyping batch was undertaken for the association of NAFLD‐susceptibility variants with the Fibrosis‐4 Index (FIB‐4) score and the NAFLD fibrosis score; both validated noninvasive measures associated with NAFLD‐related fibrosis.( , )

Genome‐wide Association Meta‐analysis

Two NAFLD case‐control GWASs with available summary association statistics (Namjou et al,( ) 1,106 cases and 8,571 controls; and Anstee et al,( ) 1,483 cases and 17,781 controls) from independent populations of European ancestry were also assessed. Namjou et al. reported on participants from American health centers and relied on natural language processing to ascertain cases of NAFLD from the EHR, and therefore was more closely related to the methodology used in this study. Namjou et al. performed a case‐control logistic regression adjusted for age, sex, BMI, the medical center, and the first three genetic principal components. Anstee et al. recruited participants from European tertiary liver centers with biopsy‐proven NAFLD and compared them to population controls with a linear mixed model adjusting for sex and the first five genetic principal components before repeating the analysis as a logistic regression. Identified lead SNPs from the UKB GWAS were inspected in these studies for direction and magnitude of effect. Meta‐analysis was conducted using METAL( ) with an inverse‐variance fixed‐effects meta‐analysis. Although between‐study heterogeneity was expected, the fixed‐effects model was used in preference to the random effects model due to the low number of studies and anticipated deviation from the Gaussian distribution required for a random effects model.( )

Plotting and Statistical Analysis of Results

Genomic analyses were performed using SNPtest (version 2.5.2),( ) QCTOOLS (version 2.0.6), and METAL (2018 version) on the University of Edinburgh Linux high‐performance compute cluster. Post‐GWAS analysis, regression analyses, and plotting were performed using R version 3.6.3.( ) Methods and results are reported in accordance with the STREGA (STrengthening the REporting of Genetic Association Studies) guidelines( ) (Supporting Materials 4).

Results

NAFLD Cohorts

A total of 502,616 participants entered the UKB study with 377,998 taken forward for GWAS. A history of NAFLD was present in 4,761 with 373,227 controls. After exclusion of alternative hepatic pathology there were 3,954 NAFLD cases and 355,942 controls. The median follow‐up was 9.0 years (range 7.3‐11.9). Compared with controls, cases were more likely to be male (51.0% vs. 46.3%), older (mean 57.4 vs. 56.9 years), heavier (mean 89.2 vs. 78.2 kg), and diabetic (32.9% vs. 7.7%). The baseline characteristics are shown in Supporting Materials 1 (Supporting Table S5). A total of 24,096 participants entered GS‐SFHS with 6,317 taken forward for GWAS. A history of NAFLD was present in 67 with 6,250 controls (both without alternative pathologies). The mean follow‐up was 11.2 years (range 9.6‐14.7 years) (Fig. 1).

FIG. 1

Flowchart describing participant recruitment in UKB and GS‐SFHS.

GWAS

Initial Analysis

A total of 460 SNPs were identified with genome‐wide significant P values after removal of low‐quality SNPs in the UKB. A total of 1,313 SNPs demonstrated borderline significance (P < 5*10−5). LD clumping revealed six loci with at least one significant NAFLD‐associated signal. Significant associations were seen with rs2642442 (MARC1 intron), rs1260326 (GCKR exon), rs17321515 (tribbles pseudokinase 1 [TRIB1] intergenic), rs73001065 (maternal‐effect uncoordinated sister chromatid cohesion factor [MAU2] intron), rs429358 (APOE exon), and rs3747207 (PNPLA3 intron). Rs73001065 is in strong linkage with the previously identified missense variant (rs58542926) within TM6SF2 (R 2 = 0.82),( , ) which is thought to be the causal variant.( ) Rs3747207 is in perfect linkage (R 2 = 1) with the established protein‐altering variant rs738409. In both cases, P values of the lead SNP and the putative causal variants were almost identical. Loci boundaries of the loci are shown in Supporting Materials 3B and the results at the 460 significant SNPs in Supporting Materials 5. Of the six loci, all six are known from previous case‐control GWASs of NAFLD or related traits such as hepatic steatosis, including the recently identified APOE variant (rs429358( , , , )) (Table 1). One locus (rs1260326) was replicated within the GS‐SFHS cohort. Of the five loci established by previous case‐control format GWASs of NAFLD, we replicated all five with four reaching genome‐wide significance (MARC1, GCKR, TM6SF2, and PNPLA3) and one (HSD17B13, P = 7.41−03) reaching the Bonferroni‐corrected threshold (see Table 2, Supporting Materials 1, and Supporting Table S2). We replicated three of the four significant loci reported by Anstee et al. and the only significant locus identified by Namjou et al. in their case‐control GWAS (P < 0.0083; Supporting Materials 1 and Supporting Table S2). Of the two loci identified previously by quantitative trait GWAS but not case‐control GWAS, TRIB1 reached the replication threshold in both Anstee et al. and Namjou et al., whereas APOE reached the replication threshold in Anstee et al. and nominal significance in Namjou et al. Both TRIB1 and APOE have been identified in other GWASs.

TABLE 1

Loci Associated With NAFLD in the UKB Cohort

RSID	Chr:Pos	Reference Allele/Effect Allele	EAF	OR (95% CI)	P Value	Consequence (Gene)	Hypothesized Functional Gene
rs2642442	1:220973563	C/T	0.317	1.15 (1.10‐1.20)	7.67⁻¹⁰	Intron (MARC1)	MARC1
rs1260326	2:27730940	T/C	0.392	0.87 (0.84‐0.91)	2.54⁻¹¹	Missense (GCKR)	GCKR
rs17321515	8:126486409	A/G	0.476	0.86 (0.82‐0.89)	1.81⁻¹³	Intergenic (TRIB1)	TRIB1
rs73001065	19:19460541	G/C	0.071	1.41 (1.32‐1.51)	1.08⁻²⁴	Intron (MAU2)	TM6SF2
rs429358	19:45411941	T/C	0.156	0.82 (0.77‐0.87)	2.17⁻¹¹	Missense (APOE)	APOE
rs3747207	22:44324855	G/A	0.215	1.45 (1.38‐1.51)	6.74⁻⁶⁰	Intron (PNPLA3)	PNPLA3

Functional role is based on assessment of published literature. Chromosome and position based on Genome Reference Consortium Human Build 37. Effect allele is the minor allele.

Abbreviations: Chr:Pos, chromosome:position; EAF, effect allele frequency; P Value, P value using allelic model.

TABLE 2

Association of Identified Loci with NAFLD in the Replication Cohort

RSID	Chr:Pos	Reference Allele/Effect Allele	GS‐SFHS			Namjou et al.		Anstee et al.		Meta‐analysis	Gene
RSID	Chr:Pos	Reference Allele/Effect Allele	EAF	OR (95% CI)	P Value	OR*	P Value	OR (95% CI)	P Value	P Value	Gene
rs2642442	1:220973563	C/T	0.311	1.06 (0.73‐1.54)	0.551	1.19	5.96⁻⁰³	1.16 (1.06‐1.26)	9.70⁻⁰⁴	5.83⁻¹²	MARC1
rs1260326	2:27730940	T/C	0.387	0.73 (0.52‐1.03)	0.00737	0.90	7.34⁻⁰²	0.78 (0.73‐0.84)	1.06⁻¹⁰	3.08⁻¹⁵	GCKR
rs17321515	8:126486409	A/G	0.478	0.94 (0.67‐1.32)	0.913	0.80	1.08⁻⁰⁴	0.86 (0.79‐0.93)	1.99⁻⁰⁴	1.24⁻¹⁶	TRIB1
rs73001065	19:19460541	G/C	0.065	0.79 (0.37‐1.69)	0.591	1.30	1.19⁻⁰²	1.58 (1.37‐1.82)	1.59⁻¹⁰	7.51⁻³⁰	TM6SF2
rs429358	19:45411941	T/C	0.162	0.70 (0.41‐1.18)	0.126	0.81	9.57⁻⁰³	0.85 (0.77‐0.95)	4.16⁻⁰³	3.42⁻¹³	APOE
rs3747207	22:44324855	G/A	0.194	1.37 (0.92‐2.03)	0.142	1.78	2.63⁻²⁰	1.83 (1.68‐1.98)	2.58⁻⁴⁹	1.67⁻⁸⁷	PNPLA3

Chromosome and position based on Genome Reference Consortium Human Build 37.

The summary association statistics for Namjou et al. did not provide a CI or standard error for the odds ratio.

Abbreviations: Chr:Pos, chromosome:position; EAF, effect allele frequency; P Value, P value using allelic model.

Loci Associated With NAFLD in the UKB Cohort Functional role is based on assessment of published literature. Chromosome and position based on Genome Reference Consortium Human Build 37. Effect allele is the minor allele. Abbreviations: Chr:Pos, chromosome:position; EAF, effect allele frequency; P Value, P value using allelic model. Association of Identified Loci with NAFLD in the Replication Cohort Chromosome and position based on Genome Reference Consortium Human Build 37. The summary association statistics for Namjou et al. did not provide a CI or standard error for the odds ratio. Abbreviations: Chr:Pos, chromosome:position; EAF, effect allele frequency; P Value, P value using allelic model. GWAS meta‐analysis resulted in broadly similar results with no additional loci reaching genome‐wide significance. Direction and magnitude of effect were similar for all six loci other than rs73001065‐C, for which GS‐SFHS demonstrated a nonsignificant reduction in NAFLD risk but with substantially wider confidence intervals (CIs). Forest plots showing effect size in the four studies (UKB GWAS cohort, GS‐SFHS replication cohort, Namjou et al. summary association statistics, and Anstee et al. summary association statistics) along with a Manhattan plot from the GWAS meta‐analysis are shown in Supporting Materials 1 (Supporting Figs. S1 and S2). A Manhattan plot for association of variants with NAFLD in the UKB is shown in Fig. 2.

FIG. 2

Manhattan plot for the association with NAFLD (4,761 cases and 373,227 controls). Each variant is plotted based on chromosome and position on the x‐axis and ‐log10 P values on the y‐axis. The horizontal dotted line represents genome‐wide significance (P = 5*10−8).

Conditional Analyses

After conditional analyses, one locus was found to have a further independent signal. Rs182611493 within the TM6SF2 locus showed significant association (P = 9.30*10−13) after conditioning on rs73001065 (see Supporting Materials 5). The remaining five loci did not have any SNPs reaching genome‐wide significance (P < 5*10−8) after conditioning on the lead SNP. After adjusting for the first 20 genetic principal components, there was evidence of test statistic inflation (). Inflation may be due to polygenicity rather than unmeasured population substructure.( ) The LD score regression intercept was 1.006, and the proportion of test statistic inflation ascribed to causes other than polygenicity was estimated to be 7.52%, confirming that polygenicity is the main driver of test statistic inflation. The quantile‐quantile plot is shown in Supporting Materials 1 (Supporting Fig. S3). Across all sensitivity analyses, the estimated genetic effects at each lead SNP had the same direction and broadly similar magnitude. All SNPs demonstrated a significant effect when alternative hepatic pathologies were excluded and when adjusting for BMI. Two SNPs retained significance after additionally adjusting for alcohol intake (rs73001065, rs3747207), with one retaining suggestive significance (rs17321515, P = 1.39*10−7). The other three SNPs no longer retained suggestive significance, although this is likely to be due to the greatly reduced sample size in those individuals who had completed the alcohol intake questionnaire (158,388 vs. 377,998), with all six SNPs showing greatly attenuated significance. Visual inspection of the lattice plots showed that odds ratio (OR) estimates at each SNP were broadly similar with wider CIs in the alcohol‐adjusted analysis (see Supporting Materials 1 and Supporting Fig. S4). A total of 377,998 individuals had sufficient data available to calculate the APOE genotype. As expected, the most common genotype was (219,869), whereas there were a total of 19 individuals who were either or who were excluded (there were no homozygotes). The distribution of individuals by genotype is shown in Supporting Materials 1 (Supporting Table S6). The APOE genotype was significantly associated with NAFLD (). The allele was strongly associated with reduced risk. The OR for NAFLD risk for heterozygotes was 0.84 (95% CI 0.78‐0.90) and for homozygotes 0.64 (0.50‐0.79). The allele was not associated with any significant change in NAFLD risk. The OR for NAFLD risk for heterozygotes was 1.02 (95% CI 0.93‐1.11) and for homozygotes 0.76 (0.50‐1.12). All ORs relate to the homozygote reference group. The results of the logistic regression are shown in Figure 3.

FIG. 3

OR plot demonstrating odds of NAFLD by APOE genotype. Each APOE genotype is compared with the ε3 homozygotes reference group (model adjusted for age, sex, genotyping batch, and the first 20 genetic principal components).

Association of NAFLD‐susceptibility Loci with Phenotypic Traits

Assessment of serum lipids was undertaken for each lead SNP in all 377,998 individuals taken forward for GWAS. The NAFLD‐susceptibility alleles were heterogeneous in their influence on serum lipids with most demonstrating reduced levels of TC and LDL other than rs17321515 (see Fig. 4). Four variants were associated with elevated ALT (rs17321515, rs73001065, rs429358, rs3747207). Figures for the other serum biochemistry markers, anthropometric features and disease associations are available in Supporting Materials 1 (Supporting Figs. S5‐S10).

FIG. 4

Impact of each NAFLD‐susceptibility allele on the measured serum cholesterol fractions. Each point represents the beta‐coefficient from an age‐adjusted and sex‐adjusted linear regression, and the error bar represents the 95% CI. Triglycerides were log‐transformed before the analysis. Abbreviations: Chol, total cholesterol; trigs, triglycerides. The 6 variants were each tested for 3 gene‐environment interactions and a Bonferroni‐corrected P value of 0.0028 was considered significant. All 6 variants demonstrated a significant gene‐environment interaction with HbA1c and all but rs1260326 demonstrated significant interaction with BMI. None of the 6 variants demonstrated an interaction with alcohol intake. In patients with NAFLD, only the locus with the strongest signal was associated with a change in FIB‐4 score on linear regression (rs3747207: Beta 0.08 [95% CI 0.03‐0.14]; P = 0.001). The NAFLD fibrosis score was not significantly influenced by any of the NAFLD‐susceptibility loci. Numeric results for the GWAS, sensitivity analyses, and additional analyses are shown in Supporting Materials 5 and 6.

Discussion

We performed a GWAS of NAFLD using 4,761 cases and 373,227 controls from the UKB study. We identified six NAFLD‐susceptibility variants previously identified in GWAS of quantitative NAFLD traits such as hepatic steatosis, including GCKR, ( ) TM6SF2 ( , ) and MARC1,( ) and confirm the recently identified NAFLD‐susceptibility variant of rs429358‐C within APOE, a protein‐altering variant, which is protective against NAFLD. We replicate all five previously established loci associated with NAFLD in case‐control GWASs including HSD17B13.( , ) Rs429358 is a recently identified( , , , ) missense variant within APOE, which in combination with rs7412 defines the three main alleles of APOE, namely, , , and . APOE plays various roles in peripheral lipid and lipoprotein metabolism, and the three common alleles influence metabolic and cardiovascular disease and Alzheimer’s disease.( ) The role of APOE in NAFLD has been examined previously in candidate‐gene studies and has recently been detected by GWAS. While this manuscript was under preparation, three independent analyses were published confirming an association at the APOE locus,( , , ) although it was first identified in a subgroup analysis of an earlier paper.( ) An exome‐wide array meta‐analysis was published, in which rs2075650 in TOMM40 was identified; the authors suggested, based on conditional analysis, that rs429358 in APOE is the causal variant with the C allele conferring protection.( ) Candidate genes studies have previously shown a decreased risk with the allele( , , ) and the allele,( , ) although some studies reported no difference in the risk of NAFLD.( , ) Elevated serum levels of APOE appear to correlate with higher fatty liver index, regardless of genotype.( ) Perhaps surprisingly, the allele may be associated with greater NAFLD fibrosis severity, although this finding was made in a very small sample.( ) The existence of additional populations demonstrating a similar association confirms that the APOE finding is likely to be a genuine association. The apparent lack of effect in some of the earlier studies is likely related to smaller sample sizes. The mechanisms by which APOE influences NAFLD development remain unclear. There is a linear increase in both cardiovascular risk and serum LDL and total cholesterol, with transition from to and from to .( ) The association between NAFLD and cardiovascular and metabolic disease( ) is unlikely to be explained by APOE activity, given that the allele simultaneously offers protection against NAFLD and increases the risk of cardiovascular and metabolic disease. APOE influences hepatic very‐low‐density lipoprotein (VLDL) secretion.( ) Apoe‐deficient mice demonstrate reduced VLDL secretion and greater steatohepatitis severity. This is not corrected by APOE‐producing bone marrow transplants with hepatic VLDL secretion remaining low, confirming that APOE plays an important role in liver autonomous VLDL secretion.( ) The allele is associated with enhanced hepatic VLDL secretion, and this may explain the association with hypertriglyceridemia and cardiovascular disease as well as protection against hepatic steatosis.( ) In our study, the overall rate of NAFLD detected using the EHR is lower than would be expected for NAFLD diagnosed by histological or radiological approaches. This is an expected limitation of the approach, as not all individuals with NAFLD undergo histological or radiological evaluations and may live in the community unaware of their disease. Despite the difference between the detected and anticipated rates of NAFLD in this study cohort, there are several elements of the analysis that suggest that the phenotype is valid for NAFLD research. First, all six loci have been identified by previous GWASs with strength of signal commensurate to that seen in earlier studies. Detection of such a unique genetic architecture strongly supports the validity and specificity of the EHR‐based phenotype. Second, we base our case definition on recent, independently authored expert consensus guidelines,( ) and based on these guidelines conduct sensitivity analyses after exclusion of alternative hepatic pathology, in which our results are consistent. Third, the baseline characteristics of the NAFLD cohort compared with controls is similar to that which is expected, as shown in Supporting Materials 1 (Supporting Table S5). Fourth, the UKB is affected by healthy volunteer bias, suggesting that the overall rate of NAFLD may be lower than an age‐matched population.( ) Finally, the results of our analysis are consistent with a histological analysis of a subgroup of UKB participants, in which several variants including the APOE variant were identified.( , , , ) The strengths of this study include using a larger sample size than previous studies, with greater power to detect association. The study also used a second cohort for external replication as well as available GWAS summary association statistics from other populations, resulting in replication of all identified loci. The use of administrative records for identification of cases also demonstrates that this technique can be used to study NAFLD without the requirement for invasive procedures or radiological assessment and, as discussed, the results are consistent with published literature. Our study has confirmed four loci identified in the same cohort using radiological assessment of NAFLD in a smaller subset (up to 14,400 participants; APOE, GCKR, TM6SF2, and PNPLA3)( ) and two other SNPs identified in other population. Furthermore, the strongest known signal at PNPLA3 was verified in this study with a highly significant association, and relative effect sizes at each locus were highly similar to results from histologically or radiologically characterized cohorts. The study also benefits from demonstrating the robustness of the associations within sensitivity analyses using diagnostic codes based on published recommendations( ) and provides mechanistic evidence by using validated biomarkers to determine the potential influence of each locus on NAFLD development. Notably, using consensus recommendations to define NAFLD has resulted in 4,761 eligible cases, whereas previous studies have classified only 704 individuals from within the UKB as NAFLD cases, with the remainder misclassified as controls.( ) The limitations of this study include misclassification of individuals with alternative hepatic pathology on administrative records and low detection rate of NAFLD due to limited documentation in the EHR. The definition of cases based on administrative records also prevents any assessment of NAFLD severity; thus, variants that contribute solely to the progression of steatohepatitis, fibrosis, or cirrhosis without promoting initial occurrence of steatosis may not be identified. The binary case definition offers lower power to detect associations than continuous traits, although this is partly overcome by the very large sample size. The results of our study do, however, support published literature. The GS‐SFHS replication cohort only demonstrated significant association at the GCKR locus, but none of the other loci. This is likely to be due to the underpowered sample size with only 67 NAFLD cases compared with almost 5,000 analyzed in the UKB cohort. The overall rate of NAFLD ascertained using the EHR‐based approach was similar between the cohorts (1.2% in the UKB and 1.1% in the GS‐SFHS cohorts), suggesting that the lack of replication may be determined by sample size rather than differences in clinical coding. Despite this, all six identified loci have been detected by earlier GWASs. Although the definition of NAFLD is based on consensus recommendations,( ) it is possible that regional variation in recording before these recommendations has influenced documentation of NAFLD. This paper supports the use of administrative data as a means to conduct research into NAFLD. Such approaches are likely to require large sample sizes, but do overcome the need for invasive and costly recruitment and investigations. In summary, we have performed a GWAS of NAFLD using UKB data and identified six loci associated with NAFLD. We have also demonstrated the feasibility of EHR‐based NAFLD research without reliance on invasive investigations, validating the consensus guidelines. Supplementary Material Click here for additional data file. Supplementary Material Click here for additional data file. Supplementary Material Click here for additional data file. Supplementary Material Click here for additional data file. Supplementary Material Click here for additional data file.

51 in total

1. A Protein-Truncating HSD17B13 Variant and Protection from Chronic Liver Disease.

Authors: Noura S Abul-Husn; Xiping Cheng; Alexander H Li; Yurong Xin; Claudia Schurmann; Panayiotis Stevis; Yashu Liu; Julia Kozlitina; Stefan Stender; G Craig Wood; Ann N Stepanchick; Matthew D Still; Shane McCarthy; Colm O'Dushlaine; Jonathan S Packer; Suganthi Balasubramanian; Nehal Gosalia; David Esopi; Sun Y Kim; Semanti Mukherjee; Alexander E Lopez; Erin D Fuller; John Penn; Xin Chu; Jonathan Z Luo; Uyenlinh L Mirshahi; David J Carey; Christopher D Still; Michael D Feldman; Aeron Small; Scott M Damrauer; Daniel J Rader; Brian Zambrowicz; William Olson; Andrew J Murphy; Ingrid B Borecki; Alan R Shuldiner; Jeffrey G Reid; John D Overton; George D Yancopoulos; Helen H Hobbs; Jonathan C Cohen; Omri Gottesman; Tanya M Teslovich; Aris Baras; Tooraj Mirshahi; Jesper Gromada; Frederick E Dewey
Journal: N Engl J Med Date: 2018-03-22 Impact factor: 91.245

2. Apolipoprotein E gene polymorphism in nonalcoholic fatty liver disease.

Authors: Mehmet Derya Demirag; Hacer Ilke Onen; Meral Yirmibes Karaoguz; Ibrahim Dogan; Tarkan Karakan; Abdullah Ekmekci; Galip Guz
Journal: Dig Dis Sci Date: 2007-04-12 Impact factor: 3.199

3. Genome-wide scan revealed that polymorphisms in the PNPLA3, SAMM50, and PARVB genes are associated with development and progression of nonalcoholic fatty liver disease in Japan.

Authors: Takuya Kitamoto; Aya Kitamoto; Masato Yoneda; Hideyuki Hyogo; Hidenori Ochi; Takahiro Nakamura; Hajime Teranishi; Seiho Mizusawa; Takato Ueno; Kazuaki Chayama; Atsushi Nakajima; Kazuwa Nakao; Akihiro Sekine; Kikuko Hotta
Journal: Hum Genet Date: 2013-03-28 Impact factor: 4.132

4. The NAFLD fibrosis score: a noninvasive system that identifies liver fibrosis in patients with NAFLD.

Authors: Paul Angulo; Jason M Hui; Giulio Marchesini; Ellisabetta Bugianesi; Jacob George; Geoffrey C Farrell; Felicity Enders; Sushma Saksena; Alastair D Burt; John P Bida; Keith Lindor; Schuyler O Sanderson; Marco Lenzi; Leon A Adams; James Kench; Terry M Therneau; Christopher P Day
Journal: Hepatology Date: 2007-04 Impact factor: 17.425

5. High-sensitivity C-reactive protein is an independent clinical feature of nonalcoholic steatohepatitis (NASH) and also of the severity of fibrosis in NASH.

Authors: Masato Yoneda; Hironori Mawatari; Koji Fujita; Hiroshi Iida; Kyoko Yonemitsu; Shingo Kato; Hirokazu Takahashi; Hiroyuki Kirikoshi; Masahiko Inamori; Yuichi Nozaki; Yasunobu Abe; Kensuke Kubota; Satoru Saito; Tomoyuki Iwasaki; Yasuo Terauchi; Shinji Togo; Shiro Maeyama; Atsushi Nakajima
Journal: J Gastroenterol Date: 2007-07-25 Impact factor: 7.527

6. Genome-wide association analysis identifies variants associated with nonalcoholic fatty liver disease that have distinct effects on metabolic traits.

Authors: Elizabeth K Speliotes; Laura M Yerges-Armstrong; Jun Wu; Ruben Hernaez; Lauren J Kim; Cameron D Palmer; Vilmundur Gudnason; Gudny Eiriksdottir; Melissa E Garcia; Lenore J Launer; Michael A Nalls; Jeanne M Clark; Braxton D Mitchell; Alan R Shuldiner; Johannah L Butler; Marta Tomas; Udo Hoffmann; Shih-Jen Hwang; Joseph M Massaro; Christopher J O'Donnell; Dushyant V Sahani; Veikko Salomaa; Eric E Schadt; Stephen M Schwartz; David S Siscovick; Benjamin F Voight; J Jeffrey Carr; Mary F Feitosa; Tamara B Harris; Caroline S Fox; Albert V Smith; W H Linda Kao; Joel N Hirschhorn; Ingrid B Borecki
Journal: PLoS Genet Date: 2011-03-10 Impact factor: 5.917

7. Systematic review of genetic association studies involving histologically confirmed non-alcoholic fatty liver disease.

Authors: Kayleigh L Wood; Michael H Miller; John F Dillon
Journal: BMJ Open Gastroenterol Date: 2015-02-17

Review 8. Genetic predisposition in nonalcoholic fatty liver disease.

Authors: Silvia Sookoian; Carlos J Pirola
Journal: Clin Mol Hepatol Date: 2017-03-09

9. STrengthening the REporting of Genetic Association Studies (STREGA): an extension of the STROBE statement.

Authors: Julian Little; Julian P T Higgins; John P A Ioannidis; David Moher; France Gagnon; Erik von Elm; Muin J Khoury; Barbara Cohen; George Davey-Smith; Jeremy Grimshaw; Paul Scheet; Marta Gwinn; Robin E Williamson; Guang Yong Zou; Kim Hutchings; Candice Y Johnson; Valerie Tait; Miriam Wiens; Jean Golding; Cornelia van Duijn; John McLaughlin; Andrew Paterson; George Wells; Isabel Fortier; Matthew Freedman; Maja Zecevic; Richard King; Claire Infante-Rivard; Alex Stewart; Nick Birkett
Journal: PLoS Med Date: 2009-02-03 Impact factor: 11.069

10. Genome-Wide Association Study of Liver Fat: The Multiethnic Cohort Adiposity Phenotype Study.

Authors: S Lani Park; Yuqing Li; Xin Sheng; Victor Hom; Lucy Xia; Kechen Zhao; Loreall Pooler; V Wendy Setiawan; Unhee Lim; Kristine R Monroe; Lynne R Wilkens; Bruce S Kristal; Johanna W Lampe; Meredith Hullar; John Shepherd; Lenora L M Loo; Thomas Ernst; Adrian A Franke; Maarit Tiirikainen; Christopher A Haiman; Daniel O Stram; Loïc Le Marchand; Iona Cheng
Journal: Hepatol Commun Date: 2020-06-25