Literature DB >> 35474605

A minority of somatically mutated genes in pre-existing fatty liver disease have prognostic importance in the development of NAFLD.

Jake P Mann1,2, Matthew Hoare2,3.   

Abstract

BACKGROUND: Understanding the genetics of liver disease has the potential to facilitate clinical risk stratification. We recently identified acquired somatic mutations in six genes and one lncRNA in pre-existing fatty liver disease. We hypothesised that germline variation in these genes might be associated with the risk of developing steatosis and contribute to the prediction of disease severity.
METHODS: Genome-wide association study (GWAS) summary statistics were extracted from seven studies (>1.7 million participants) for variants near ACVR2A, ALB, CIDEB, FOXO1, GPAM, NEAT1 and TNRC6B for: aminotransferases, liver fat, HbA1c, diagnosis of NAFLD, ARLD and cirrhosis. Findings were replicated using GWAS data from multiple independent cohorts. A phenome-wide association study was performed to examine for related metabolic traits, using both common and rare variants, including gene-burden testing.
RESULTS: There was no evidence of association between rare germline variants or SNPs near five genes (ACVR2A, ALB, CIDEB, FOXO1 and TNRC6B) and risk or severity of liver disease. Variants in GPAM (proxies for p.Ile43Val) were associated with liver fat (p = 3.6 × 10-13 ), ALT (p = 2.8 × 10-39 ) and serum lipid concentrations. Variants in NEAT1 demonstrated borderline significant associations with ALT (p = 1.9 × 10-11 ) and HbA1c, but not with liver fat, as well as influencing waist-to-hip ratio, adjusted for BMI.
CONCLUSIONS: Despite the acquisition of somatic mutations at these loci during progressive fatty liver disease, we did not find associations between germline variation and markers of liver disease, except in GPAM. In the future, larger sample sizes may identify associations. Currently, germline polygenic risk scores will not capture data from genes affected by somatic mutations.
© 2022 The Authors. Liver International published by John Wiley & Sons Ltd.

Entities:  

Keywords:  GPAM; NAFLD; alcohol-related liver disease; genomic analysis; precision medicine; variation

Mesh:

Substances:

Year:  2022        PMID: 35474605      PMCID: PMC9544140          DOI: 10.1111/liv.15283

Source DB:  PubMed          Journal:  Liver Int        ISSN: 1478-3223            Impact factor:   8.754


Lay Summary We have recently found recurrent genetic mutations in the liver of patients with end‐stage fatty liver disease. We believe these mutations develop to protect liver cells from damage. However, when we study very large numbers of patients, hereditary genetic mutations at the same places in DNA, do not seem to affect the risk that a person will develop fatty liver disease in the first place.

INTRODUCTION

Non‐alcoholic fatty liver disease (NAFLD) affects around 25% of the worldwide population and is emerging as the fastest‐growing form of liver disease in developed countries. , It encompasses a spectrum of diseases from simple hepatic steatosis, through non‐alcoholic steatohepatitis (NASH) to cirrhosis, with the attendant risks of liver failure and hepatocellular carcinoma (HCC). Hepatic steatosis is strongly associated with insulin resistance, obesity and other features of the metabolic syndrome, which has led to the development of the term metabolic dysfunction‐associated fatty liver disease (MAFLD). Given the number of patients at risk of development of NAFLD and subsequent health problems, there is an urgent need to stratify patients based upon long‐term risks, so that even countries with well‐resourced healthcare schemes can cope with the patient numbers predicted to develop end‐stage liver disease over the next few decades. At present, there are a number of clinically deployed non‐invasive tests, including elastography and serum biomarkers that can predict levels of hepatic fibrosis and long‐term risk in NAFLD. Significant interest has been generated in the potential of germline genetic variation in the pathogenesis and prognostication of NAFLD. Single‐nucleotide polymorphisms (SNPs) near several genes, including PNPLA3, HSD17B13, TM6SF2 and MBOAT7, , have been shown to be associated with long‐term risk of several disease‐related outcomes in NAFLD, as well as in alcohol‐related liver disease (ARLD), through genome‐wide association studies (GWAS). Their mechanistic contribution to fatty liver disease initiation or progression is only starting to become clear. These variants are common but associated with small effect sizes, when studied in isolation. This has suggested the potential for combining multiple genotypes into a polygenic risk score: the analysis of several SNPs in patients at baseline as a predictive tool for long‐term outcomes in both NAFLD and ARLD. Similarly, other studies have focused on rare pathogenic germline variants, such as in APOB (encoding apolipoprotein B), that lead to NAFLD with high penetrance. Although these cases can inform us about underlying disease pathogenesis, their low frequency suggests that they are unlikely to be informative in the stratification of most patients within the general NAFLD population. We have recently identified recurrent somatic mutations within the liver of patients with NAFLD and ARLD Recurrent non‐synonymous mutations were seen in six coding genes: ACVR2A, ALB, CIDEB, FOXO1, GPAM, TNRC6B, plus the long non‐coding (lnc)RNA NEAT1. These mutations were found in multiple hepatocyte clones throughout the liver, with evidence of convergent evolution: the identification of independent clones with mutations in the same genes, suggesting strong selection pressure to acquire these variants during progressive fatty liver disease. Many of these variants were predicted to result in an absence of functional protein. Amongst these genes, three are involved in lipid metabolism suggesting a potential functional role in disease pathogenesis: CIDEB, mediates fusion and cargo transfer of cytoplasmic lipid droplets ; FOXO1, the main transcription factor downstream of insulin, but also a major regulator of lipid metabolism and; GPAM, encoding GPAT1, the enzyme catalysing the initial step in triglyceride synthesis. Through functional validation of the hot‐spot mutations in FOXO1, we identified that this impacted the response to insulin, glycolysis and lipid metabolism. Although promising for improving the mechanistic understanding of disease pathogenesis, our current evidence only implicates them as acquired mutations developing in pre‐existing fatty liver disease. However, the prognostic or therapeutic role that these acquired mutations could play in NAFLD and ARLD remains unknown. As there was strong selective pressure to acquire these mutations in the context of pre‐existing NAFLD, we were interested to explore whether common SNPs or rare germline variation near these recurrently mutated genes might be associated with the development of fatty liver and liver‐related outcomes in NAFLD. Data from well‐established variants have shown that hepatic fat accumulation has been causally linked to clinical liver events (e.g. hepatocellular carcinoma). We hypothesised that germline variation providing weaker modulation of disease phenotypes would exist and might improve the performance of polygenic risk scores being developed for prognostication in NAFLD and ARLD. Amongst these genes, germline coding variants at GPAM have been associated with serum ALT levels in an exome‐wide association study of the UK Biobank cohort, particularly p.Ile43Val. Further study found that these variants were also associated with hepatic fat content and histological markers of liver damage in independent NAFLD cohorts. Other studies using GWAS have replicated these findings in further cohorts, , , where rs10787429 in GPAM was associated with elevated ALT in both NAFLD and ARLD. Here we use data from multiple GWAS to investigate whether similar associations are observed for the other five recurrently mutated genes and one recurrently mutated locus.

METHODS

Identification of genes enriched for somatic mutations in ARLD and NAFLD

As described in detail elsewhere, whole‐genome sequencing was performed on diseased, non‐malignant hepatic tissue from patients with NAFLD and ARLD undergoing tumour resection or liver transplant. The dN/dScv method was used to identify six coding genes with higher numbers of nonsynonymous mutations relative than expected.

Annotation of somatic mutants and genomic regions

After removal of duplicates, somatic mutations at ACVR2A, ALB, CIDEB, FOXO1, GPAM and TNRC6B, plus one non‐coding lncRNA NEAT1 14 were annotated with predicted consequence, impact on transcript and prevalence within the 1000 Genomes dataset (n = 2504), Exome Sequencing Project (n = 6503) and gnomAD (n = 141 456) using the Ensembl Variant Effect Predictor. All non‐synonymous single nucleotide variants were also annotated with predicted functional consequences using dbNSFP. For the variants that had previously been identified in any of the above population genomic sequencing datasets, we searched Phenoscanner and the Common Metabolic Disease Portal for any evidence of association with metabolic traits. No data were available on Phenoscanner for these ultra‐rare variants. The Common Metabolic Disease Portal search yielded 96 variant‐trait associations: therefore, rather than apply genome‐wide association significance cut‐off, the critical p‐value for significance for this analysis only was p < 5.2 × 10−4 (i.e. 0.05/96). In addition, we annotated each of the six genes with their expected and observed number of predicted loss of function (pLoF) and missense mutations from gnomAD. In brief, the ratio between observed and expected pLoF mutants is an indicator of each gene's tolerance to haploinsufficiency. Such that a gene with a low observed/expected ratio (defined as the upper 95% confidence interval <0.35) has fewer pLoF mutants than other genes, suggesting that humans are relatively intolerant of haploinsufficiency.

Association with markers of liver disease from genome‐wide association studies

Summary statistics from genome‐wide association studies (GWAS) were searched for all variants within the genomic regions for the seven regions of interest (Table S1). Regions included were (GRCh37): ACVR2A: chr2:148602086‐148 688 393; ALB: chr4:74262831‐74 287 129; CIDEB: chr14:24774302‐24 780 636; FOXO1: chr13:41129804‐41 240 734; GPAM: chr10:113909624‐113 975 135; NEAT1: chr11:65190245‐65 213 011; TNRC6B: chr22:40440821‐40 731 812. In addition, for comparison, we extracted summary statistics for four well‐validated genome‐wide significance risk variants for fatty liver disease: rs738409C > G in PNPLA3, rs58542926C > T in TM6SF2, rs2642438A > G in MTARC1 and rs72613567TA > T in HSD17B13. However, for some studies data on rs72613567TA > T were not available therefore we used rs13125522A > G as a proxy, which is in strong linkage disequilibrium (R 2 = 0.97) with rs72613567. Summary statistics were obtained from the Pan‐UK BioBank analysis for alanine aminotransferase (ALT), aspartate aminotransferase (AST), glycosylated haemoglobin (HbA1c), diagnosis of NAFLD (phecode‐571.5), diagnosis of ARLD (phecode‐317.11), liver fibrosis (icd10‐K74), cirrhosis (phecode‐571) and other liver diseases (phecode‐571.5). We also obtained summary statistics for diagnosis of NAFLD from Anstee et al. and MRI liver fat from Liu et al., which utilises UK BioBank data. These data were accessed through GWAS Catalogue. For replication of findings, we first obtained categorical data on liver‐related diagnoses from the FinnGen study: NAFLD, ARLD, hepatocellular carcinoma and intrahepatic cholangiocarcinoma and cirrhosis. For replication of observations for ALT, we obtained data from BioBank Japan and Pazoki et al. For replication of findings for HbA1c, we obtained summary statistics on HbA1c from the MAGIC consortium (trans‐ancestry meta‐analysis by Chen et al. ) and BioBank Japan, plus diagnosis of type 2 diabetes mellitus (T2DM) in East Asian individuals from Spracklen et al. The total number of unique participants included from these GWAS summary statistics was 1 628 945. All variants from the above regions were extracted, and coordinates from FinnGen were carried over from GRCh38 to GRCh37 using the Ensembl Assembly Converter. Manhattan plots were produced for each trait, illustrating only variants within the regions of interest. Significance was defined as p < 5 × 10−8. Within regions that had variants with a significant association, we used FIVEx to look for expression quantitative trait loci (eQTL), which extracts data from the European Bioinformatic Institute eQTL Catalogue.

Gene‐based phenome‐wide association study for common variants

We sought to explore associations between germline variation in the coding genes and lncRNA of interest and metabolic traits using a phenome‐wide association study approach. Phenoscanner and the Common Metabolic Diseases Knowledge Portal were searched for each of the six coding genes plus NEAT1. Rare variants (mean allele frequency [MAF] <0.01) were excluded and results were filtered for traits relevant to ARLD and NAFLD. Significance was defined as p < 5 × 10−8. Data on eQTLs were obtained using the Qtlizer package for R for all significant associations. We also searched for any metabolite‐wide associations within the regions of interest using data from Lotta et al. (https://omicscience.org/apps/crossplatform/); however, we did not identify any significant (p < 4.9 × 10−10, as defined by the authors) associations.

Association between rare coding variants and traits related to NAFLD or ARLD

We next investigated whether rare coding variants individually, or in combination using gene‐burden analyses, were associated with markers of liver disease or related metabolic traits. We used data from https://genebass.org/ and https://azphewas.com/, which derive data from the UK BioBank 300 k Exomes. These analyses were not available for the lncRNA NEAT1. Extracted associations for individual variants, and gene‐burden tests for predicted loss of function (pLoF), missense and synonymous variants. Significance, as defined by the original studies, was p < 2.5 × 10−8 for GeneBass (using SKAT‐O test) and p < 2.0 × 10−9 (−log10[8.7]) for AZPheWAS.

Analyses

Linkage disequilibrium between variants, including the previously reported rs2792751T > V (p.Ile43Val) in GPAM, was calculated using SNiPA. Data were analysed using R 4.0.2 and the code used in the analyses is available from https://doi.org/10.5281/zenodo.4656979.

RESULTS

We have recently identified recurrent somatic mutations in non‐malignant liver tissue from individuals with ARLD and NAFLD through laser capture microdissection and whole‐genome sequencing. Through this approach we identified six coding genes (ACVR2A, ALB, CIDEB, FOXO1, GPAM and TNRC6B) and one lncRNA (NEAT1) significantly enriched for acquired somatic variants. We hypothesised that given the selective advantage these variants must endow during disease progression, rare pathogenic variants of these regions, associated with features of liver disease, might be identified.

Specific somatic mutations in recurrently mutated genes are not found in the germline

We previously identified 129 unique variants across these six coding genes and one lncRNA (Table S2), the majority of which were either missense (49/129, 38%) or non‐coding exonic variants in NEAT1 (49/129, 38%), and predicted to have a moderate or high impact upon protein structure (61/129, 47%, Table S3). Fifteen percent (11/71) of coding single‐nucleotide variants result in premature stops. 96% (51/53) of non‐synonymous coding variants had a CADD‐Phred score > 15, suggestive of deleterious impact on the protein (Table S4). These variants are extremely rare in the germline, with 118/129 (92%) having never been identified across 150 463 individuals from gnomAD, 1000G, or the Exome Sequencing Project. The most common of these 11 previously reported variants was rs368997599 G > A (p.Arg45Trp) in CIDEB, which was identified in 10 individuals from gnomAD in the heterozygous state, seven of whom are of Ashkenazi Jewish ancestry. Even in this genetic ancestry p.Arg45Trp in CIDEB remained an ultra‐rare mutation with an allele frequency of 7.0 × 10−4. There was no evidence of associations between any of the previously identified variants and metabolic traits in the Common Metabolic Disease Portal (Table S5). None of the 11 previously identified variants were associated with significant eQTLs (Table S4). Whilst all of the six coding genes had the expected number of missense mutations, four of the six (ACVR2A, ALB, FOXO1 and TNRC6B) are under selection pressure to prevent against haploinsufficiency; in gnomAD there were significantly fewer predicted loss of function (pLoF) variants observed than expected (Table S3). This implies that there are a reduced number of germline variants in these genes that will cause pLoF, compared to other coding genes and potential haploinsufficiency in these genes is deleterious. The exception to this was CIDEB, where there was an expected number of missense and pLoF mutations, but no reports of these rare variants associated with liver phenotypes. Overall, the specific somatic mutations that we had previously identified are very rare in the germline, because of negative selection pressure and are not known to be associated with the development of liver disease.

Among recurrently mutated genes only germline exonic variation in is associated with liver phenotypes

Given the rarity (or absence) of these specific 129 variants in the germline, we investigated whether other rare variants at these loci were associated with liver disease (or related metabolic traits). This methodology combines the effect of multiple rare variants (e.g. loss of function or missense) within genes to account for the rarity of individual variants. We used data from https://genebass.org/ and https://azphewas.com/ (n = 281 852 from UK BioBank), which tests whether exonic germline variants either individually, or cumulatively using a gene‐based burden method, demonstrated an association with liver or metabolic phenotypes. Analysis of individual coding variants found that p.Ile43Val in GPAM was associated with differences in serum lipids and the synonymous mutation 10–112 157 327‐T‐A (p.Pro681Pro) in GPAM was associated with ALT (Table 1). Exonic variants in other genes were associated with related metabolic traits, but not with liver disease phenotypes (Table 1 and Table S6). Using gene‐based burden testing, which adds together all variants within a single category (e.g. pLoF or missense), variants in ALB demonstrated associations with serum lipids, but no other markers of liver disease. Burden testing for pLoF variants in TNRC6B demonstrated a significant association with alcohol consumption habits, but no other markers of alcohol‐related disease (Table S6). Therefore, among the six recurrently mutated genes only rare germline coding variants in GPAM have been identified to be associated with serum lipid levels, rather than liver phenotypes.
TABLE 1

Top associations from rare variant and gene‐burden analyses

Variant‐level analyses
GenePhenotypeVariantAA changeAllele frequencyModelBeta p valueTotalSource
ACVR2A Creatinine2‐147 899 548‐G‐Ap.Pro118Pro0.30genotypic0.036.84E‐25254 544AZ
GPAM Alanine aminotransferase10–112 157 327‐T‐Ap.Pro681Pro0.28genotypic0.039.11E‐19255 248AZ
GPAM Alkaline phosphatase10–112 157 327‐T‐Ap.Pro681Pro0.28dominant0.042.61E‐24255 341AZ
GPAM Apolipoprotein A10–112 180 571‐T‐Cp.Ile43Val0.27genotypic0.041.16E‐42231 005AZ
GPAM Cholesterol10–112 180 571‐T‐Cp.Ile43Val0.27genotypic0.032.12E‐17253 523AZ
GPAM Direct bilirubin10–112 157 327‐T‐Ap.Pro681Pro0.28genotypic0.021.20E‐08217 133AZ
GPAM HDL cholesterol10–112 180 571‐T‐Cp.Ile43Val0.27genotypic0.043.95E‐43232 376AZ
GPAM Hip circumference10–112 180 571‐T‐Cp.Ile43Val0.27genotypic−0.021.59E‐08265 294AZ
GPAM LDL direct10–112 180 571‐T‐Cp.Ile43Val0.27genotypic0.021.03E‐09253 059AZ
GPAM Leg fat mass (right)10–112 180 571‐T‐Cp.Ile43Val0.27genotypic−0.013.47E‐08261 190AZ
GPAM Total bilirubin10–112 157 327‐T‐Ap.Pro681Pro0.28genotypic0.021.34E‐12254 316AZ
GPAM Triglycerides10–112 180 571‐T‐Cp.Ile43Val0.27genotypic−0.022.07E‐13253 317AZ
TNRC6B Body mass index (BMI)22‐40 301 373‐A‐G0.34genotypic−0.022.19E‐10267 287AZ
TNRC6B Creatinine22‐40 301 172‐TGCA‐Tp.Gln524del0.30genotypic0.022.58E‐09255 188AZ
TNRC6B Impedance of whole body22‐40 301 373‐A‐G0.34genotypic0.021.93E‐16263 528AZ
TNRC6B Total bilirubin22‐40 301 172‐TGCA‐Tp.Gln524del0.30genotypic−0.023.87E‐08254 308AZ

Note: Summary statistics were obtained from analyses of individual rare variants (using UK BioBank 300 k Exomes) or gene‐burden testing for pLoF or missense variants. AA, amino acid; AZ, Astrazeneca PheWAS; mis, missense variants; pLoF, predicted loss of function; ptv, protein‐truncating variant; snv, single nucleotide variant. Significance threshold adjusted for multiplicity was p < 2.5 × 10−8 for GeneBass (using SKAT‐O test) and p < 2.0 × 10−9 (−log10[8.7]) for AZPheWAS.

Top associations from rare variant and gene‐burden analyses Note: Summary statistics were obtained from analyses of individual rare variants (using UK BioBank 300 k Exomes) or gene‐burden testing for pLoF or missense variants. AA, amino acid; AZ, Astrazeneca PheWAS; mis, missense variants; pLoF, predicted loss of function; ptv, protein‐truncating variant; snv, single nucleotide variant. Significance threshold adjusted for multiplicity was p < 2.5 × 10−8 for GeneBass (using SKAT‐O test) and p < 2.0 × 10−9 (−log10[8.7]) for AZPheWAS.

Germline variation at , but not other somatically mutated genes, is associated with liver phenotypes

We next explored whether more common germline variants might be associated with liver phenotypes in previously published datasets. We utilised the summary statistics from the UK BioBank cohort of 420 000 subjects and searched for variants within these seven regions (Figures 1 and 2, Table 2 and Figure S1). As previously described, , we observed genome‐wide significant variants within GPAM, associated with elevated serum levels of ALT (Table 2, e.g. rs10787429 C > T beta = .006, p = 2.8 × 10−39 [p‐value significance threshold adjusted for multiplicity p < 5 × 10−8]), AST and liver fat by MR imaging (e.g. rs11446981 T > TA beta = −.003, p = 3.6 × 10−13). This acted as a useful positive control in our analyses that found no significant associations between SNPs near any of the other recurrently mutated coding genes and disease correlates in NAFLD or ARLD.
FIGURE 1

Association between common variants at recurrently mutated regions and markers of liver disease or glycaemic control. Manhattan plots focusing on the six protein‐coding genes and one lncRNA (NEAT1) of interest, illustrating all variants within their genomic coordinates. −log10 p‐value was obtained from summary statistics from the UK BioBank for alanine aminotransferase (ALT), liver fat (from Liu et al., 2021), glycosylated haemoglobin (HbA1c) and alcohol‐related liver disease (ARLD). Significance threshold adjusted for multiplicity was p < 5 × 10−8

FIGURE 2

Association between common variants at recurrently mutated regions and markers of liver disease or glycaemic control. Manhattan plots focusing on the six protein‐coding genes and one lncRNA (NEAT1) of interest, illustrating all variants within their genomic coordinates. −log10 p‐value was obtained from summary statistics from the UK BioBank for aspartate aminotransferase (AST), hepatic fibrosis and cirrhosis. Data on diagnosis of NAFLD were obtained from Anstee et al. (2021). Significance threshold adjusted for multiplicity was p < 5 × 10−8

TABLE 2

Genome‐wide significant associations of common variants within regions of interest with eQTLs

VariantGeneTraitBeta p valueEAFSource n eQTL gene (tissue)eQTL beta (p‐value)eQTL_study
rs10787429 C > T GPAM ALT0.0062.80E‐390.27European GWAS meta‐analysis753 010Nil
rs7096937 T > C GPAM AST−0.022.77E‐090.73UK BioBank419 034Nil
rs11446981 T > TA GPAM Liver fat−0.063.60E‐130.70UK BioBank32 858Nil
rs595366 T > A NEAT1 ALT−0.0031.90E‐110.27European GWAS meta‐analysis753 010NEAT1 (adipose)−0.4 (p = 6.5 × 10−17)FUSION
rs34743766 C > CA NEAT1 HbA1c0.018.49E‐090.17UK BioBank419 446NEAT1 (adipose)0.27 (p = 9.3 × 10−17)TwinsUK

Note: Lead variants from the six protein‐coding genes and one lncRNA (NEAT1) of interest are associated with markers of liver disease or glycaemic control. GWAS summary statistics were obtained from the UK BioBank or Pazoki et al. (2021). Expression quantitative trait locus (eQTL) data were obtained from FIVEx. EAF, effect allele frequency. Significance threshold adjusted for multiplicity was p < 5 × 10−8.

Association between common variants at recurrently mutated regions and markers of liver disease or glycaemic control. Manhattan plots focusing on the six protein‐coding genes and one lncRNA (NEAT1) of interest, illustrating all variants within their genomic coordinates. −log10 p‐value was obtained from summary statistics from the UK BioBank for alanine aminotransferase (ALT), liver fat (from Liu et al., 2021), glycosylated haemoglobin (HbA1c) and alcohol‐related liver disease (ARLD). Significance threshold adjusted for multiplicity was p < 5 × 10−8 Association between common variants at recurrently mutated regions and markers of liver disease or glycaemic control. Manhattan plots focusing on the six protein‐coding genes and one lncRNA (NEAT1) of interest, illustrating all variants within their genomic coordinates. −log10 p‐value was obtained from summary statistics from the UK BioBank for aspartate aminotransferase (AST), hepatic fibrosis and cirrhosis. Data on diagnosis of NAFLD were obtained from Anstee et al. (2021). Significance threshold adjusted for multiplicity was p < 5 × 10−8 Genome‐wide significant associations of common variants within regions of interest with eQTLs Note: Lead variants from the six protein‐coding genes and one lncRNA (NEAT1) of interest are associated with markers of liver disease or glycaemic control. GWAS summary statistics were obtained from the UK BioBank or Pazoki et al. (2021). Expression quantitative trait locus (eQTL) data were obtained from FIVEx. EAF, effect allele frequency. Significance threshold adjusted for multiplicity was p < 5 × 10−8. We validated these associations across a European‐ancestry GWAS meta‐analysis (n = 753 010) and in the BioBank Japan cohort (n = 134182 ; [p‐value significance threshold adjusted for multiplicity p < 5 × 10−8] Figure 2, Figure S1 and Table S7). The magnitude of effect size for the lead variants within GPAM was similar to that for well‐established loci in HSD17B13 and MTARC1, but smaller than observed for risk variants in PNPLA3 and TM6SF2 (Table S8). For example, for change in hepatic fat : 10–113 950 257‐T‐TA in GPAM beta = −.06, p = 3.60 × 10−13; compared to rs738409C > G in PNPLA3: beta = 0.22, p = 1.5 × 10−133 and rs58542926T > C in TM6SF2 beta = 0.33, p = 2.5 × 10−116. Therefore, the only somatically mutated gene for which germline variation was associated with liver disease was GPAM. Genome‐wide significant associations between variants in GPAM and fatty liver disease have been previously described, particularly for rs2792751T > C p.Ile43Val. , All the GPAM variants identified in the above analyses were non‐coding variants and in strong linkage disequilibrium (LD r2 > 0.93) with rs2792751T > C (Figure S2A). Somatic variants in GPAM were associated with NAFLD and alcohol‐related liver disease (ARLD) in our previous study. Germline rs2792751T > C (p.Ile43Val) was also associated with ARLD or NAFLD diagnosis in these individuals (p = 0.002, Figure S3). Unlike the somatic mutations in GPAM, p.Ile43Val is not predicted to have a major functional consequence (Table S4). rs2792751T > C (and non‐coding GPAM variants identified in this study) were only associated with significant reductions in GPAM mRNA in tibial artery tissue (Table S4). No significant eQTLs in liver were identified. We did not find evidence of significant associations between p.Ile43Val with clinical liver‐related events in the UKBB cohort, though the absolute number of cases was comparatively small (Table S9).

Germline variation at the long non‐coding RNA is associated with liver phenotypes and glycaemic control

In the UK BioBank cohort, we observed genome‐wide significant variants within the lncRNA NEAT1 for elevated serum ALT (Figure 1). This was replicated in the European‐ancestry ALT meta‐analysis, though not in the BioBank Japan ALT results (Figure 2). The lead variant for NEAT1 (rs595366T > A) was also found to have an eQTL for NEAT1 in both adipose and liver tissue. Germline variation at rs595366T > A was also associated with the diagnosis of ARLD or NAFLD in the small cohort of individuals characterised in Ng et al. (p = 0.002, Figure S3). However, no associations were identified between variants at NEAT1 and categorical definitions of liver disease (e.g. diagnosis of NAFLD or ARLD, Figures 1 and 2) or severity of liver disease (e.g. cirrhosis). This was consistent across data from the UK BioBank and FinnGen (Figure S1 and Tables S7 and S9) datasets. Given the causal associations between insulin resistance and hepatic steatosis, we investigated whether variants were associated with HbA1c or a diagnosis of type 2 diabetes mellitus (T2DM). Variants in NEAT1 were associated with HbA1c levels in the UK BioBank (e.g. rs34743766 C > CA, beta = 0.1, p = 8.5 × 10−9 [p‐value significance threshold adjusted for multiplicity p < 5 × 10−8]) and had an associated eQTL for NEAT1 in adipose tissue, but this was not replicated for diagnosis of T2DM or other analyses of HbA1c (Figure S1). These associations are interesting as expression levels of NEAT1 have previously been associated with both the presence of diabetes and the progression of diabetic complications. , All variants in or near NEAT1 identified to have significant genome‐wide association were in strong linkage disequilibrium with each other (LD r 2 > 0.82, Figure S2B). Many of the variants were found to have significant eQTLs for NEAT1 in a range of highly metabolically active tissues (Table 3), but not in the liver.
TABLE 3

Associations between common variants in or near regions of interest and related metabolic traits

GeneVariantTraitBeta (SE) p value n eQTL
ACVR2A rs13008838 A > GeGFR‐creat (serum creatinine)−0.003 (0.0004)4.97E‐15754 661ACVR2A: Adipose ‐ Subcutaneous (+), Muscle skeletal (+), Artery ‐ Tibial (+), Artery ‐ Aorta (+), Pancreas (+), Artery ‐ Coronary (+), Adipose ‐ Visceral (Omentum) (+)
ACVR2A rs3764955 G > CSerum creatinine0.020 (0.003)2.01E‐08182 901ACVR2A: Artery ‐ Coronary (+), Adipose ‐ Visceral (Omentum) (+), Adipose ‐ Subcutaneous (+), Pancreas (+), Artery ‐ Aorta (+), Artery ‐ Tibial (+), Muscle skeletal (+)
CIDEB rs12590407 G > ADiastolic blood pressure−0.010 (0.002)9.11E‐12552 754
FOXO1 rs2253001 A > TBasal metabolic rate0.010 (0.002)1.03E‐08331 307
GPAM rs10787429 T > CAlanine transaminase−0.030 (0.005)3.44E‐09141 341GPAM: Artery ‐ Tibial (−)
GPAM rs7898213 T > CAlkaline phosphatase−0.030 (0.004)5.00E‐11112 189
GPAM rs2792759 C > TBilirubin−0.010 (0.001)4.10E‐09467 109GPAM: Artery ‐ Tibial (−)
GPAM rs2297991 T > CHDL cholesterol−0.030 (0.003)5.21E‐20191 159GPAM: Artery ‐ Tibial (−)
GPAM rs1129555 A > GLDL cholesterol0.030 (0.004)1.00E‐15GPAM: Artery ‐ Tibial (−)
GPAM rs1129555 A > GTotal cholesterol0.030 (0.004)2.00E‐18GPAM: Artery ‐ Tibial (−)
GPAM rs2254537 T > ATriglycerides0.020 (0.002)8.23E‐17482 392GPAM: Artery ‐ Tibial (−)
NEAT1 rs10896037 A > GChronic kidney disease0.060 (0.009)5.98E‐10140 966NEAT1: Adipose ‐ Visceral (Omentum) (−), Muscle skeletal (−), Adipose ‐ Subcutaneous (−), Artery ‐ Aorta (−)
NEAT1 rs12801636 G > ACoronary artery disease−0.040 (0.004)6.74E‐171 546 260
NEAT1 rs2306363 G > TDiastolic blood pressure−0.030 (0.002)1.39E‐28552 754
NEAT1 rs4930319 G > CeGFR‐creat (serum creatinine)0.003 (0.0004)8.69E‐27757 454NEAT1: Adipose ‐ Subcutaneous (−), Adipose ‐ Visceral (Omentum) (−), Artery ‐ Aorta (−), Muscle skeletal (−)
NEAT1 rs2236682 G > TSerum creatinine0.020 (0.004)2.66E‐10182 901NEAT1: Muscle skeletal (−), Artery ‐ Aorta (−), Adipose ‐ Visceral (Omentum) (−), Adipose ‐ Subcutaneous (−)
NEAT1 rs2306363 G > TSystolic blood pressure−0.020 (0.002)3.90E‐20550 853
NEAT1 rs10750766 C > ATriglycerides0.020 (0.002)9.96E‐15459 761
NEAT1 rs947791 G > AType 2 diabetes0.050 (0.004)2.71E‐161 016 100NEAT1: Artery ‐ Aorta (+), Pancreas (+), Muscle skeletal (+), Artery ‐ Tibial (+), Adipose ‐ Visceral (Omentum) (+), Adipose ‐ Subcutaneous (+)
NEAT1 rs11227217 C > TWaist‐hip ratio0.020 (0.002)2.60E‐15997 776NEAT1: Adipose ‐ Visceral (Omentum) (+), Muscle skeletal (+), Adipose ‐ Subcutaneous (+), Artery ‐ Aorta (+), Artery ‐ Tibial (+), Pancreas (+)
NEAT1 rs4645917 G > AWaist‐hip ratio adj BMI−0.030 (0.002)5.05E‐18969 126NEAT1: Muscle skeletal (−), Artery ‐ Tibial (−)
TNCR6B rs4820410 A > GBMI−0.020 (0.001)1.72E‐261 326 100TNRC6B: Artery ‐ Tibial (+)
TNCR6B rs2294352 G > AeGFR‐creat (serum creatinine)0.010 (0.001)2.90E‐32772 751
TNCR6B rs470113 A > GPulse pressure0.020 (0.003)4.33E‐08317 539TNRC6B: Artery ‐ Aorta (+), Artery ‐ Tibial (+), Artery ‐ Aorta (+), Artery ‐ Tibial (+)
TNCR6B rs2294352 G > ASerum creatinine−0.030 (0.004)1.06E‐15191 380
TNRC6B rs5995840 T > CBody mass index0.030 (0.004)1.02E‐11173 430TNRC6B: Artery ‐ Aorta (+), Artery ‐ Tibial (+), Atherosclerotic aortic root NA

Note: Gene‐based PheWAS performed using Phenoscanner and Common Metabolic Disease Portal to identify common variants significantly associated with the six protein‐coding genes and one lncRNA (NEAT1) of interest. eQTL column provides the gene that has significant eQTLs for that variant, the tissues and the direction (+, positive eQTL; −, negative eQTL). eQTL data were obtained using the Qtlizer package for R. Significance threshold adjusted for multiplicity was p < 5 × 10−8.

Associations between common variants in or near regions of interest and related metabolic traits Note: Gene‐based PheWAS performed using Phenoscanner and Common Metabolic Disease Portal to identify common variants significantly associated with the six protein‐coding genes and one lncRNA (NEAT1) of interest. eQTL column provides the gene that has significant eQTLs for that variant, the tissues and the direction (+, positive eQTL; −, negative eQTL). eQTL data were obtained using the Qtlizer package for R. Significance threshold adjusted for multiplicity was p < 5 × 10−8.

Germline variation at recurrently mutated genes is associated with metabolic phenotypes

We then explored whether common variants in or near these regions were associated with other related metabolic traits that affect patients with NAFLD, using a gene‐ (or region‐) based phenome‐WAS from two sources (Phenoscanner and Common Metabolic Disease Portal, n = 7637‐1 546 260, Table S10). In addition to its association with ALT, variants in or near GPAM influenced the concentration of many serum lipids (Table 3), including LDL cholesterol and triglyceride levels. We also identified variants near NEAT1 that were associated with a diagnosis of T2DM, waist‐to‐hip ratio and had a significant eQTL for NEAT1 in adipose tissue. In addition, variants in or near TNRC6B were associated with BMI and creatinine, as were those in ACVR2A. Therefore, germline variation in these genes may impact on related phenotypes that accompany the metabolic syndrome.

DISCUSSION

Investigating the genetics of fatty liver disease has the potential to inform our understanding of disease biology and facilitate clinical risk stratification for affected patients. We recently identified six protein‐coding genes and one lncRNA (NEAT1) that are enriched for loss of function somatic mutations in patients with NAFLD or ARLD. In this study, we find that germline variation in only one, GPAM, was robustly associated with markers of liver disease. The variants in GPAM are in strong linkage disequilibrium with a benign (or gain of function) variant: p.Ile43Val. This implies that although strong selection pressure exists to acquire loss of function mutations in existing fatty liver, we did not find evidence that similar germline variation contributes to disease initiation. This important negative result has important implications: (1) different pathophysiological mechanisms likely operate in fatty liver disease initiation (e.g. gain of function in GPAM) and progression driving stage‐specific selective advantage (e.g. loss of function); (2) genetic risk scores capturing only germline variation will not include the pathogenicity conveyed by these regions and additional strategies may be needed to understand the prognostic implications of these somatic mutations. Our analysis has replicated the known associations between variants in or near GPAM with liver fat, as well as ALT and AST, acting as a positive ‘control’ for analyses of the other loci. These variants are likely proxies for the coding variant p.Ile43Val. This variant has no significant hepatic eQTLs and is not predicted to disturb the protein structure, therefore may cause a gain of function in GPAM. This is in marked contrast to the somatic mutants in GPAM identified by Ng et al., all of which were predicted to be deleterious. Whilst we did not show these variants to influence the diagnosis of NAFLD, this has been demonstrated by others with larger sample sizes, variants in GPAM (particularly p.Ile43Val) are associated with radiological and histological diagnosis of NAFLD. , , , However, it is important to note that variants in GPAM have a comparatively small effect size compared to variants in PNPLA3 and TM6SF2, but similar to those in HSD17B13 and MTARC1. For five (ACVR2A, ALB, CIDEB, FOXO1, TNRC6B) of the six protein‐coding genes under investigation in this study, we found no evidence for the association between germline variation and liver disease. These observations were consistent across multiple data sources, traits, genetic ancestries and analysis methodologies, for example both common variant analyses and rare‐variant gene burden testing. This was an unexpected observation, given the genetic evidence for their selective advantage in hepatocyte clones in diseased non‐malignant NAFLD and ARLD. Moreover, CIDEB and FOXO1 have well‐established functions as a lipid droplet‐associated protein and a component of the insulin signalling cascade respectively. Conversely, our previous study did not identify acquired somatic mutations in genes with strong germline associations with liver disease (e.g. PNPLA3, TM6SF2). Collectively, this suggests that the influence of germline and acquired variants on parenchymal liver disease occurs through independent mechanisms (and genes). It should be noted that the methodology employed in this study does not exclude the possibility that individuals with rare loss‐of‐function mutations in these genes may have liver‐related phenotypes. Identification and studying human knock‐outs for these genes is an alternative strategy for investigating whether germline variation plays a role in liver disease, as has been illustrated for other conditions. , However, our data suggest that these individuals will be very rare. We identified borderline associations between variants in NEAT1, a long non‐coding RNA (lncRNA), with ALT and HbA1c. Our broader analyses implicated NEAT1 in influencing multiple metabolic traits (e.g. serum triglycerides, diagnosis of T2DM and coronary artery disease), that are of potential relevance to patients with NAFLD. These results point towards a primary role on insulin resistance, potentially through modulation of adipose tissue biology, as several variants in this lncRNA also had significant eQTLs in adipose tissue, but not in the liver. The biology of NEAT1 is poorly understood, but there is some in vitro evidence for its role in adipogenesis. We suggest that the subtle effects of germline variants in NEAT1 on ALT are likely indirect, via perturbation of insulin resistance and/or development of T2DM, however further work is required to establish this. These data also underline the principle that the genomic regions enriched for somatic mutations in our original analysis are principally those involved in metabolism. We found that germline variation in lead variants in GPAM and NEAT1 was associated with the diagnosis of NAFLD or ARLD, using data from our previous study. Therefore, in this small cohort, we found enrichment of both somatic and germline variation in these two genomic regions. Clinically, one aim of human genetics is to stratify patients into high‐ and low‐risk groups for disease progression using polygenic gene scores. Such an approach can identify individuals with a five‐fold increased risk of coronary artery disease. This would be of particular use for NAFLD and ARLD, both common conditions where only a minority of individuals progress to liver‐related clinical events. To date, there have been four PGS published for liver disease, , , , all derived using genome‐wide significant hits, and therefore none of our seven genomic regions of interest were included. If a genome‐wide PGS were derived, which included weighting from sub‐genome wide‐significant variants, then variants in or near GPAM would contribute. However, they would still receive comparatively minimal weighting compared to variants in PNPLA3 and TM6SF2. More broadly, it is not clear how the magnitude of prognostic implication would compare for germline variation risk scores compared to somatic mutations, as the prognostic implication of these remains unknown. Our results illustrate that the integration of somatic mutants into prognostic tools will be a complex process and separate from existing methods for polygenic gene scores. One limitation of this study is that rare variant associations may not be observed because of a lack of power. Larger population‐based datasets and disease‐specific cohort studies may in future identify links between variants and liver‐related outcomes that we have not been able to observe. In addition, we have not investigated evidence of interaction between genetic and environmental triggers (e.g. body mass index, alcohol consumption), as has been shown for other variants that influence liver fat.

CONCLUSION

Out of seven genomic regions with selective pressure for acquired loss of function mutations secondary to NAFLD and ARLD, only germline variation in GPAM is predictive of liver disease. Unlikely somatic mutations, the lead coding variant in GPAM (p.Ile43Val) is not predicted to deleteriously affect protein structure. This suggests that different pathophysiological mechanisms occur in disease initiation and progression. Therefore, genes with pathogenic somatic mutations in NAFLD and ARLD are distinct from those that confer germline risk and would not be captured by polygenic risk scores. Novel approaches will be required to integrate somatic and germline variation with clinical variables for risk prediction algorithms. These observations may be refined when larger sample sizes facilitate observations of subtle in rare variants.

FUNDING INFORMATION

JPM is supported by a Wellcome Trust fellowship (216329/Z/19/Z); MH is supported by a CRUK Advanced Clinician Scientist fellowship (C52489/A19924); CRUK‐OHSU Project Award (C52489/A29681) and CRUK Accelerator award to the HUNTER consortium (C18873/A26813).

CONFLICT OF INTEREST

MH is a co‐inventor on a patent detailing the finding of recurrent somatic mutations in chronic liver disease. Figure S1 Click here for additional data file. Figure S2 Click here for additional data file. Figure S3 Click here for additional data file. Table S1 Table S2 Table S3 Table S4 Table S5 Table S6 Table S7 Table S8 Table S9 Table S10 Click here for additional data file.
  55 in total

Review 1.  Regulation of Triglyceride Metabolism. II. Function of mitochondrial GPAT1 in the regulation of triacylglycerol biosynthesis and insulin action.

Authors:  Maria R Gonzalez-Baró; Tal M Lewin; Rosalind A Coleman
Journal:  Am J Physiol Gastrointest Liver Physiol       Date:  2006-12-07       Impact factor: 4.052

2.  Genome-wide association study of non-alcoholic fatty liver and steatohepatitis in a histologically characterised cohort.

Authors:  Quentin M Anstee; Rebecca Darlay; Simon Cockell; Marica Meroni; Olivier Govaere; Dina Tiniakos; Alastair D Burt; Pierre Bedossa; Jeremy Palmer; Yang-Lin Liu; Guruprasad P Aithal; Michael Allison; Hannele Yki-Järvinen; Michele Vacca; Jean-Francois Dufour; Pietro Invernizzi; Daniele Prati; Mattias Ekstedt; Stergios Kechagias; Sven Francque; Salvatore Petta; Elisabetta Bugianesi; Karine Clement; Vlad Ratziu; Jörn M Schattenberg; Luca Valenti; Christopher P Day; Heather J Cordell; Ann K Daly
Journal:  J Hepatol       Date:  2020-04-13       Impact factor: 25.083

3.  Cideb, an ER- and lipid droplet-associated protein, mediates VLDL lipidation and maturation by interacting with apolipoprotein B.

Authors:  Jing Ye; John Zhong Li; Yang Liu; Xuanhe Li; Tianshu Yang; Xiaodong Ma; Qing Li; Zemin Yao; Peng Li
Journal:  Cell Metab       Date:  2009-02       Impact factor: 27.287

4.  Association of Genetic Variation With Cirrhosis: A Multi-Trait Genome-Wide Association and Gene-Environment Interaction Study.

Authors:  Connor A Emdin; Mary Haas; Veeral Ajmera; Tracey G Simon; Julian Homburger; Cynthia Neben; Lan Jiang; Wei-Qi Wei; Qiping Feng; Alicia Zhou; Joshua Denny; Kathleen Corey; Rohit Loomba; Sekar Kathiresan; Amit V Khera
Journal:  Gastroenterology       Date:  2020-12-11       Impact factor: 22.682

5.  SNiPA: an interactive, genetic variant-centered annotation browser.

Authors:  Matthias Arnold; Johannes Raffler; Arne Pfeufer; Karsten Suhre; Gabi Kastenmüller
Journal:  Bioinformatics       Date:  2014-11-26       Impact factor: 6.937

6.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

Authors:  Annalisa Buniello; Jacqueline A L MacArthur; Maria Cerezo; Laura W Harris; James Hayhurst; Cinzia Malangone; Aoife McMahon; Joannella Morales; Edward Mountjoy; Elliot Sollis; Daniel Suveges; Olga Vrousgou; Patricia L Whetzel; Ridwan Amode; Jose A Guillen; Harpreet S Riat; Stephen J Trevanion; Peggy Hall; Heather Junkins; Paul Flicek; Tony Burdett; Lucia A Hindorff; Fiona Cunningham; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

7.  Genetic analysis in European ancestry individuals identifies 517 loci associated with liver enzymes.

Authors:  Raha Pazoki; Marijana Vujkovic; Benjamin F Voight; Kyong-Mi Chang; Mark R Thursz; Paul Elliott; Joshua Elliott; Evangelos Evangelou; Dipender Gill; Mohsen Ghanbari; Peter J van der Most; Rui Climaco Pinto; Matthias Wielscher; Matthias Farlik; Verena Zuber; Robert J de Knegt; Harold Snieder; André G Uitterlinden; Julie A Lynch; Xiyun Jiang; Saredo Said; David E Kaplan; Kyung Min Lee; Marina Serper; Rotonya M Carr; Philip S Tsao; Stephen R Atkinson; Abbas Dehghan; Ioanna Tzoulaki; M Arfan Ikram; Karl-Heinz Herzig; Marjo-Riitta Järvelin; Behrooz Z Alizadeh; Christopher J O'Donnell; Danish Saleheen
Journal:  Nat Commun       Date:  2021-05-10       Impact factor: 14.919

8.  Genetic Variation in the Mitochondrial Glycerol-3-Phosphate Acyltransferase Is Associated With Liver Injury.

Authors:  Aaron Hakim; Matthew Moll; Joseph Brancale; Jiangyuan Liu; Jessica A Lasky-Su; Edwin K Silverman; Silvia Vilarinho; Z Gordon Jiang; Yered H Pita-Juárez; Ioannis S Vlachos; Xuehong Zhang; Fredrik Åberg; Nezam H Afdhal; Brian D Hobbs; Michael H Cho
Journal:  Hepatology       Date:  2021-11-09       Impact factor: 17.298

9.  Exome-wide association study identifies a TM6SF2 variant that confers susceptibility to nonalcoholic fatty liver disease.

Authors:  Julia Kozlitina; Eriks Smagris; Stefan Stender; Børge G Nordestgaard; Heather H Zhou; Anne Tybjærg-Hansen; Thomas F Vogt; Helen H Hobbs; Jonathan C Cohen
Journal:  Nat Genet       Date:  2014-02-16       Impact factor: 38.330

10.  Universal Patterns of Selection in Cancer and Somatic Tissues.

Authors:  Iñigo Martincorena; Keiran M Raine; Moritz Gerstung; Kevin J Dawson; Kerstin Haase; Peter Van Loo; Helen Davies; Michael R Stratton; Peter J Campbell
Journal:  Cell       Date:  2018-06-14       Impact factor: 41.582

View more
  1 in total

1.  A minority of somatically mutated genes in pre-existing fatty liver disease have prognostic importance in the development of NAFLD.

Authors:  Jake P Mann; Matthew Hoare
Journal:  Liver Int       Date:  2022-05-11       Impact factor: 8.754

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.