| Literature DB >> 35173224 |
Yehudit Hasin-Brumshtein1, Suraj Sakaram1, Purvesh Khatri2,3, Yudong D He4, Timothy E Sweeney5.
Abstract
Non-Alcoholic Fatty Liver Disease (NAFLD) is a progressive liver disease that affects up to 30% of worldwide population, of which up to 25% progress to Non-Alcoholic SteatoHepatitis (NASH), a severe form of the disease that involves inflammation and predisposes the patient to liver cirrhosis. Despite its epidemic proportions, there is no reliable diagnostics that generalizes to global patient population for distinguishing NASH from NAFLD. We performed a comprehensive multicohort analysis of publicly available transcriptome data of liver biopsies from Healthy Controls (HC), NAFLD and NASH patients. Altogether we analyzed 812 samples from 12 different datasets across 7 countries, encompassing real world patient heterogeneity. We used 7 datasets for discovery and 5 datasets were held-out for independent validation. Altogether we identified 130 genes significantly differentially expressed in NASH versus a mixed group of NAFLD and HC. We show that our signature is not driven by one particular group (NAFLD or HC) and reflects true biological signal. Using a forward search we were able to downselect to a parsimonious set of 19 mRNA signature with mean AUROC of 0.98 in discovery and 0.79 in independent validation. Methods for consistent diagnosis of NASH relative to NAFLD are urgently needed. We showed that gene expression data combined with advanced statistical methodology holds the potential to serve basis for development of such diagnostic tests for the unmet clinical need.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35173224 PMCID: PMC8850484 DOI: 10.1038/s41598-022-06512-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Datasets included in multicohort analysis capture real world heterogeneity.
| Dataset ID | Publication | Patient and control populations | Platform | ||||||
|---|---|---|---|---|---|---|---|---|---|
| PMID | Last Author | Year | Country | Age | NASH/NAFLD | HC | Available phenotypes | ||
| Discovery | |||||||||
| E-MEXP-3291 | 24048683 | Cherrington, NJ | 2013 | US | 16–70 | N = 16/10 N = 19 Postmortem samples were acquired from NIH-funded Liver Tissue Cell Distribution System. Classification based on presence of inflammation and fibrosis for NASH (regardless of fat deposition), and fat deposition of > 10% for steatosis | Sex, age | GPL6244 Affymetrix Human Gene 1.0 ST Array | |
| GSE126848 | 30653341 | Knop, FK | 2019 | Denmark | N = 16/15 Histological evaluation of steatosis, activity, and fibrosis (SAF) + Kleiner fibrosis stagea | N = 26 Healthy normal weight and overweight individuals, no diabetes or excessive alcohol intake | Sex | GPL18573 RNAseq, Illumina NextSeq 500 | |
| GSE33814 | 23071592 | Sültmann, H | 2012 | Austria | 25–78 | N = 12/19 Presence of ballooning in combination with variable degree of steatosis and/or inflammation | N = 13 Explant and tumor surgery | GPL6884 Illumina HumanWG-6 v3.0 expression beadchip | |
| GSE37031 | 23492103 | Titos, E | 2014 | Spain | N = 8/0 | N = 7 | GPL14877 Affymetrix Human Genome U133 Plus 2.0 Array | ||
| GSE63067 | 25993042 | Martínez-Chantar, ML | 2015 | Spain | N = 9/2 Histology | N = 7 | GPL570 Affymetrix Human Genome U133 Plus 2.0 Array | ||
| GSE66676 | 26026390 | Inge, TH Teen-Labs Consortium | 2015 | US | 13–20 | N = 7/26 NASH Clinical Research Network scoring systemc | N = 34 Obese, undergoing bariatric surgery, no evidence of steatosis in biopsy | Sex, age, BMI, histology, HDL, LDL, cholesterol, triglycerides | GPL6244 Affymetrix Human Gene 1.0 ST Array |
| GSE89632 | 25581263 | Allard, JP | 2014 | Canada | 22–68 | N = 19/20 Necro-inflammatory Grading Systemb | N = 24 Live donor liver transplant, no steatosis or cirrhosis by imaging or histology | Sex, age, BMI, histology, biochemistry | GPL14951 Illumina expression beadchip |
| GSE105127 | 30297808 | Hampe, J | 2018 | Germany | 29–68 | N = 5/5 Kleiner NAFLD activity scorea (NAS) | N = 9 Scheduled liver resection, exclusion of liver malignancy or bariatric surgery | GPL16791 Illumina HiSeq 2500 | |
| GSE130970 | 31467298 | Sanyal, AJ | 2019 | US | N = 42/30 NASH Clinical Research Network scoring systemc | N = 6 Live donor liver transplant or patients with ALT fluctuations related biopsy | Sex, age, histology | GPL16791 Illumina HiSeq 2500 | |
| GSE48452 | 23931760 | Hempe, J | 2013 | Germany | 38–72 | N = 18/14 Kleiner NAFLD activity scorea (NAS) | N = 41 Exclusion of liver malignancy during major oncological surgery | Sex, age, BMI, histology, biochemistry | GPL11532 Affymetrix Human Gene 1.1 ST Array |
| GSE61260 | 25313081 | Hempe, J | 2014 | Germany | 20–86 | N = 24/23 Kleiner NAFLD activity scorea (NAS) | N = 62 Exclusion of liver malignancy during major oncological surgery | Sex, age, BMI | GPL11532 Affymetrix Human Gene 1.1 ST Array |
| GSE83452 | 28679947 | Stales, B | 2017 | Belgium | 20–74 | N = 126/0 NASH Clinical Research Network scoring systemc | N = 98 Obese + suspected NAFLD | Sex, age | GPL16686 Affymetrix Human Gene 2.0 ST Array |
The 12 datasets included in our analysis span multiple countries, age groups, diagnostic approaches, and technical variation in gene expression platforms. In the NASH/NAFLD and HC columns N indicates the number of samples in the relevant group.
aKleiner DE, Brunt EM 2005.
bBrunt EM et al. 1999.
cXnathakos S 2006.
Figure 1130-mRNA score robustly distinguishes NASH from NAFLD or HC. (A) Study design overview. (B,C) ROC curves for [NASH]vs[NAFLD + HC] signature in (B) 7 discovery datasets and (C) 5 independent validation datasets. ROCs for individual studies are shown in color, summary ROC is shown in black with 95% confidence interval. (D) Violin plot of the [NASH]vs[NAFLD + HC] 130-mRNA score in discovery and validation in each group, n indicates number of samples in each class. (E) FABP4 effects sizes across datasets, studies in bold are validation. (F) Performance (AUROCs) of all six possible signatures. Summary ROC performance and 95% CI are shown in solid symbol and line, smaller empty symbols show performance in individual studies. Triangles indicate discovery, and circles validation. Color coding same as in (A).
Number of samples used for and genes identified in the six possible gene signatures.
| Signature | Class | N samples (% class = 1) | N genes | ||
|---|---|---|---|---|---|
| 1 | 0 | Discovery | Validation | (Up + Down) | |
| [NASH]vs[NAFLD + HC] | NASH | NAFLD + HC | 309 (28%) | 503 (43%) | 130 (85 + 45) |
| [NASH]vs[HC] | NASH | HC | 217 (40%) | 431 (50%) | 173 (101 + 72) |
| [NASH]vs[NAFLD] | NASH | NAFLD | 160 (44%) | 161 (55%) | 170 (112 + 58) |
| [NASH + NAFLD]vs[HC] | NASH + NAFLD | HC | 309 (58%) | 539 (57%) | 50 (34 + 16) |
| [NAFLD]vs[HC] | NAFLD | HC | 206 (44%) | 226 (40%) | 55 (30 + 25) |
| [NAFLD]vs[NASH + HC] | NAFLD | NASH + HC | 276 (33%) | 315 (29%) | 41 (20 + 21) |
Class indicates the assignment to comparison groups in MetaIntegrator. HC healthy control.
Figure 2NASH signature gene composition. (A) ES of union of all genes (n = 428) in each signature. ES of genes that are not significant for the particular signature were coded as 0 (grey). (B) Each signature is represented by a pie chart. Number of genes in each signature is represented by pie size, colored part of the chart represents proportion of genes that are unique to the relevant signature. Thickness of the lines represent number of genes shared by each pair of signatures (as in legend). (C) Overlap of [NASH]vs[NAFLD + HC], [NASH]vs[NAFLD] and [NASH]vs[HC] signatures. Genes common to all three signatures (n = 21) are listed according to their direction of change (up or down) in [NASH]vs[NAFLD + HC] signature. (D) Violin plots of zscores of expression of 2 representative genes (FAT1 for over-expressed and SLC6A16 for under-expressed) listed in C. Groups are color coded based on classification (HN = healthy normal BMI, HO = healthy obese, HU = healthy unknown BMI, NAFLD and NASH). In analysis all healthy were considered as one group, regardless of BMI status. (E) Pathway enrichment analysis of [NASH]vs[NAFLD] signature.
Figure 3A parsimonious set of 19 genes retains the full performance of [NASH]vs[NAFLD + HC] signature. (A) Performance of 19-gene signature in discovery studies. (B) Performance of 19-gene signature in validation studies.