| Literature DB >> 36119721 |
Stanislav Listopad1, Christophe Magnan1, Aliya Asghar2, Andrew Stolz3, John A Tayek4, Zhang-Xu Liu3, Timothy R Morgan2, Trina M Norden-Krichmar1,5.
Abstract
Background & Aims: Liver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challenging. In this study, we implemented a machine learning pipeline for the identification of diagnostic gene expression biomarkers across several alcohol-associated and non-alcohol-associated liver diseases, using either liver tissue or blood-based samples.Entities:
Keywords: AC, alcohol-associated cirrhosis; AH, alcohol-associated hepatitis; AKR1B10, aldo-keto reductase family 1 member B10; BTM, blood transcription module; Classification; DE, differential expression; FPKM, fragments per kilobase of exon model per million reads mapped; GSEA, gene set-enrichment analysis; IG, information gain; IPA, Ingenuity Pathway Analysis; LR, logistic regression; LTCDS, liver tissue cell distribution system; LV, liver tissue; ML, machine learning; MMP, matrix metalloproteases; NAFLD, non-alcohol-associated fatty liver disease; PBMCs, peripheral blood mononuclear cells; RNA sequencing; RNA-seq, RNA sequencing; SCAHC, Southern California Alcoholic Hepatitis Consortium; SVM, support vector machine; TNF, tumor necrosis factor; alcohol-associated liver disease; biomarker discovery; kNN, k-nearest neighbors
Year: 2022 PMID: 36119721 PMCID: PMC9472076 DOI: 10.1016/j.jhepr.2022.100560
Source DB: PubMed Journal: JHEP Rep ISSN: 2589-5559
Study population demographics (PBMCs).
| PBMC samples | |||||
|---|---|---|---|---|---|
| AH | CT | AC | NF | HP | |
| (n = 38) | (n = 20) | (n = 40) | (n = 20) | (n = 19) | |
| Age, mean ± SD | 47.3 ± 11.5 | 35.9 ± 15.6 | 54.5 ± 9.7 | 52.2 ± 14.9 | 58.9 ± 7.4 |
| MELD, mean ± SD | 25 ± 3.8 | 7.3 ± 2.6 | 13.4 ± 5.8 | 8.9 ± 4 | 8.9 ± 2.8 |
| Maddrey’s DF, mean ± SD | 52.6 ± 20.7 | 2.4 ± 8.1 | 21.1 ± 19.1 | 7.7 ± 14.1 | 6.7 ± 7.1 |
| BMI, mean ± SD | 30 ± 6.2 | 27 ± 3.5 | 30.4 ± 5.1 | 36.5 ± 6 | 29.6 ± 5.9 |
| Sex, n (%) | |||||
| Female | 1 (2.6%) | 8 (40.0%) | 0 (0.0%) | 4 (20.0%) | 8 (42.1%) |
| Male | 37 (97.4%) | 12 (60.0%) | 40(100.0%) | 16 (80.0%) | 11 (57.9%) |
| Ethnicity, n (%) | |||||
| Hispanic | 25 (65.8%) | 8 (40.0%) | 25 (62.5%) | 9 (45.0%) | 10 (52.6%) |
| NHW | 10 (26.3%) | 0 (0.0%) | 13 (32.5%) | 7 (35.0%) | 4 (21.1%) |
| Black | 2 (5.3%) | 2 (10.0%) | 1 (2.5%) | 2 (10.0%) | 5 (26.3%) |
| Other | 1 (2.6%) | 10 (50.0%) | 1 (2.5%) | 2 (10.0%) | 0 (0.0%) |
| Source | SCAHC | SCAHC | SCAHC | SCAHC | SCAHC |
AC, alcohol-associated cirrhosis; AH, alcohol-associated hepatitis; CT, healthy controls; DF, discriminant function; HP, HCV infection; MELD, model for end-stage liver disease; NF, non-alcoholic fatty liver disease; NHW, non-Hispanic White; SCAHC, Southern California Alcoholic Hepatitis Consortium.
Study population demographics (Liver).
| Liver tissue samples | |||||
|---|---|---|---|---|---|
| AH | CT | AC | NF | HP | |
| (n = 32) | (n = 8) | (n = 8) | (n = 10) | (n = 9) | |
| Age, mean ± SD | 43.3 ± 11.3 | 55.4 ± 4.3 | 54.2 ± 6.9 | 56.8 ± 11.6 | 56.8 ± 7.6 |
| MELD, mean ± SD | 25.1 ± 5.7 | NA | NA | 28 ± 5.9 | 27.2 ± 7.5 |
| Maddrey’s DF, mean ± SD | 52.3 ± 22.1 | NA | NA | NA | NA |
| BMI, mean ± SD | 29.4 ± 5.9 | NA | NA | NA | NA |
| Sex, n (%) | |||||
| Female | 3 (9.4%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| Male | 29 (90.6%) | 7 (87.5%) | 5 (62.5%) | 10 (100.0%) | 9 (100.0%) |
| Ethnicity, n (%) | |||||
| Hispanic | 25 (78.1%) | NA | 0 (0.0%) | 0 (0.0%) | 1 (11.1%) |
| NHW | 5 (15.6%) | NA | 4 (50.0%) | 7 (70.0%) | 5 (55.5%) |
| Black | 1 (3.1%) | NA | 0 (0.0%) | 1 (10.0%) | 2 (22.2%) |
| Other | 1 (3.1%) | NA | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| Source | SCAHC | LTCDS | LTCDS | LTCDS | LTCDS |
The ethnicity and sex percentages may not add up to 100% due to missing data.
AC, alcohol-associated cirrhosis; AH, alcohol-associated hepatitis; CT, healthy controls; DF, discriminant function; HP, HCV infection; LTCDS, liver tissue cell distribution system; MELD, model for end-stage liver disease; NF, non-alcoholic fatty liver disease; NHW, non-Hispanic White; SCAHC, Southern California Alcoholic Hepatitis Consortium.
Missing age for 3 AC participants, MELD for 2 NF participants, and MELD for 4 HP participants.
Fig. 1Diagram outlining the flow of processes in the machine learning feature selection and classification pipeline.
ML, machine learning; RNA-seq, RNA sequencing.
Fig. 2Confusion matrices and RNA-seq count heatmaps corresponding to the best gene set of LV 2-Way dataset.
(A) Confusion matrix for classification of LV 2-Way dataset using best gene set. The diagonal contains the number and percentage of the correctly predicted samples. (B) Heatmap of best LV 2-Way gene set averaged per condition. (C) Per replicate heatmap of best LV 2-Way gene set. (D) Confusion matrix for classification of AH and CT samples within validation dataset. (E) Heatmap of best gene set within validation dataset averaged per condition. (F) Per replicate heatmap of best gene set within validation dataset. AH, alcohol-associated hepatitis; CT, healthy controls; LV, liver tissue; RNA-seq, RNA sequencing.
Fig. 3Confusion matrices and RNA-seq count heatmap corresponding to the best gene set of LV 3-Way dataset.
(A) Confusion matrix for classification of LV 3-Way dataset using best gene set identified by filter feature selection. (B) RNA-seq count heatmap of best LV 3-Way gene set averaged per condition. (C) Confusion matrix for classification of AH, AC, and CT samples within independent validation dataset. (D) RNA-seq count heatmap of best gene set within independent validation dataset (AH, AC, and CT) averaged per condition. AC, alcohol-associated cirrhosis; AH, alcohol-associated hepatitis; CT, healthy controls; LV, liver tissue; RNA-seq, RNA sequencing.
Fig. 4Confusion matrices and RNA-seq count heatmaps corresponding to the best gene set of LV 5-Way dataset.
(A) Confusion matrix for classification of LV 5-Way dataset using best gene set identified by filter feature selection. (B) RNA-seq count heatmap of best LV 5-Way gene set averaged per condition. (C) Confusion matrix for classification of AH, AC, and CT samples within independent validation dataset. (D) RNA-seq count heatmap of best gene set within independent validation dataset (AH, AC, and CT) averaged per condition. AC, alcohol-associated cirrhosis; AH, alcohol-associated hepatitis; CT, healthy controls; HP, chronic HCV infection; LV, liver tissue; NF, non-alcohol-associated fatty liver disease; RNA-seq, RNA sequencing.
Fig. 5Confusion matrices and RNA-seq count heatmaps corresponding to the best gene set of PBMC 5-Way dataset.
(A) Confusion matrix for classification of PBMC 5-Way dataset using best gene set identified by filter feature selection. (B) RNA-seq count heatmap of best PBMC 5-Way gene set averaged per condition. AC, alcohol-associated cirrhosis; AH, alcohol-associated hepatitis; CT, healthy controls; HP, chronic HCV infection; NF, non-alcohol-associated fatty liver disease; PBMC, peripheral blood mononuclear cells; RNA-seq, RNA sequencing.