Literature DB >> 36033523

Screening of co-pathogenic genes of non-alcoholic fatty liver disease and hepatocellular carcinoma.

Ting Chen1, Siwen Zhang2, Dongmei Zhou1, Peipei Lu3, Xianglai Mo3, Rashi Tamrakar1, Xi Yang3,4,5.   

Abstract

Background: Non-alcoholic fatty liver disease (NAFLD) is a risk factor for hepatocellular carcinoma (HCC). However, its carcinogenic mechanism is still unclear, looking for both diseases' transcriptome levels, the same changes as we are looking for NAFLD may provide a potential mechanism of action of HCC. Thus, our study aimed to discover the coexisting pathogenic genes of NAFLD and HCC.
Methods: We performed a variance analysis with public data for both diseases. At the same time, weighted gene correlation network analysis (WGCNA) was used to find highly correlated gene modules in both diseases. The darkturquoise gene module was found to be highly correlated with both diseases. Based on the diagnosis related module genes and the differential genes of the two diseases, we constructed diagnostic and prognostic models by logistic regression, univariate Cox regression, and LASSO regression. Public datasets verified the results. Meanwhile, we built a competing endogenous RNA (ceRNA) network based on the model genes and explored the related pathways and immune correlation involved in the two diseases by using Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, and gene set enrichment analyses. Immunohistochemistry was used to verify the different expression of ABCC5 and TUBG1 among the normal liver, NAFLD, and HCC tissues. Sodium palmitate/sodium oleate was used to establish high-fat cell models, and Real Time Quantitative Polymerase Chain Reaction (RT-qPCR) was used to verify the messenger RNA (mRNA) expression of ABCC5 in lipidization cells.
Results: A total of 26 upregulated genes and 87 downregulated genes were found using limma package identification analysis. According to WGCNA, the darkturquoise gene module was highly correlated with the prognosis of both diseases. The coexisting genes acquired by the two groups were only three central genes, that is, ABCC5, DHODH and TUBG1. The results indicated that the diagnostic and prognostic models constructed by ABCC5 and TUBG1 genes had high accuracy in both diseases. The results of immunohistochemistry showed that ABCC5 and TUBG1 were significantly overexpressed in NAFLD and HCC tissues compared with normal liver tissues. The Oil Red O staining and triglyceride identified the successful construction of HepG2 and LO2 high-fat models using PA/OA. The results of RT-qPCR showed that the lipidization of LO2 and HepG2 increased the mRNA expression of ABCC5. Conclusions: The gene model constructed by ABCC5 and TUBG1 has high sensibility and veracity in the diagnosis of NAFLD as well as the diagnosis and prognosis of HCC. ABCC5 and TUBG1 may play an important role in the development of NAFLD to HCC. In addition, lipidization could upregulate the mRNA expression of ABCC5 in HCC.
Copyright © 2022 Chen, Zhang, Zhou, Lu, Mo, Tamrakar and Yang.

Entities:  

Keywords:  ABCC5; TUBG1; bioinformatics; hepatocellular carcinoma; nonalcoholic fatty liver disease

Year:  2022        PMID: 36033523      PMCID: PMC9410624          DOI: 10.3389/fonc.2022.911808

Source DB:  PubMed          Journal:  Front Oncol        ISSN: 2234-943X            Impact factor:   5.738


Introduction

Cancers like hepatocellular carcinoma (HCC) have one of the highest mortalities in the world. Pathological metabolic diseases are the primary causes of HCC, such as hepatitis C or B and alcoholic and non-alcoholic fatty liver disease (NAFLD) (1). Due to the rapid increase in obesity and NAFLD patients, the prevalence of NAFLD-related HCC has increased accordingly (2). As the harmful effects of NAFLD can ultimately cause cirrhosis and even cancer, the prevalence of HCC has also started to rise suddenly. Based on some research findings (3), NAFLD can act synergistically with other HCC risk factors to accelerate cancer development. Over the past decade, an analysis of US health insurance databases found (4), according to the report, that 59% of HCC patients enrolled in medical plans are believed to have NAFLD as their primary cause. In a meta-analysis by Younossi et al. (5), all studies from 1989 to 2015 involving NAFLD were screened and estimated to determine point estimates [95% confidence intervals (CIs)] of HCC incidence, mortality, and incidence rates using a random-effects model. In NAFLD patients, HCC incidence was found to be 0.44/1,000 person-years (range: 0.29–0.66). A retrospective cohort study (6) evaluated the cancer risk in patients with NAFLD; it is also required to follow up the successive prevalence of HCC. Tumor biomarkers can assist medical researchers with early screening, diagnosis, treatment evaluation, recurrence, and prognosis prediction of tumors. With the prevalence of HCC and the estimated continued increase in this tumor (7), it is suggested that early monitoring and screening of tumor markers can reduce the mortality rate. In fact, in most patients, even after a complete resection or ablation of HCC tumors, the remaining carcinogenic tissue will progress to incurable terminal disease (8), thus, early detection and prevention of HCC development are considered to be the most effective strategies worldwide for improving patient outcomes. Up to now, the exact pathogenesis of NAFLD transition to HCC has not been explained. Identifying new marker genes can be used to predict risk and target treatment for individual patients. Hence, the prediction of candidate genes based on NAFLD-HCC pathogenesis is also an option. Based on bioinformatics analysis, earlier studies have established that telomerase reverse transcriptase (hTERT) (9), hypoxia-inducible transcription factor-2a (HIF-2α) (10), S100 calcium-binding protein A11(S100A11) (11), squalene epoxidase (SQLE) (12), and deoxycytidine kinase (DCK) (13) were associated with NAFLD to HCC progression and prognosis and could act as a related marker. Therefore, in an era of precision medicine, the diagnostic markers of NAFLD progressing to HCC need to be explored further. In the human genome, the ATP-binding cassette (ABC) transporter is one of the largest protein families. More than 48 genes encoding human ABC transporters have been identified and sequenced, and the largest member of the family is multidrug resistance-associated protein (MRP), including MRP1 (ABCC1), MRP5 (ABCC5), and MRP6 (ABCC6) (14). In addition to its role in chemotherapy-resistant tumors, as one of the major classes of membrane ATPase, it is also implicated in the treatment failure of HCC with a multidrug-resistant phenotype (15). Studies in NAFLD have shown that several ABC transporter family members, such as ABCA1 and ABCC5, are closely related to the progression of NAFLD (16). γ-Tubulin (TUBG1) is a member of the human tubulin family. There are five known tubulin subtypes in the human tubulin family based on the c-terminal region with the greatest variation: α-tubulin, β-tubulin, γ-tubulin, δ-tubulin, and ϵ-tubulin. Complex cortical malformations related to tubulin gene mutations, also known as tubulin diseases, are a heterogeneous group of diseases with a wide range of clinical severity (14). A family of GTP enzymes called tubulin is highly enriched in microtubules and centrosomes. This study aimed to identify candidate biomarkers for predicting NAFLD and HCC diagnoses by using the full public repository for Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases. We screened differentially expressed genes (DEGs) from public datasets of NAFLD and HCC and also screened overlapping genes between DEGs and the constructed weighted gene correlation network analysis (WGCNA); constructed NAFLD and HCC models, functional and pathway enrichment analyses, established ceRNA network, and protein–protein Interaction (PPI) network, respectively, and finally identified the hub gene. Therefore, this study hopes to provide new candidate diagnostic markers during the progression of NAFLD to HCC, offer the best screening strategy for high-risk patients, and advance the accuracy of HCC prevention.

Materials and methods

Data download and processing

Through the GEOquery package (17), NAFLD datasets GSE48452, GSE89632, and GSE37031 were downloaded from the GEO database. The GSE48452 dataset is from Homo sapiens, and the data platform is GPL11532, which has 73 samples in total, including 14 normal liver tissue control samples, 27 liver tissue samples from obese patients, and 32 NAFLD tissue samples (18).This study included 14 normal liver tissue control samples and 32 NAFLD tissue samples from the dataset. The GSE89632 dataset is from Homo sapiens, and the data platform is GPL14951, which contains 63 samples, including 24 normal liver tissue control samples and 39 NAFLD tissue samples. This study included all the samples (19). The GSE37031 dataset is from Homo sapiens, and the data platform is GPL14877, containing 15 samples, including 7 normal liver tissue control samples and 8 NAFLD tissue samples. In this study, all samples were included as a validation set (20). GSE48452 and GSE89632 datasets are combined, and batches are removed through the SVA package (21). The data were normalized and standardized through the limma package (22). Through the TCGAbiolinks package (23), the hepatoma dataset was downloaded from the TCGA database, TCGA-LIHC (Liver Hepatocellular Carcinoma, n=445 cases). The data type Count was selected and converted to the TPM format. HCC data were taken from the International Cancer Genome Consortium database, and TCGA-LIHC (n = 419 cases) was used as a validation set.

Model building and validation

For the construction of diagnostic models, the minimization of absolute contraction and selection operator LASSO regression are commonly used machine learning algorithms. In curve fitting, regularization is used to solve the overfitting and improve model accuracy. The glmnet package (24) was used to construct the model, and the parameter was set as: seed (2), family = “ binomial “. The prognostic model of HCC patients was developed using univariate and multivariate COX regression.

Variance analysis

The limma package was used for the gene difference analysis of GEO chip data in the normal group and disease group, and R package DESeq2 was used for the gene difference analysis of the normal group and disease group (25). LogFC >0.5 and the adjp value <0.01 were fixed as the threshold of differential genes. LogFC >0.5 was considered to be upregulated in the high-risk group, while logFC <0.5 was downregulated. The outcomes of difference analysis were represented by R package pheatmap heatmap and GGplot2 volcano map (26).

Analysis of enrichment (gene ontology/kyoto encyclopedia of genes and genomes/gene set enrichment analysis/GSVA/ssGSEA)

Gene Ontology (GO) is a common analysis for large-scale functional enrichment studies, such as biological processes (BPs), molecular functions (MFs), and cellular component (CC). The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a widely used database keeping information about genomes, biological pathways, diseases, and drugs. The Cluster Profiler R software package was used to analyze the GO annotation and KEGG pathway enrichment analysis of differential genes, and the critical value of False discovery rate(FDR) < 0.05 was considered statistically significant (27). The gene set enrichment analysis (GSEA) method analyzes whether a particular gene set is statistically different between two biological states and is commonly used to estimate changes in the pathway and biological process activities in expression dataset samples (27). For GSEA analysis, the “msigdb. v7.0. Symbols” gene set was downloaded from the MSigDB database. Furthermore, the enrichment fractions of related pathways in the MSigDB database were calculated according to the gene expression matrix of each sample using the Gene Set Variation Analysis (GSVA) method by R-packet (28). The limma package was used to screen the differences, and the relevant enrichment items with statistically significant differences were displayed by a heat map. Using the R-packet GSVA and single-sample Gene Set Enrichment Analysis (ssGSEA) methods combined with the 28 characteristic gene matrices of immune cells, we calculated the immune cell enrichment fraction of each sample.

Weighted gene correlation network analysis

The aim (29) is to identify coexisting gene modules, explore the relationship between gene networks and phenotypes, and study the core genes in the network. Soft threshold was calculated by pick soft threshold function, and 5 were considered to be the optimal soft threshold. Scale-free networks were subsequently constructed based on soft thresholds, topology matrices were created, and hierarchical clustering was performed. Considering 50 as the minimum number of genes in the module, the identification gene module was dynamically cut and eigengenes were calculated. Based on eigengenes, the correlation between modules was constructed, and hierarchical clustering was carried out. The modules with a correlation above 0.7 were merged again, and lastly, 13 modules were obtained. The correlation between modules and clinical features was understood through Pearson’s correlation analysis.

Construction of ceRNA network

CeRNA discloses the interaction mechanism among lncRNA, microRNA, and mRNA. The possible upstream miRNAs of TUBG1 and ABCC5 were searched through the mirTarbase database using multiMiR package (30). The StarBase database was used to download lncRNA–miRNA data (starBaseV3_hg19_CLIP-seq_LncRNA_all). Using pancancerNum >10 and clipExpNum>4 as criteria, lncRNAs are screened and intersected with those with significant differences in HCC data, and finally, four lncRNAs and 30 miRNAs are obtained. We completed the visual analysis using Cytoscape.

PPI network building (STRING)

The STRING database (https://cn.string-db.org/) searches known and predicted protein interactions across 2,031 species, comprising 9.6 million proteins and 1.38 million PPIs. It includes results obtained from experimental data, PubMed Abstracts in Chinese, and the synthesis of data from other databases, as well as results predicted using bioinformatics methods. We used the STRING database to construct the PPI network of genes with common differences between the two diseases and set the parameter as correlation coefficient 0.7. The results from PPI are exported from the STRING database and visualized through Cytoscape. Furthermore, the CytoHubba plug-in was used to analyze Hub genes in the PPI network.

Specimen collection

The study was approved by the Ethics Committee of the First Affiliated Hospital of Guangxi Medical University [NO.2022-KY-E-(115)]. NAFLD, HCC, and normal liver tissues were collected from tissue sections preserved in the Department of Pathology of our hospital. Informed consent was obtained.

Cell model construction and cell culture

HepG2 and LO2 were cultured with Dulbecco's modified eagle medium (DMEM) (Multicell) containing 1% penicillin–streptomycin and 10% fetal bovine serum (FBS; Gibco) in a moist cell incubator at 37°C and 5% CO2. High-fat LO2 cells and HepG2 cells (HF-LO2 group, HF-HepG2 group) were constructed by using sodium palmitate/sodium oleate (PA/OA) to simulate a high-free fatty acid environment. After the growth density of HepG2 and LO2 cells reached 70%–80% according to the manufacturer’s instructions, approximately 1 × 105 cells were inoculated in 6-well plates and incubated for 24 h. The NAFLD model was established by a complete culture medium containing 500-μM sodium oleate and 250-μM sodium palmitate for 24 and 48 h. A complete culture medium supplemented with solvent was used as the control group (Control-LO2 group, Control-HepG2 group). The lipid accumulation level was detected by oil red O staining and triglyceride (TG). The TG assay kit (Jiancheng Bioengineering Institute, Nanjing, China) was utilized for extracting the contents of TG using the GPO-PAP method. According to the instruction, the cell homogenate and triglyceride assay reagents were added to the 96-well plate, mixed, and incubated at 37°C for 10 min, and the optical density (OD value) of each well was measured with the microplate reader (546 nm). The calculation formula was as follows: TG content = (sample OD value − blank OD value)/(calibration sample OD value − blank OD value) * Calibration sample concentration/sample protein concentration.

RNA extraction and RT-QPCR

According to the manufacturer’s instructions, total RNA was isolated from cultured cells using a TRIzol reagent (TAKERA, Tokyo, Japan), and cDNA was obtained by reverse transcription using RT SuperMix for qPCR (Vazyme). The extracted cDNA was stored at -80°C. The target genes were amplified and detected using SYBR qPCR Master Mix (Vazyme, Nanjing, China) in the Applied Biosystems 7500 Fast real-time PCR system (ABI, Waltham, Massachusetts, USA). Primers for ABCC5 and Glyceraldehyde-3-Phosphate Dehydrogenase(GAPDH) are as follows: ABCC5-F (5’-ATC ATG GCT TGA GTG CTC TGA-3’) and ABCC5-R (5’-AGA CCA CAC GTC CAT TGA-3’); GAPDH-F (5’-AAT CAA GTG GGG CGA TGC TG-3’) and GAPDH-R (5’-GCA AAT GAG CCC CAG CCT TC-3’). The setting parameters were as follows: the denaturation temperature was 95°C for 15 min, the annealing temperature was 60°C for 1 min, and the extension was 40 cycles. The relative mRNA expression of ABCC5 was calculated by the 2-ΔΔCT method (GAPDH as a housekeeping gene).

Immunohistochemistry

Samples were prepared into paraffin sections, followed by dewaxing and hydration, and high-pressure and high-temperature antigen recovery in sodium citrate buffer (pH 6.0). Subsequently, it was incubated with endogenous peroxidase blockers and then incubated with a primary antibody for 12 h at 4°C. Sections were then rinsed with phosphate-buffered saline (PBS), followed by washing with an enzyme-coupled goat anti-rabbit/mouse IgG polymer for another 20 min and rinsed with PBS three times. Then, the sections were incubated with a DAB staining solution for 5 min. After washing with water, the paraffin sections were immersed in a hematoxylin- staining solution for 15 s and finally identified, rinsed, and returned to the laboratory for staining. The antibodies used included ABCC5 (Cat No. 19503-1-AP, 1:100; Proteintech, Wuhan, China) (31) and TUBG1 (Cat No. 15176-1-AP, 1:100; Proteintech, Wuhan, China) (32). Two pathologists observed the results. The number of positive cells was divided into five grades (0–4 score) according to the ratio (<5%, 5%–25%, 26%–50%, 51%–75%, 75%–100%), and the intensity of staining was divided into four grades (0–3 score). The group is obtained by multiplying the two fractions (from 0 to 12). A score of >4 was suggested to be positive.

Statistical analysis

We performed all data calculations and statistical analysis using R programming (https://www.r-project.org/, version 4.0.2). The statistical significance of the normally distributed variables was calculated using an independent Student’s t-test, and the differences between the non-normally distributed variables were analyzed using the Mann–Whitney U test (i.e., Wilcoxon rank-sum test). All statistical p-values were bilateral, and P <0.05 was considered statistically significant.

Results

Weighted gene correlation network analysis consensus module identification

NAFLD can ultimately cause HCC, but the mechanism by which hepatitis leads to HCC remains unclear. We assumed that the same gene modules expressed in both diseases might play a role in the copathogenesis of both diseases. NAFLD datasets GSE48452 and GSE89632 were downloaded from the GEO database and HCC dataset from the TCGA database. WGCNA analysis was used to identify gene modules that were expressed uniformly in both NAFLD and HCC. We removed large outliers with hierarchical clustering analysis ( ), scale-free networks and topological matrices were constructed by soft threshold ( ), and then topological overlapping matrices were scaled to make them comparable between two disease sets ( ). By extracting the minimum of parallel positions in two topological overlapping matrices [we now calculate the consensus topological overlap by taking the component-wise [“Parallel”)], the minimum of the topological matrices in individual sets acted as the consensus topological overlap matrix. Finally, we obtained 13 gene modules with similar expression in both diseases ( ). As clinical features, we explored modules related to the diagnosis of both diseases based on whether they were sick or not. The dark turquoise module had a high correlation and statistical significance with NAFLD and HCC among the 13 modules ( ). Then, we performed GO and KEGG function analysis on 58 genes in the dark turquoise module, and the results indicated that these genes were primarily involved in DNA-related enzyme activity, DNA shear, and the repair process ( ).
Figure 1

WGCNA. (A) The datasets of non-alcoholic fatty disease (NAFLD) and hepatocellular carcinoma (HCC) were based on the Euclidean distance clustering tree of samples. The red line of the dataset of NAFLD was the tangent line of outlier detection. Samples cut by this line are considered outliers. The tangents of the corresponding liver cancer samples are outside the tree, indicating that no samples from the HCC set are considered outliers. (B) Soft threshold power (x-axis) selection. NAFLD is approximately scale-free when the soft threshold is 7, while HCC is implemented at 20. With the increase of the soft threshold power, the comprehensive connectivity measure decreases sharply. Finally, we choose 7 as the soft threshold to construct the scale-free network. (C) Quantile–quantile plots of the dataset topological matrices (TOMs) for NAFLD and HCC. The black dots are TOMs before scaling, and the red dots are TOMs after scaling. After scaling, the two TOMs are comparable. (D) The gene tree was obtained by similarity clustering based on the overlap of consensus topology. As shown in the figure, there were 13 consensus gene modules in the two diseases, among which gray indicated unmatched genes.

Figure 2

Identification of diagnostic-related modules. (A) Correlation analysis was used to find the correlation between 13 gene modules in NAFLD and whether they were sick or not. Red indicated positive correlation, blue indicated negative correlation, and P<0.05 was considered statistically significant. (B) Correlation analysis was used to find the correlation between 13 gene modules in HCC and whether they were sick or not. Red indicated positive correlation, blue indicated negative correlation, and P<0.05 was considered statistically significant. (C) The module dark turquoise is associated with the diagnosis of both diseases. (D) Gene Ontology (GO) function analysis of the dark turquoise module gene. (E) Kyoto Encyclopedia of Genes and Genomes (KEGG) function analysis of dark turquoise module gene.

WGCNA. (A) The datasets of non-alcoholic fatty disease (NAFLD) and hepatocellular carcinoma (HCC) were based on the Euclidean distance clustering tree of samples. The red line of the dataset of NAFLD was the tangent line of outlier detection. Samples cut by this line are considered outliers. The tangents of the corresponding liver cancer samples are outside the tree, indicating that no samples from the HCC set are considered outliers. (B) Soft threshold power (x-axis) selection. NAFLD is approximately scale-free when the soft threshold is 7, while HCC is implemented at 20. With the increase of the soft threshold power, the comprehensive connectivity measure decreases sharply. Finally, we choose 7 as the soft threshold to construct the scale-free network. (C) Quantile–quantile plots of the dataset topological matrices (TOMs) for NAFLD and HCC. The black dots are TOMs before scaling, and the red dots are TOMs after scaling. After scaling, the two TOMs are comparable. (D) The gene tree was obtained by similarity clustering based on the overlap of consensus topology. As shown in the figure, there were 13 consensus gene modules in the two diseases, among which gray indicated unmatched genes. Identification of diagnostic-related modules. (A) Correlation analysis was used to find the correlation between 13 gene modules in NAFLD and whether they were sick or not. Red indicated positive correlation, blue indicated negative correlation, and P<0.05 was considered statistically significant. (B) Correlation analysis was used to find the correlation between 13 gene modules in HCC and whether they were sick or not. Red indicated positive correlation, blue indicated negative correlation, and P<0.05 was considered statistically significant. (C) The module dark turquoise is associated with the diagnosis of both diseases. (D) Gene Ontology (GO) function analysis of the dark turquoise module gene. (E) Kyoto Encyclopedia of Genes and Genomes (KEGG) function analysis of dark turquoise module gene.

Differences in gene

We performed difference analysis on the two datasets to understand the correlation of transcriptional level changes in the two diseases. Using principal component analysis (PCA), we found some differences in expression profiles between the NAFLD group and the normal group. We used the limma package for difference analysis to obtain DEGs in the two data groups, including 181 upregulated genes and 232 downregulated genes, which were visualized by the volcanic diagram and heat maps ( ). Significant differences were found between HCC and the control group gene expression profiles. We used the Deseq2 package to analyze the differences of genes in different groups, comprising 2,490 upregulated genes and 1,077 downregulated genes, which were also used and visualized by the volcano map and heat map ( ).
Figure 3

Variance analysis. (A) Volcanic diagram shows the differentially expressed genes (DEGs) of the NAFLD gene dataset, the horizontal axis is logFC value, the y-axis is -log10 (adjusted p-value), and the dotted line in the figure representslogFC> 0.5, adj-P < 0.05). Red represents upregulated genes, and blue represents downregulated genes. The first five gene labels are displayed in the adjusted p-value arrangement. (B) PCA principal component analysis of the NAFLD dataset: light green represents the samples of NAFLD, and red represents the samples of normal liver tissue (normal). (C) Heat map of differential genes. Red represents the sample of NAFLD, and dark blue represents the sample of normal liver tissue (normal). The top 20 genes are, respectively, visualized in adjusted p-value arrangement (yellow indicates high expression, and blue indicates low expression). (D) Volcanic diagram shows the DEGs of the HCC gene dataset; the horizontal axis is logFC value; the y-axis is -log10 (adjusted p-value), and the dotted line in the figure representslogFC> 1, adj. P < 0.05). Red represents upregulated genes, and blue represents downregulated genes. The first five gene labels are displayed in the adjusted p-value arrangement. (E) Principal component analysis (PCA) of HCC datasets: light green represents HCC tissue samples (cancer), and red represents normal liver tissue samples (normal). (F) Heat map of differential genes. Red represents HCC tissue samples (cancer), and dark blue represents normal liver tissue samples (normal). The top 20 genes are, respectively, visualized by adjusted p-value arrangement (red indicates high expression, and dark green indicates low expression).

Variance analysis. (A) Volcanic diagram shows the differentially expressed genes (DEGs) of the NAFLD gene dataset, the horizontal axis is logFC value, the y-axis is -log10 (adjusted p-value), and the dotted line in the figure representslogFC> 0.5, adj-P < 0.05). Red represents upregulated genes, and blue represents downregulated genes. The first five gene labels are displayed in the adjusted p-value arrangement. (B) PCA principal component analysis of the NAFLD dataset: light green represents the samples of NAFLD, and red represents the samples of normal liver tissue (normal). (C) Heat map of differential genes. Red represents the sample of NAFLD, and dark blue represents the sample of normal liver tissue (normal). The top 20 genes are, respectively, visualized in adjusted p-value arrangement (yellow indicates high expression, and blue indicates low expression). (D) Volcanic diagram shows the DEGs of the HCC gene dataset; the horizontal axis is logFC value; the y-axis is -log10 (adjusted p-value), and the dotted line in the figure representslogFC> 1, adj. P < 0.05). Red represents upregulated genes, and blue represents downregulated genes. The first five gene labels are displayed in the adjusted p-value arrangement. (E) Principal component analysis (PCA) of HCC datasets: light green represents HCC tissue samples (cancer), and red represents normal liver tissue samples (normal). (F) Heat map of differential genes. Red represents HCC tissue samples (cancer), and dark blue represents normal liver tissue samples (normal). The top 20 genes are, respectively, visualized by adjusted p-value arrangement (red indicates high expression, and dark green indicates low expression). Subsequently, we searched for both upregulated and downregulated genes in the two diseases and demonstrated them through the Venn diagram. The results showed that 27 genes were upregulated in the two diseases compared with the control group, while 83 genes were downregulated in the two diseases compared with the control group ( ). We performed the GO function analysis of genes to further understand the biological function of these genes with same trend. The results indicated that these genes were mainly involved in cell chemotaxis, neuroresponse, response to lipopolysaccharide, response to the molecule of bacterial origin, response to steroid hormone; a variety of lipid-related functional items are involved, and then we used the mesh diagram to visualize the main items ( ; ). Then, we used GSEA enrichment analysis to enrich the pathway-related dataset in MSigDb based on the above gene list. The results indicated that it was associated with multiple immune, HCC, and cancer-related pathways ( ; ). Consequently, we constructed the PPI network through STRING (correlation coefficient 0.7) and visualized it using Cytoscape ( ). The top three genes interacting with other proteins were EGR1, FOSB, and CCL2.
Figure 4

Common differential genes. (A) Venn diagram shows the coupregulated and codownregulated genes in the two diseases. (B) Network diagram showing the GO function analysis of common differential genes. (C) Gene set enrichment analysis (GSEA) of common differential genes: the logFC value was used to arrange genes, and the GSEA algorithm was used to calculate the first two enrichment items and the last two enrichment items in descending order. (D) GSEA analysis: the top 10 enrichment pathway were visualized in the bubble diagram, arranged with the descending order of the absolute value of enrichment fraction (NES). (E) PPI network construction, from red to blue; continuous color change represents the logFC value from high to low.

Common differential genes. (A) Venn diagram shows the coupregulated and codownregulated genes in the two diseases. (B) Network diagram showing the GO function analysis of common differential genes. (C) Gene set enrichment analysis (GSEA) of common differential genes: the logFC value was used to arrange genes, and the GSEA algorithm was used to calculate the first two enrichment items and the last two enrichment items in descending order. (D) GSEA analysis: the top 10 enrichment pathway were visualized in the bubble diagram, arranged with the descending order of the absolute value of enrichment fraction (NES). (E) PPI network construction, from red to blue; continuous color change represents the logFC value from high to low.

To construct a diagnostic model of non-alcoholic fatty liver disease

The consensus module genes related to the diagnosis of the two diseases obtained by WGCNA were intersected with the different genes (DEG) with the same change trend in the two diseases to find efficient diagnostic and therapeutic markers ( ), suggesting three coexisting genes (ABCC5, DHODH, and TUBG1). After univariate and multivariate logistic regression analyses, ABCC5 and TUBG1 genes were found to be independent risk factors for the diagnosis of NAFLD and HCC (OR>1; P <0.05), while DHODH was an independent protective factor in HCC (OR<1; P < 0, 05) ( , ). These findings propose that ABCC5 and TUBG1 may have certain roles in the occurrence and development of the two diseases. We visualized the expression of the two genes by a heat map in the group of NAFLD ( ). The above three genes were reverified using minimal absolute contraction and selection operator LASSO regression. The results were in line with multivariate logistic regression. Two genes, ABCC5 and TUBG1, were involved in constructing a diagnostic model of NAFLD ( ). Its correlation coefficient was used to calculate the risk values of all samples, and a box plot was used to visualize the difference of risk values between the NAFLD group and the normal group, indicating that the risk values of NAFLD were significantly higher ( ). According to the ROC curve results, the diagnostic model constructed by ABCC5 and TUBG1 genes had very good diagnostic efficiency (AUC = 0.893) ( ). As further validation of our results, we downloaded the NAFLD dataset GSE37031 through the GEO database. The results showed that the risk value of the NAFLD group was significantly higher, and the TUBG1 expression level was also higher in the NAFLD group (P<0.01) ( ). While the box plot indicated that the ABCC5 expression level was high in the NAFLD group (P = 0.064), the p-value was insignificant as well as the AUC value of the ROC curve was 1, which may result from the small sample size. ( , ). Results from these studies suggest that the high expression of ABCC5 and TUBG1 is an independent risk factor for NAFLD and can be used as a diagnostic indicator.
Figure 5

Establishment of a diagnostic model of NAFLD. (A) Venn diagram shows the coexisting genes (ABCC5, DHODH, TUBG1) of the diagnosis-related gene module darkturquoise in WGCNA and common differential genes. (B) Visualizing the expression profile of ABCC5 and TUBG1 in NAFLD by a heat map. (C) LASSO model tuning parameters were selected through 10-fold cross-validation. (D) LASSO coefficient spectra of diagnosis-relevant gene sets with dotted lines representing values selected by 10 cross-validations. A diagnostic model containing two genes (ABCC5 and TUBG1) was obtained. (E) Box plot presenting the difference in the risk score between the NAFLD group and the normal group. (F) ROC curve confirms that the diagnostic model has high diagnostic efficiency (AUC = 0.893). (G) Based on the new dataset (GSE37031), the risk score was calculated according to the model; the risk value in the NAFLD group was significantly higher than that in the normal group. (H) The expression level of TUBG1 in the GSE37031 dataset visualized by box plot in the NAFLD group and normal liver tissue sample group (P<0.05 is considered statistically significant). (I) Box plot visualization of ABCC5 expression in the GSE37031 dataset between the NAFLD group and the normal group.

Table 1

Univariate and multivariate logistic regression (non-alcoholic fatty liver disease).

Univariate analysisMultivariate analysis
OR (95% CI)P valueOR (95% CI)P-value
ABCC518.133 (5.445–60.391)<0.0017.188 (1.900–27.196)0.004
DHODH0.195 (0.090–0.426)<0.001NA
TUBG163.821 (13.06–311.663)<0.00132.515 (5.976–176.904)0
Table 2

Univariate and multivariate logistic regression (The Cancer Genome Atlas).

Univariate analysisMultivariate analysis
OR (95% CI)P-valueOR (95% CI)P-value
ABCC540.238 (15.785–102.571)<0.0015.514 (1.884–16.135)0.002
DHODH0.245 (0.167–0.358)<0.0010.429 (0.247– 0.744)0.003
TUBG144.468 (16.669–118.632)<0.00113.254 (4.369–40.214)0
Establishment of a diagnostic model of NAFLD. (A) Venn diagram shows the coexisting genes (ABCC5, DHODH, TUBG1) of the diagnosis-related gene module darkturquoise in WGCNA and common differential genes. (B) Visualizing the expression profile of ABCC5 and TUBG1 in NAFLD by a heat map. (C) LASSO model tuning parameters were selected through 10-fold cross-validation. (D) LASSO coefficient spectra of diagnosis-relevant gene sets with dotted lines representing values selected by 10 cross-validations. A diagnostic model containing two genes (ABCC5 and TUBG1) was obtained. (E) Box plot presenting the difference in the risk score between the NAFLD group and the normal group. (F) ROC curve confirms that the diagnostic model has high diagnostic efficiency (AUC = 0.893). (G) Based on the new dataset (GSE37031), the risk score was calculated according to the model; the risk value in the NAFLD group was significantly higher than that in the normal group. (H) The expression level of TUBG1 in the GSE37031 dataset visualized by box plot in the NAFLD group and normal liver tissue sample group (P<0.05 is considered statistically significant). (I) Box plot visualization of ABCC5 expression in the GSE37031 dataset between the NAFLD group and the normal group. Univariate and multivariate logistic regression (non-alcoholic fatty liver disease). Univariate and multivariate logistic regression (The Cancer Genome Atlas).

ABCC5 and TUBG1 are independent risk factors for diagnosis of hepatocellular carcinoma

By performing univariate and multivariate logistic regression analysis on three coexisting genes (ABCC5, DHODH, and TUBG1) in TCGA HCC dataset, it was found that ABCC5 and TUBG1 were independent risk factors for HCC (OR>1; P <0.05), whereas DHODH was an independent protective factor in HCC (OR<1; < 0, 05) ( ). To further confirm our conclusions, we used HCC dataset (LIRI-JP) from the ICGC database for validation and found ABCC5 and TUBG1 genes were independent risk factors for HCC (OR>1; P <0,05), while DHODH was not ( ). These outcomes showed that ABCC5 and TUBG1 were independent risk factors for NAFLD and HCC, suggesting that the two genes possessed potential diagnostic value in both diseases.
Table 3

Univariate and multivariate logistic regression (ICGC).

Univariate analysisMultivariate analysis
OR (95% CI)P-valueOR (95% CI)P value
ABCC510.444 (6.676–16.339)<0.0012.109 (1.197- 3.715)0.01
DHODH0.353 (0.281–0.445)<0.001NA
TUBG150.090 (24.340–103.081)<0.00126.418 (12.231–57.063)0
Univariate and multivariate logistic regression (ICGC).

Construction of prognostic model in hepatocellular carcinoma

Univariate and multivariate Cox regression analyses were performed to explore whether three coexisting genes (ABCC5, DHODH, and TUBG1) were associated with the prognosis of HCC, and the results showed that ABCC5 and TUBG1 were independent risk factors for the prognosis of HCC (HR>1, P<0.05) ( ). Subsequently, we calculated the risk value of each sample based on the correlation coefficient. There were more deaths, shorter survival time, and higher expression levels of ABCC5 and TUBG1 in the high-risk group ( ). Afterward, we verified our prognostic model using ROC curves with survival time nodes of 1, 2, and 3 years. The results presented the AUC value of 1-year, 2-year, and 3-year ROC curves to be 0.80, 0.68, and 0.68, respectively ( ). As shown in , the risk value was higher in the death group (P<0.05). Using the nomogram, we found that when combined with age, gender, stage, and the depth of invasion, the risk score significantly contributed to survival risk ( ). We performed a multivariate Cox analysis in combination with clinical features to determine the influence of clinicopathological features on riskScore. The findings indicated that the predictive model’s riskScore remained an independent risk factor in combination with other clinical features (HR = 2.225 95%, CI [1.716–2.885]; P < 0.001) ( ). Next, we tested the predictive accuracy of the model using the calibration curve, and the results showed that the prediction accuracy of 1-year, 2-year and 3-year survival rates was very high ( ). The survival curve showed that both ABCC5 and TUBG1 were significantly correlated with poor prognosis in HCC and the survival rate of patients in the high-risk group was significantly lower than that in the low-risk group ( ). So as to further verify our prognostic model, HCC-related data (LIRI-JP) from the ICGC database were used for confirmation. The ABCC5 and TUBG1 expressions were significantly increased in the HCC group. In the meantime, based on the model, it was determined that the HCC group had a significantly higher risk score than the normal group ( ). Then, we verified our prognostic model by using ROC curves with survival time nodes of 1, 2, and 3 years. The results disclosed that the AUC of 1-year, 2-year and 3-year ROC curves was 0.70, 0.66, and 0.74, respectively ( ). Meanwhile, the survival curve showed that high risk score was significantly correlated with poorPrognosis in LIRI-JP ( ).
Figure 6

Construction of hepatocellular carcinoma prognostic model. (A) Univariate Cox analysis suggested that ABCC5 and TUBG1 were correlated with the prognosis of HCC. Multivariate Cox analysis suggested that ABCC5 and TUBG1 were independent risk factors (red represents HR>1; dark blue represents HR<1, P<0.05 is statistically significant). (B) Both ABCC5 and TUBG were upregulated in the high-risk group (red represents high-risk group; dark blue represents low-risk group). (C) The risk value was calculated by multivariate Cox regression. With the increase of the risk value, the survival time decreased, and the number of patients who died increased. (D) With risk score as the observed value, the time ROC curve confirmed that it had good diagnostic value at 1-year, 2-year, and 3-year survival nodes (the AUC of a 1-year ROC curve was 0.8, the 2-year ROC curve was 0.68, and the 3-year ROC curve was 0.68). (E) The box plot shows that the death group has higher risk scores (0 represents alive group; 1 represents dead group).

Figure 7

Risk model testing. (A) Nomogram recommends that risk score would contribute the most than other clinical characteristics (age, gender, stage, and the depth of invasion). (B) Multivariate Cox regression analysis of the risk score combined with clinical characteristics (age, gender, stage, and the depth of invasion) suggested that the risk score was still an independent risk factor. (C) The results of 1-year, 2-year, and 3-year calibration curves show that the prediction accuracy of risk models is high. (D) The survival curve suggested that patients in both ABCC5 and TUBG1 high-expression groups had a poor prognosis (red represents a high-expression group; yellow represents low-expression group). (E) The survival curve suggested that patients in the high-risk group had a poor prognosis (red represents a high-risk group; yellow represents low-risk group, P<0.05 is statistically significant).

Figure 8

Risk model was tested with the ICGC-HCC dataset. (A) The box plot shows that the expression of ABCC5 and TUBG in the ICGC-HCC dataset weas significantly higher in the HCC group. (B) The box plot shows that in the ICGC-HCC dataset, the risk score in the HCC group was significantly higher than that in the normal group. (C) With the risk score as the observed value, the timeROC curve confirmed that our prognostic model still had a high diagnostic value (the AUC of the 1-year ROC curve was 0.7, the 2-year ROC curve was 0.66, and the 3-year ROC curve was 0.74). (D) The survival curve suggested that patients in the high-risk group had a poor prognosis (red represents high-risk group; yellow represents low-risk group, P<0.05 is statistically significant).

Construction of hepatocellular carcinoma prognostic model. (A) Univariate Cox analysis suggested that ABCC5 and TUBG1 were correlated with the prognosis of HCC. Multivariate Cox analysis suggested that ABCC5 and TUBG1 were independent risk factors (red represents HR>1; dark blue represents HR<1, P<0.05 is statistically significant). (B) Both ABCC5 and TUBG were upregulated in the high-risk group (red represents high-risk group; dark blue represents low-risk group). (C) The risk value was calculated by multivariate Cox regression. With the increase of the risk value, the survival time decreased, and the number of patients who died increased. (D) With risk score as the observed value, the time ROC curve confirmed that it had good diagnostic value at 1-year, 2-year, and 3-year survival nodes (the AUC of a 1-year ROC curve was 0.8, the 2-year ROC curve was 0.68, and the 3-year ROC curve was 0.68). (E) The box plot shows that the death group has higher risk scores (0 represents alive group; 1 represents dead group). Risk model testing. (A) Nomogram recommends that risk score would contribute the most than other clinical characteristics (age, gender, stage, and the depth of invasion). (B) Multivariate Cox regression analysis of the risk score combined with clinical characteristics (age, gender, stage, and the depth of invasion) suggested that the risk score was still an independent risk factor. (C) The results of 1-year, 2-year, and 3-year calibration curves show that the prediction accuracy of risk models is high. (D) The survival curve suggested that patients in both ABCC5 and TUBG1 high-expression groups had a poor prognosis (red represents a high-expression group; yellow represents low-expression group). (E) The survival curve suggested that patients in the high-risk group had a poor prognosis (red represents a high-risk group; yellow represents low-risk group, P<0.05 is statistically significant). Risk model was tested with the ICGC-HCC dataset. (A) The box plot shows that the expression of ABCC5 and TUBG in the ICGC-HCC dataset weas significantly higher in the HCC group. (B) The box plot shows that in the ICGC-HCC dataset, the risk score in the HCC group was significantly higher than that in the normal group. (C) With the risk score as the observed value, the timeROC curve confirmed that our prognostic model still had a high diagnostic value (the AUC of the 1-year ROC curve was 0.7, the 2-year ROC curve was 0.66, and the 3-year ROC curve was 0.74). (D) The survival curve suggested that patients in the high-risk group had a poor prognosis (red represents high-risk group; yellow represents low-risk group, P<0.05 is statistically significant).

Changes of immune microenvironment and construction of ceRNA network

We used the GSVA algorithm to analyze pathway-related datasets in MSigDb in the two diseases and the limma package for difference analysis. LogFC = 0.5, ADj. P<0.05 were used as thresholds to screen related pathways with significant differences between the disease and normal groups. The results showed 46 significantly upregulated and 122 significantly downregulated pathways in NAFLD and 1,418 significantly upregulated and 1,669 significantly downregulated pathways in HCC. We visualized the top 10 pathways with significant changes using heat maps ( ) and then obtained both upregulated and downregulated pathways in the two diseases ( , ). The outcomes showed that NAFLD and HCC had different fat metabolism, immunity, tumor, and other related pathways as compared to the normal group. These pathways may disclose the principal mechanisms by which NAFLD progresses to HCC. We conducted enrichment analysis of 28 immune cell components in the two datasets ssGSEA to understand the immune infiltration of the two diseases and found that the immune microenvironment of the two diseases had significantly changed. Furthermore, effector memory CD4 T cells, eosinophils, mast cells, neutrophils, and type 1 T helper cells showed significant differences and the same trend in the high- and low-risk groups of the two diseases ( ). CeRNA discloses the interaction mechanism among lncRNA, microRNA, and mRNA. The upstream miRNAs and lncRNAs of ABCC5 and TUBG1 were predicted through the ceRNA network, and visualization analysis was executed using Cytoscape ( ). According to the ceRNA mechanism, lncRNAs can upregulate the expression of their target mRNA. Hence, we used the upregulated lncRNAs expressed in HCC data based on TCGA and intersected with the predicted lncRNAs to find four lncRNAs with high confidence and consistent expression.
Figure 9

Immune infiltration and ceRNA. (A) Heat map showing pathways that differ significantly between the normal and NAFLD groups (red for NAFLD group and dark blue for normal group). (B) Heat map shows pathways with significant differences between the normal group and the HCC group (red represents the cancer group; dark blue represents the normal group). (C) Venn diagram shows coupregulated and codownregulated pathways in both diseases. (D) The upper box plot shows the immune infiltration of 28 cells in the high- and low-risk group of NAFLD, and the lower box diagram shows the immune infiltration of 28 cells in in the high- and low-risk group of HCC. (E) ABCC5 and TUBG1-related ceRNA networks were constructed. Red represents mRNA, purple represents microRNA, and yellow represents lncRNA.

Table 4

GSVA: both upregulated and downregulated pathways occur in both diseases.

GSVA
UPGO_LONG_CHAIN_FATTY_ACYL_COA_BINDING
UPBIOCARTA_TCAPOPTOSIS_PATHWAY
UPGO_WATER_SOLUBLE_VITAMIN_BIOSYNTHETIC_PROCESS
DOWNBIOCARTA_GHRELIN_PATHWAY
DOWNTIAN_TNF_SIGNALING_NOT_VIA_NFKB
DOWNAMIT_SERUM_RESPONSE_40_MCF10A
DOWNPHONG_TNF_TARGETS_UP
DOWNGRAHAM_CML_QUIESCENT_VS_CML_DIVIDING_UP
DOWNAMIT_EGF_RESPONSE_20_HELA
DOWNGO_POSITIVE_REGULATION_OF_CARDIOBLAST_DIFFERENTIATION
DOWNGO_NEGATIVE_REGULATION_OF_PLASMINOGEN_ACTIVATION
DOWNFUNG_IL2_TARGETS_WITH_STAT5_BINDING_SITES_T1
DOWNGO_POSITIVE_REGULATION_OF_SARCOMERE_ORGANIZATION
DOWNREACTOME_RUNX1_REGULATES_TRANSCRIPTION_OF_GENES_INVOLVED_IN_INTERLEUKIN_SIGNALING
DOWNGO_SNORNA_LOCALIZATION
Immune infiltration and ceRNA. (A) Heat map showing pathways that differ significantly between the normal and NAFLD groups (red for NAFLD group and dark blue for normal group). (B) Heat map shows pathways with significant differences between the normal group and the HCC group (red represents the cancer group; dark blue represents the normal group). (C) Venn diagram shows coupregulated and codownregulated pathways in both diseases. (D) The upper box plot shows the immune infiltration of 28 cells in the high- and low-risk group of NAFLD, and the lower box diagram shows the immune infiltration of 28 cells in in the high- and low-risk group of HCC. (E) ABCC5 and TUBG1-related ceRNA networks were constructed. Red represents mRNA, purple represents microRNA, and yellow represents lncRNA. GSVA: both upregulated and downregulated pathways occur in both diseases.

Expression of ABCC5 and TUBG1 at tissue and cell levels

Subsequently, the expression of ABCC5 and TUBG1 in normal liver tissue, NAFLD tissue, and HCC tissue was detected by immunohistochemistry. The results showed that compared with normal liver tissue, the protein expression levels of ABCC5 and TUBG1 in NAFLD and HCC tissues were significantly increased (P <0.05) ( ).
Figure 10

Lipidization increased the expression of ABCC5 in LO2 and HepG2. (A) The expression of ABCC5 in human normal liver tissue, NAFLD tissues and HCC tissues (magnification ×200, scale bars 100 μm; magnification ×400, scale bars 50 μm). (B) The expression of TUBG1 in human normal liver tissue, NAFLD tissues, and HCC tissues (magnification ×200, scale bars 100 μm; magnification ×400; scale bars 50 μm). (C) The Oil Red O staining of LO2 cells treated with or without sodium palmitate/sodium oleate (PA/OA) for 24 h (magnification ×200, scale bars 100 μm). (D) The Oil Red O staining of HepG2 cells treated with or without PA/OA for 48 h (magnification ×200, scale bars 100 μm). (E) TG content in HepG2 cells and LO2 cells treated with PA/OA for 24 and 48 h. (F) The mRNA expression of ABCC5 examined by RT-qPCR. ([HF] represents high-fat group) (P < 0.05 *, P <0.01 **, P <0.001 ***, P <0.0001 ****).

Lipidization increased the expression of ABCC5 in LO2 and HepG2. (A) The expression of ABCC5 in human normal liver tissue, NAFLD tissues and HCC tissues (magnification ×200, scale bars 100 μm; magnification ×400, scale bars 50 μm). (B) The expression of TUBG1 in human normal liver tissue, NAFLD tissues, and HCC tissues (magnification ×200, scale bars 100 μm; magnification ×400; scale bars 50 μm). (C) The Oil Red O staining of LO2 cells treated with or without sodium palmitate/sodium oleate (PA/OA) for 24 h (magnification ×200, scale bars 100 μm). (D) The Oil Red O staining of HepG2 cells treated with or without PA/OA for 48 h (magnification ×200, scale bars 100 μm). (E) TG content in HepG2 cells and LO2 cells treated with PA/OA for 24 and 48 h. (F) The mRNA expression of ABCC5 examined by RT-qPCR. ([HF] represents high-fat group) (P < 0.05 *, P <0.01 **, P <0.001 ***, P <0.0001 ****). In order to further explore the expression of ABCC5 and TUBG1 in NAFLD and HCC. We simulated a high-fatty-acid environment by mixing PA and OA (33). HF-LO2 group and HF-HepG2 group models were constructed (34–36). Oil red O staining results showed that OA significantly increased lipid accumulation in HepG2 cells and LO2 cells compared with the control group ( ). The results of the quantitative analysis of triglyceride were consistent with the above results, and the triglyceride content of cells in HF-LO2 and HF-HepG2 groups was significantly higher than that in the corresponding control group ( ). The above results indicated that the adipose cell model was successfully constructed. The transcriptional levels of ABCC5 and TUBG1 in HepG2 cells and LO2 cells were detected by RT-QPCR. The results showed that ABCC5 expression was significantly increased in HepG2 cells compared with LO2 cells (P <0.05). Similarly, RT-QPCR results showed that ABCC5 expression was significantly increased in both the HF-LO2 group and the HF-HepG2 group after aliphication, and the increased level increased with the extension of aliphication time (P <0.05) ( ). However, the transcription level of TUBG1 was not significantly changed in HepG2 and LO2 cell lines and before and after adipose treatment.

Discussion

The annual global increase in the HCC incidence is inseparable from NAFLD, an independent risk factor. In the wake of increasing living standards, the incidence of NAFLD is increasing at an alarming speed. The prevalence of NAFLD is increasing at an alarming rate and has been considered an inducible cause of HCC (37), and non-alcoholic steatohepatitis (NASH), as a manifestation of NAFLD, can progress to cirrhosis or even HCC (38). Elucidating the mechanism of NAFLD to HCC will enable us to have the opportunity to take measures at the early stage, so as to achieve early prevention and reduce the incidence of HCC, which will be of great significance. Currently, some excellent studies have tried to explore the underlying mechanism of the progression of NAFLD to HCC. Dabin Liu (39) et al. found that SQLE (squalene epoxidase) may play an important role in the progression of NAFLD to HCC. The authors found that SQLE was abnormally upregulated in both NAFLD and HCC. In vitro and in vivo experiments have confirmed that abnormal expression of SQLE leads to a significant increase in intracellular cholesterol esters and induces oxidative stress. Increased intracellular cholesterol esters are associated with the growth and invasion of a variety of tumors, while oxidative stress can activate multiple downstream tumor-related pathways, such as the PI3K/AKT/mTOR signal pathway. Meanwhile, the authors also demonstrated that terbinafine may exert its antitumor effect by targeting SQLE. However, the pathogenesis of HCC induced by NAFLD still stays uncleared. Therefore, we explored the potential mechanism of NAFLD to HCC and found two key genes. In this study, 26 upregulated and 87 downregulated genes were obtained by limma package analysis using the NAFLD dataset (GSE48452, GSE89632, GSE37031) and HCC dataset TCGA-LIHC (n = 445 cases). By using the limma package to analyze the differences in the data of the two diseases, 26 coupregulated genes and 87 codownregulated genes were obtained. Then, we used 13 consensus modules screened by WGCNA from the two diseases, among which the darkturquoise module was highly correlated with the diagnosis of both diseases. The intersection of the dark turquoise module and common downregulated genes was obtained, and the coexpressed genes ABCC5, DHODH, and TUBG1, which were expressed uniformly in both diseases and were related to diagnosis, were finally obtained. After the establishment of diagnostic and prognostic models and the verification process of prediction ability, it was concluded that the expression patterns of ABCC5 and TUBG1 in NAFLD and HCC were consistent, which were common diagnostic markers and highly correlated with the prognosis of HCC. Both may be closely related to NAFLD-induced HCC and may become new biomarkers for the biological diagnosis and prognosis of HCC. Current studies indicate that ABCC5 plays an oncogene role in HCC. Borel et al. (40) studied the expression of 15 ABC transporters related to multidrug resistance in 19 samples of HCC patients (16 untreated and 3 treated with chemotherapy); 12 ABC transporters were found to be upregulated in HCC as compared to healthy livers, and one of them is ABCC5. Chen has found that (41)ABCC5 levels are significantly increased in HCC, and it is considered to be an independent risk factor for HCC progression and prognosis mainly because of immune cell infiltration. Due to its drug resistance characteristics, Huang et al. (42) proved that the regulation of the PI3K/AKT/NRF2 pathway activates and upregulates ABCC5 to promote the resistance of sorafenib (a first-line molecular targeted drug for advanced HCC) acquired in human HCC cells. This result suggests that ABCC5 is an important regulator and a promising therapeutic target of sorafenib resistance acquired in human HCC cells. ABCC5 has also been found to be able to regulate and reduce GPx4 consumption in sorafenib-resistant HCC cells to inhibit lipid peroxidation. Additionally, ABC transporters mainly mediate the transmembrane transport of cyclic nucleotides or nucleotide analogs, able to transport organic anionic compounds, including glucoside acids, and the sulfate conjugates of steroids (43). Further studies (44) have found that it actively transports bile acids made by cholesterol in liver cells. Lyu (45), Liu (46), and Akbulut (47) et al. further demonstrated that ABCA1 could mediate the transport of cholesterol and phospholipids from cells to high-density lipoprotein apolipoprotein, thereby affecting the transport of liver cholesterol and inducing fatty liver disease. Hardwick et al. (16) suggested that six ABC transporters, including ABCC5, ABCB1, and ABCG2, showed an increasing trend in the progression of NAFLD. In support of this theory, Cyranka et al. (48) demonstrated that ABCC5 gene knockout mice had reduced white and brown adipose tissue, and the most prominent metabolic phenotype was the reduction of total adipose mass, suggesting a new concept that ABCC5 protein expression plays a key role in mammalian energy metabolism. The results of our bioinformatics analysis are consistent with the above studies. ABCC5 is a diagnostic marker in both diseases and is associated with the prognosis of HCC. In addition, it was found for the first time in vitro that the expression level of ABCC5 was significantly increased in hepatocellular carcinoma after adipose transformation, and the increased level was positively correlated with the degree of adipose transformation. TUBG1 is a member of the human tubulin family and is involved in the development of various tumors. In the study of Hubert (49) and Hsu (50) et al. it is found the breast Cancer 1 protein (BRCA1), an inhibitor of breast and ovarian tumorigenesis, maintained a high fidelity of cell division after centrosome binding to TUBG1 produced a ubiquitination reaction. Furthermore, its association with tumor suppressor P53 and centrosome was explained by Morris (51) and Kanai (52) et al. It was also revealed by Hořejší et al. that high expression of TUBG1 antagonized its inhibitory effect on DNA damage (53). In a hospital-based case–control study (54), variations in TUBG1 were also associated with breast cancer risk. Blanco et al. (55) further explored that TUBG1 interacts with associated proteins in sporadic breast tumors to regulate the mammary epithelial polarization and affect patient survival. However, the relationship between TUBG1 and the progression of HCC and NAFLD patients is still unclear, and few studies have been conducted in this context. Through bioinformatics multiple analysis, we believe that this gene may be a common carcinogenic factor in the pathogenesis of NAFLD and HCC. Our immunohistochemical results were consistent with the results of the bioassay, and the TUBG1 expression level was significantly increased in both NAFLD and HCC tissues. However, there was no significant change in the transcription level after fatty treatment. We speculated that the increase of the TUBG1 protein level may not be caused by its own increase but rather due to changes in the microenvironment of liver cells under a high-lipid environment. We believe that TUBG1 is a very promising research site. Although we verified the above results by using a variety of methods, there are still limitations. The regulatory mechanism of ABCC5 and TUBG1 in the development of HCC in the context of NAFLD is not detailed yet, and whether the differences of the last two genes in different stages of NAFLD-HCC progression also needs further study and confirmation. Our results showed that ABCC5 and TUBG1 were increased in both NAFLD and HCC and were independent risk factors. The diagnostic model of NAFLD and the prognostic model of HCC jointly constructed by ABCC5 and TUBG1 had high predictive ability. Both may be closely related to NAfLD-HCC. We found for the first time that a high-fat environment promotes the expression of ABCC5 in HCC, further elucidating the carcinogenic mechanism of ABCC5 in HCC and its possible important role in the pathogenesis of NAFLD-HCC. Furthermore, we also explored the immune cells with the same changing trend of the immune microenvironment in NAFLD and HCC, as well as the common up–downregulation pathway, providing new clues for exploring the pathogenesis of NAFLD-HCC. Simultaneously, referring to the studies that established that lncRNA- D16366 (56)and circ_0067934 (57) could be used as novel diagnostic and prognostic markers of HCC, the upstream miRNAs and lncRNAs of ABCC5 and TUBG1 were predicted through the ceRNA network. In conclusion, ABCC5 and TUBG1 are likely to become copathogenic factors of NAFLD and HCC, opening a new window of opportunity for individualized and precise diagnosis, prevention, and treatment of NAFLD.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ .

Author contributions

TC, XY, and SZ contributed to the study conception and design. Material preparation, data collection, and analysis were performed by TC, SZ, DZ, PL, XM, and RT. The first draft of the manuscript was written by TC, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. XY contributed to check this manuscript. All authors contributed to the article and approved the submitted version.

Funding

This work was supported in part by the Science and Technology Plan Project of Qingxiu District, Nanning City [grant no. 2019024], ‘Medical Excellence Award’ Funded by the Creative Research Development Grant from the First Affiliated Hospital of Guangxi Medical University [grant no. 2017025], The basic ability improvement project for young and middle-aged teachers in Guangxi universities [grant no. 2017KY0100], and Medical and Health Appropriate Technology Development, Promotion and Application Project of Guangxi [grant no. 2016018].

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  56 in total

1.  clusterProfiler: an R package for comparing biological themes among gene clusters.

Authors:  Guangchuang Yu; Li-Gen Wang; Yanyan Han; Qing-Yu He
Journal:  OMICS       Date:  2012-03-28

2.  Cdk1 and BRCA1 target γ-tubulin to microtubule domains.

Authors:  Thomas Hubert; Joël Vandekerckhove; Jan Gettemans
Journal:  Biochem Biophys Res Commun       Date:  2011-09-17       Impact factor: 3.575

Review 3.  Association between Metabolic Syndrome and Cancer.

Authors:  Mehmet Uzunlulu; Ozge Telci Caklili; Aytekin Oguz
Journal:  Ann Nutr Metab       Date:  2016-02-20       Impact factor: 3.374

4.  Hepatocellular Carcinoma in the Absence of Cirrhosis in United States Veterans is Associated With Nonalcoholic Fatty Liver Disease.

Authors:  Sahil Mittal; Hashem B El-Serag; Yvonne H Sada; Fasiha Kanwal; Zhigang Duan; Sarah Temple; Sarah B May; Jennifer R Kramer; Peter A Richardson; Jessica A Davila
Journal:  Clin Gastroenterol Hepatol       Date:  2015-07-18       Impact factor: 11.382

5.  Population-based risk factors and resource utilization for HCC: US perspective.

Authors:  A Sanyal; A Poklepovic; E Moyneur; V Barghout
Journal:  Curr Med Res Opin       Date:  2010-09       Impact factor: 2.580

6.  Involvement of poly(ADP-Ribose) polymerase 1 and poly(ADP-Ribosyl)ation in regulation of centrosome function.

Authors:  Masayuki Kanai; Wei-Min Tong; Eiji Sugihara; Zhao-Qi Wang; Kenji Fukasawa; Masanao Miwa
Journal:  Mol Cell Biol       Date:  2003-04       Impact factor: 4.272

7.  WW domain-binding protein 2 overexpression prevents diet-induced liver steatosis and insulin resistance through AMPKβ1.

Authors:  Zhe Zheng; Yue Li; Siyuan Fan; Jie An; Xi Luo; Minglu Liang; Feng Zhu; Kai Huang
Journal:  Cell Death Dis       Date:  2021-03-03       Impact factor: 8.469

8.  Pan-cancer analysis and single-cell analysis revealed the role of ABCC5 transporter in hepatocellular carcinoma.

Authors:  Liang Chen; Zhonghua Yang; Yuan Cao; Yiming Hu; Wei Bao; Dan Wu; Li Hu; Jiaheng Xie; Hongzhu Yu
Journal:  Channels (Austin)       Date:  2021-12       Impact factor: 2.581

Review 9.  Risk factors and prevention of hepatocellular carcinoma in the era of precision medicine.

Authors:  Naoto Fujiwara; Scott L Friedman; Nicolas Goossens; Yujin Hoshida
Journal:  J Hepatol       Date:  2017-10-06       Impact factor: 30.083

10.  DCK is a promising prognostic biomarker and correlated with immune infiltrates in hepatocellular carcinoma.

Authors:  Danjun Song; Yining Wang; Kai Zhu; Lingyu Tian; Qiang Gao; Jian Zhou; Jia Fan; Xiaoying Wang
Journal:  World J Surg Oncol       Date:  2020-07-20       Impact factor: 2.754

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.