| Literature DB >> 34646759 |
Hua Chai1,2, Long Xia3, Lei Zhang2, Jiarui Yang2, Zhongyue Zhang4, Xiangjun Qian2, Yuedong Yang4, Weidong Pan2.
Abstract
BACKGROUND: Predicting hepatocellular carcinoma (HCC) prognosis is important for treatment selection, and it is increasingly interesting to predict prognosis through gene expression data. Currently, the prognosis remains of low accuracy due to the high dimension but small sample size of liver cancer omics data. In previous studies, a transfer learning strategy has been developed by pre-training models on similar cancer types and then fine-tuning the pre-trained models on the target dataset. However, transfer learning has limited performance since other cancer types are similar at different levels, and it is not trivial to balance the relations with different cancer types.Entities:
Keywords: bioinformatics; deep learning; hepatocellular carcinoma; prognostic markers; survival analysis
Year: 2021 PMID: 34646759 PMCID: PMC8504135 DOI: 10.3389/fonc.2021.692774
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Proposed adaptive transfer-learning-based deep Cox neural network (ATRCN) used for hepatocellular carcinoma (HCC) survival analysis. (A) Architecture of the proposed ATRCN. (B) Independent tests for the HCC prognosis prediction model obtained by ATRCN. (C) Identifying HCC prognostic markers. (D) Enriching the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways that are associated with HCC prognosis.
The features used for describing the survival situations of cancer patients.
| Phenotype features | Genotype features | ||
|---|---|---|---|
| Y3 | Three-year survival rate | AVE1 | Mean of Fe1 |
| Y5 | Five-year survival rate | MID1 | Median of Fe1 |
| SAVE | Mean of the survival time | STD1 | Standard deviation of Fe1 |
| STD | Standard deviation of survival time | KURT1 | Kurtosis of Fe1 |
| T1 | First quartile of the survival time | SKEW1 | Skewness of Fe1 |
| T2 | Second quartile of the survival time | AVE2 | Mean of Fe2 |
| T3 | Third quartile of the survival time | MID2 | Median of Fe2 |
| T4 | Max value of survival time | STD2 | Standard deviation of Fe2 |
| S1 | % patients in [0, 0.25*T4] | KURT2 | Kurtosis of Fe2 |
| S2 | % patients in [0.25*T4, 0.5*T4] | SKEW2 | Skewness of Fe2 |
| S3 | % patients [0.5*T4, 0.75*T4] | ||
| S4 | % patients in [0.75*T4, T4] | ||
Fe1 and Fe2 are the two compressed features of the mRNA constructed using kernel principal component analysis (KPCA).
Figure 2Survival characteristics for matching appropriate pre-training cancer data adaptively. (A) Normalized cancer description matrix of 11 different cancers in The Cancer Genome Atlas (TCGA) by using 12 phenotype and 10 genotype characteristics. (B) Corresponding clustering results obtained by k-means (k = 6). (C) Distances between the centers of the different clusters. (D) C-index values obtained using different pre-training data combinations based on cluster distances. The points on the x-axis represent the combination of pre-training data arranged according to distance from near to far.
Prediction performance obtained in the different cancer datasets.
| Cox_en | RSF | Deep_surv | TRCN* | TRCN | ATRCN | |
|---|---|---|---|---|---|---|
| BRCA | 0.553 (±0.081) | 0.571 (±0.075) | 0.588 (±0.103) | 0.596 (±0.091) | 0.617 (±0.082) | 0.652 (±0.078) |
| HNSC | 0.539 (±0.071) | 0.547 (±0.064) | 0.565 (±0.077) | 0.573 (±0.072) | 0.585 (±0.056) | 0.602 (±0.064) |
| LIHC | 0.570 (±0.074) | 0.582 (±0.070) | 0.636 (±0.095) | 0.654 (±0.088) | 0.667 (±0.073) | 0.696 (±0.079) |
| LUAD | 0.552 (±0.067) | 0.555 (±0.062) | 0.572 (±0.083) | 0.580 (±0.074) | 0.590 (±0.068) | 0.605 (±0.070) |
| STAD | 0.542 (±0.054) | 0.541 (±0.048) | 0.560 (±0.058) | 0.555 (±0.060) | 0.564 (±0.055) | 0.583 (±0.051) |
| Average | 0.551 | 0.559 | 0.584 | 0.592 | 0.605 | 0.628 |
BRCA, breast invasive carcinoma; HNSC, head and neck squamous cell carcinoma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; STAD, stomach adenocarcinoma; Cox_en, Cox regression model with elastic net regularization; Deep_surv, deep Cox neural network without transfer learning; RSF, random survival forest; ATRCN, adaptive transfer-learning-based deep Cox neural network.
TRCN* selected the farthest cancer cluster from the target for pre-training.
Figure 3Survival curves of the different risk groups divided by the adaptive transfer-learning-based deep Cox neural network (ATRCN) prediction model on different liver cancer datasets. Red lines represent the high-risk patients and green lines are the low-risk ones.
Correlations between the predicted risk subgroups and clinical covariates.
| Clinical | Squared |
| Feature types |
|---|---|---|---|
| Tumor stage | 19.02 | 5.0E−4 | Stage I; stage II; stage III; stage IV |
| Treatment or therapy | 9.15 | 0.006 | Yes; No |
| Gender | 4.73 | 0.033 | Male; Female |
| Survival time | 67.42 | 4.9E−4 | Four intervals (<25%; 25%–50%; 50%–75%; >75%) |
| Race | 6.35 | 0.163 | White; Black or African American; Asian |
| Age | 3.47 | 0.336 | Four intervals (<30; 30–50; 50–70; >70) |
| Treatment type | 0.19 | 0.735 | Radiation therapy; Pharmaceutical therapy |
| Prior malignancy | 0.22 | 0.745 | Yes; No |
Figure 4Identification of prognostic markers by conducting differential expression analysis and weighted gene co-expression network analysis (WGCNA). (A) The 298 identified differentially expressed genes (DEGs) with |log2 fold change| >0.7 and corrected p-values <0.05 in the liver cancer data (LIHC) in TCGA. (B). Heat map using the DEGs with predicted risk subgroups in LIHC. (C). WGCNA for calculating co-expression modules. (D). Computed average gene significance values in the different modules. (E) Three identified modules that are associated with risk phenotypes in hepatocellular carcinoma (HCC). (F) Differences in the identified downregulated risk genes in the HCC risk subgroups by gene expression level. (G) Differences in the identified upregulated risk genes in the HCC risk subgroups.
Figure 5Validation of the role of TTC36 in human liver cancer. (A) The efficacy of TTC36 ectopic expression is determined in hepatocellular carcinoma (HCC) cells. (B) Cell proliferation was assessed with the CCK-8 assay in Huh7 cells. (C, D) The effect of TTC36 overexpression on colony formation was counted in Huh7 cells. (E, F) Representative images and histogram analysis of the Transwell migration and invasion assays after TTC36 upregulation in Huh7 cells. *p-value<0.05.
Figure 6The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway–gene network enriched by using the connected differentially expressed genes (DEGs). The ellipse nodes represent the genes and the rectangle nodes represent the enriched KEGG pathways. Red represents the upregulated risk path, green represents the downregulated risk group, and gray represents the path simultaneously enriched by different risk group genes.