| Literature DB >> 33815409 |
Hua Ye1, Tiandong Li1,2,3, Hua Wang1,3, Jinyu Wu1,3, Chuncheng Yi1,3, Jianxiang Shi3,4, Peng Wang1,3, Chunhua Song1,3, Liping Dai3,4, Guozhong Jiang5, Yuxin Huang6, Yongwei Yu7, Jitian Li2,3.
Abstract
Pancreatic cancer is a lethal malignancy with a poor prognosis. This study aims to identify pancreatic cancer-related genes and develop a robust diagnostic model to detect this disease. Weighted gene co-expression network analysis (WGCNA) was used to determine potential hub genes for pancreatic cancer. Their mRNA and protein expression levels were validated through reverse transcription PCR (RT-PCR) and immunohistochemical (IHC). Diagnostic models were developed by eight machine learning algorithms and ten-fold cross-validation. Four hub genes (TSPAN1, TMPRSS4, SDR16C5, and CTSE) were identified based on bioinformatics. RT-PCR showed that the four hub genes were expressed at medium to high levels, IHC revealed that their protein expression levels were higher in pancreatic cancer tissues. For the panel of these four genes, eight models performed with 0.87-0.92 area under the curve value (AUC), 0.91-0.94 sensitivity, and 0.84-0.86 specificity in the validation cohort. In the external validation set, these models also showed good performance (0.86-0.98 AUC, 0.84-1.00 sensitivity, and 0.86-1.00 specificity). In conclusion, this study has identified four hub genes that might be closely related to pancreatic cancer: TSPAN1, TMPRSS4, SDR16C5, and CTSE. Four-gene panels might provide a theoretical basis for the diagnosis of pancreatic cancer.Entities:
Keywords: WGCNA; bioinformatics; diagnostic model; machine learning; pancreatic cancer; panel
Year: 2021 PMID: 33815409 PMCID: PMC8015801 DOI: 10.3389/fimmu.2021.649551
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Flow chart of data preparing, analysis, validation, and model development.
Primers sequences of hub genes and internal reference genes.
| Forward 5':TGGGCTGCTATGGTGCTAAG | 154 bp | |
| Reverse 5':GGCACTACCAGCAACGTCAG | ||
| Forward 5':GGGAAGTCACCGAGAAGA | 107 bp | |
| Reverse 5':ATGCCACTGGTCAGATTG | ||
| Forward 5':CTATACCCTCAGCCCAACTG | 169 bp | |
| Reverse 5':GTTATTCCCACGGTCAAAGAC | ||
| Forward 5':AATGGGCTGGCAGATTACTG | 111 bp | |
| Reverse 5':CACAATCGTGGTTTTGATCC | ||
| Forward 5':TGACTTCAACAGCGACACCCA | 121 bp | |
| Reverse 5':CACCCTGTTGCTGTAGCCAAA |
Figure 2Determination of soft-thresholding power in the weighted gene co-expression network analysis (WGCNA). (A) Analysis of the scale-free fit index for various soft-thresholding powers (β). (B) Analysis of the mean connectivity for various soft-thresholding powers. (C) Histogram of connectivity distribution when β = 8. (D) Checking the scale-free topology when β = 8.
Figure 3Identification of modules associated with the clinical traits of pancreatic cancer. (A) Dendrogram of 18,830 genes clustered based on a dissimilarity measure (1-TOM). (B) Heatmap of the correlation between module eigengenes and clinical traits of pancreatic cancer. (C) Module membership vs. gene significance in “greenyellow,” “blue,” and “red” module.
Figure 4Gene-gene interaction network of the top 20 genes. Through constructing a gene-gene interaction network by using 171 genes obtained from WGCNA analysis, the top 20 genes, ranked by degrees of interactions, were identified.
Figure 5Identification of four hub genes by eight datasets validation. Forty-one DEGs were identified through the intersection of the DEGs of 8 GEO datasets (GSE15471, GSE28735, GSE62165, GSE32688, GSE71989, GSE62452, GSE62165, and GSE32676), and then four hub genes were identified by an intersection with the top 20 genes.
Figure 6Validation of four hub genes expression by using RNA-Seq data (GEPIA website). *P ≤ 0.05; PAAD, pancreatic cancer (GEPIA website).
Summary of four hub genes identified by weighted gene co-expression network analysis.
| Tetraspanin 1 | Cell development, activation, growth, and motility | ( | |
| Transmembrane serine protease 4 | Integral component of membrane; regulation of gene expression; scavenger receptor activity | ( | |
| Cathepsin E | Antigen processing and presentation of exogenous peptide antigen via MHC class II; protein autoprocessing; protein catabolic process | ( | |
| Short chain dehydrogenase/reductase family 16C member 5 | Activating transcription factor binding; keratinocyte proliferation; oxidation-reduction process | NA |
Figure 7TSPAN1, TMPRSS4, SDR16C5 and CTSE mRNA expression in three pancreatic cancer cells.
Figure 8Immunohistochemical staining of TSPAN1, TMPRSS4, SDR16C5 and CTSE.
Diagnostic performance of eight machine learning methods for pancreatic cancer.
| Support vector machine | 0.87 (0.79–0.95) | 0.92 | 0.84 | 0.90 (0.73–1.00) | 0.96 | 0.86 |
| Random forest | 0.91 (0.86–0.97) | 0.91 | 0.86 | 0.94 (0.83–1.00) | 0.96 | 0.86 |
| Naive Bayes | 0.91 (0.86–0.96) | 0.93 | 0.84 | 0.92 (0.77–1.00) | 0.96 | 0.86 |
| Neural network | 0.91 (0.86–0.97) | 0.94 | 0.84 | 0.97 (0.91–1.00) | 0.84 | 1.00 |
| Linear discriminant analysis | 0.91 (0.86–0.96) | 0.93 | 0.84 | 0.95 (0.86–1.00) | 1.00 | 0.86 |
| Mixture discriminant analysis | 0.91 (0.87–0.96) | 0.91 | 0.84 | 0.98 (0.93–1.00) | 1.00 | 0.86 |
| Flexible discriminant analysis | 0.91 (0.85–0.96) | 0.92 | 0.84 | 0.86 (0.71–1.00) | 0.84 | 0.86 |
| Logistic regression | 0.92 (0.87–0.97) | 0.93 | 0.84 | 0.97 (0.90–1.00) | 0.96 | 0.86 |
AUC, receiver operating characteristic area under the curve value; Se, Sensitivity; Sp, Specificity.