| Literature DB >> 33178697 |
Zi-Mei Zhang1, Jia-Shu Wang1, Hasan Zulfiqar1, Hao Lv1, Fu-Ying Dao1, Hao Lin1.
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is an aggressive and lethal cancer deeply affecting human health. Diagnosing early-stage PDAC is the key point to PDAC patients' survival. However, the biomarkers for diagnosing early PDAC are inexact in most cases. Therefore, it is highly desirable to identify an effective PDAC diagnostic biomarker. In the current work, we designed a novel computational approach based on within-sample relative expression orderings (REOs). A feature selection technique called minimum redundancy maximum relevance was used to pick out optimal REOs. We then compared the performances of different classification algorithms for discriminating PDAC and its adjacent normal tissues from non-PDAC tissues. The support vector machine algorithm is the best one for identifying early PDAC diagnostic biomarker. At first, a signature composed of nine gene pairs was acquired from microarray gene expression data sets. These gene pairs could produce satisfactory classification accuracy up to 97.53% in fivefold cross-validation. Subsequently, two types of data from diverse platforms, namely, microarray and RNA-Seq, were used to validate this signature. For microarray data, all (100.00%) of 115 PDAC tissues and all (100.00%) of 31 PDAC adjacent normal tissues were correctly recognized as PDAC. In addition, 88.24% of 17 non-PDAC (normal or pancreatitis) tissues were correctly classified. For the RNA-Seq data, all (100.00%) of 177 PDAC tissues and all (100.00%) of 4 PDAC adjacent normal tissues were correctly recognized as PDAC. Validation results demonstrated that the signature had a good cross-platform effect for early detection of PDAC. This work developed a new robust signature that might be a promising biomarker for early PDAC diagnosis.Entities:
Keywords: biomarker; diagnosis; pancreatic ductal adenocarcinoma; relative expression orderings; support vector machine
Year: 2020 PMID: 33178697 PMCID: PMC7593596 DOI: 10.3389/fcell.2020.582864
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
Statistics of all data sets.
| Data set | Platform | PDAC | PDAC_adjacent | Pancreatitis | Normal |
| GSE62452 | Affymetrix GPL6244 | 69 | 61 | – | – |
| GSE28735 | Affymetrix GPL6244 | 45 | 45 | – | – |
| GSE22780 | Affymetrix GPL570 | 8 | 8 | – | – |
| GSE15471 | Affymetrix GPL570 | 39 | 39 | – | – |
| GSE50827 | Illumina GPL10558 | 103 | – | – | – |
| GSE106189 | Affymetrix GPL570 | 35 | – | – | – |
| GSE84219 | Illumina GPL14951 | 30 | – | – | – |
| GSE98399 | Affymetrix GPL570 | 43 | – | – | – |
| GSE62165 | Affymetrix GPL13667 | 118 | – | – | 13 |
| GSE32676 | Affymetrix GPL570 | 25 | – | – | 7 |
| GSE101462 | Illumina GPL10558 | 6 | – | 10 | 4 |
| GSE101448 | Illumina GPL10558 | 24 | – | – | 19 |
| GSE41368 | Affymetrix GPL6244 | 6 | – | – | 6 |
| GSE60601 | Affymetrix GPL570 | 9 | – | – | 3 |
| GSE71989 | Affymetrix GPL570 | 13 | – | – | 8 |
| GSE89120 | Affymetrix GPL1352 | – | – | – | 14 |
| Total | 573 | 153 | 10 | 74 | |
| Samples for assessing the efficiency of the signature | |||||
| TCGA | RNA-Seq | 177 | 4 | – | – |
| Total | 177 | 4 | |||
FIGURE 1Schematic workflow of analyses.
Comparison of different methods for identifying early PDAC diagnostic biomarker.
| Methods | Training set | Testing set | ||||||
| ACR (%) | SES (%) | SPF (%) | MCC | ACR (%) | SES (%) | SPF (%) | MCC | |
| SVM | 97.53 | 97.96 | 93.22 | 0.8615 | 98.77 | 98.65 | 100.00 | 0.9330 |
| Decision tree | 96.91 | 97.78 | 88.52 | 0.8278 | 95.09 | 97.92 | 73.68 | 0.7518 |
| Logistic regression | 96.91 | 98.11 | 86.15 | 0.8314 | 96.93 | 99.30 | 80.00 | 0.8513 |
| Random forest | 96.60 | 97.61 | 86.89 | 0.8104 | 96.93 | 99.30 | 80.00 | 0.8513 |
| Naïve Bayes | 96.14 | 98.94 | 76.25 | 0.8124 | 96.32 | 99.30 | 76.19 | 0.8274 |
| Bayes net | 95.83 | 98.59 | 75.64 | 0.7933 | 95.70 | 99.29 | 72.73 | 0.8051 |
FIGURE 2A plot to show the IFS curve. The black dotted line showed that nine gene pairs reached the highest accuracy of 97.53%.
The nine gene pairs’ signature ranked by mRMR.
| Order | Feature (gene pair) | |
| Gene i | Gene j | |
| 1 | UBE2C | FITM1 |
| 2 | SERPINB5 | ZNF100 |
| 3 | NUSAP1 | ONECUT1 |
| 4 | LAMC2 | RBM33 |
| 5 | BCAR3 | FBXO42 |
| 6 | CTSE | PRRC2C |
| 7 | HOXB7 | MYO19 |
| 8 | NUSAP1 | TNKS |
| 9 | RRM2 | ONECUT1 |
Classification efficiency of the nine gene pairs in independent test data sets.
| Data set | PDAC | PDAC_adjacent | Pancreatitis | Normal | ACR | SES | SPF | MCC |
| Testing set | 115 | 31 | 2 | 15 | 98.77% | 98.65% | 100.00% | 0.9330 |
| TCGA | 177 | 4 | – | – | – | 100.00% | – | – |
FIGURE 3The ROC curve of the independent test data sets.
The description of the association between seven genes and PDAC.
| Gene symbol | The description of the association between seven genes and PDAC |
| UBE2C | Silencing UBE2C could inhibit the proliferation and epithelial–mesenchymal transition in PDAC ( |
| SERPINB5 | SERPINB5 links to the prognosis of PDAC ( |
| LAMC2 | LAMC2 is associated with PDAC occurrence and progression (54–56). The high expression level of LAMC2 could facilitate the invasion of PDAC cell and thus increase the risk of tumor recurrence ( |
| CTSE | Because patients with pancreatic diseases (chronic pancreatitis) have a strong risk of developing PDAC, the expression of CTSE in pancreatic diseases might be the key to detection of early PDAC and progression of PDAC |
| HOXB7 | HOXB7 is overexpressed in PDAC. It is closely relevant to lymph node metastasis ( |
| RRM2 | Gene expression of RRM2 was significantly higher in PDAC tissues than normal pancreatic tissues ( |
| ONECUT1 | Loss expression of ONECUT1 in PDAC cells implied its tumor suppressor function in this malignant tumor ( |