| Literature DB >> 33193628 |
Jin-Hui Zhu1, Qiu-Liang Yan2, Jian-Wei Wang3, Yan Chen1, Qing-Huang Ye1, Zhi-Jiang Wang1, Tao Huang4.
Abstract
BACKGROUND: Pancreatic ductal adenocarcinoma (PDAC) is the most aggressive form of pancreatic cancer. Its 5-year survival rate is only 3-5%. Perineural invasion (PNI) is a process of cancer cells invading the surrounding nerves and perineural spaces. It is considered to be associated with the poor prognosis of PDAC. About 90% of pancreatic cancer patients have PNI. The high incidence of PNI in pancreatic cancer limits radical resection and promotes local recurrence, which negatively affects life quality and survival time of the patients with pancreatic cancer.Entities:
Keywords: Monte-Carlo feature selection; incremental feature selection; pancreatic ductal adenocarcinoma; perineural invasion; support vector machine
Year: 2020 PMID: 33193628 PMCID: PMC7593847 DOI: 10.3389/fgene.2020.554502
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1The IFS curve for final key feature selection. The X-axis was the number of features. The Y-axis was their prediction accuracy evaluated with LOOCV. When 175 genes were used, the accuracy was the highest, at 0.96. But when only 26 genes were used, the accuracy became 0.94. Balancing both model complexity and performance, we chose the 26 genes as the final key features and their SVM predictor as the optimized predictor for perineural invasion.
The 26 key discriminative features between patients with perineural invasion and without perineural invasion.
| Rank | Tissue | Probe | RI | Probe sequence | blastn Score | blastn Identities | Chromosome position | Gene symbol |
| 1 | Adjacent | p12601 | 0.0518 | GGCTGAAGGAGAACTTCAATATCATATATTTTAAA GGTTGACTCACAGTTTGGAACAAGA | 111bits (60) | 60/60 (100%) | chr4:3545147-3545206 | AL590235.1 |
| 2 | Tumor | p18602 | 0.0395 | CTTGCCATTTAGCCACTGTTACTGATAATTTGGAT GGAAAAGAAAAATAAATTATATCAC | 111bits (60) | 60/60 (100%) | chr12:25944921-25944862 | RASSF8-AS1 |
| 3 | Tumor | A_24_P88801 | 0.0370 | CAGACGGAGTTCATGAACCTTTTGACCTTTCAG AGCAGACCTATGACTTCTTGGGTGAAA | 111bits (60) | 60/60 (100%) | chr2:110123870-110123811 | NPHP1 |
| 4 | Tumor | A_33_P3329128 | 0.0352 | GGTGGCTGGACAATCATTTGCTAAAGATATT AGGCATGTGACTCATGAATCCAATCAATC | No significant similarity found | |||
| 5 | Adjacent | A_21_P0010506 | 0.0271 | TTAGGCCAAGTGTGGAGAAATCAATGATGT TGACGATGAGGCTCCCTGAGAGAAATCACA | 111bits (60) | 60/60 (100%) | chr1:2565049-2565108 | TNFRSF14 |
| 6 | Adjacent | A_24_P941759 | 0.0243 | ATGCTTCAAGTAATGCAATACAAAACATA ACCCATTTAATGATGAATTACTTGAATGTAT | 111bits (60) | 60/60 (100%) | chr14:30619533-30619592 | G2E3 |
| 7 | Adjacent | p14843 | 0.0238 | AGTGACTGATTTGAAACCAGTTGTACCGTA TTATTAGGAAAGGCGCCTTGATGAAAAGAT | 80.5 bits (43) | 43/43 (100%) | chr7:19813727-19813685 | AC004543.1 |
| 8 | Adjacent | p28694 | 0.0238 | CATTTACTGGGCCTGAAATGGGAAAATGAA AGATGTGGCAAGAAACTGACAAGGGCCCAA | 111bits (60) | 60/60 (100%) | chr1:1660100-1660159 | FO704657.1/SLC35E2B |
| 9 | Tumor | p4684 | 0.0236 | AAGGTGTTGAAGCATAGACGCTGGAACATAAAATG ACTCATGATCTCACTGGGAGAAGGG | 111bits (60) | 60/60 (100%) | chr14:101635450-101635391 | LINC02320 |
| 10 | Adjacent | A_23_P170088 | 0.0214 | ATCCTTTCTGTGCTGCTTTAGGCATCTGCC CTTACGTGGTTCGTGTCCAGCTCTGTCAAC | 106 bits (57) | 59/60 (98%) | chr9:137392642-137392583 | EXD3 |
| 11 | Tumor | A_23_P398372 | 0.0198 | TCAGGTAGAGAAAGCAAAAAATCTCTGGCCGTAA ACCGTGCTCTCTAATTTATCGGCAGC | 111bits (60) | 60/60 (100%) | chr9:136115009-136114950 | TMEM250 |
| 12 | Tumor | A_24_P277673 | 0.0191 | CAAGGTACTGAGCGATAATATTCAGGGCATTACC AAGTGCACTATCCGGCGCTTGGCCCG | 111bits (60) | 60/60 (100%) | chr6:26246918-26246859 | HIST1H4G |
| 13 | Tumor | A_19_P00803575 | 0.0190 | AAGCAGCTTGTATAATTCCAACTGGTGTTT CATTTCTGTTCTAATGCTAAGTGGTAACGC | 111bits (60) | 60/60 (100%) | chr17:17192194-17192253 | MPRIP |
| 14 | Tumor | A_33_P3802966 | 0.0190 | GAGGGGGTTAACATAACGCGGACCGATCCCA AATGGCATTGATGAGTGTACCTCCCACGA | No significant similarity found | |||
| 15 | Adjacent | A_32_P207124 | 0.0166 | CCCTGCCCTTCCCATCTTAGGGTGTCGTCT GAGACAGACTCTTATTCCCTCAATAAAGAG | 111bits (60) | 60/60 (100%) | chrX:120932228-120932169 | CT47A12 |
| 16 | Tumor | A_23_P410128 | 0.0165 | AAACCAACAAATAAAAGCATGATAAATTGACT ATATCAAAATTTAAAACTTCTCTATGAC | 111bits (60) | 60/60 (100%) | chr22:42027707-42027766 | WBP2NL |
| 17 | Adjacent | p28485 | 0.0157 | AGTCTCAGGGCTAGACGTATTCCAAATATTTGGAT AATTCAAAGTAATTTGCACAGACAT | 111bits (60) | 60/60 (100%) | chr7:28738207-28738148 | CREB5 |
| 18 | Tumor | p13289 | 0.0155 | CTAGGGTGCTCTATGCTGTGATGCTATCAAATC TTCATGGATTTTTCCAGGATCCTCAAA | 111bits (60) | 60/60 (100%) | chr5:92432262-92432203 | AC114316.1/AC124854.1 |
| 19 | Tumor | A_33_P3349334 | 0.0149 | CAGGCTTGTATGATCTATTCCTTACCACAAAAG AAGTAGACAATTGCCACTTTTATTTCT | 111bits (60) | 60/60 (100%) | chr14:55049328-55049387 | SOCS4 |
| 20 | Adjacent | A_23_P40078 | 0.0146 | GGGTATTTGTCGACCAAAATGATGCCAATTTG TAAATTAAAATGTCACCTAGTGGCCCTT | 111bits (60) | 60/60 (100%) | chr2:61478760-61478701 | AC016727.1/XPO1 |
| 21 | Adjacent | RNA143544| tRNA_461_68 | 0.0145 | AGAAAAACCATTTCATAACTTTGTCAAAGTTAAA TTATAGGCTAAATCCTATATACCTTA | 106 bits (57) | 59/60 (98%) | chrM:7526-7585 | MT-TD |
| 22 | Tumor | A_33_P3259817 | 0.0142 | AGAAATGAAAGCCAACTACAGGGAAATGGC GAAGGAGCTTTCTGAAATCATGCATGAGCA | 111bits (60) | 60/60 (100%) | chr13:98797175-98797116 | DOCK9 |
| 23 | Tumor | RNA95815| RNS_897_109 | 0.0141 | TAAAAGGAGAAAGGGAGGGGCCTTGTGAGGTG AAGGGTGTCCTTATACAGGTGTGACAGC | 111bits (60) | 60/60 (100%) | chr20:56660085-56660144 | |
| 24 | Adjacent | A_24_P33895 | 0.0139 | TCCAGAAGATGAGAGAAACCTCTTTATCCAACA GATAAAAGAAGGAACATTGCAGAGCTA | 111bits (60) | 60/60 (100%) | chr1:212619495-212619554 | ATF3 |
| 25 | Adjacent | p29186 | 0.0138 | AGGCGGGGGATGCTGTGTGGCACCTCCTATTG TCTCTTTTTGCGTTTTCTCCCATTCTCG | 111bits (60) | 60/60 (100%) | chr1:19188401-19188460 | UBR4 |
| 26 | Tumor | A_33_P3351785 | 0.0137 | GCATCAAAATCAACAAAAAACCAGAATATAGTC CCAAAAGAGAAATCCACCAAGTACCAT | 111bits (60) | 60/60 (100%) | chr20:8871665-8871724 | PLCB1 |
Ranking of the 26 key features selected by the Monte-Carlo method in the other seven feature selection methods.
| Monte Carlo | Best Rank in other seven methods | ChiSquared | Correlation | GainRatio | InfoGain | OneR | ReliefF | SymmetricalUncert |
| 1 | OneR 4 | 6 | 12 | 51 | 12 | 4 | 21 | 15 |
| 2 | ChiSquared 14 | 14 | 49 | 52 | 24 | 94 | 306 | 30 |
| 3 | GainRatio 2 | 18 | 1121 | 2 | 16 | 267 | 11 | 9 |
| 4 | GainRatio 4 | 20 | 501 | 4 | 15 | 227 | 220 | 8 |
| 5 | SymmetricalUncert 5 | 13 | 38 | 13 | 10 | 527 | 1602 | 5 |
| 6 | ReliefF 4 | 26 | 43 | 168 | 39 | 613 | 4 | 67 |
| 7 | Correlation 8 | 10 | 8 | 135 | 19 | 1839 | 80 | 29 |
| 8 | SymmetricalUncert 3 | 11 | 74 | 12 | 9 | 6306 | 121 | 3 |
| 9 | ChiSquared 25 | 25 | 47 | 170 | 38 | 5466 | 194 | 69 |
| 10 | Correlation 5 | 40 | 5 | 140 | 48 | 760 | 545 | 72 |
| 11 | GainRatio 1 | 19 | 1205 | 1 | 18 | 10 | 67 | 11 |
| 12 | GainRatio 29 | 123 | 1697 | 29 | 114 | 11583 | 8543 | 46 |
| 13 | Correlation 1 | 31 | 1 | 17 | 13 | 73 | 3 | 14 |
| 14 | Correlation 6 | 45 | 6 | 9 | 31 | 1037 | 160 | 22 |
| 15 | ChiSquared 1, InfoGain 1, SymmetricalUncert 1 | 1 | 89 | 11 | 1 | 13 | 7 | 1 |
| 16 | SymmetricalUncert 4 | 12 | 239 | 14 | 11 | 4030 | 128 | 4 |
| 17 | ReliefF 38 | 100 | 2787 | 238 | 155 | 1528 | 38 | 269 |
| 18 | ChiSquared 53 | 53 | 106 | 417 | 57 | 366 | 572 | 192 |
| 19 | GainRatio 3 | 17 | 1935 | 3 | 17 | 609 | 115 | 10 |
| 20 | ChiSquared 22 | 22 | 581 | 406 | 34 | 4094 | 335 | 97 |
| 21 | ChiSquared 15 | 15 | 3593 | 251 | 26 | 2325 | 489 | 66 |
| 22 | ChiSquared 16 | 16 | 21 | 250 | 25 | 45 | 56 | 65 |
| 23 | ChiSquared 41 | 41 | 303 | 139 | 50 | 3739 | 329 | 70 |
| 24 | ChiSquared 62 | 62 | 2882 | 404 | 120 | 197 | 2038 | 259 |
| 25 | ReliefF 429 | 10141 | 12902 | 10141 | 10141 | 6549 | 429 | 10141 |
| 26 | ReliefF 52 | 71 | 1801 | 438 | 142 | 11874 | 52 | 288 |
FIGURE 2The IFS curves of seven other feature selection methods from Weka. (A) The IFS curve of ChiSquaredAttributeEval; (B) The IFS curve of CorrelationAttributeEval; (C) The IFS curve of GainRatioAttributeEval; (D) The IFS curve of InfoGainAttributeEval; (E) The IFS curve of OneRAttributeEval; (F) The IFS curve of ReliefFAttributeEval; (G) The IFS curve of SymmetricalUncertAttributeEval. The IFS curves of seven other feature selection methods from Weka were plotted. Their peak accuracies were 0.88, 0.94, 0.90, 0.88, 0.76, 0.92, and 0.88, all smaller than the highest accuracy of Monte-Carlo feature selection, which was 0.96.
The enriched functions of the 15 genes from tumor tissues, the 11 genes from adjacent tissues, and the combined 26 genes using GATHER.
| Function | # Genes | Genes | Bayes factor | |||
| The 15 genes from tumor | Gene Ontology | GO:0016043: cell organization and biogenesis | 3 | HIST1H4G NPHP1 SOCS4 | 0.0004 | 4 |
| GO:0006996: organelle organization and biogenesis | 2 | HIST1H4G NPHP1 | 0.006 | 2 | ||
| V$HEN1_02: HEN1 | 2 | NPHP1 SOCS4 | 0.0003 | 4 | ||
| TRANSFAC | V$NFKB_Q6: NF-kappaB | 3 | DOCK9 NPHP1 SOCS4 | 0.002 | 3 | |
| V$HNF1_Q6 | 2 | NPHP1 SOCS4 | 0.004 | 2 | ||
| The 11 genes from adjacent | KEGG Pathway | hsa04060: Cytokine-cytokine receptor interaction | 1 | TNFRSF14 | 0.002 | 2 |
| TRANSFAC | V$USF_01: upstream stimulating factor | 4 | ATF3 CREB5 TNFRSF14 XPO1 | <0.0001 | 7 | |
| V$MYC_Q2 | 4 | ATF3 CREB5 TNFRSF14 XPO1 | 0.0002 | 5 | ||
| V$NMYC_01: N-Myc | 4 | ATF3 CREB5 TNFRSF14 XPO1 | 0.0004 | 4 | ||
| V$MYCMAX_02: c-Myc:Max heterodimer | 4 | ATF3 CREB5 TNFRSF14 XPO1 | 0.001 | 3 | ||
| V$AML_Q6 | 3 | ATF3 CREB5 TNFRSF14 | 0.002 | 2 | ||
| V$SRF_Q6: serum response factor | 2 | ATF3 CREB5 | 0.003 | 2 | ||
| Combined 26 genes | TRANSFAC | V$NFKB_Q6: NF-kappaB | 5 | CREB5 DOCK9 NPHP1 SOCS4 XPO1 | 0.0008 | 3 |
FIGURE 3The STRING network of the 26 genes. The 26 genes were mapped onto the STRING network (https://string-db.org/). The light-yellow nodes were genes from adjacent tissues while the light-blue nodes were genes from tumor tissues. The genes from tumors can be grouped into three clusters and the hub genes of these three clusters were ATF3, XPO1, and TNFRSF14, which were highlighted in red.