| Literature DB >> 31850330 |
Lei Chen1,2,3, XiaoYong Pan4,5, Tao Zeng6, Yu-Hang Zhang7, YunHua Zhang8, Tao Huang7, Yu-Dong Cai1.
Abstract
Liquid biopsy (i.e., fluid biopsy) involves a series of clinical examination approaches. Monitoring of cancer immunological status by the "immunosignature" of patients presents a novel method for tumor-associated liquid biopsy. The major work content and the core technological difficulties for the monitoring of cancer immunosignature are the recognition of cancer-related immune-activating antigens by high-throughput screening approaches. Currently, one key task of immunosignature-based liquid biopsy is the qualitative and quantitative identification of typical tumor-specific antigens. In this study, we reused two sets of peptide microarray data that detected the expression level of potential antigenic peptides derived from tumor tissues to avoid the detection differences induced by chip platforms. Several machine learning algorithms were applied on these two sets. First, the Monte Carlo Feature Selection (MCFS) method was used to analyze features in two sets. A feature list was obtained according to the MCFS results on each set. Second, incremental feature selection method incorporating one classification algorithm (support vector machine or random forest) followed to extract optimal features and construct optimal classifiers. On the other hand, the repeated incremental pruning to produce error reduction, a rule learning algorithm, was applied on key features yielded by the MCFS method to extract quantitative rules for accurate cancer immune monitoring and pathologic diagnosis. Finally, obtained key features and quantitative rules were extensively analyzed.Entities:
Keywords: cancer subtype; expression rule; feature selection; immunosignature; multi-class classification
Year: 2019 PMID: 31850330 PMCID: PMC6901955 DOI: 10.3389/fbioe.2019.00370
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Entire procedures to investigate the peptide microarray data from Gene Expression Omnibus with advanced machine learning algorithms.
Figure 2Performance of random forest (RF) on different feature subsets. (A) Performance of RF on feature subsets with step 10; (B) Performance of RF on feature subsets with top 40–60 features.
The classification performance on two datasets.
| Dataset 1 | RF | 46 | 0.985 |
| Dataset 2 | SVM | 2,846 | 0.952 |
Figure 3Performance of three classifiers on six groups in dataset-1 and overall accuracy.
Figure 4The confusion map of seven classification rules on dataset-1.
Seven detected rules for classifying different diseases in dataset-1.
| Rule1 | CSGAGFEGTGLRCSLLCLDR <= 0.795 | Esophageal cancer |
| Rule2 | CSGFQPMRYPFQDPYHGYGW <= 1.056 | Pancreatic cancer |
| Rule3 | CSGFLMEHQNLLERSEDAKA <= 0.569 | Healthy control |
| Rule4 | CSGTYEPHLVYLATFTDGIP <= 0.870 | Healthy control |
| Rule5 | CSGEKIGMEQHYNQWIELMR >= 1.036 | Multiple myeloma |
| Rule6 | CSGADFVTYATRRVQFMMHK >= 1.282 | Brain cancer |
| Rule7 | Others | Breast cancer |
Figure 5Performance of support vector machine (SVM) on different feature subsets. (A) Performance of SVM on feature subsets with step 10; (B) Performance of SVM on feature subsets with top 2,800–2,900 features.
Figure 6Performance of three classifiers on 15 groups in dataset-2 and overall accuracy.
Figure 7The confusion map of 42 classification rules on dataset-2.
Forty-two detected rules for classifying different diseases in dataset-2.
| Rule1 | HQKNDSANTVITTWLTRGSC>=5.265 | Sarcoma |
| Rule2 | MNVHYAAQDVINFGAHQGSC>=1.497 | Glioblastoma multiformae |
| Rule3 | ELIAFRDFNWRGGVVAGGSC>=2.837 | Glioblastoma multiformae |
| Rule4 | VWGKGGMYEAHYRRNGEGSC>=2.360 | Glioblastoma multiformae |
| Rule5 | HDWNVAWELRRWKALIYGSC>=1.791 | Breast cancer stage IVa |
| Rule6 | KFPNEFRYRYNWRMQNPGSC>=7.729 | Breast cancer stage IVa |
| Rule7 | FHWNMYKNSESLFEEKQGSC>=2.110 | Oligodendroglioma |
| Rule8 | PGLTHNTLQYMATVLSVGSC>=1.876 | Oligodendroglioma |
| Rule9 | PGLTHNTLQYMATVLSVGSC>=1.876 | Oligodendroglioma |
| Rule10 | QVNKAVSWYLVWHLWHQGSC>=1.183 | Recurrent breast cancer |
| Rule11 | HYNRYMVIIGNWGKQPIGSC<=0.509 | Recurrent breast cancer |
| Rule12 | GDQHQLEPPYKKNQYMIGSC>=1.857 | Recurrent breast cancer |
| Rule13 | RQNTIRSRQKINLGGGDGSC>=1.853 | Recurrent breast cancer |
| Rule14 | PVGEVSSDYNRGPWRGTGSC>=1.977 | Recurrent breast cancer |
| Rule15 | SWIHGWLTITIYGFKERGSC>=1.631 | Recurrent breast cancer |
| Rule16 | DLVMPTNHESLSQLTGDGSC>=1.004 | Pancreatitis |
| Rule17 | LERGHRADMAYRDTFPMGSC>=2.128 | Pancreatitis |
| Rule18 | IKSRTGAEEIQIQMLLRGSC>=2.858 | Pancreatitis |
| Rule19 | LSERWAMGAHRDTASQTGSC>=1.540 | Ovarian cancer |
| Rule20 | ADVKMLWEWNDVKVLIIGSC>=4.318 | Ovarian cancer |
| Rule21 | VNFESFREPTFGSDGYSGSC>=2.353 | Mixed Oligo/Astrocytoma |
| Rule22 | LIVFTKGHRMYNDIPTNGSC<=0.434 | Mixed Oligo/Astrocytoma |
| Rule23 | YLSTSMEQEQEQVHGNWGSC>=2.247 | Mixed Oligo/Astrocytoma |
| Rule24 | TVKKMYNGGLASKNALYGSC<=0.171 | Mixed Oligo/Astrocytoma |
| Rule25 | TQGVAHFGQTHYPYQLEGSC>=1.942 | Lung cancer |
| Rule26 | YVQEHAQWKNMWELANGGSC>=2.325 | Lung cancer |
| Rule27 | FLKFMQKMSTVHIIWLNGSC<=0.118 | Lung cancer |
| Rule28 | TAKWYGIRNSQDEKVEAGSC>=1.756 | Lung cancer |
| Rule29 | YINSYPIAKPHGEEMQMGSC<=0.461 | Multiple myeloma |
| Rule30 | ERIYRDHFIHEHKANIIGSC<=0.545 | Multiple myeloma |
| Rule31 | TAHGKARDFDPAKNRYLGSC<=0.398 | Multiple myeloma |
| Rule32 | HFGIVISVMNEKEGALRGSC>=7.715 | Multiple myeloma |
| Rule33 | YFMWPFWWYSHVWGRDWGSC>=1.001 | Pancreatic cancer |
| Rule34 | WWWFHSLGLLAHIKIALGSC>=1.122 | Pancreatic cancer |
| Rule35 | IISNTTMAVLWMLQSSRGSC>=1.429 | Pancreatic cancer |
| Rule36 | TYQRRMGGVRGQQPYNKGSC>=2.089 | Breast cancer |
| Rule37 | PKQHGRQQNQGIFKPMLGSC>=2.538 | Breast cancer |
| Rule38 | FKETAMPVLNYPVGVNEGSC>=1.959 | Healthy normal donor |
| Rule39 | GEASDNYKWWWDHVVYPGSC>=1.854 | Astrocytoma |
| Rule40 | FFYKKDFTPRHTFQNRRGSC<=0.529 | Astrocytoma |
| Rule41 | APMKNIVSAKTKDFAYMGSC<=0.324 | Astrocytoma |
| Rule42 | Others | Healthy normal donor |