| Literature DB >> 25335090 |
Ao Zhang1, Chi Wang2, Shiji Wang1, Liang Li3, Zhongmin Liu1, Suyan Tian4.
Abstract
INTRODUCTION: The widespread application of microarray experiments to cancer research is astounding including lung cancer, one of the most common fatal human tumors. Among non-small cell lung carcinoma (NSCLC), there are two major histological types of NSCLC, adenocarcinoma (AC) and squamous cell carcinoma (SCC).Entities:
Mesh:
Substances:
Year: 2014 PMID: 25335090 PMCID: PMC4198193 DOI: 10.1371/journal.pone.0110052
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Results of simulation studies.
| Max # of features | Simulation 1 (selected features/VizRank Score) | Simulation 2 (selected features/VizRank Score) | Simulation 3 (selected features/VizRank Score) |
| 3 | X4, X18, X3 (89.12%) | X10, X38, X5 (68.67%) | X1, X2, X75 (73.35%) |
| 4 | X4, X3, X18, X9 (90.44%) | X5, X10, X3, X38 (75.35%) | X1, X173, X2, X7 (75.78%) |
| 5 | X9, X2, X12, X3, X5 (93.06% | X5, X10, X3, X38 (75.35%) |
|
| 6 | X5, X4, X2, X12, X11, X3 (94.22%) | X6, X11, X2, X1, X10, X5 (78.04%) | X1, X28, X3, X32, X2, X83 (78.09%) |
| 7 |
| X5, X4, X11, X38, X21, X3, X10 (79.75%) | X170, X1, X7, X2, X173 (78.34%) |
| 8 | X3, X5, X4, X6, X9, X2, X11, X12 (94.15%) |
| X170, X1, X7, X2, X173 (78.34%) |
| Frequent ones | X3, X4, X9, X12, X5, X11, X2, X16, X6, X131, X18, X1 | X11, X16, X10, X6, X1, X3, X131, X2, X4, X7, X38, X5, X8, X9, X328, X12 | X1 X4, X72, X338, X3, X173, X2 |
Figure 1Study flowchart.
Performance metrics of classifiers on the lung cancer test set (AC and SCC subtype classification).
| The data used (Total # of samples) | N# of Genes | Error (%) | GBS (0) | BCM (1) | AUPR (1) | |
|
| GSE10245, GSE18842, GSE31799 (151, 81AC, 70SCC) | 1 | 15.3 | NA | NA | NA |
|
| GSE10245, GSE18842, GSE2109, GSE31908 (175, 100AC, 75SCC) | 20 | 16 | 0.1153 | 0.8325 | 0.9416 |
|
| ||||||
|
| GSE10245, GSE18842, GSE2109 (only stage I &II, 145, 71AC, 74SCC) | 3 | 16.67 | – | – | – |
|
| GSE10245, GSE18842, GSE2109 (145) | 3 | 14.67 | 0.2360 | 0.5144 | 0.8917 |
|
| GSE10245, GSE18842, GSE2109 (145) | 3 | 13.33 | 0.1260 | 0.8447 | 0.8908 |
|
| GSE10245, GSE18842, GSE2109 (145) | 3 | 13.33 | 0.1208 | 0.6974 | 0.8978 |
|
| ||||||
|
| GSE10245, GSE18842, GSE2109 (145) | 8 | 14 | – | – | – |
|
| GSE10245, GSE18842, GSE2109 (145) | 8 | 13.33 | 0.1061 | 0.8271 | 0.8935 |
|
| GSE10245, GSE18842, GSE2109 (145) | 8 | 12.67 | 0.1191 | 0.8719 | 0.9067 |
|
| GSE10245, GSE18842, GSE2109 (145) | 8 | 14 | 0.1029 | 0.7983 | 0.9211 |
NA: not available. –: not computable because no posterior probabilities were provided.
Figure 2ROC curves for 3-gene signature combinations.
The signature of KRT5 alone has the best AUC values on both training and test sets.
Figure 3Scatterplots on the training data and test data.
A. 3D scatterplots with KRT5 on x-axis, MAGEA4 on y-axis, and RORC on z-axis. B. 2D scatterplots with KRT5 on x-axis, MAGEA4 on y-axis. From these scatterplot, it is obvious that KRT5 alone can discriminate AC and SCC samples apart on both training and test sets.
Might-be wrongly labeled samples identified by Ben-Hamo's study.
| ID | Label | Overall SBV misclassification rate | Methods indicating opposite labels |
| 115 | AC | 84% | All eight methods |
| 19 | SCC | 86% | All except Radviz alone on 3 gene signature |
| 100 | SCC | 88% | All except Radviz alone on 3 gene signature |
| 3 | SCC | 90% | All eight methods |
| 70 | SCC | 76% | All except Radviz alone on 3 gene signature |
| 9 | SCC | 88% | All eight methods |
Performance metrics of classifiers on the lung cancer test set (subtype and stage classification).
| N# of Genes | Error (%) | GBS (0) | BCM (1) | AUPR (1) | |
|
| |||||
| Ben-Hamo's study | 23 | 49.3 | NA | 0.48 | 0.46 |
| Hierarchical-TGDR | 66 | 53.3 | 0.3736 | 0.4401 | 0.4709 |
| Pairwise Coupling | 158 | 54 | 0.3794 | 0.4371 | 0.4010 |
| Multi-TGDR local | 83 | 54 | 0.3579 | 0.4210 | 0.4681 |
| Multi-TGDR global | 60 | 54 | 0.3524 | 0.4164 | 0.4685 |
|
| |||||
| Radviz + multi-TGDR | 8 | 54.7 | 0.3423 | 0.4137 | 0.4557 |
| Radviz+ naïve Bayes | 8 | 54.7 | 0.4104 | 0.4437 | 0.4494 |
| Radviz+SVM | 8 | 54 | 0.3654 | 0.4137 | 0.4562 |
|
| |||||
| Radviz + multi-TGDR | 10 | 53.3 | 0.3215 | 0.4269 | 0.4710 |
| Radviz+ naïve Bayes | 10 | 55.3 | 0.4256 | 0.4503 | 0.4573 |
| Radviz+ SVM | 10 | 54.7 | 0.3516 | 0.3612 | 0.4815 |
NA: not available.
Figure 4RadViz plots using 8-gene signature on the training data and test data.
From these plots, it is observed that AC and SCC samples can be discriminated with a reasonable size of misclassification rate on both training and test sets. However, the discrimination between different stages within each subtype is not achieved on both training and test datasets.