| Literature DB >> 25881029 |
Hui-Ling Huang1,2, Yu-Chung Wu3, Li-Jen Su4, Yun-Ju Huang5, Phasit Charoenkwan6, Wen-Liang Chen7, Hua-Chin Lee8,9, William Cheng-Chung Chu10, Shinn-Ying Ho11,12.
Abstract
BACKGROUND: Few studies have investigated prognostic biomarkers of distant metastases of lung cancer. One of the central difficulties in identifying biomarkers from microarray data is the availability of only a small number of samples, which results overtraining. Recently obtained evidence reveals that epithelial-mesenchymal transition (EMT) of tumor cells causes metastasis, which is detrimental to patients' survival.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25881029 PMCID: PMC4349617 DOI: 10.1186/s12859-015-0463-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Disease-free survival times for 78 lung cancer patients with a follow-up time of 120 months. (A) The mean times of disease-free survival for non-distant and distant metastasis are 73.08 and 14.02 months, respectively. (B) The box plots of non-distant and distant metastasis. The p-value of t-test is 7.99E-22 (p < 0.001) suggesting that distant metastasis is highly correlated to patients’ survival.
Selected characteristics of participants according to NSCLC
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
|
|
| |||||
|
| ||||||
| ≧65 | 49 | 63 | 25 | 24 | 0.64 | |
| <65 | 29 | 37 | 17 | 12 | ||
|
| ||||||
| Male | 56 | 72 | 28 | 28 | 0.32 | |
| Female | 22 | 28 | 14 | 8 | ||
|
| ||||||
| ≧20 pack-year | 35 | 45 | 17 | 18 | 0.49 | |
| <20 pack-year | 43 | 55 | 25 | 18 | ||
|
| ||||||
| Squamous cell | 25 | 32 | 11 | 14 | 0.46b | |
| Adenocarcinoma | 46 | 59 | 25 | 21 | ||
| Large cell | 5 | 6 | 4 | 1 | ||
| Other cell type | 2 | 3 | 2 | 0 | ||
|
| ||||||
| Well | 5 | 6 | 1 | 4 | ||
| Moderate | 51 | 65 | 28 | 23 | 0.80b | |
| Poor | 22 | 28 | 13 | 9 | ||
|
| ||||||
| ≧5 cm | 18 | 23 | 11 | 7 | 0.59 | |
| <5 cm | 59 | 76 | 31 | 28 | ||
|
| ||||||
| Positive | 54 | 69 | 25 | 29 | 0.09 | |
| Negative | 25 | 32 | 17 | 7 | ||
|
| ||||||
| Positive | 5 | 6 | 5 | 0 | 0.20 | |
| Negative | 73 | 94 | 37 | 36 | ||
|
| ||||||
| Stage I | 36 | 46 | 10 | 26 | 3.0E-5c | |
| Stage II | 11 | 14 | 7 | 4 | ||
| Stage III | 26 | 33 | 20 | 6 | ||
| Stage IV | 5 | 6 | 5 | 0 | ||
aThe two-sided p-values were calculated by Fisher's exact test.
bThe p-values were calculated using the two variables with the largest numbers of patients.
cThe p-value was calculated to measure the association between two variables (Stage I and Stages II, III and IV).
Accuracies of the independent test for various weights in the fitness function
|
|
|
|---|---|
| 1.0 | 61.62 |
| 0.8 | 74.82 |
| 0.5 | 62.15 |
| 0.2 | 51.69 |
The 30 top-ranked transcripts in terms of selected times from 30 runs
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1 | FOSB | 12 | 16 | CDKL5 | 3 |
| 2 | MAST1 | 12 | 17 | DCK | 3 |
| 3 | CCL15 | 8 | 18 | KLF12 | 3 |
| 4 | MAK | 6 | 19 | ZAP70 | 3 |
| 5 | SF1 | 5 | 20 | BACH2 | 3 |
| 6 | HDAC9 | 5 | 21 | YSK4 | 3 |
| 7 | YSK4 | 4 | 22 | ELF5 | 3 |
| 8 | EDN1 | 4 | 23 | PRKCB1-1 | 2 |
| 9 | FOXE1-1 | 4 | 24 | CEP110 | 2 |
| 10 | GLI3 | 4 | 25 | HS.541237 | 2 |
| 11 | CDKN2AIP | 4 | 26 | KLF6 | 2 |
| 12 | CREG1 | 3 | 27 | PRKCB1-2 | 2 |
| 13 | FOXE1-2 | 3 | 28 | MAK | 2 |
| 14 | CSNK1A1 | 3 | 29 | CCL16 | 2 |
| 15 | TUB | 3 | 30 | IL23A | 2 |
Figure 2The prediction accuracy and disease-free survival area obtained using the sequence backward selection method. (A) w = 0.8 (B) w = 1.0.
The performance of the IBCGA and sequential backward selection (SBS) methods
|
|
|
|
|
|
|---|---|---|---|---|
|
|
|
|
| |
| IBCGA ( | 95.95 | 70.01 | 97.43 | 69.21 |
| IBCGA ( | 92.02 | 63.81 | 94.73 | 66.91 |
| SBS | 93.59 | 74.81 | 91.03 | 70.92 |
The 17 transcripts obtained by the sequence backward selection method with the disease-free survival area
|
|
|
|
|
|---|---|---|---|
| CCL16 | 74.36 (1) | 43.58 (2) | 5 |
| GLI3 | 71.79 (2) | 34.43 (6) | 15 |
| TUB | 71.79 (3) | 35.71 (4) | 13 |
| PRKCB1-1 | 70.51 (4) | 34.51 (5) | 3 |
| ZAP70 | 67.95 (5) | 29.53 (8) | 14 |
| ELF5 | 67.95 (6) | 26.70 (10) | 17 |
| EDN1 | 66.67 (7) | 31.09 (7) | 9 |
| SF1 | 66.67 (8) | 25.87 (11) | 8 |
| CREG1 | 66.67 (9) | 23.40 (14) | 2 |
| MAST1 | 65.38 (10) | 23.71 (13) | 12 |
| CSNK1A1 | 64.10 (11) | 24.95 (12) | 11 |
| HDAC9 | 64.10 (12) | 27.78 (9) | 10 |
| MAK | 62.82 (13) | 38.76 (3) | 4 |
| CCL15 | 61.54 (14) | 22.94 (15) | 6 |
| PRKCB1-2 | 61.54 (15) | 16.73 (17) | 1 |
| CDKN2AIP | 61.54 (16) | 46.00 (1) | 16 |
| FOXE1-1 | 60.26 (17) | 19.60 (16) | 7 |
The test accuracies of the IBCGA and sequence backward selection (SBS) methods from 30 runs
|
|
|
|
|
|---|---|---|---|
| IBCGA ( | 10 | 88.80 ± 3.92 | 50.41 ± 6.56 |
| IBCGA ( | 10 | 88.88 ± 3.27 | 61.08 ± 7.35 |
| SBS ( | 15 | 91.03 | 53.84 (14/26) |
| SBS ( | 17 | 93.59 | 65.38 (17/26) |
| SVM ensemble | 17 | 93.59 | 76.92 (20/26) |
The type of regulator and location of its protein product, as well as the related cancer of the genes in the 16-gene set
|
|
|
|
|
|
|---|---|---|---|---|
| 1 |
| Cytokine | Extracellular space | Lung cancer breast cancer |
| 2 |
| Kinase | Cytoplasm | NSCLC |
| 3 |
| Cytokine | Extracellular space | NSCLC |
| 4 |
| Transcription regulator | Nucleus | Lung cancer |
| 5 |
| Kinase | Cytoplasm | Lung cancer |
| 6 |
| Kinase | Cytoplasm | Breast cancer |
| 7 |
| Kinase | Plasma membrane | Colorectal cancer |
| 8 |
| Cytokine | Extracellular space | Mammary Adenocarcinoma |
| 9 |
| Transcription regulator | Nucleus | Cancer |
| 10 |
| Transcription regulator | Nucleus | Cancer |
| 11 |
| Transcription regulator | Nucleus | Medulloblastoma |
| 12 |
| Transcription regulator | Nucleus | Cancer |
| 13 |
| Transcription regulator | Nucleus | Thyroid |
| 14 |
| Kinase | Cytoplasm | Prostate cancer |
| 15 |
| Transcription regulator | Cytoplasm | Relation to ocular diseases |
| 16 |
| Transcription regulator | Nucleus | Inhibitor of apoptosis |
The 32 lung cancer-related genes and their relevant papers
|
|
|
|
|---|---|---|
| 1 |
| Ohta |
| 2 |
| Gemmill |
| 3 |
| Feng |
| 4 |
| Yoshikawa |
| 5 |
| Ward |
| 6 |
| Qi |
| 7 |
| Singh |
| 8 |
| Thomson |
| 9 |
| Takeyama |
| 10 |
| Xiang |
| 11 |
| Matsuno |
| 12 |
| Pallier |
Performance comparison among various gene sets
|
|
|
|
|
|---|---|---|---|
| 32-gene (lung cancer related) | 65.38 | 34.47 | 50.00 (13/26) |
| 9-gene (SBS from the 32-gene set) | 73.07 | 55.70 | 80.76 (21/26) |
| 16-gene (EMT related, SVM ensemble) | 93.59 | 74.81 | 76.92 (20/26) |
| 11-gene (EMT and cancer related) | 87.18 | 53.37 | 69.23 (18/26) |
| 5-gene (EMT and lung cancer related) | 78.25 | 52.33 | 76.92 (20/26) |
| 1-gene ( | 74.36 | 43.58 | 57.69 (15/26) |
| 1-gene ( | 61.54 | 46.00 | 76.92 (20/26) |
The value of Asurv is 73.21% for real classes of samples in the training dataset.
Figure 3The disease-free survival areas of various gene sets using Kaplan-Meier survival curves. There are 37 non-distant- (in blue) and 41 distant-metastasis samples (in red). (A) real class (73.21%), (B) 16-gene set (74.81%), (C) 11-gene set (53.37%), (D) 5-gene set (52.33%), (E) CCL16 (43.58%) and (F) CDKN2AIP (46.00%).
Figure 4The predicted disease-free survival time using the support vector regression with the 16-gene set.