| Literature DB >> 32235589 |
Ken Asada1,2, Kazuma Kobayashi1,2, Samuel Joutard1,2, Masashi Tubaki3, Satoshi Takahashi1,2, Ken Takasawa1,2, Masaaki Komatsu1,2, Syuzo Kaneko2, Jun Sese2,4, Ryuji Hamamoto1,2.
Abstract
Lung cancer is one of the leading causes of death worldwide. Therefore, understanding the factors linked to patient survival is essential. Recently, multi-omics analysis has emerged, allowing for patient groups to be classified according to prognosis and at a more individual level, to support the use of precision medicine. Here, we combined RNA expression and miRNA expression with clinical information, to conduct a multi-omics analysis, using publicly available datasets (the cancer genome atlas (TCGA) focusing on lung adenocarcinoma (LUAD)). We were able to successfully subclass patients according to survival. The classifiers we developed, using inferred labels obtained from patient subtypes showed that a support vector machine (SVM), gave the best classification results, with an accuracy of 0.82 with the test dataset. Using these subtypes, we ranked genes based on RNA expression levels. The top 25 genes were investigated, to elucidate the mechanisms that underlie patient prognosis. Bioinformatics analyses showed that the expression levels of six out of 25 genes (ERO1B, DPY19L1, NCAM1, RET, MARCH1, and SLC7A8) were associated with LUAD patient survival (p < 0.05), and pathway analyses indicated that major cancer signaling was altered in the subtypes.Entities:
Keywords: lung cancer; multi-omics analysis; survival-associated genes
Year: 2020 PMID: 32235589 PMCID: PMC7225957 DOI: 10.3390/biom10040524
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1Overall workflow for classification of lung-cancer subtypes. (A) Multi-omics analysis pipeline. (B) Clustering result of elbow method. (C) Clustering results of the Silhouette index and Calinski–Harabasz criterion. (D) Clustering result of K-means clustering. Red dot represents S1, and blue dot represents S2 subtype in Figure 1E. (E) Kaplan–Meier plot using patient labels obtained from Figure 1D.
Evaluation of SVM model performance.
| Number of Features | Train Score Accuracy | Test Score Accuracy |
|---|---|---|
| 10 (5 + 5) | 0.61 | 0.57 |
| 20 (10 + 10) | 0.81 | 0.66 |
| 30 (15 + 15) | 0.89 | 0.71 |
| 40 (20 + 20) | 0.94 | 0.82 |
| 45 (20 + 25) | 0.95 | 0.82 |
| 50 (20 + 30) | 0.97 | 0.80 |
Confusion matrix of SVM.
| Predicted Positive | Predicted Negative | |
|---|---|---|
|
| 62 | 10 |
|
| 17 | 57 |
Evaluation of KNN, RF, and LR performance.
| KNN | RF | LR | ||||||
|---|---|---|---|---|---|---|---|---|
| Class | Manhattan | Euclidean | Tree | Entropy | Gini | C | L1 | L2 |
| 1 | 0.72 | 0.70 | 1 | 0.54 | 0.54 | 1 | 0.75 | 0.74 |
| 2 | 0.71 | 0.68 | 2 | 0.64 | 0.64 | 5 | 0.72 | 0.75 |
| 3 | 0.76 | 0.73 | 3 | 0.64 | 0.66 | 10 | 0.71 | 0.74 |
| 4 | 0.73 | 0.74 | 4 | 0.64 | 0.67 | 50 | 0.70 | 0.71 |
| 5 | 0.74 | 0.75 | 5 | 0.66 | 0.67 | 100 | 0.70 | 0.71 |
| 6 | 0.71 | 0.72 | 6 | 0.66 | 0.66 | 500 | 0.70 | 0.69 |
| 7 | 0.73 | 0.75 | 7 | 0.67 | 0.65 | 1000 | 0.70 | 0.70 |
| 8 | 0.71 | 0.75 | 8 | 0.67 | 0.65 | |||
| 9 | 0.73 | 0.75 | 9 | 0.67 | 0.65 | |||
| 10 | 0.73 | 0.75 | 10 | 0.67 | 0.65 | |||
Clinical characterization in LUAD low-risk and high-risk subtypes.
| Low-Risk ( | High-Risk ( | ||
|---|---|---|---|
| Age at initial pathologic diagnosis (age) | 65.5 ± 9.8 | Age at initial pathologic diagnosis (age) | 65.6 ± 10.2 |
| Tumor stage * | (No.) | Tumor stage * | (No.) |
| Discrepancy | 1 | Discrepancy | 2 |
| Stage I | 2 | Stage I | 2 |
| Stage IA | 47 | Stage IA | 50 |
| Stage IB | 48 | Stage IB | 53 |
| Stage II | 0 | Stage II | 1 |
| Stage IIA | 20 | Stage IIA | 19 |
| Stage IIB | 14 | Stage IIB | 29 |
| Stage IIIA | 21 | Stage IIIA | 31 |
| Stage IIIB | 3 | Stage IIIB | 4 |
| Stage IV | 10 | Stage IV | 7 |
| Gender | (No.) | Gender | (No.) |
| Male | 82 | Male | 79 |
| Female | 84 | Female | 119 |
| Vital state | (No.) | Vital state | (No.) |
| Alive | 116 | Alive | 117 |
| Dead | 50 | Dead | 81 |
| Overall survival time (days) | 996.0 ± 967.2 | Overall survival time (days) | 730.5 ± 560.0 |
| New tumor event | (No.) | New tumor event | (No.) |
| Yes | 48 | Yes | 83 |
| No | 118 | No | 115 |
| Days to event | 588.9 ± 539.0 | Days to event | 503.7 ± 444.4 |
| Progression-free interval | (No.) | Progression-free interval | (No.) |
| Available | 59 | Available | 88 |
| Progression-free interval time (days) | 836.9 ± 874.2 | Progression-free interval time (days) | 605.8 ± 518.3 |
| Smoking history indicator | (No.) | Smoking history indicator | (No.) |
| 1 | 22 | 1 | 29 |
| 2 | 39 | 2 | 45 |
| 3 | 46 | 3 | 48 |
| 4 | 54 | 4 | 69 |
| 5 | 0 | 5 | 4 |
| NA or unknown | 5 | NA or unknown | 3 |
* American Joint Committee on Cancer (AJCC) pathology states.
Gene mutations analysis of 18 genes reported as having a statistically significant mutation in the LUAD dataset. Gene names and number of mutations (number of patients) are summarized.
| Genes | Low-Risk | High-Risk |
|---|---|---|
|
| 64 (63) | 83 (80) |
|
| 42 (41) | 40 (38) |
|
| 26 (26) | 27 (26) |
|
| 20 (18) | 17 (16) |
|
| 17 (13) | 17 (13) |
|
| 12 (10) | 29 (28) |
|
| 12 (10) | 8 (8) |
|
| 11 (10) | 12 (10) |
|
| 10 (9) | 11 (10) |
|
| 10 (9) | 17 (14) |
|
| 4 (4) | 5 (5) |
|
| 9 (8) | 10 (7) |
|
| 8 (8) | 9 (8) |
|
| 12 (11) | 18 (18) |
|
| 7 (6) | 6 (6) |
|
| 7 (5) | 6 (6) |
|
| 0 (0) | 0 (0) |
|
| 4 (3) | 2 (2) |
Figure 2Copy number variation analysis in two subtypes. Red box represents low-risk subtype, and blue box represents high-risk subtype.
Figure 3Subtype-specific signaling pathways obtained from GSEA. The left represents pathway names, and the right represents gene ranks.
Summary of KEGG pathway, miRNA, and GO analysis.
|
|
|
|
| Fatty acid metabolism | 1.64 × 10−3 | 2.00 × 10−3 |
| Oxidative phosphorylation | 1.49 × 10−3 | 2.00 × 10−3 |
| Valine, leucine and isoleucine degradation | 1.61 × 10−3 | 2.00 × 10−3 |
| Arachidonic acid metabolism | 1.69 × 10−3 | 2.00 × 10−3 |
| Pyruvate metabolism | 1.64 × 10−3 | 2.00 × 10−3 |
|
|
|
|
| miR-501_AAAGGAT | 1.45 × 10−3 | 0.319 |
| miR-26a/miR-26b_TACTTGA | 8.44 × 10−3 | 0.481 |
| miR-507_GTGCAAA | 8.71 × 10−3 | 0.481 |
| miR-33_CAATGCA | 6.05 × 10−3 | 0.481 |
| miR-200b/miR-200c/miR-429_CAGTATT | 2.03 × 10−2 | 0.660 |
|
|
|
|
| Spinal cord development | 1.61 × 10−3 | 5.97 × 10−2 |
| Neuromuscular junction development | 1.64 × 10−3 | 5.97 × 10−2 |
| Cytoplasmic translation | 3.08 × 10−3 | 5.97 × 10−2 |
| Positive regulation of calcium ion transport | 2.87 × 10−3 | 5.97 × 10−2 |
| Regulation of antigen receptor mediated signaling pathway | 2.62 × 10−3 | 5.97 × 10−2 |
Top 25 RNAs with statistical significance between the two subtypes.
| Rank | Gene Name | Log2 Fold Change | Adjusted | |
|---|---|---|---|---|
| 1 |
| 3.45 | 3.01 × 10−40 | 4.08 × 10−36 |
| 2 |
| 1.71 | 3.98 × 10−27 | 2.69 × 10−23 |
| 3 |
| 1.93 | 1.47 × 10−20 | 6.64 × 10−17 |
| 4 |
| 0.89 | 1.22 × 10−18 | 4.12 × 10−15 |
| 5 |
| 1.49 | 2.39 × 10−18 | 6.47 × 10−15 |
| 6 |
| 1.18 | 2.20 × 10−16 | 4.97 × 10−13 |
| 7 |
| 1.40 | 2.81 × 10−15 | 5.44 × 10−12 |
| 8 |
| 1.32 | 5.12 × 10−15 | 8.68× 10−12 |
| 9 |
| −0.72 | 2.38 × 10−14 | 3.58 × 10−11 |
| 10 |
| 1.22 | 5.19 × 10−14 | 7.03 × 10−11 |
| 11 |
| 1.37 | 1.48 × 10−13 | 1.82 × 10−10 |
| 12 |
| 1.82 | 3.39 × 10−13 | 3.83 × 10−10 |
| 13 |
| −1.17 | 1.93 × 10−12 | 2.02 × 10−9 |
| 14 |
| 0.94 | 2.54 × 10−12 | 2.45 × 10−9 |
| 15 |
| 1.28 | 2.98 × 10−12 | 2.70 × 10−9 |
| 16 |
| 0.82 | 4.35 × 10−12 | 3.68 × 10−9 |
| 17 |
| 0.38 | 7.25 × 10−12 | 5.78 × 10−9 |
| 18 |
| 1.06 | 9.22 × 10−12 | 6.94 × 10−9 |
| 19 |
| −1.37 | 1.07 × 10−11 | 7.66 × 10−9 |
| 20 |
| 0.72 | 1.15 × 10−11 | 7.80 × 10−9 |
| 21 |
| −1.30 | 1.33 × 10−11 | 8.61 × 10−9 |
| 22 |
| 0.98 | 2.33 × 10−11 | 1.43 × 10−8 |
| 23 |
| 0.88 | 3.19 × 10−11 | 1.88 × 10−8 |
| 24 |
| −0.68 | 4.27 × 10−11 | 2.41 × 10−8 |
| 25 |
| 0.53 | 4.99 × 10−11 | 2.71 × 10−8 |
Figure 4Newly identified survival-associated genes. The red line represents high-expression subtype, and the blue line represents low-expression subtype.
Figure 5Co-expression analysis of ERO1B. Correlation analysis was performed with Pearson correlation test against ERO1B gene.
Summary of co-expression genes.
| Rank | Target Gene | Pearson Correlation | FDR * | ||
|---|---|---|---|---|---|
| 1 |
| 0.679 | 6.11 × 10−71 | 6.12 × 10−67 | |
| 2 |
| 0.615 | 5.63 × 10−55 | 3.75 × 10−51 | |
| 3 |
| 0.605 | 1.07 × 10−52 | 5.36 × 10−59 | |
| 4 |
|
| 0.581 | 9.89 × 10−48 | 3.95 × 10−44 |
| 5 |
| 0.578 | 2.53 × 10−47 | 7.47 × 10−44 | |
| 6 |
| 0.578 | 2.67 × 10−47 | 7.47 × 10−44 | |
| 7 |
| 0.578 | 3.55 × 10−47 | 8.86 × 10−44 | |
| 8 |
| −0.559 | 9.74 × 10−44 | 2.16 × 10−40 | |
| 9 |
| −0.558 | 2.06 × 10−43 | 4.11 × 10−40 | |
| 10 |
| 0.553 | 1.28 × 10−42 | 2.33 × 10−39 | |
| 46 |
|
| 0.482 | 2.91 × 10−31 | 1.24 × 10−28 |
| 77 |
|
| 0.462 | 1.50 × 10−28 | 3.85 × 10−26 |
* False discovery rate (Benjamini–Hochberg procedure). Red arrows indicate ones of top 25 RNA shown in Table 7.