| Literature DB >> 29872324 |
Jingming Zhao1, Wei Cheng1, Xigang He2, Yanli Liu1, Ji Li3, Jiaxing Sun1, Jinfeng Li1, Fangfang Wang1, Yufang Gao4.
Abstract
BACKGROUND: Novel diagnostic predictors and drug targets are needed for LUAD (lung adenocarcinoma). We aimed to build a specific SVM (support vector machine) classifier for diagnosis of LUAD and identify molecular markers with prognostic value for LUAD.Entities:
Keywords: SVM classifier; lncRNA-miRNA-mRNA network; lung adenocarcinoma; molecular marker; prognosis
Year: 2018 PMID: 29872324 PMCID: PMC5975616 DOI: 10.2147/OTT.S151121
Source DB: PubMed Journal: Onco Targets Ther ISSN: 1178-6930 Impact factor: 4.147
The parameters for calculating Fisher’s score
| DEGs | Non-DEGs | Total | |
|---|---|---|---|
| Pathway genes | |||
| Non-pathway genes | |||
| Total |
Abbreviations: DEG, differentially expressed genes; N, total number of genes; M, number of pathway genes; K, number of differently expressed genes.
Figure 1Hierarchical clustering analysis of TCGA samples using differentially expressed lncRNA (A), miRNA (B) and mRNA (C).
Abbreviation: TCGA, The Cancer Genome Atlas.
Clinical features related differentially expressed lncRNAs, miRNAs and mRNAs
| Comparisons | Upregulated
| Downregulated
| ||||
|---|---|---|---|---|---|---|
| lncRNA | miRNA | mRNA | lncRNA | miRNA | mRNA | |
| Age (≥60 versus <60) | – | hsa-mir-133a-1 | FLJ41941, DGCR9, TERC | – | ||
|
| ||||||
| Gender (M/F) | DKFZp779M0652, KIAA0087 | hsa-mir-1247, hsa-mir-133a-1 | C15orf54, C2orf48, CECR7, DGCR9, EGOT | hsa-mir-503 | ||
|
| ||||||
| New tumor (yes/no) | FAM138F, KIAA0087, SFTA1P | EGOT, DGCR9 | hsa-mir-196a-1, hsa-mir-1269 | |||
|
| ||||||
| Pathologic M (M1/M0) | SFTA1P | – | C1orf220, C2orf48, KIAA0087, C15orf54 | hsa-mir-323b, hsa-mir-31, hsa-mir-1269, hsa-mir-539 | ||
|
| ||||||
| Pathologic N (N2 + N3/N0 + N1) | DKFZp779M0652, KIAA0087 | hsa-mir-184 | C20orf197, CECR7 | hsa-mir-1269, hsa-mir-31, hsa-mir-577 | ||
|
| ||||||
| Pathologic T (T3 + T4/T1 + T2) | C22orf34, DIO3OS, KIAA0087 | hsa-mir-133a-1 | C15orf54, C10orf91, C20orf197, CECR7 | hsa-mir-1269, hsa-mir-31, hsa-mir-323b, hsa-mir-450b | ||
|
| ||||||
| Pathologic stage (III + IV/I + II) | DIO3OS, DKFZp779M0652, KIAA0087 | hsa-mir-1247 | C20orf197, CECR7, EGOT, HAR1B | hsa-mir-1269, hsa-mir-577, hsa-mir-9-2 | ||
|
| ||||||
| Cancer status (with/without) | KIAA0087, SFTA1P | – | EGOT, HAR1B, C2orf48 | hsa-mir-539, hsa-mir-1269 | ||
|
| ||||||
| Smoking history (yes/no) | DIO3OS, DKFZp779M0652, KIAA0087 | hsa-mir-184 | C15orf54, C20orf197, C2orf48, EGOT, FLJ12825 | hsa-mir-1269, hsa-mir-31 | ||
Figure 2LUAD specific lncRNA-miRNA-mRNA ceRNA network. LUAD specific lncRNA-miRNA regulatory network (A), miRNA-mRNA regulatory network (B) and ceRNA network (C). The ceRNA network is acquired by integrating lncRNA-miRNA and miRNA-mRNA regulatory network. Squares, triangles and circles indicate lncRNAs, miRNAs and mRNAs, respectively. Upregulated lncRNAs, miRNAs and mRNAs in LUAD are shown as red and downregulated ones shown as green. Red lines and blue lines indicate lncRNA-miRNA and miRNA-mRNA regulatory relationships, whereas gray lines indicate protein–protein interactions of corresponding mRNAs.
Abbreviation: LUAD, lung adenocarcinoma.
Functional annotation of mRNAs in the ceRNA network
| Term | Count | |
|---|---|---|
| GO:0000278~mitotic cell cycle | 31 | 5.16E–19 |
| GO:0022403ĉell cycle phase | 32 | 9.85E–19 |
| GO:0007067~mitosis | 25 | 1.25E–17 |
| GO:0000280~nuclear division | 25 | 1.25E–17 |
| GO:0000087~M-phase of mitotic cell cycle | 25 | 1.92E–17 |
| GO:0048285~organelle fission | 25 | 3.24E–17 |
| GO:0000279~M-phase | 28 | 6.32E–17 |
| GO:0022402~cell cycle process | 33 | 8.46E–16 |
| GO:0007049~cell cycle | 37 | 2.12E–15 |
| GO:0051301~cell division | 22 | 2.78E–11 |
| GO:0007059~chromosome segregation | 11 | 1.43E–06 |
| GO:0007017~microtubule-based process | 13 | 0.001431 |
| GO:0007346~regulation of mitotic cell cycle | 10 | 0.005949 |
| GO:0010564~regulation of cell cycle process | 9 | 0.005999 |
| GO:0000070~mitotic sister chromatid | 6 | 0.01543 |
| segregation | ||
| GO:0000819~sister chromatid segregation | 6 | 0.017727 |
| GO:0006259~DNA metabolic process | 16 | 0.02048 |
| GO:0006260~DNA replication | 10 | 0.036122 |
| hsa04012:ErbB signaling pathway | 6 | 0.001138 |
| hsa04110:Cell cycle | 6 | 0.005563 |
| hsa03440:Homologous recombination | 3 | 0.027134 |
| hsa04080:Neuroactive ligand-receptor | 7 | 0.029121 |
| interaction | ||
| hsa05200:Pathways in cancer | 7 | 0.079398 |
Abbreviations: GO-BPs, gene ontology-biological processes; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Figure 3Construction and validation of the LUAD specific SVM classifier. (A) Feature gene selection based on recursive feature elimination. The prediction accuracy versus the number of selected feature genes is plotted as blue line. The red dashed line labels the best prediction accuracy (95.3%, 442 out of 464 TCGA samples), with the corresponding number of selected feature genes being 44. (B) Scatter plot of TCGA samples based on the LUAD specific SVM classifier. (C) ROC curves of TCGA (black), GSE10072 (blue) and GSE43458 (orange) datasets generated using the LUAD specific SVM classifier. AUCs are calculated to be 0.996, 0.963 and 0.985 for each data.
Abbreviations: LUAD, lung adenocarcinoma; SVM, support vector machine; TCGA, The Cancer Genome Atlas; ROC, receiver operating characteristic; AUC, area under ROC curve.
Selected feature genes from the ceRNA network
| Gene | LogFC | FDR | Gene | LogFC | FDR | ||
|---|---|---|---|---|---|---|---|
| −3.10475 | 2.24E–15 | 3.57E–13 | −0.65072 | 0.000921 | 0.009757 | ||
| −2.72135 | 3.37E–19 | 1.07E–16 | −0.61702 | 0.001212 | 0.012241 | ||
| −1.67587 | 6.18E–16 | 1.11E–13 | 0.604752 | 0.000166 | 0.00223 | ||
| −1.64033 | 3.87E–09 | 1.72E–07 | 0.606032 | 0.000124 | 0.001713 | ||
| −1.62809 | 3.02E–10 | 1.83E–08 | 0.651515 | 6.89E–05 | 0.001019 | ||
| −1.4877 | 7.22E–09 | 2.97E–07 | 0.660387 | 1.92E–06 | 4.34E–05 | ||
| −1.45014 | 1.73E–08 | 6.45E–07 | 0.66095 | 0.000318 | 0.003954 | ||
| −1.35673 | 4.59E–11 | 3.42E–09 | 0.679193 | 1.20E–05 | 0.000223 | ||
| −1.33536 | 5.52E–07 | 1.44E–05 | 0.702108 | 3.72E–07 | 1.01E–05 | ||
| −1.14566 | 1.92E–07 | 5.59E–06 | 0.706481 | 1.46E–06 | 3.38E–05 | ||
| −1.14019 | 5.99E–07 | 1.55E–05 | 0.720253 | 1.62E–05 | 0.000289 | ||
| −1.13563 | 2.26E–06 | 4.99E–05 | 0.726607 | 2.48E–05 | 0.000421 | ||
| −1.0867 | 5.42E–06 | 0.000109 | 0.729584 | 2.47E–07 | 7.05E–06 | ||
| −0.95974 | 1.39E–05 | 0.000252 | 0.75071 | 3.69E–08 | 1.29E–06 | ||
| −0.9316 | 1.33E–06 | 3.11E–05 | 0.839385 | 1.81E–08 | 6.69E–07 | ||
| −0.92189 | 6.43E–05 | 0.00096 | 0.946016 | 4.26E–10 | 2.42E–08 | ||
| −0.76123 | 0.000196 | 0.002587 | 0.950831 | 1.54E–08 | 5.79E–07 | ||
| −0.7507 | 9.53E–05 | 0.001363 | 1.08277 | 3.86E–11 | 2.92E–09 | ||
| −0.72205 | 0.000455 | 0.005339 | 1.158801 | 1.99E–15 | 3.21E–13 | ||
| −0.71549 | 0.000379 | 0.004581 | 1.189844 | 1.04E–14 | 1.43E–12 | ||
| −0.71264 | 0.006238 | 0.048666 | 1.275478 | 8.00E–22 | 4.87E–19 | ||
| −0.68691 | 0.001514 | 0.014765 | 1.801564 | 9.18E–46 | 1.29E–41 |
Abbreviations: FC, fold change; FDR, false discovery rate.
Performance of support vector machine classifier in training and validation datasets
| Datasets | No of samples | Correct rate | Se | Sp | PPV | NPV | AUC |
|---|---|---|---|---|---|---|---|
| TCGA | 464 | 95.3% | 0.889 | 0.957 | 0.886 | 0.995 | 0.996 |
| GSE10072 | 107 | 90.7% | 0.918 | 0.897 | 0.882 | 0.929 | 0.963 |
| GSE43458 | 110 | 97.3% | 0.967 | 0.975 | 0.935 | 0.987 | 0.985 |
Abbreviations: TCGA, The Cancer Genome Atlas; Se, sensitivity; Sp, specificity; PPV, positive prediction value; NPV, negative prediction value; AUC, area under ROC curve.
Prognosis related lncRNAs, miRNAs and mRNAs
| RNA | Upregulated | Downregulated |
|---|---|---|
| lncRNA | KIAA0087, PGM5P2, SFTA1P | C15orf54, C20orf197 |
| miRNA | hsa-miR-184, hsa-miR-204 | hsa-miR-651, hsa-miR-188, hsa-miR-96, hsa-miR-708 |
| mRNA |
Figure 4Kaplan–Meier analysis of prognosis related lncRNAs, miRNAs and mRNAs. (A, B) Kaplan–Meier curves of two lncRNAs PGM5P2 and SFTA1P. (C, D) Kaplan–Meier curves of two miRNAs hsa-miR-96 and hsa-miR-204. (E–H) Kaplan–Meier curves of four mRNAs RGS20, RGS9BP, FGB and INA. Red and blue lines indicate patient groups with expression level above and below median value, respectively. P-value indicates the significance of difference.
Abbreviations: PGM5P2, phosphoglucomutase 5 pseudogene 2; SFTA1P, surfactant associated 1; RGS20, regulator of G protein signaling 20; RGS9BP, RGS9-binding protein; FGB, fibrinogen beta chain; INA, alpha-internexin.