Literature DB >> 29872324

Construction of a specific SVM classifier and identification of molecular markers for lung adenocarcinoma based on lncRNA-miRNA-mRNA network.

Jingming Zhao1, Wei Cheng1, Xigang He2, Yanli Liu1, Ji Li3, Jiaxing Sun1, Jinfeng Li1, Fangfang Wang1, Yufang Gao4.   

Abstract

BACKGROUND: Novel diagnostic predictors and drug targets are needed for LUAD (lung adenocarcinoma). We aimed to build a specific SVM (support vector machine) classifier for diagnosis of LUAD and identify molecular markers with prognostic value for LUAD.
METHODS: The expression differences of miRNAs, lncRNAs and mRNAs between LUAD and normal samples were compared using data from TCGA (The Cancer Genome Atlas) database. A LUAD related miRNA-lncRNA-mRNA network was constructed, based on which feature genes were selected for the construction of LUAD specific SVM classifier. The robustness and transferability of SVM classifier were validated using gene expression profile datasets GSE43458 and GSE10072. Prognostic markers were identified from the network. A set of LUAD-related differentially expressed miRNAs, lncRNAs and miRNAs were identified and a LUAD related miRNA-lncRNA-mRNA network was obtained. The LUAD specific SVM classifier constructed on the basis of the network was robust and efficient for classification of samples from TCGA dataset and two independent validation datasets.
RESULTS: Eight RNAs with prognostic value were identified, including hsa-miR-96, hsa-miR-204, PGM5P2 (phosphoglucomutase 5 pseudogene 2), SFTA1P (surfactant associated 1), RGS20 (regulator of G protein signaling 20), RGS9BP (RGS9-binding protein), FGB (fibrinogen beta chain) and INA (alpha-internexin). Among them, RGS20 and INA were regulated by hsa-miR-96. RGS20 was also regulated by hsa-miR-204, which was a potential target of SFTA1P.
CONCLUSION: The LUAD specific SVM classifier may serve as a novel diagnostic predictor. hsa-miR-96, hsa-miR-204, PGM5P2, SFTA1P, RGS20, RGS9BP, FGB and INA may serve as prognostic markers in clinical practice.

Entities:  

Keywords:  SVM classifier; lncRNA-miRNA-mRNA network; lung adenocarcinoma; molecular marker; prognosis

Year:  2018        PMID: 29872324      PMCID: PMC5975616          DOI: 10.2147/OTT.S151121

Source DB:  PubMed          Journal:  Onco Targets Ther        ISSN: 1178-6930            Impact factor:   4.147


Introduction

LUAD (lung adenocarcinoma) is the most common subtype of non-small cell lung cancer, accounting for about 40% of lung cancer worldwide.1,2 Molecularly targeted therapies using TKIs (tyrosine kinase inhibitors) are standard treatments for LUAD patients with mutations in EGFR (epidermal growth factor receptor) and fusions of ALK (anaplastic lymphoma kinase), ROS1 (ROS proto-oncogene 1), and RET (rearranged during transfection).3,4 Acquired resistance, however, often occurs approximately 1–2 years after TKI treatment.4 Moreover, few effective therapies have been developed to target alterations in other genes, such as TP53 (tumor protein p53),5 KEAP1 (kelch-like ECH associated protein 1)6 and STK11 (serine/threonine kinase 11).7 Therefore, it is still urgent for developing new drug targets for the diagnosis and treatment of LUAD. Increasing evidence has highlighted the involvement of ncRNAs (non-coding RNAs) in tumorigenesis.8 Two typical subtypes of ncRNAs are miRNAs (microRNAs) and lncRNAs (long non-coding RNAs).9–11 miRNAs are small ncRNAs with about 22 nucleotides, which can interact with target mRNAs to degrade mRNAs or inhibit the translation of mRNA.9,10 In comparison to miRNAs, lncRNAs are much longer ncRNAs with more than 200 nucleotides and function through more diverse mechanisms.9,11 In addition to directly targeting mRNAs, it has also been shown to function as ceRNAs (competing endogenous RNAs), interacting with miRNAs to indirectly regulate mRNAs.11,12 It is thus believed that interplays between lncRNAs and miRNAs may play an important role in tumorigenesis.12 Recently, investigations about the lncRNA-miRNA-mRNA ceRNA networks provide a better understanding of the roles lncRNA-miRNA interactions in mRNAs regulation and LUAD development.13,14 Important regulatory pathways, as well as therapeutic targets, could be revealed based on lncRNA-miRNA-mRNA networks. For example, MEG3 (maternally expressed 3), MIAT (myocardial infarction associated transcript) and LINC00115 may serve as prognostic lncRNAs and may be involved in regulatory pathways in LUAD.14 According to the lncRNA-miRNA-mRNA network, MEG3 and MIAT regulate MAPK9 (mitogen-activated protein kinase 9) by interacting with miR-106, whereas LINC00115 regulate FGF2 (fibroblast growth factor 2) by interacting with miR-7.14 Two gene expression profile datasets GSE4345815 and GSE1007216 have been used to reveal genes related to LUAD. It has been shown that ETS2 (V-ets erythroblastosis virus E26 oncogene homolog 2) is downregulated in LUAD, using GSE43458 dataset.15 ETS2 may inhibit cancer cell invasion, migration and growth by suppressing MET activation.15 Cigarette smoking related signature genes in LUAD patients have been identified using GSE10072 dataset.16 It is remarkable that most of the signatures are involved in cell cycle, such as NEK2, TTK, and PRC1.16 Though advances have been made to identify LUAD related signatures, efficient diagnostic predictors and potential drug targets of LUAD are still in need. In order to identify novel diagnostic predictors and molecular markers, we first constructed a LUAD specific lncRNA-miRNA-mRNA ceRNA network in our study, using data from TCGA (The Cancer Genome Atlas). A LUAD specific SVM (support vector machine) classifier was built and prognosis related nodes were identified based on the ceRNA network. GSE43458 and GSE10072 datasets were further used to validate the efficiency and robustness of the SVM classifier in predicting LUAD. The SVM classifier and the prognosis related nodes may contribute to the diagnosis and treatment of LUAD in clinical practice.

Materials and methods

Data source and data preprocessing

The mRNA and miRNA expression data of LUAD-related samples was downloaded from TCGA (https://gdc-portal.nci.nih.gov/). After checking the barcode information of samples, a total of 464 LUAD samples with both mRNA and miRNA data were obtained for subsequent analysis, including 445 LUAD and 19 normal samples. All the clinical information related to these samples was also obtained. Two independent validation datasets GSE10072 (contributed by Landi et al)16 and GSE43458 (contributed by Kabbout et al)15 were downloaded from GEO (Gene Expression Omnibus) database (https://www.ncbi.nlm.nih.gov/geo/). In total, 107 lung samples (58 LUAD versus 49 normal samples, GPL96 [HG-U133A] platform) were included in the GES10072 dataset, and 110 lung samples (80 LUAD versus 30 normal samples, GPL6244 [HuGene-1_0-st] platform) were included in the GES43458 dataset. The package oligo17 under R was used for background adjustment of expression values and normalization preprocessing of expression profile data, including conversion of the original data format, imputation of missing values and data standardization.

Identification of LUAD related lncRNAs, miRNAs and mRNAs

According to annotation information from HGNC (HUGO Gene Nomenclature Committee, http://www.genenames.org/), the lncRNA data of LUAD-related samples downloaded from TCGA were obtained based on the gene ID. Expression differences of mRNAs and miRNA-seq data between LUAD and normal samples were analyzed using edgeR package18 under R3.0.1 and FDR (false discovery rate) was calculated using multtest package.19 LncRNAs, miRNAs and mRNAs with FDR <0.05 and FC (fold change) >1.5 or <0.67 (|logFC|>0.585) were considered to be significantly differentially expressed between LUAD and normal samples.

Identification of lncRNAs, miRNAs and mRNAs related to clinical features

LUAD samples downloaded from TCGA were binary classified according to clinical information. Classifications included age (≥60 versus <60), gender (female versus male), pathologic M (M1 versus M0), pathologic N (N3 + N2 versus N0 + N1), pathologic T (T3 + T4 versus T1 + T2), pathologic stage (I + II versus III + IV), cancer status (with versus without), smoking history (yes versus no) and vital status (living versus deceased). The mRNAs, miRNAs and lncRNAs related to clinical features were then screened from differentially expressed RNAs between LUAD and normal samples, using edgeR package and multtest package. lncRNAs, miRNAs and mRNAs with FDR <0.05 and |logFC|>0.585 were considered to be related to clinical features.

Construction of LUAD-related lncRNA-miRNA-mRNA ceRNA network

The miRNAs targeted by differentially expressed lncRNAs were predicted using miRcode (version 11, http://www.mircode.org/)20 and starBase (version 2.0)21 databases. Results from these two databases were combined and intersected with differentially expressed miRNAs. The intersection contained differentially expressed miRNAs targeted by differentially expressed lncRNAs. A LUAD-related lncRNA-miRNA regulation network was thus obtained. Similarly, differentially expressed mRNAs targeted by differentially expressed miRNAs were obtained based on the information of miRTarBase (version 6.0, http://mirtarbase.mbc.nctu.edu.tw).22,23 Then the common PPIs (protein–protein interactions) existed in three databases, including BioGRID (http://thebiogrid.org/),24 HPRD (Human Protein Reference Database, http://www.hprd.org/)25 and DIP (Database of Interacting Proteins, http://dip.doe-mbi.ucla.edu/),26 were identified. PPIs corresponding to differentially expressed mRNAs targeted by differentially expressed miRNAs were extracted and then integrated with differentially expressed miRNA-mRNA regulatory relationships, generating a LUAD-related miRNA-mRNA regulation network. The lncRNA-miRNA and miRNA-mRNA regulatory networks were combined to obtain a comprehensive lncRNA-miRNA-mRNA ceRNA regulatory network.

Functional and pathway annotation of mRNAs in the ceRNA network

In order to reveal LUAD-related biological functions and pathways, GO (gene ontology) biological process27 analysis and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis28 were performed for mRNAs in the ceRNA regulatory network. Fisher’s exact test was used during the enrichment process. Fisher’s score could be calculated according to Table 1 and the following equation: where N indicates the total number of genes, M indicates the number of pathway genes, K indicates the number of differentially expressed genes, and the Fisher’s score p indicates the possibility that at least x in K differentially expressed genes were pathway genes.
Table 1

The parameters for calculating Fisher’s score

DEGsNon-DEGsTotal
Pathway genesn11n12M
Non-pathway genesn21n22NM
TotalKNKN

Abbreviations: DEG, differentially expressed genes; N, total number of genes; M, number of pathway genes; K, number of differently expressed genes.

Construction of SVM classification model

The optimal subset of feature genes used for SVM classification model was selected from differentially expressed mRNAs in the LUAD specific ceRNA network, using recursive feature elimination (RFE),29 an algorithm of machine learning. Specifically, the optimal subset was selected through a leave-one-out cross-validation approach. Expression values of selected feature genes in the combination were used as eigenvalues to estimate the possibility that a sample belonged to certain classification. Based on the possibility, a sample was classified to be LUAD or a normal sample. The optimal subset was the combination giving the best SVM classification accuracy for TCGA samples. The LUAD specific SVM classifier was built based on the optimal subset of feature genes. GSE10072 and GSE43458 datasets were used to validate the robustness and transferability of the SVM classifier. This SVM classifier was trained with a fivefold cross validation strategy and its performance was assessed by establishing receiver operation characteristic (ROC) curve, followed by detection of prediction accuracy, sensitivity, specificity, positive prediction value, negative prediction value and AUC (area under ROC curve).

Identification of prognosis related mRNAs, miRNAs and lncRNAs

The expression value of each differentially expressed mRNA, miRNA and lncRNA and the survival information of each sample were extracted from TCGA dataset. Prognosis-associated lncRNAs, miRNAs and mRNAs were identified by univariate Cox regression using the survfit function of the survival package (version 2.40-1, https://cran.r-project. org/package=survival)30 under R. Cancerous samples were divided into two groups based on the cutoff (median expression value), followed Kaplan–Meier curve analysis.

Results

Differentially expressed lncRNAs, miRNAs and mRNAs in LUAD samples

A total of 811 lncRNAs, 1,047 miRNAs and 18,013 mRNAs were obtained from mRNA-seq data. RNAs with low expression level (expression value less than 1.0) were removed, with 396 lncRNAs, 517 miRNAs and 14,012 mRNAs remained. Significant differentially expressed lncRNAs, miRNAs and mRNAs were obtained by comparing LUAD and normal samples. In total, 21, 53 and 925 differentially expressed lncRNAs, miRNAs and mRNAs were obtained in LUAD samples. Sample hierarchical cluster analysis was then performed based on the expression value of these differentially expressed RNAs. The results of heatmap (Figure 1A–C) showed that LUAD samples were clustered together and discriminated from normal samples.
Figure 1

Hierarchical clustering analysis of TCGA samples using differentially expressed lncRNA (A), miRNA (B) and mRNA (C).

Abbreviation: TCGA, The Cancer Genome Atlas.

Key lncRNAs, miRNAs and mRNAs related to clinical features

In order to screen lncRNAs, miRNAs and mRNAs related to clinical features, LUAD samples were binary classified according to age (≥60 versus <60), gender (female versus male), pathologic M (M1 versus M0), pathologic N (N3 + N2 versus N0 + N1), pathologic T (T3 + T4 versus T1 + T2), pathologic stage (I + II versus III + IV), cancer status (with versus without), smoking history (yes versus no) and vital status (living versus deceased). The differentially expressed lncRNAs, miRNAs and mRNAs were further compared and identified between each two groups according to different clinical features, which were summarized in Table 2.
Table 2

Clinical features related differentially expressed lncRNAs, miRNAs and mRNAs

ComparisonsUpregulated
Downregulated
lncRNAmiRNAmRNAlncRNAmiRNAmRNA
Age (≥60 versus <60)hsa-mir-133a-1ADH1B, CLEC9A, TMEM132D, C6, CHST9, DNAH9, MT1A, RGS6, C2orf40, PZP, RSPO2, SOSTDC1, CPB2, TMEM132C, ADH1AFLJ41941, DGCR9, TERCDLL3, HAVCR1, FEZF1, GAL, KCNK12, CDK5R2, ZNF695, CRYGN, VGF, B4GALNT4, MUC13, INHA, NKAIN4, PTPRN

Gender (M/F)DKFZp779M0652, KIAA0087hsa-mir-1247, hsa-mir-133a-1TEPP, SLC5A8, PCSK2, LHX9, MEGF11C15orf54, C2orf48, CECR7, DGCR9, EGOThsa-mir-503TFF1, VIL1, RAB3B, FGL1, ANKRD34B, DUSP9, AKR1B10, FGB, AKR7A3, INHA, GLB1L3, ABCC2, SERPINB5, PCSK1, SLC6A3, IVL, HGD

New tumor (yes/no)FAM138F, KIAA0087, SFTA1PTMEM213, TMEM132D, C6, KIAA0408, CST5EGOT, DGCR9hsa-mir-196a-1, hsa-mir-1269HOXC13, PRAME, HOXC10, GAL, IGF2BP1, HHIPL2, KRT6C, HIST1H2BG, INHA, GLB1L3, KRT6B, KRT6A, CLDN6, PPP2R2C, SERPINB5

Pathologic M (M1/M0)SFTA1PIRX6, CAPSL, KIAA0408, C8BC1orf220, C2orf48, KIAA0087, C15orf54hsa-mir-323b, hsa-mir-31, hsa-mir-1269, hsa-mir-539DUSP13, HAVCR1, BARX1, FEZF1, C1orf61, GAL, CASKIN1, RAB3B, NKAIN1, CRYGN, AKR1B10, VGF, C1orf220, INHA, GLB1L3, PTPRN, INA, BCAN, ABCC2, C1QL1, GPX2

Pathologic N (N2 + N3/N0 + N1)DKFZp779M0652, KIAA0087hsa-mir-184PENK, C20orf85, MT1A, TEKT1, S100A12, C6orf118, DNAI2, HAS1, PPBP, ANKRD1C20orf197, CECR7hsa-mir-1269, hsa-mir-31, hsa-mir-577HOXB9, GAL, KLK6, IGF2BP1, FCRL4, SLC7A10, FGB, PAEP, CCDC154, PCP4, FOXE1, TNFRSF13C, STRC, SPINK2

Pathologic T (T3 + T4/T1 + T2)C22orf34, DIO3OS, KIAA0087hsa-mir-133a-1FBN3, KLK7C15orf54, C10orf91, C20orf197, CECR7hsa-mir-1269, hsa-mir-31, hsa-mir-323b, hsa-mir-450bC11orf86, TFF1, HAVCR1, DMRTA2, KLK6, CRABP1, FCRL4, ARL14, MUC13, PRSS3, TRIM31, CD19, CR2, IL20RB

Pathologic stage (III + IV/I + II)DIO3OS, DKFZp779M0652, KIAA0087hsa-mir-1247DNAH12, RSPH10B2, VWA3A, IL5RA, C20orf85, MS4A15, MT1A, DNAI2C20orf197, CECR7, EGOT, HAR1Bhsa-mir-1269, hsa-mir-577, hsa-mir-9-2BARX1, GAL, CDK5R2, CRYGN, KLK6, IGF2BP1, FCRL4, FGB, CCDC154, PCP4, IL17REL, TNFRSF13C, MIA, RUFY4, CD19, PNOC

Cancer status (with/without)KIAA0087, SFTA1PTMEM132D, SCN2B, C6, SCGB1A1, CLDN18, KIAA0408, CST5, GRIA1, C8B, TMEM132C, F11EGOT, HAR1B, C2orf48hsa-mir-539, hsa-mir-1269PRAME, GAL, IGF2BP1, KRT6C, CNGA3, HMGA2, KRT6B, UGT2B15, KRT6A, ABCC2, SERPINB5

Smoking history (yes/no)DIO3OS, DKFZp779M0652, KIAA0087hsa-mir-184TEPP, REN, PCSK2, PLA2G3, CST5, RETN, C8BC15orf54, C20orf197, C2orf48, EGOT, FLJ12825hsa-mir-1269, hsa-mir-31DLL3, PADI1, BARX1, C6orf222, YBX2, CA9, FGB, CCDC129, GLB1L3, UPK3A, UGT2B15, MSMB

The miRNA-lncRNA and miRNA-mRNA regulatory relationships

Elucidation of the physiological roles of lncRNAs is challenging as complex and diverse mechanisms are involved.11 We used bioinformatics methods to predict the roles of lncRNAs in regulating miRNAs in LUAD. The regulatory relationships between significant differentially expressed miRNAs and differentially expressed lncRNAs were predicted using miRecode20 and starBase21 database. We first acquired 264 lncRNA-miRNA regulation pairs from miRecode and 217 regulation pairs from starBase, of which lncRNAs were differentially expressed between LUAD and normal samples. Combining these two sets, a total of 291 lncRNA-miRNA pairs were obtained, 41 of which were LUAD related differentially expressed miRNAs. The 41 lncRNA-miRNA pairs were integrated to build a miRNA-lncRNA regulatory network consisting of 31 nodes, including 6 lncRNAs (3 upregulated versus 3 downregulated) and 25 miRNAs (6 upregulated versus 19 downregulated) (Figure 2A).
Figure 2

LUAD specific lncRNA-miRNA-mRNA ceRNA network. LUAD specific lncRNA-miRNA regulatory network (A), miRNA-mRNA regulatory network (B) and ceRNA network (C). The ceRNA network is acquired by integrating lncRNA-miRNA and miRNA-mRNA regulatory network. Squares, triangles and circles indicate lncRNAs, miRNAs and mRNAs, respectively. Upregulated lncRNAs, miRNAs and mRNAs in LUAD are shown as red and downregulated ones shown as green. Red lines and blue lines indicate lncRNA-miRNA and miRNA-mRNA regulatory relationships, whereas gray lines indicate protein–protein interactions of corresponding mRNAs.

Abbreviation: LUAD, lung adenocarcinoma.

The regulatory relationships between significant differentially expressed miRNAs and significant differentially expressed mRNAs were obtained using miRTarBase database, a database providing the latest and broadest experimental validated miRNA-mRNA interactions.22,23 Most miRNAs in Figure 2A were predicted to have targeted differentially expressed mRNAs, except hsa-miR-139 and hsa-miR-590. A total of 126 differentially expressed mRNAs were found to be targets of these miRNAs. Based on the information of BioGRID, HPRD and DIP databases, PPIs corresponding to these target mRNAs were predicted. A miRNA-mRNA network was constructed by integrating miRNA-mRNA regulatory relationships and PPIs of target mRNAs. As shown in Figure 2B, the miRNA-mRNA regulatory network contained 25 miRNAs (including hsa-miR-139 and hsa-miR-590) and 126 mRNAs, which formed a total of 549 edges, 115 of which were mRNA-mRNA interactions and 434 were miRNA-mRNA regulation relationships.

Construction of lncRNA-miRNA-mRNA ceRNA network

To provide an insight about how lncRNAs and miRNAs cooperate to regulate mRNAs in LUAD, a ceRNA network (Figure 2C) was constructed, through the integration of lncRNA-miRNA network and miRNA-mRNA network. All nodes in the ceRNA network were LUAD related differentially expressed lncRNAs, miRNAs or mRNAs. A total of 157 nodes were included in the ceRNA network, including 6 lncRNAs, 25 miRNAs (including hsa-miR-139 and hsa-miR-590) and 126 mRNAs. In total, 588 edges were formed, including 39 lncRNA-miRNA regulation relationships, 434 miRNA-mRNA regulation relationships and 115 PPIs of corresponding mRNAs. In order to reveal the functional processes involved in LUAD development and progression, mRNAs in the ceRNA network (Figure 2C) were subjected to Fisher’s exact test-based GO biological process analysis. We acquired 18 significantly related GO biological processes, most of which were associated with cell cycle (Table 3). We also performed KEGG pathway analysis for mRNAs in the ceRNA network, and 5 significant KEGG pathways were identified, including ErbB signaling pathway, cell cycle, homologous recombination, neuroactive ligand-receptor interaction and pathways in cancer (Table 3).
Table 3

Functional annotation of mRNAs in the ceRNA network

TermCountP-value
GO-BPs
GO:0000278~mitotic cell cycle315.16E–19
GO:0022403ĉell cycle phase329.85E–19
GO:0007067~mitosis251.25E–17
GO:0000280~nuclear division251.25E–17
GO:0000087~M-phase of mitotic cell cycle251.92E–17
GO:0048285~organelle fission253.24E–17
GO:0000279~M-phase286.32E–17
GO:0022402~cell cycle process338.46E–16
GO:0007049~cell cycle372.12E–15
GO:0051301~cell division222.78E–11
GO:0007059~chromosome segregation111.43E–06
GO:0007017~microtubule-based process130.001431
GO:0007346~regulation of mitotic cell cycle100.005949
GO:0010564~regulation of cell cycle process90.005999
GO:0000070~mitotic sister chromatid60.01543
segregation
GO:0000819~sister chromatid segregation60.017727
GO:0006259~DNA metabolic process160.02048
GO:0006260~DNA replication100.036122
KEGG pathways
hsa04012:ErbB signaling pathway60.001138
hsa04110:Cell cycle60.005563
hsa03440:Homologous recombination30.027134
hsa04080:Neuroactive ligand-receptor70.029121
interaction
hsa05200:Pathways in cancer70.079398

Abbreviations: GO-BPs, gene ontology-biological processes; KEGG, Kyoto Encyclopedia of Genes and Genomes.

SVM classification model of cancerous samples

In order to provide an efficient and reliable molecular tool for LUAD diagnosis, we build a LUAD specific SVM classifier based on the feature genes associated with LUAD. Optimal subset of feature genes was selected from differentially expressed mRNAs in the ceRNA network (Figure 2C) using RFE.29 The accuracy reached the best (95.3%) when the number of selected feature genes in the optimal subset was 44 (Figure 3A). The 44 selected feature genes were summarized in Table 4 and used for the construction of LUAD specific SVM classifier. Scatter plot of TCGA samples based on the SVM classifier was shown as Figure 3B.
Figure 3

Construction and validation of the LUAD specific SVM classifier. (A) Feature gene selection based on recursive feature elimination. The prediction accuracy versus the number of selected feature genes is plotted as blue line. The red dashed line labels the best prediction accuracy (95.3%, 442 out of 464 TCGA samples), with the corresponding number of selected feature genes being 44. (B) Scatter plot of TCGA samples based on the LUAD specific SVM classifier. (C) ROC curves of TCGA (black), GSE10072 (blue) and GSE43458 (orange) datasets generated using the LUAD specific SVM classifier. AUCs are calculated to be 0.996, 0.963 and 0.985 for each data.

Abbreviations: LUAD, lung adenocarcinoma; SVM, support vector machine; TCGA, The Cancer Genome Atlas; ROC, receiver operating characteristic; AUC, area under ROC curve.

Table 4

Selected feature genes from the ceRNA network

GeneLogFCP-valueFDRGeneLogFCP-valueFDR
TERT−3.104752.24E–153.57E–13ALS2CR11−0.650720.0009210.009757
HOXB9−2.721353.37E–191.07E–16SLC6A3−0.617020.0012120.012241
FGB−1.675876.18E–161.11E–13PRMT80.6047520.0001660.00223
KRT6C−1.640333.87E–091.72E–07SH2D1B0.6060320.0001240.001713
MMP3−1.628093.02E–101.83E–08CAMK2A0.6515156.89E–050.001019
HIST1H2BG−1.48777.22E–092.97E–07NRG10.6603871.92E–064.34E–05
TRIM54−1.450141.73E–086.45E–07NRG20.660950.0003180.003954
HMGA2−1.356734.59E–113.42E–09KCNA50.6791931.20E–050.000223
GFAP−1.335365.52E–071.44E–05SCN2B0.7021083.72E–071.01E–05
KRT6B−1.145661.92E–075.59E–06SOX50.7064811.46E–063.38E–05
RGS20−1.140195.99E–071.55E–05CCDC330.7202531.62E–050.000289
CKM−1.135632.26E–064.99E–05RGS9BP0.7266072.48E–050.000421
INA−1.08675.42E–060.000109RGS90.7295842.47E–077.05E–06
GRIK2−0.959741.39E–050.000252KCNE10.750713.69E–081.29E–06
MMP10−0.93161.33E–063.11E–05CXCR10.8393851.81E–086.69E–07
BCAN−0.921896.43E–050.00096KHDRBS20.9460164.26E–102.42E–08
CNGB1−0.761230.0001960.002587RGS60.9508311.54E–085.79E–07
TWIST1−0.75079.53E–050.001363ASPA1.082773.86E–112.92E–09
TMEM171−0.722050.0004550.005339RXRG1.1588011.99E–153.21E–13
GNB3−0.715490.0003790.004581ACTN21.1898441.04E–141.43E–12
TSHR−0.712640.0062380.048666GRIA11.2754788.00E–224.87E–19
CAMK2B−0.686910.0015140.014765SLC6A41.8015649.18E–461.29E–41

Abbreviations: FC, fold change; FDR, false discovery rate.

To validate the robustness and transferability of the SVM classifier, two independent datasets under accession number of GSE1007216 and GSE4345815 were downloaded from GEO. After normalization, samples in the validation datasets were classified using the SVM classifier. As a result, samples in the GSE10072 dataset could be correctly classified with an accuracy of 90.7% (97 out of 107 samples), and samples in the GSE43458 dataset could be classified with a precision of 97.3% (107 out of 110 samples) (Table 5). Besides prediction accuracy, the performance of our SVM classification model were also assessed using sensitivity, specificity, positive prediction value, negative prediction value and AUC (area under ROC curve) (Figure 3C, Table 5).
Table 5

Performance of support vector machine classifier in training and validation datasets

DatasetsNo of samplesCorrect rateSeSpPPVNPVAUC
TCGA46495.3%0.8890.9570.8860.9950.996
GSE1007210790.7%0.9180.8970.8820.9290.963
GSE4345811097.3%0.9670.9750.9350.9870.985

Abbreviations: TCGA, The Cancer Genome Atlas; Se, sensitivity; Sp, specificity; PPV, positive prediction value; NPV, negative prediction value; AUC, area under ROC curve.

The lncRNAs, miRNAs and mRNAs related to prognosis

Prognosis-related RNAs for LUAD were identified from differentially expressed lncRNAs, miRNAs and mRNAs using univariate cox analysis. In total, 5 lncRNAs, 6 miRNAs and 44 mRNAs were identified to be prognosis related (Table 6). Among them, PGM5P2 (phosphoglucomutase 5 pseudogene 2) and SFTA1P (surfactant associated 1) were lncRNAs and hsa-miR-96 and hsa-miR-204 were miRNAs in the ceRNA network. RGS20 (regulator of G protein signaling 20), RGS9BP (RGS9-binding protein), FGB (fibrinogen beta chain) and INA (alpha-internexin) were mRNAs in the feature subset of the SVM classifier. According to the ceRNA network (Figure 2C), two miRNA-mRNA pairs and an lncRNA-miRNA-mRNA triplet were formed among these prognosis related RNAs, specifically hsa-miR-96-INA, hsa-miR-96-RGS20 and SFTA1P-hsa-miR-204-RGS20.
Table 6

Prognosis related lncRNAs, miRNAs and mRNAs

RNAUpregulatedDownregulated
lncRNAKIAA0087, PGM5P2, SFTA1PC15orf54, C20orf197
miRNAhsa-miR-184, hsa-miR-204hsa-miR-651, hsa-miR-188, hsa-miR-96, hsa-miR-708
mRNACCDC81, LDLRAD1, ACSM5, WDR63, CAMK2A, DTHD1, PNMT, C1orf194, RGS9BP, SLC8A3, TTLL10, PCSK2, ENPP6, TSLP, KCNMB2, LHX9, PRIMA1, C11orf88, S100A12, FAM189A1, GPC5, IHHKISS1R, DMBX1, INA, CST4, RGS20, DUSP13, ECEL1, IGF2BP1, HPCA, KRT6C, NPW, FGB, NOX1, SOHLH2, INHA, NKAIN4, PFN4, CLDN6, KIF4B, NAT8L, FLRT1, GDA
We further performed Kaplan–Meier curve analyses for these prognosis-related RNAs (Figure 4). Our results showed that LUAD patients with higher expression level of PGM5P2, SFTA1P, RGS9BP and INA had a better prognosis, and patients with higher expression level of hsa-miR-96, hsa-miR-204, RGS20 and FGB had a worse prognosis (Figure 4). Meanwhile, the expression level of PGM5P2, SFTA1P, hsa-miR-204 and RGS9BP were downregulated in LUAD samples whereas hsa-miR-96, RGS20, FGB and INA were upregulated.
Figure 4

Kaplan–Meier analysis of prognosis related lncRNAs, miRNAs and mRNAs. (A, B) Kaplan–Meier curves of two lncRNAs PGM5P2 and SFTA1P. (C, D) Kaplan–Meier curves of two miRNAs hsa-miR-96 and hsa-miR-204. (E–H) Kaplan–Meier curves of four mRNAs RGS20, RGS9BP, FGB and INA. Red and blue lines indicate patient groups with expression level above and below median value, respectively. P-value indicates the significance of difference.

Abbreviations: PGM5P2, phosphoglucomutase 5 pseudogene 2; SFTA1P, surfactant associated 1; RGS20, regulator of G protein signaling 20; RGS9BP, RGS9-binding protein; FGB, fibrinogen beta chain; INA, alpha-internexin.

Discussion

In the present study, we constructed a ceRNA network delineating interplays among differentially expressed lncRNAs, miRNAs and mRNAs between LUAD and normal samples. An optimal subset of 44 selected feature genes was identified in the network and the SVM classifier SVM constructed with these 44 feature genes could accurately classify samples in both TCGA training data and GSE10072 and GSE43458 validation data. Remarkably, we also identified key prognosis-related RNAs in the ceRNA network, including 2 miRNAs (hsa-miR-96, hsa-miR-204), 2 lncRNAs (PGM5P2, SFTA1P) and 4 selected feature mRNAs (RGS20, RGS9BP, FGB, INA). Among the 8 prognostic RNAs, higher expression level of PGM5P2, SFTA1P, RGS9BP and INA were shown to correlate with better prognosis, indicating tumor-suppressive roles of these RNAs. Meanwhile, higher expression levels of hsa-miR-96, hsa-miR-204, RGS20 and FGB were found to correlate with worse prognosis, indicating tumor-promoting roles of these RNAs. Most of these RNAs have been previously shown to be involved in certain types of cancers. INA is a neuronal intermediate filament protein,31 correlated with better prognosis of glioblastoma.32,33 RGS20 is a negative regulator of heterotrimeric G proteins and may promote cancer cell metastasis by upregulating vimentin and downregulating E-cadherin.34,35 FGB is one component of fibrinogen, which is a critical for tumor cell proliferation, angiogenesis and cancer metastasis.36,37 Elevated plasma level of fibrinogen is a strong indicator of poor prognosis of various tumors, such as breast tumor,38 prostate cancer,39 and lung cancer.40 SFTA1P is a lncRNA tumor suppressor functioning through inhibiting LUAD cell migration, invasion and metastasis.41–43 RGS9BP is an anchor protein of RGS9, was also identified as being involved in bladder cancer,44 though the role it played remained elusive. The function of PGM5P2 is also unclear, however, it is implicated that PGM5P2 may be involved in pro-apoptosis and antiangiogenesis process,45 which is essential for the development and progression of cancer. Considering their roles in different cancer types, it is reasonable that these genes may play a role in the development and progression of LUAD. However, further studies are still needed to gain an insight into the roles of these molecules in LUAD. The remaining two RNAs, however, was found to play controversial roles in different cancer types. hsa-miR-96 is involved in various cancers, however, divergent roles are reported with respect to different cancer types.46,47 It is shown that hsa-miR-96 can suppress tumor invasion in renal cell carcinoma47 and colorectal cancer,48 but it can promote cancer cell proliferation and invasion in breast cancer,49,50 bladder cancer46 and lung cancer.51 hsa-miR-204 has been reported to be a tumor suppressor in clear cell renal cell carcinoma, induced by VHL and functioning through inhibiting macroautophage by targeting LC3B.52 Besides, its variant hsa-miR-204-5p is also involved in endometrial carcinoma, and is shown to suppress the clonogenic growth, migration and invasion of endometrial carcinoma cells.53 However, we found that it played a tumor-promoting role in LUAD. Therefore, we speculate that hsa-miR-96 and hsa-miR-204 may also play divergent roles in different cancer types, which should be addressed in future experimental research. Further, two miRNA-mRNA regulation pairs and an lncRNA-miRNA-mRNA regulation triplet were formed among these prognosis related RNAs according to the ceRNA network. Specifically, hsa-miR-96 formed two miRNA-mRNA regulation pairs with INA and RGS20, whereas hsa-miR-204 formed an lncRNA-miRNA-mRNA regulation triplet with SFTA1P and RGS20. We speculate that hsa-miR-96 may target INA and RGS20 in LUAD, whereas hsa-miR-204 may target RGS20 and regulated by SFTA1P. However, further experimental and functional studies are needed to disclose and confirm the pathways these RNAs involved. However, the limitation of SVM classification model on evaluating the selected feature genes is lack of experiment validation. Further experiments, such as quantitative reverse-transcription PCR and/or western blot methods are still required to confirm our results. Moreover, the Kaplan–Meier curve analysis for these 8 prognosis-related RNAs was performed individually. If the prognostic value of these RNAs is validated by various combination analyses, more valuable results will be obtained for predicting the prognosis of LUAD. In summary, we constructed a LUAD-specific SVM classification model based on the LUAD-related ceRNA network. The SVM classifier may serve as a novel diagnostic predictor of LUAD. Moreover, we also identified 8 key molecular markers with prognostic value from the ceRNA network, including PGM5P2, SFTA1P, hsa-miR-204, hsa-miR-96, RGS20, RGS9BP, FGB and INA. These molecular markers may be promising prognostic markers and drug targets in future clinical practice.
  51 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  lncRNAs and microRNAs with a role in cancer development.

Authors:  Julia Liz; Manel Esteller
Journal:  Biochim Biophys Acta       Date:  2015-07-04

Review 3.  The multilayered complexity of ceRNA crosstalk and competition.

Authors:  Yvonne Tay; John Rinn; Pier Paolo Pandolfi
Journal:  Nature       Date:  2014-01-16       Impact factor: 49.962

4.  Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing.

Authors:  Marcin Imielinski; Alice H Berger; Peter S Hammerman; Bryan Hernandez; Trevor J Pugh; Eran Hodis; Jeonghee Cho; James Suh; Marzia Capelletti; Andrey Sivachenko; Carrie Sougnez; Daniel Auclair; Michael S Lawrence; Petar Stojanov; Kristian Cibulskis; Kyusam Choi; Luc de Waal; Tanaz Sharifnia; Angela Brooks; Heidi Greulich; Shantanu Banerji; Thomas Zander; Danila Seidel; Frauke Leenders; Sascha Ansén; Corinna Ludwig; Walburga Engel-Riedel; Erich Stoelben; Jürgen Wolf; Chandra Goparju; Kristin Thompson; Wendy Winckler; David Kwiatkowski; Bruce E Johnson; Pasi A Jänne; Vincent A Miller; William Pao; William D Travis; Harvey I Pass; Stacey B Gabriel; Eric S Lander; Roman K Thomas; Levi A Garraway; Gad Getz; Matthew Meyerson
Journal:  Cell       Date:  2012-09-14       Impact factor: 41.582

5.  MiR-96-5p influences cellular growth and is associated with poor survival in colorectal cancer patients.

Authors:  Anna Lena Ress; Verena Stiegelbauer; Elke Winter; Daniela Schwarzenbacher; Tobias Kiesslich; Sigurd Lax; Stefan Jahn; Alexander Deutsch; Thomas Bauernhofer; Hui Ling; Hellmut Samonigg; Armin Gerger; Gerald Hoefler; Martin Pichler
Journal:  Mol Carcinog       Date:  2014-09-25       Impact factor: 4.784

6.  Survival analysis in clinical trials: Basics and must know areas.

Authors:  Ritesh Singh; Keshab Mukhopadhyay
Journal:  Perspect Clin Res       Date:  2011-10

7.  A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.

Authors:  Brittany Baur; Serdar Bozdag
Journal:  PLoS One       Date:  2016-02-12       Impact factor: 3.240

8.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

9.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data.

Authors:  Jun-Hao Li; Shun Liu; Hui Zhou; Liang-Hu Qu; Jian-Hua Yang
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

10.  Comprehensive characterization of cancer subtype associated long non-coding RNAs and their clinical implications.

Authors:  Weihong Zhao; Jiancheng Luo; Shunchang Jiao
Journal:  Sci Rep       Date:  2014-10-13       Impact factor: 4.379

View more
  13 in total

1.  FGB and FGG derived from plasma exosomes as potential biomarkers to distinguish benign from malignant pulmonary nodules.

Authors:  Muyu Kuang; Yizhou Peng; Xiaoting Tao; Zilang Zhou; Hengyu Mao; Lingdun Zhuge; Yihua Sun; Huibiao Zhang
Journal:  Clin Exp Med       Date:  2019-10-01       Impact factor: 3.984

2.  The construction and analysis of ceRNA networks in invasive breast cancer: a study based on The Cancer Genome Atlas.

Authors:  Chundi Gao; Huayao Li; Jing Zhuang; HongXiu Zhang; Kejia Wang; Jing Yang; Cun Liu; Lijuan Liu; Chao Zhou; Changgang Sun
Journal:  Cancer Manag Res       Date:  2018-12-17       Impact factor: 3.989

3.  Early diagnosis of colorectal cancer via plasma proteomic analysis of CRC and advanced adenomatous polyp.

Authors:  Setareh Fayazfar; Hakimeh Zali; Afsaneh Arefi Oskouie; Hamid Asadzadeh Aghdaei; Mostafa Rezaei Tavirani; Ehsan Nazemalhosseini Mojarad
Journal:  Gastroenterol Hepatol Bed Bench       Date:  2019

4.  A Prognostic 14-Gene Expression Signature for Lung Adenocarcinoma: A Study Based on TCGA Data Mining.

Authors:  Jie Liu; Shiqiang Hou; Jinyi Wang; Zhengjun Chai; Xuan Hong; Tian Zhao; Zhengliang Sun; Liandi Bai; Hongyan Gao; Jing Gao; Guohan Chen
Journal:  Oxid Med Cell Longev       Date:  2020-12-19       Impact factor: 6.543

5.  HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis.

Authors:  Yajie Meng; Min Jin
Journal:  Front Cell Dev Biol       Date:  2021-06-30

6.  Signature microRNAs and long noncoding RNAs in laryngeal cancer recurrence identified using a competing endogenous RNA network.

Authors:  Zhengyi Tang; Ganguan Wei; Longcheng Zhang; Zhiwen Xu
Journal:  Mol Med Rep       Date:  2019-04-10       Impact factor: 2.952

7.  LncRNA NNT-AS1 promotes lung squamous cell carcinoma progression by regulating the miR-22/FOXM1 axis.

Authors:  Jing Ma; Guanbin Qi; Lei Li
Journal:  Cell Mol Biol Lett       Date:  2020-05-29       Impact factor: 5.787

8.  Data Mining and Expression Analysis of Differential lncRNA ADAMTS9-AS1 in Prostate Cancer.

Authors:  Jiahui Wan; Shijun Jiang; Ying Jiang; Wei Ma; Xiuli Wang; Zikang He; Xiaojin Wang; Rongjun Cui
Journal:  Front Genet       Date:  2020-02-21       Impact factor: 4.599

9.  Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients.

Authors:  Johannes Smolander; Alexey Stupnikov; Galina Glazko; Matthias Dehmer; Frank Emmert-Streib
Journal:  BMC Cancer       Date:  2019-12-03       Impact factor: 4.430

10.  Identification of common signatures in idiopathic pulmonary fibrosis and lung cancer using gene expression modeling.

Authors:  Dong Leng; Jiawen Yi; Maodong Xiang; Hongying Zhao; Yuhui Zhang
Journal:  BMC Cancer       Date:  2020-10-12       Impact factor: 4.430

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.