Literature DB >> 35430805

Predicting lymph node metastasis and prognosis of individual cancer patients based on miRNA-mediated RNA interactions.

Shulei Ren1, Wook Lee1, Kyungsook Han2.   

Abstract

BACKGROUND: Lymph node metastasis is usually detected based on the images obtained from clinical examinations. Detecting lymph node metastasis from clinical examinations is a direct way of diagnosing metastasis, but the diagnosis is done after lymph node metastasis occurs.
RESULTS: We developed a new method for predicting lymph node metastasis based on differential correlations of miRNA-mediated RNA interactions in cancer. The types of RNAs considered in this study include mRNAs, lncRNAs, miRNAs, and pseudogenes. We constructed cancer patient-specific networks of miRNA mediated RNA interactions and identified key miRNA-RNA pairs from the network. A prediction model using differential correlations of the miRNA-RNA pairs of a patient as features showed a much higher performance than other methods which use gene expression data. The key miRNA-RNA pairs were also powerful in predicting prognosis of an individual patient in several types of cancer.
CONCLUSIONS: Differential correlations of miRNA-RNA pairs identified from patient-specific networks of miRNA mediated RNA interactions are powerful in predicting lymph node metastasis in cancer patients. The key miRNA-RNA pairs were also powerful in predicting prognosis of an individual patient of solid cancer.
© 2022. The Author(s).

Entities:  

Keywords:  Competitive endogenous RNA; Lymph node metastasis; Prognosis; miRNA–mediated RNA interaction

Mesh:

Substances:

Year:  2022        PMID: 35430805      PMCID: PMC9014599          DOI: 10.1186/s12920-022-01231-x

Source DB:  PubMed          Journal:  BMC Med Genomics        ISSN: 1755-8794            Impact factor:   3.622


Background

The spread of cancer cells from the original (primary) tumor to another part of the body is called metastasis. During metastasis, cancer cells travel to other areas through either the bloodstream or the lymph system. As one of the steps of tumor metastasis, lymph node metastasis is commonly observed in cancer patients. Lymph node metastasis itself does not directly endanger the life of patients, but malignant tumors can metastasize to other parts of the body through lymph node metastasis [1]. Many studies have reported that the prognosis of patients with lymph node metastasis is worse than that of patients without lymph node metastasis [2]. Lymph node metastasis is also an important factor in determining effective treatment options for cancer patients. Lymph node metastasis is usually detected based on the images obtained from clinical examinations. Recently deep learning methods such as convolutional neural networks (CNN) have been used to help clinicians detect lymph node metastasis in ultrasound images [3-5]. Detecting lymph node metastasis from ultrasound images is a direct and accurate way of diagnosing metastasis, but the diagnosis is often done after metastasis occurs. Several studies have reported abnormal gene expression in the process of lymph node metastasis [6]. For example, the study of Okugawa et al. [7] suggested that the expression of KiSS1 is closely related to lymph node metastasis in colorectal cancer. Zhang et al. [8] predicted lymph node metastasis using differentially expressed mRNAs and noncoding RNAs. Dihge et al. predicted lymph node metastasis using gene expressions combined with clinicopathological characteristics [9]. Expression data of mRNAs and noncoding RNAs are valuable resources for studying and predicting lymph node metastasis. But, cancer is a complex and heterogeneous disease, so abnormal expression of individual genes cannot fully explain the development and metastasis of cancer. The development and metastasis of cancer is better explained by the dysregulation of gene interactions rather than by individual genes alone. For example, AKT1 is abnormally expressed in many types of cancer and the up-regulation of AKT1 has been known to be related to lymph node metastasis. But, recent studies found that miR-138 binding to AKT1 regulates the expression of AKT1 in tongue squamous cell carcinoma [10]. miR-519d inhibits lymph node metastasis by regulating MMP3 in oral squamous cell carcinoma and breast cancer [11, 12]. Salmena et al. [13] proposed a new gene regulation known as competitive endogenous RNA (ceRNA) hypothesis. The ceRNA hypothesis suggests that RNAs with similar miRNA response elements compete to bind to the same miRNA, thereby regulate each other indirectly. Motivated by the increasing evidence supporting the hypothesis, several computational methods have been developed to construct a network of ceRNAs [14, 15]. Most of the methods focused on mRNAs or lncRNAs only as ceRNAs and did not consider pseudogenes when constructing ceRNA networks. In this study, we propose a new method for predicting lymph node metastasis based on differential correlations of miRNA-mediated RNA interactions in cancer. The types of RNAs considered in this study include mRNAs, lncRNAs, miRNAs, and pseudogenes. We constructed cancer patient-specific networks of miRNA mediated RNA interactions, and identified key miRNA–RNA interactions from the networks. We built a model using the correlations of the miRNA–RNA pairs as features for predicting lymph node metastasis. The model showed a much higher performance than other methods which use gene expressions alone. The key miRNA–RNA pairs were also powerful in predicting prognosis of individual patients in several types of cancer. The rest of this paper presents the method and the experimental results in several types of cancer.

Results

Prediction of lymph node metastasis

Using the PCCs of miRNA–RNA pairs obtained in our study, we predicted lymph node metastasis using the stacking model and base models (SVM and logistic regression) in seven types of cancer. As expected, the stacking model showed the better performance than the other models both in cross-validation and in independent testing (Additional file 1). We compared the performance of stacking models using two different types of features: PCC of miRNA–RNA pairs and RNA expression. PCC of miRNA–RNA pairs was computed by equation 4 in the Methods section. For RNA expression feature, we used the RNAs with a p-value  both in differential analysis between normal samples and tumor samples and in additional differential analysis between lymph node metastasis samples and non-metastatic samples. The performance of the stacking models was evaluated by fivefold cross-validation and independent testing using several measures: sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV) and area under curve (AUC). Tables 1 and 2 show the performance of two stacking models in the fivefold cross validation and in the independent testing, respectively. The stacking models with PCCs as features showed a better performance than those with RNA expressions both in the fivefold cross validation and in independent testing, except for thyroid cancer (THCA) in independent testing. These results indicate that PCC of miRNA–RNA pairs is a more powerful feature than the gene expression level in predicting lymph node metastasis, which in turn supports that lymph node metastasis is associated with dysregulation of gene interactions rather than individual genes, as mentioned in the Background section.
Table 1

Performance of the prediction model with different types of features in the fivefold cross validation

CancerFeature#Features#PCsSNSPACCPPVNPVAUC
BRCAEXP51194300.6740.7090.6920.6940.6890.691
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC15634800.7730.8060.7900.7960.7840.789
COADEXP8351000.3600.9350.7580.7110.7670.647
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC1969800.7600.9650.9020.9050.9010.862
HNSCEXP292100.7500.6840.7200.7390.6960.717
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC8001000.9560.8770.9200.9030.9430.917
LUADEXP61931100.4770.8820.7410.6830.7590.679
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC12,9812000.5930.9440.8220.8500.8130.769
LUSCEXP13711900.6440.8670.7860.7360.8090.756
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC24362000.8750.9340.9120.8840.9290.904
STADEXP4761200.9050.4720.7630.7780.7080.688
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC17,445600.9730.9030.9500.9530.9420.938
THCAEXP4205300.6630.6630.6630.6340.6910.663
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC33971500.6740.7230.7000.6820.7160.698

In comparison of two types of features (RNA expression vs. deltaPCC), the better performances are shown in bold

In all cancer types, prediction with PCCs showed a better performance than that with RNA expression levels

PC, principal component; SN, sensitivity; SP, specificity; ACC, accuracy; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; EXP, RNA expression level

Table 2

Performance of the prediction model with different types of features in an independent testing

CancerFeature#Features#PCsSNSPACCPPVNPVAUC
BRCAEXP51194300.6640.7100.6880.6900.6850.687
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC15634800.7760.8260.8020.8130.7920.801
COADEXP8351000.5630.9320.8190.7830.8290.747
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC1969800.9060.9860.9620.9670.9600.946
HNSCEXP292100.8670.7920.8330.8390.8260.829
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC8001000.9670.7920.8890.8530.9500.879
LUADEXP61931100.6220.9430.8320.8520.8250.782
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC12,9812000.7840.9710.9070.9360.8950.878
LUSCEXP13711900.5330.8080.7070.6150.7500.671
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC24362000.8890.9620.9350.9300.9380.925
STADEXP4761200.9370.4520.7770.7760.7780.694
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC17,445600.9050.9680.9260.9830.8330.936
THCAEXP4205300.7370.7960.7680.7570.7780.766
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC33971500.6580.8640.7680.8070.7450.761

In comparison of two types of features (RNA expression vs. deltaPCC), the better performances are shown in bold

In all cancer types except thyroid cancer (THCA), prediction with PCCs showed a better performance than that with RNA expression levels

PC, principal component; SN, sensitivity; SP, specificity; ACC, accuracy; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; EXP, RNA expression level

Performance of the prediction model with different types of features in the fivefold cross validation In comparison of two types of features (RNA expression vs. deltaPCC), the better performances are shown in bold In all cancer types, prediction with PCCs showed a better performance than that with RNA expression levels PC, principal component; SN, sensitivity; SP, specificity; ACC, accuracy; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; EXP, RNA expression level Performance of the prediction model with different types of features in an independent testing In comparison of two types of features (RNA expression vs. deltaPCC), the better performances are shown in bold In all cancer types except thyroid cancer (THCA), prediction with PCCs showed a better performance than that with RNA expression levels PC, principal component; SN, sensitivity; SP, specificity; ACC, accuracy; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; EXP, RNA expression level We also compared the performance of our method with that of Zhang’s method [8] using the same data sets and the same SVM model. Among the seven types of cancer used in our study, comparison was made for four types of cancer because the four cancer types are common to both studies. The train_score and test_score in Table 3 were obtained using the scikit-learn package, which was used by Zhang’s study. In all cancer types used in comparison, our model which used PCCs as features was better than the four SVM models of Zhang’s method, which used the expression levels of mRNAs, miRNAs and lncRNAs separately. These results also demonstrate that PCCs of miRNA–RNA pairs are much more powerful features than expression data of RNAs when predicting lymph node metastasis.
Table 3

Comparison of the performance of our SVM model with that of Zhang’s SVM model [8]

CancerMethod_featureTrain_scoreTest_score
BRCAOur model_\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC0.9720.787
Zhang_mRNA0.7980.680
Zhang_miRNA0.7640.737
Zhang_lncRNA0.7930.696
COADOur model_\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC0.9840.905
Zhang_mRNA0.8490.871
Zhang_miRNA0.9020.886
Zhang_lncRNA0.8690.871
LUADOur model_\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC0.9960.850
Zhang_mRNA0.8080.849
Zhang_miRNA0.8850.795
Zhang_lncRNA0.7980.849
LUSCOur model_\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document}ΔPCC0.9990.904
Zhang_mRNA0.8710.900
Zhang_miRNA0.9390.847
Zhang_lncRNA0.8610.900

In comparison of two types of features (RNA expression vs. deltaPCC), the better performances are shown in bold

Among the seven types of cancer used in our study, comparison was made in four types of cancer because they are the only common cancer types in both studies. The train_score and test_score were obtained using the scikit-learn package, which was used by Zhang’s study. In all four caner types, our model showed the better performance in both training and testing. our model_PCC: SVM model using PCCs as features. Zhang_X: SVM model using the expression levels of RNA type X as features

Comparison of the performance of our SVM model with that of Zhang’s SVM model [8] In comparison of two types of features (RNA expression vs. deltaPCC), the better performances are shown in bold Among the seven types of cancer used in our study, comparison was made in four types of cancer because they are the only common cancer types in both studies. The train_score and test_score were obtained using the scikit-learn package, which was used by Zhang’s study. In all four caner types, our model showed the better performance in both training and testing. our model_PCC: SVM model using PCCs as features. Zhang_X: SVM model using the expression levels of RNA type X as features

Overall survival of cancer patients

We analyzed the overall survival of cancer patients by performing a log-rank test with respect to PCCs of miRNA–RNA pairs obtained in this study. Table 4 shows top three miRNA–RNA pairs with the smallest p-value from the log-rank test in each type of cancer. The remaining miRNA–RNA pairs with p-value less than 0.01 are available in Additional file 2.
Table 4

Comparison of p-values from the log-rank test with miRNA–RNA pair, and individual RNA and miRNA involved in the pair

CancermiRNA–RNA pairType of RNA in the pairP-value of miRNA–RNA pairP-value of miRNAP-value of RNA
BRCAmiR-26b_AC079414.1lncRNA2.270E−059.203E−015.896E−01
miR-3192_PPDPFLmRNA6.320E−051.351E−031.346E−02
miR-3192_AC013549.3lncRNA.260E−045.028E−011.346E−02
COADmiR-604_AL162426.1lncRNA1.869E−044.365E−016.730E−01
miR-3679_RPL26P29Pseudogene3.122E−041.315E−028.171E−01
miR-6835_AC037459.2lncRNA7.746E−049.815E−012.938E−02
HNSCmiR-4539_KRTAP10-2mRNA1.849E−043.033E−011.629E−03
miR-6730_LINC01435lncRNA9.783E−041.038E−023.211E−03
miR-5195_AL390067.1lncRNA1.070E−038.716E−023.435E−02
LUADmiR-581_LINC00628lncRNA4.719E−071.925E−028.736E−01
miR-7848_AC087588.2lncRNA2.220E−061.750E−057.506E−01
miR-3680-1_AL138789.1lncRNA1.300E−052.386E−025.371E−01
LUSCmiR-548z_PNLIPRP2Pseudogene1.175E−043.178E−016.640E−04
miR-3972_CSAG4Pseudogene1.485E−045.168E−014.740E−01
miR-146b_PHETA2mRNA1.488E−044.779E−022.760E−01
STADmiR-604_OLFML3mRNA1.000E−054.787E−024.921E−01
miR-554_OR10A5mRNA4.040E−054.727E−035.852E−02
miR-149_OR10A5mRNA1.689E−044.727E−038.850E−01
THCAmiR-5685_GADD45AmRNA3.489E−037.915E−012.587E−01
miR-6784_AC093281.2lncRNA3.762E−035.934E−015.559E−02
miR-8071-2_CFBmRNA3.991E−031.392E−029.494E−01
Comparison of p-values from the log-rank test with miRNA–RNA pair, and individual RNA and miRNA involved in the pair As shown in Table 4, the p-values from the log-rank test with PCC are much smaller than those with individual RNAs involved in the miRNA–RNA pairs. Three pseudogenes (RPL26P29, PNLIPRP2, and CSAG4) are included in the top three miRNA–RNA pairs with the smallest p-value (Table 4), and several miRNA-pseudogene pairs were found as potential prognostic pairs for all 7 types of cancer (Additional file 2). Overall survival rates of patients with respect to PCCs of miRNA–RNA pairs in 7 cancer types. PCCs of miRNA–RNA pairs are predictive of the survival rates of patients in all 7 types of cancer Figure 1 compares the overall survival rates of two groups of patients with respect to PCC of miRNA–RNA pairs in 7 types of cancer. In all 7 types of cancer, PCCs of miRNA–RNA pairs were powerful in predicting the survival rates of patients. For comparative purposes, Fig. 2 shows the overall survival rates of patients of BRCA, COAD and LUAD with respect to RNA expressions instead of PCC of miRNA–RNA pairs. The RNAs involved in the miRNA–RNA pairs of Fig. 1 (miR-26b_AC079414.1 pair for BRCA, miR-604_AL162426.1 pair for COAD, and miR-581_LINC00628 for LUAD) were selected for the comparison. None of the individual RNAs involved in the pairs showed predictive power of the survival rates of cancer patients, whereas the miRNA–RNA pairs were very powerful in predicting the survival rates of patients as demonstrated in Fig. 1.
Fig. 1

Overall survival rates of patients with respect to PCCs of miRNA–RNA pairs in 7 cancer types. PCCs of miRNA–RNA pairs are predictive of the survival rates of patients in all 7 types of cancer

Fig. 2

Overall survival rates of patients with respect to expressions of individual RNAs in Fig.  1. In contrast to the miRNA–RNA pairs, none of the individual RNAs showed predictive power of the survival rates of cancer patients

Overall survival rates of patients with respect to expressions of individual RNAs in Fig.  1. In contrast to the miRNA–RNA pairs, none of the individual RNAs showed predictive power of the survival rates of cancer patients

ceRNA networks

For every tumor sample in Table 5, we constructed a ceRNA network and derived PCC of miRNA–RNA pairs from the network. We then constructed ceRNA networks with the miRNA–RNA pairs. Figure 3 shows a ceRNA network composed of all miRNA–RNA pairs for breast invasive carcinoma (BRCA). The network includes 1563 miRNA–RNA interactions among 119 miRNAs, 423 lncRNAs, 380 mRNAs and 252 pseudogenes. The small network centered at miR-149 is a blowup of the subnetwork enclosed by a red box.
Table 5

The number of normal samples, tumor samples, tumor samples with lymph node metastasis, and tumor samples without lymph node metastasis in seven types of cancer

Cancer#Normal samples#Tumor samples#Lymph node metastasis#Non-metastasis
BRCA1131102447457
COAD41478107242
HNSC445009881
LUAD59533123231
LUSC49502149259
STAD32375210103
THCA58502127145
Fig. 3

ceRNA network for breast invasive carcinoma (BRCA). The network is composed of 1563 miRNA–RNA interactions among 119 miRNAs, 423 lncRNAs, 380 mRNAs and 252 pseudogenes. The small network centered at miR-149 is a blowup of the subnetwork enclosed by a red box

The number of normal samples, tumor samples, tumor samples with lymph node metastasis, and tumor samples without lymph node metastasis in seven types of cancer ceRNA network for breast invasive carcinoma (BRCA). The network is composed of 1563 miRNA–RNA interactions among 119 miRNAs, 423 lncRNAs, 380 mRNAs and 252 pseudogenes. The small network centered at miR-149 is a blowup of the subnetwork enclosed by a red box miR-149 is a miRNA that interacts with ceRNAs most frequently in the ceRNA network. miR-149 is known to promote metastasis in breast cancer when it is down regulated [16]. The ceRNA network also contains several genes associated with breast cancer. For instance, mutations in ERBB4 have been known to be associated with breast cancer [17]. Overexpression of YWHAE increases the proliferation, migration and invasion ability of breast cancer cells [18]. KAT6A promotes SMAD3 binding to oncogenic chromatin modifier TRIM24 and disrupts its interaction with the tumor suppressor TRIM33, which lead to the tumor cell metastasis in breast cancer [19]. As an example of patient-specific networks, Fig. 4 shows the ceRNA networks specific to two LUAD patients with different PCCs of the miR-581_LINC00628 pair. Figure 4A is a ceRNA network for a LUAD patient (sample ID: TCGA-44-7670) with a high PCC group of the pair, whereas Fig. 4B is a ceRNA network for a LUAD patient (TCGA-NJ-A55O) with a low PCC group of the same pair. The network in Fig. 4A is composed of 210 miRNA–RNA pairs among 29 miRNAs, 77 lncRNAs, 47 mRNAs and 38 pseudogenes, and the network in Fig. 4B is composed of 111 miRNA–RNA pairs among 5 miRNAs, 53 lncRNAs, 30 mRNAs and 19 pseudogenes.
Fig. 4

Subnetworks of patient-specific ceRNA networks for two LUAD patients. A LUAD patient (TCGA-44-7670) with a high PCC of the miR-581_LINC00628 pair. B LUAD patient (TCGA-NJ-A55O) with a low PCC of the miR-581_LINC00628 pair. The RNAs involved in the three miRNA–RNA pairs of Table 4 are marked by red boxes. For clarity, subnetworks of the three miRNA–RNA pairs are displayed

Subnetworks of patient-specific ceRNA networks for two LUAD patients. A LUAD patient (TCGA-44-7670) with a high PCC of the miR-581_LINC00628 pair. B LUAD patient (TCGA-NJ-A55O) with a low PCC of the miR-581_LINC00628 pair. The RNAs involved in the three miRNA–RNA pairs of Table 4 are marked by red boxes. For clarity, subnetworks of the three miRNA–RNA pairs are displayed Apparently, the network in Fig. 4A includes more RNAs and interactions among them than that in Fig. 4B. As shown earlier in Fig. 1, patients with a high PCC of the miR-581_LINC00628 pair have a much lower survival rate than those with a low PCC of the pair. Similar observations were made in the other types of cancer.

Discussion

The result of our work showed that PCCs of miRNA–RNA pairs derived from patient-specific ceRNA networks are more powerful than the expression levels of individual RNAs in predicting lymph node metastasis. This is related with dysregulated ceRNA interactions in cancer [20]. For instance, miR-125b may induce breast cancer metastasis by binding to STARD13 [21]. HOXD-AS1 prevents miR-130a-3p mediated degradation of SOX4 through competitive binding to miR-130a-3p, thereby promoting hepatocellular carcinoma transfer [22]. MT1JP regulates gastric cancer progression by binding to miR-92a-3p competitively with FBXW7 [23]. Unlike other studies on ceRNA interactions, our study considered pseudogenes as well as mRNAs and lncRNAs as ceRNAs. Pseudogenes were previously considered as genomic junk and neglected in the studies on ceRNA interactions as well. However, several experimental evidences suggested that pseudogenes can act as ceRNAs in the development of disease [24-26]. For instance, Karreth et al. [27] demonstrated that the pseudogene BRAFP1 functions as a ceRNA and induces lymphoma in vivo. Overexpression of the oncogenic pseudogene BRAFP1 promotes the formation of human B-cell lymphomas through serving as a ceRNA of the parental gene BRAF [28]. In prostate cancer, the pseudogene PTENP1 functions as a ceRNA to regulate PTEN expression by sponging miR-499-5p [29]. Straniero et al. [30] demonstrated that the pseudogene GBAP1 can function as a ceRNA for the glucocerebrosidase gene GBA by sponging miR-22-3p, thus revealing a new regulatory network in the pathogenesis of Parkinson’s Disease. There are limitations in our current work. A patient-specific ceRNA network consists of miRNA–RNA pairs with significant changes from other patients by including miRNA–RNA pairs whose PCC| is larger than the median PCC| of all tumor samples of the same type. Since we used PCC| instead of PCC, a patient-specific network does not show the direction of change (i.e., increase or decrease) in PCC. In the future, we plan to come up with a better way of presenting such information in a patient-specific network. Another direction of future work is to improve the performance of the prediction model further, in particular for thyroid carcinoma.

Conclusion

The spread of tumors has always been a difficulty in tumor treatment, especially large-scale spread, which greatly reduces the survival rate of patients. Lymph node metastasis is the first step in the spread of many tumors. Therefore, predicting lymph node metastasis can make medical interventions in advance and reduce the risk of large-scale spread. In this study, we constructed ceRNA networks for 7 types of solid cancer. Unlike other ceRNA networks, our ceRNA networks include pseudogenes as well as mRNA and lncRNAs. From the miRNA–RNA pairs in the ceRNA networks, we built a prediction model of lymph node metastasis in tumor samples. Experimental results of the prediction model showed that PCCs of miRNA–RNA pairs from ceRNA networks are powerful for predicting lymph node metastasis in tumor samples. Comparison of our method with the features of other methods using the same data sets showed that PCCs of miRNA–RNA pairs are much more powerful than gene expression levels in predicting lymph node metastasis of cancer patients. Some miRNA–RNA pairs were also powerful in predicting prognosis of individual patients. Our work is preliminary and requires further investigation for clinical use. However, this approach will help characterize individual cancer patients and predict the occurrence of lymph node metastasis in advance.

Methods

The overall workflow of our method is shown in Fig. 5. It shows data collection, data filtering, data processing, generation of miRNA–RNA gene pairs, and construction of a prediction model and patient-specific ceRNA network.
Fig. 5

The overview of the overall workflow. There are three types of samples: normal samples (gray), tumor samples without lymph node metastasis (sky blue) and tumor samples with lymph node metastasis (pink). In our prediction model, tumor samples with lymph node metastasis (pink) and tumor samples without lymph node metastasis (sky blue) are treated as positive and negative instances, respectively

The overview of the overall workflow. There are three types of samples: normal samples (gray), tumor samples without lymph node metastasis (sky blue) and tumor samples with lymph node metastasis (pink). In our prediction model, tumor samples with lymph node metastasis (pink) and tumor samples without lymph node metastasis (sky blue) are treated as positive and negative instances, respectively

Data collection

We collected gene expression profiles of lncRNAs, mRNAs, pseudogenes, and miRNAs and clinical profiles from The Cancer Genome Atlas (TCGA) data portal [31] for primary tumor samples of all solid cancer types. Normal samples of each type of cancer were also obtained from the TCGA data portal. All the gene expression profiles used in this study were obtained by RNA-sequencing (RNA-seq). In TCGA, there were 18 types of solid cancer which have at least 200 samples. Among the 18 types, 6 types were excluded due to insufficient data on lymph node metastasis in their tumor samples. In the remaining 12 types of solid cancer, we selected the types which have at least 30 normal samples and 50 tumor samples with lymph node metastasis. Only 7 types of solid cancer satisfied such criteria: breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), head and neck squamous cell carcinoma (HNSC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), stomach adenocarcinoma (STAD) and thyroid carcinoma (THCA). The clinical profiles of the TCGA data includes the Tumor, Node, Metastasis (TNM) stage of samples. Samples with an M stage of 1 were excluded because distant organ metastasis often coexists with lymph node metastasis and makes the evaluation of prediction difficult. Based on the TNM staging system, we clustered the tumor samples into those with lymph node metastasis and and those without lymph node metastasis.Table 5 shows the number of normal samples, tumor samples, tumor samples with lymph node metastasis, and tumor samples without lymph node metastasis in 7 types of cancer. The TCGA barcodes of all normal samples and tumor samples of Table 5 are provided as Additional file 3. The TCGA barcode is the primary identifier of biospecimen data in the TCGA project. Samples with lymph node metastasis: tumor samples with T stage of 1–4, N stage of 1–3, and M stage of 0 Samples without lymph node metastasis: tumor samples with T stage of 1–4, N stage of 0, and M stage of 0

Gene filtering

The gene names of the TCGA data are represented by Ensembl ID. Thus, we obtained the annotation files from the Ensembl project [32] and determined the names and biotypes of the genes (mRNAs, lncRNAs, pseudogenes and miRNAs). Table 6 shows the number of genes and their types.
Table 6

The number of RNAs of four biotypes in each cancer type studied in this study

Cancer#miRNAs#mRNAs#lncRNAs#pseudogenes
BRCA16518,08485535528
COAD15717,57372845304
HNSC9518,01874274643
LUAD19718,05487555954
LUSC16118,22787065680
STAD37918,61710,3549039
THCA15317,56873424753
The number of RNAs of four biotypes in each cancer type studied in this study We filtered out genes with an average count below 1. In RNA-seq data, counts are non-negative integer values. The count of unexpressed genes is 0, so the count of expressed genes is at least 1. Since the genes with the average count are unexpressed genes in most samples, we removed them. We then normalized the RNA-seq data of the genes using the trimmed mean of M values (TMM) method [33].

Deriving miRNA–RNA pairs and feature selection

As mentioned earlier, any of lncRNAs, mRNAs, and pseudogenes with common miRNA response elements compete to bind to the same miRNA, so can act as competitive endogenous RNAs (ceRNAs). To obtain initial miRNA–RNA pairs we computed the maximal information coefficient (MIC) [34] of each miRNA with ceRNA candidates, which include mRNAs, lncRNAs, and pseudogenes. The overall workflow of our method for deriving miRNA–RNA pairs, selecting features and building a model can be summarized as follows: Our approach to predicting lymph node metastasis is based on the differential correlations of miRNA–RNA interactions of a sample from normal samples. To obtain the differential correlations of miRNA–RNA interactions of a sample, we first selected miRNA–RNA interactions with the maximal information coefficient (MIC). Pearson correlation coefficient (PCC) is the most commonly used for gene association. However, we used MIC instead of PCC to select potential miRNA–RNA pairs for a few reasons: (1) PCC can measure linear association only, but MIC measures linear or non-linear association between two variables. (2) MIC is less susceptible to outliers than PCC. Given RNA-seq gene expression data of miRNAs and ceRNAs (mRNAs, lncRNAs and pseudogenes), compute MIC of miRNA–RNA pairs in tumor samples and normal samples. Select those miRNA–RNA pairs with MIC 0.5 in tumor samples or normal samples, and remove the remaining miRNA–RNA pairs. Compute the Pearson correlation coefficient (PCC) of each miRNA–RNA pair in normal samples. Recompute PCC in normal samples perturbed by a single tumor sample. Compute the difference in PCC (PCC) between the normal samples and perturbed samples. Select miRNA–RNA pairs with a p-value  in the Wilcox test based on PCC, and remove the remaining pairs. Reduce the dimension of feature vectors by the principal component analysis (PCA) of PCCs. RNAs of the miRNA–RNA pairs are scattered into the two-dimensional space, which is divided into bins in the X and Y axes, Here X denotes the expression level of miRNA and Y denotes the expression level of any one of mRNA, lncRNA, or pseudogene in the pairs. Based on the number of scattered points in each bin, we calculate the mutual information I(X, Y) by Eq. (1). This process is repeated until the largest mutual information is obtained as the MIC (Eq. 2).where X: miRNA; Y: mRNA, lncRNA, or pseudogeneThe parameter B of MIC controls how much of the characteristic matrix we search over. Setting B too high can lead to non-zero scores even for random data, while setting B too low results in searching only for simple patterns [34]. we used the default setting for B, the 0.6th power of the number of samples, because the default setting is known to work well in practice [34]. Unlike the parameter B, there is no default setting for MIC. When selecting miRNA–RNA pairs for analysis, the threshold for MIC was set to 0.5, which is the median of its range [0, 1]. Setting the threshold of MIC smaller than 0.5 results in more miRNA–RNA pairs, which will contain a large number of spurious pairs. In contrast, with a larger threshold, we may miss potential prognostic gene pairs. MICs of miRNA–RNA pairs were computed separately in tumor samples and normal samples because the association strength of miRNAs and ceRNAs are different between tumor and normal samples. Those miRNA–RNA pairs MIC  in normal samples and tumor samples were removed because their association is not strong enough to be included in a ceRNA network. We constructed a ceRNA network by subtracting a reference network for a group of normal samples from a perturbed network with a single tumor sample. Thus, each edge in the patient-specific network represents a differential PCC (PCC) of miRNA–RNA pair between a single tumor sample and a group of normal samples. MIC was not used at this stage because MIC does not make sense by its definition and PCC is more suitable for quantifying the perturbation by a single sample.where n: number of samples; X: miRNA; Y: mRNA, lncRNA, or pseudogeneEvery edge of a ceRNA network represents PCC of a miRNA–RNA pair, which is obtained by the following procedure: We divided the PCCs of miRNA–RNA pairs into 2 groups, lymph node metastasis and non-metastasis, and performed the Wilcox test [35] in the two groups. We selected miRNA–RNA pairs with a p-value less than 0.01 in the Wilcox test. We reduced the number of miRNA–RNA pairs further by PCA. Table 7 shows the number of miRNA–RNA pairs left after each filtering process.
Table 7

The number of features left after each filtering process. miRNA–RNA pairs with MIC  both in normal samples and tumor samples were removed by MIC filtering

Cancer#Features after#Features after#Features after
MIC filteringWilcox testPCA
BRCA90,8371563480
COAD178,973196980
HNSC67,020800100
LUAD341,14612,981200
LUSC165,7652436200
STAD976,76317,44560
THCA38,0773397150

The miRNA–RNA pairs with a p-value 0.01 were removed by the Wilcox test. The number of features was further reduced after dimension reduction by PCA of PCCs. In both MIC filtering and the Wilcox test, each feature represents a miRNA–RNA pair, In PCA, the number of features indicates the dimension of a feature vector

Compute PCC of every miRNA–RNA pair in n normal samples. Recompute PCC in samples which include n normal samples and a single tumor sample. Compute differential PCCs (PCCs) between normal samples and the tumor sample. The number of features left after each filtering process. miRNA–RNA pairs with MIC  both in normal samples and tumor samples were removed by MIC filtering The miRNA–RNA pairs with a p-value 0.01 were removed by the Wilcox test. The number of features was further reduced after dimension reduction by PCA of PCCs. In both MIC filtering and the Wilcox test, each feature represents a miRNA–RNA pair, In PCA, the number of features indicates the dimension of a feature vector

Construction of a prediction model

A model for predicting lymph node metastasis in tumor samples was built using an ensemble learning method. There are several ensemble learning methods such as bagging, boosting and stacking [36, 37]. Stacking is known to have higher prediction accuracy, yet lower risk of overfitting than bagging and boosting [38-40]. We selected support vector machine (SVM) and logistic regression (LR) as base models and combined them using stacking ensemble learning in the scikit-learn library [41]. We first trained the SVM model and LR model (base learners) with the original training set. We then used their prediction results as features to train a secondary learner. We used LR as the secondary classifier, which is the default classifier in the library. Stacking integrates the prediction results of the base learners in the best way through the secondary learner. The tumor samples obtained from TCGA were divided into training and test sets with the ratio of 7:3. The parameters of the prediction model were determined by the grid search in the training set. When training and validating the prediction model, tumor samples with lymph node metastasis were considered as positive instances, and tumor samples without lymph node metastasis were considered as negative instances.

Construction of a ceRNA network

For each type of cancer, we constructed a ceRNA network with the miRNA–RNA pairs obtained by the Wilcox test. A node of the ceRNA network represents one of miRNA, mRNA, lncRNA or pseudogene, and an edge represents the interaction of miRNA with other RNAs. The patient-specific ceRNA network is a sub-network of the ceRNA network. For each miRNA–RNA pair, we computed the median of the absolute value of PCC (i.e., PCC|) of the pair in all tumor samples of the same cancer type. A patient-specific ceRNA network was constructed by selecting the miRNA–RNA pairs whose PCC| is larger than the median PCC|. Thus, the edges in a patient-specific ceRNA network represent the miRNA–RNA interactions which show a significant change from other patients of the same cancer type. All additional files are available at http://bclab.inha.ac.kr/LNM/. Additional file 1. Performance of two base models (logistic regression and SVM) and the ensemble model by stacking the base models in predicting lymph node metastasis. Additional file 2. Potential prognostic miRNA–RNA pairs in seven types of cancer. Additional file 3. TCGA barcodes of all normal samples and tumor samples studied in our work.
  1 in total

1.  YWHAE promotes proliferation, metastasis, and chemoresistance in breast cancer cells.

Authors:  Yi-Fang Yang; Yi-Chen Lee; Yen-Yun Wang; Chie-Hong Wang; Ming-Feng Hou; Shyng-Shiou F Yuan
Journal:  Kaohsiung J Med Sci       Date:  2019-04-18       Impact factor: 2.744

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.