Literature DB >> 24895499

Establishing reliable miRNA-cancer association network based on text-mining method.

Lun Li1, Xingchi Hu1, Zhaowan Yang1, Zhenyu Jia2, Ming Fang1, Libin Zhang1, Yanhong Zhou1.   

Abstract

Associating microRNAs (miRNAs) with cancers is an important step of understanding the mechanisms of cancer pathogenesis and finding novel biomarkers for cancer therapies. In this study, we constructed a miRNA-cancer association network (miCancerna) based on more than 1,000 miRNA-cancer associations detected from millions of abstracts with the text-mining method, including 226 miRNA families and 20 common cancers. We further prioritized cancer-related miRNAs at the network level with the random-walk algorithm, achieving a relatively higher performance than previous miRNA disease networks. Finally, we examined the top 5 candidate miRNAs for each kind of cancer and found that 71% of them are confirmed experimentally. miCancerna would be an alternative resource for the cancer-related miRNA identification.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24895499      PMCID: PMC4016856          DOI: 10.1155/2014/746979

Source DB:  PubMed          Journal:  Comput Math Methods Med        ISSN: 1748-670X            Impact factor:   2.238


1. Introduction

MicroRNAs (miRNAs) are a large class of small noncoding RNAs [1] known to be functionally involved in a wide range of biological processes including embryo development, cell growth, differentiation, apoptosis, and proliferation [2-5]. Recently, it has been found that miRNAs play important roles in human tumor genesis and many of them have also been applied as novel biomarkers for cancer therapies [6-11], which attracts more and more efforts in revealing the complex associations between miRNAs and cancers. However, the existing literature usually focused on the relationship between several miRNAs and a specific cancer, leaving the comprehensive miRNA-cancer network unrevealed. Therefore, fully uncovering the associations between miRNAs and cancers would be extremely interesting and valuable for identifying cancer-related miRNA and understanding the mechanisms behind. To this aim, the manually collected miRNA-disease association databases HMDD [12] and miR2Disease [13] have been established. At present, these manually created miRNA-disease networks have been used to predict disease-related miRNAs [14-16] and  achieved relatively high accuracies, opening opportunity of prioritizing miRNAs  with bioinformatics methods. However, thousands of papers on miRNA and cancer researches are published each year, making it difficult to manually check papers. On the other hand, automatic text-mining methods are needed to extract reliable miRNA-disease associations [17] from the increasing database. In this paper, we collected 1,018 associations between 226 miRNA families and 20 common cancers by extracting from more than 7.1 million publications with an automatic text-mining method. All these relationships have been recorded in a database named miCancerna, which can be freely assessed at http://micancerna.appspot.com/. We further constructed a miRNA-cancer general view on top 5% significant associations for visualizing the roles of miRNAs in different cancers and prioritized the cancer-related miRNAs using the random walk with restart algorithm (RWRA) [14] on miRNA-cancer network built on the data in miCancerna. By analyzing the top 5 associated miRNAs of 20 cancers according to Fisher's exact tests, we found experimental evidence for 71% of these miRNA-cancer relationships, and the rest might be candidate cancer-related miRNAs for further experimental validation. The constructed miRNA-cancer network would be extremely valuable for comprehensively understanding the mechanisms of cancers and identifying cancer-related miRNA genes.

2. Materials and Methods

2.1. Collecting Resource Literature

We collected the abstracts from NCBI's MEDLINE database as our target literature resource. MEDLINE is a comprehensive database containing the abstracts of millions of articles in biomedical area. Since a large number of papers are not fully accessible in the PubMed database, we only consider the abstracts for the papers, which are always available. In 2000, Reinhart et al. [18] identified the second miRNA, and thereafter researchers began to pay attention to the importance of miRNAs. Therefore, we mainly focus on the papers that have been published in 2000 and after. In total, 7,207,066 abstracts were retrieved and then screened using keywords, such as “Humans” or “Animals,” within the PubMed search for eliminating plant and virus miRNAs in the following text-mining analysis. This filtration yielded 5,606,308 paper abstracts. Currently, the 20 most common cancers reported by National Cancer Institute (http://www.cancer.gov/) are considered in our study, including leukemia, lung cancer bladder cancer, brain cancer, breast cancer, cervix cancer, colorectal cancer, esophageal cancer, kidney cancer, liver cancer, melanoma, myeloma, non-Hodgkin lymphoma, oral cancer, ovarian cancer, pancreatic cancer, prostate cancer, stomach cancer, thyroid cancer, and uterine cancer. The abstracts are individually marked with cancer types by the following steps: first, we mapped each cancer type to its corresponding MeSH (medical subject headings) term(s), the U.S. National Library of Medicine's controlled vocabulary that are manually assigned for articles archived in MEDLINE describing their subject matters, and then compiled a list of standard names of each type of cancer. Subsequently, we searched each article abstract for the MeSH annotations. The abstracts with MeSH terms in our cancers name list are marked with the corresponding cancer and selected for the following text-mining processing.

2.2. Establishing miRNA-Cancer Networks by Text-Mining Method

With the selected abstracts, we firstly established relationships between miRNAs and cancers by a text-mining method. The associations between miRNAs and cancers were estimated based on the cooccurrence assumption, which is the fundamental assumption in the field of text-mining and can be used to infer whether two terms are associated or not. In our case, if a particular miRNA appears in the abstracts marked by a specific cancer frequently, we can reasonably assume that they cooccurred and tend to be related. To establish the associations between miRNAs and cancers, we detect the appearance of miRNAs in the abstracts marked by cancer types. In this study, the regular expression was applied to match miRNA names against the texts with the following steps. (1) miRNAs (such as “miR-1” and “miR-2”) were firstly extracted from the abstracts with the nomenclature of a “miR” prefix accompanied by a unique identifying number [19]. (2) Following the conventions, a prefixed species/state identifier can be added (e.g., “hsa-miR-1” in Homo sapiens and “pre-miR-1” for a precursor) and additional suffixes can be given to indicate loci or variant (e.g., “miR-1a-1”) [20]. (3) The regular expression was also designed for the variants of some miRNAs, such as “lin-4” and “let-7.” (4) Abbreviations for more than one miRNA are also recognized by the regular expression, for example, “miR-221/222” and “miR-15 & -16.” The significance levels of the associations of the miRNAs and the cancers extracted from the marked abstracts were estimated by one-sided Fisher's exact tests [21]. For a pair of the miRNA M and the cancer C, the P value of Fisher's exact test is calculated based on hypergeometric distribution, as follows: , where n is denoted as the total number of papers included in the text-mining analysis, a stands for the number of papers with both the miRNA M and the cancer C in the abstracts, b and c represent, respectively, the number of abstracts containing one termand excluding the other, and d is the number of papers with neither of the terms. The top 5% miRNA-cancer associations with the minimum P value are considered as significant and were used to generate the general view for miRNA-cancer network. The miRNA-cancer network is a bipartite network composed by miRNA nodes and cancer nodes. Each edge in miCancerna connects a miRNA and one of its corresponding cancers.

2.3. Text-Mining Quality Check

We first queried PubMed with “MIR or MIRN or MIRNA or MICRORNA” and randomly picked up 100 MEDLINE abstracts with at least one miRNA identifier from the querying result as our evaluating data. We then investigated the reliability of detecting miRNAs in texts using the F-measure, which is the harmonic mean of two other measures, recall and precision, as follows: where TP, FP, and FN are the number of true positives, false positives, and false negatives, respectively.

2.4. Random Walk with Restart Method

Based on the network constructed by the data from miCancerna, a random walk with restart (RWRA) method is applied to prioritize cancer-related miRNAs. RWRA is one of the random walk models widely used in disease gene discovery [22]. It simulates a random walker's moves in a given network and the walker moves from a current node to a direct neighboring node or restart with a training node with the probability (α). The movement given out by RWRA is defined as follows: where M is a column-normalized adjacency matrix representing the given network. In this case, each nonzero node in M stands for a certain association between a miRNA and a cancer, and these nodes are taken as seeds. P is a vector representing the probabilities of the walker at each node at time t, and P 0 is the initial probability vector in which training nodes are equally assigned 1/N (N is the number of seeds) while others are 0. The process is iterated until P reaches a stable status when the difference between P and P (measured by L1 norm) is less than a threshold value (10−6 in this study). The stable probability is defined as P . The candidate nodes are then ranked in descending order according to P .

2.5. Leave-One-Out Cross-Validation

The performance of cancer-related miRNA prioritization by random walk with restart algorithm through miCancerna could be evaluated by calculating the area under the ROC through the leave-one-out cross-validation. For each training node, we took it as a candidate node and randomly picked 20 miRNAs not belonging to the same cancer as testing nodes and then prioritized them as above. For each threshold, the sensitivity (SN) and specificity (SP) are defined as follows: where TP (true positive) is the number of training nodes with rank above the threshold, FN (false negative) is the number of training nodes with rank under the threshold, TN (true negative) is the number of testing nodes with rank under the threshold, and FP (false positive) is the number of test nodes with rank above the threshold. The ROC curve shows the relationship between SN and 1-SP, and the AUC means the area under the ROC curve.

3. Result and Discussion

3.1. Online Resource for miRNA-Cancer Network

In the first release, miCancerna records 1,018 associations between 226 miRNA families and 20 common cancers extracted from 7.2 million papers. Now all the data that miCancerna refers to can be freely assessed at http://micancerna.appspot.com/, including the associations, the supporting papers, and significant levels for each association. miCancerna will be updated periodically. To check the text-mining quality, we randomly picked up 100 MEDLINE abstracts that contained at least one miRNA identifier from the search results by querying MEDLINE with “MIR or MIRN or MIRNA or MICRORNA.” A total of 739 miRNA identifiers were manually recognized in the texts of evaluating data, while our regular expression correctly matched 735 of them (true positive, TP), miscalled 2 (false positive, FP), and missed 4 (false negative, FN). So the miRNA annotation gained recall of 0.9946, precision of 0.9973, and F-measure of 0.9959, which demonstrated a fairly high reliability of our regular expression. According to these comparison results, we concluded that miCancerna is a high-quality resource of miRNA-cancer associations.

3.2. miRNA-Cancer Network Visualization

To reveal the roles of miRNA in different cancers, we constructed a bipartite network with the top 5% associations based on Fisher's exact test P values in miCancerna, consisting of 40 miRNA families and 13 types of cancers (Figure 1). In this bipartite network, miRNAs are only connected to cancers and cancers are only connected to miRNAs. The miRNA-cancer network was visualized with Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/). It is interesting to find that almost all these cancers (except the stomach cancer) can be connected via miRNAs, which indicated that different cancers might share common pathogenic components regulated by  these interconnected miRNAs, while stomach cancer may be different with others.
Figure 1

Network illustrated significant associations of miRNAs and cancers. Red circles and green squares represent cancers and miRNAs, respectively, with different sizes according to the number of corresponding annotated papers (logarithmic). Each link represents a miRNA-cancer association with colour and width according to the strength of relationship.

As shown in Figure 1, miRNAs may have different involvements in cancers. Some miRNAs are specifically associated with a specific cancer. For example, miR-15 and miR-16 are tendentiously related to leukemia, and miR-122 is almost exclusively associated with liver cancer. These miRNAs may be used as biomarker candidates for diagnosis and efficacy of therapies for corresponding cancers. By contrast, some miRNAs tend to be associated with various cancers. One example is miR-21, which is shown to significantly associate with breast cancer, colorectal cancer, liver cancer, and pancreatic cancer, indicating that target genes of miR-21 might play critical roles in tumor formation. It is interesting that four miRNA-cancer associations in top 10 (Table 1) are miRNA-leukemia associations, and 28.6% (12) of significant associations were related to leukemia, which makes leukemia the most miRNA-related cancer. Similarly, 8 (19.0%) miRNA families were related to breast cancer in significant miRNA-cancer associations. Furthermore, we found that miR-21 is the most cancer-related miRNA, which is associated with 4 (30.77%) different cancers in significant associations (breast cancer, pancreatic cancer, liver cancer, and colorectal cancers), indicating that miR-21 may be involved in an important pathway in cancer formation.
Table 1

Top 10 associates between miRNAs and cancers.

miRNACancerPapers P value
miR-15Leukaemia 356.804 × 10−43
miR-16Leukaemia335.028 × 10−36
miR-122Liver cancer229.742 × 10−26
miR-181Leukaemia233.142 × 10−25
miR-155Non-Hodgkin lymphoma227.393 × 10−22
Let-7Lung cancer341.110 × 10−19
miR-223Leukaemia161.987 × 10−18
miR-17Non-Hodgkin lymphoma193.772 × 10−18
miR-21Breast cancer311.659 × 10−16
miR-221Thyroid cancer111.607 × 10−14

3.3. Prioritization of Cancer-Related miRNAs

We applied RWRA on the network established by miCancerna to prioritize candidate cancer-related miRNAs, and the performance is evaluated by leave-one-out cross-validation. With a restart probability alpha of 0.9, the AUC of ROC curve can reach 0.798 (Figure 2), while the AUC of 1 stands for the perfect performance and AUC of 0.5 indicates the random performance. The performances with different restart probabilities are showed in Table 2. The AUC improves as alpha increases, but the variation is small. To rule out the possibility that the performance of miCancerna is achieved by chance, a permutation test with 300 runs was performed. For each run, the seeds are randomly selected from the candidate nodes. The average AUC of random permutations obtained by leave-one-out cross validation is 0.513, and the distribution of the random permutation AUCs is shown in Figure 3. It is obvious that there is significant difference between the AUC achieved by miCancerna and the random permutations, which supports that the miCancerna reveals the real involvement of miRNAs in cancer biology.
Figure 2

ROC curves for RWRA on miCancerna and previous miRNA-cancer network.

Table 2

AUC value under different alpha.

Alpha0.10.20.30.40.50.60.70.80.9
AUC0.79520.79730.79740.79780.79810.79810.79830.79830.7984
Figure 3

Distribution of random AUC for miCancerna.

The top 5 potential miRNAs of each cancer are presented in Table 3, among which 71% have been evaluated by experimental evidence in dbDEMC [23] or literatures published after miCancerna. The performance of cancer-related miRNA prioritization demonstrates the reliability of miCancerna. Moreover, the top predicted miRNAs may be the potential cancer-related miRNAs for further study.
Table 3

Top 5 potential miRNAs of 20 cancers.

Bladder cancerBrain cancerBreast cancerCervix cancer
miRNAsConfirmmiRNAsConfirmmiRNAsConfirmmiRNAsConfirm
miR-15Nulllet-7Ref. [25]miR-143dbDEMClet-7Null
miR-34Ref. [26]miR-145Ref. [27]miR-223dbDEMCmiR-221Null
miR-16Ref. [26]miR-16Ref. [28]miR-203dbDEMCmiR-17Ref. [29]
miR-146Ref. [30]miR-155Ref. [31]miR-194dbDEMCmiR-125Null
miR-155Ref. [30]miR-143Ref. [28]miR-100dbDEMCmiR-222Null

Colorectal cancerEsophageal cancerKidney cancerLeukemia
miRNAsConfirmmiRNAsConfirmmiRNAsConfirmmiRNAsConfirm

miR-221dbDEMCmiR-17dbDEMCmiR-125dbDEMCmiR-200Ref. [32]
miR-146dbDEMCmiR-222dbDEMCmiR-222dbDEMCmiR-205Null
miR-29dbDEMCmiR-15dbDEMCmiR-146dbDEMCmiR-193Null
miR-199dbDEMCmiR-125dbDEMCmiR-16dbDEMCmiR-9Ref. [33]
miR-193NullmiR-200dbDEMCmiR-143dbDEMCmiR-31Ref. [34]

Liver cancerLung cancerMelanomaMyeloma
miRNAsConfirmmiRNAsConfirmmiRNAsConfirmmiRNAsConfirm

miR-205NullmiR-23dbDEMCmiR-21Ref. [35]miR-145Null
miR-27dbDEMCmiR-148dbDEMCmiR-145Ref. [36]miR-200Null
miR-124Ref. [37]miR-27dbDEMCmiR-26NullmiR-221Ref. [38]
miR-520dbDEMCmiR-203dbDEMCmiR-143Ref. [36]miR-34Null
miR-203Ref. [39]miR-520dbDEMCmiR-126Ref. [35]miR-205Null

Non-Hodgkin lymphomaOral cancerOvarian cancerPancreatic cancer
miRNAsConfirmmiRNAsConfirmmiRNAsConfirmmiRNAsConfirm

miR-200dbDEMCmiR-15NullmiR-26NullmiR-16Ref. [40]
miR-205dbDEMCmiR-205Ref. [41]miR-181NullmiR-125Ref. [42]
miR-126dbDEMCmiR-10Ref. [43]miR-143Ref. [44]miR-26Null
miR-224dbDEMCmiR-182NullmiR-10NullmiR-126Ref. [45]
miR-23dbDEMCmiR-20NullmiR-23NullmiR-181Ref. [40]

Prostate cancerStomach cancerThyroid cancerUterine cancer
miRNAsConfirmmiRNAsConfirmmiRNAsConfirmmiRNAsConfirm

miR-155dbDEMCmiR-155Ref. [46]miR-15NullmiR-17dbDEMC
miR-29NullmiR-29NullmiR-34NullmiR-222dbDEMC
miR-30dbDEMCmiR-30NullmiR-145Ref. [47]miR-224dbDEMC
miR-10dbDEMCmiR-10Ref. [48]miR-16NullmiR-30dbDEMC
miR-199dbDEMCmiR-199NullmiR-205Ref. [49]miR-106dbDEMC

“Null” means we did not find experimental evidence.

3.4. Comparison with Similar Databases

We made comparisons with similar database or networks. First we compared the data involved in miCancerna and the manual checking database miR2Disease on the number of evidence papers. For most cancers, miCancerna provides much more evidence papers than miR2Disease (Table 4). Second, we compared the prediction performance of RWRA on miCancerna with the miRNA-cancer network used in RWRMDA [14], which was built based on HMDD, a manual database. The ROC curves for both networks are showed in Figure 2. According to the result of leave-one-out cross-validation, the network used in RWRMDA achieved AUC of 0.763, which is lower than 0.797 achieved by miCancerna.
Table 4

Number of evidence papers comparing with miR2Diease.

Cancer typesmiCancernamiR2DiseaseIncrease
Bladder cancer141127.27%
Brain cancer3531067%
Breast cancer13758136.2%
Cervix cancer114175%
Colorectal cancer8139107.7%
Esophageal cancer167128.6%
Kidney cancer144250.0%
Leukemia14645224.4%
Liver cancer9939153.8%
Lung cancer11237202.7%
Melanoma219133.3%
Myeloma93200.0%
Non-Hodgkin lymphoma6213376.9%
Oral cancer190
Ovarian cancer4718161.1%
Pancreatic cancer4716193.8%
Prostate cancer6119221.1%
Stomach cancer4816200.0%
Thyroid cancer219133.3%
Uterine cancer285460.0%
These results indicate that miCancerna provides an alternative resource of miRNA-cancer associations.

4. Conclusion

In this study, we constructed a reliable miRNA-cancer network based on text-mining method, which is stored in the database miCancerna. In current release, there are 1,018 associations between 226 miRNA families and 20 common cancers. According to our test result, the miCancerna provides a reliable and comprehensive resource of miRNA-cancer associations, which can be further used in the identification of cancer-related miRNAs. For future development, we plan to consider more types of cancers, add regulation information to the miRNA-cancer associations, and integrate miCancerna into other related databases, such as MISIM [24], the human miRNA functional similarity and functional network.
  47 in total

Review 1.  Oncogenic microRNAs (OncomiRs) as a new class of cancer biomarkers.

Authors:  Vladimir A Krutovskikh; Zdenko Herceg
Journal:  Bioessays       Date:  2010-10       Impact factor: 4.345

2.  MicroRNA expression in zebrafish embryonic development.

Authors:  Erno Wienholds; Wigard P Kloosterman; Eric Miska; Ezequiel Alvarez-Saavedra; Eugene Berezikov; Ewart de Bruijn; H Robert Horvitz; Sakari Kauppinen; Ronald H A Plasterk
Journal:  Science       Date:  2005-05-26       Impact factor: 47.728

Review 3.  MicroRNAs in vertebrate development.

Authors:  Brian D Harfe
Journal:  Curr Opin Genet Dev       Date:  2005-08       Impact factor: 5.578

4.  Association between miR-200c and the survival of patients with stage I epithelial ovarian cancer: a retrospective study of two independent tumour tissue collections.

Authors:  Sergio Marchini; Duccio Cavalieri; Robert Fruscio; Enrica Calura; Daniela Garavaglia; Ilaria Fuso Nerini; Costantino Mangioni; Giorgio Cattoretti; Luca Clivio; Luca Beltrame; Dionyssios Katsaros; Luca Scarampi; Guido Menato; Patrizia Perego; Giovanna Chiorino; Alessandro Buda; Chiara Romualdi; Maurizio D'Incalci
Journal:  Lancet Oncol       Date:  2011-02-21       Impact factor: 41.316

5.  miR-10b promotes cell invasion through RhoC-AKT signaling pathway by targeting HOXD10 in gastric cancer.

Authors:  Zhuo Liu; Jiaming Zhu; Hong Cao; Hui Ren; Xuedong Fang
Journal:  Int J Oncol       Date:  2012-01-24       Impact factor: 5.650

6.  Differential signature of fecal microRNAs in patients with pancreatic cancer.

Authors:  Yan Ren; Jun Gao; Jian-Qiang Liu; Xiao-Wei Wang; Jun-Jun Gu; Hao-Jie Huang; Yan-Fang Gong; Zhao-Shen Li
Journal:  Mol Med Rep       Date:  2012-04-10       Impact factor: 2.952

7.  Prioritization of disease microRNAs through a human phenome-microRNAome network.

Authors:  Qinghua Jiang; Yangyang Hao; Guohua Wang; Liran Juan; Tianjiao Zhang; Mingxiang Teng; Yunlong Liu; Yadong Wang
Journal:  BMC Syst Biol       Date:  2010-05-28

Review 8.  Potential of anti-cancer therapy based on anti-miR-155 oligonucleotides in glioma and brain tumours.

Authors:  Palmiro Poltronieri; Pietro I D'Urso; Valeria Mezzolla; Oscar F D'Urso
Journal:  Chem Biol Drug Des       Date:  2013-01       Impact factor: 2.817

9.  Human lung cancer cell line SPC-A1 contains cells with characteristics of cancer stem cells.

Authors:  C H Zhou; S F Yang; P Q Li
Journal:  Neoplasma       Date:  2012       Impact factor: 2.575

10.  NEDD9, a novel target of miR-145, increases the invasiveness of glioblastoma.

Authors:  Maria Carmela Speranza; Véronique Frattini; Federica Pisati; Dimos Kapetis; Paola Porrati; Marica Eoli; Serena Pellegatta; Gaetano Finocchiaro
Journal:  Oncotarget       Date:  2012-07
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.