Literature DB >> 30894171

PMAMCA: prediction of microRNA-disease association utilizing a matrix completion approach.

Jihwan Ha1, Chihyun Park2, Sanghyun Park3.   

Abstract

BACKGROUND: Numerous experimental results have indicated that microRNAs (miRNAs) play a vital role in biological processes, as well as outbreaks of diseases at the molecular level. Despite their important role in biological processes, knowledge regarding specific functions of miRNAs in the development of human diseases is very limited. While attempting to solve this problem, many computational approaches have been proposed and attracted significant attention. However, most previous approaches suffer from the common problem of being inapplicable to new diseases without any known miRNA-disease associations.
RESULTS: This paper proposes a novel method for inferring disease-miRNA associations utilizing a machine learning technique called matrix factorization, which is widely used in recommendation systems. In recommendation systems, the goal is to predict rating scores that a user might assign to specific items. By replacing users with miRNAs and items with diseases, we can efficiently predict miRNA-disease associations without seed miRNAs. As a result, our proposed model, called prediction of microRNA-disease association utilizing a matrix completion approach, achieves excellent performance compared to previous approaches with a reliable AUC value of 0.882 by implementing five-fold cross validation.
CONCLUSIONS: To the best of our knowledge, the proposed method applies the matrix completion technique to infer miRNA-disease associations and overcome the seed-miRNA problem negatively affects existing computational models.

Entities:  

Keywords:  Disease; Matrix completion approach; miRNA

Mesh:

Substances:

Year:  2019        PMID: 30894171      PMCID: PMC6425656          DOI: 10.1186/s12918-019-0700-4

Source DB:  PubMed          Journal:  BMC Syst Biol        ISSN: 1752-0509


Background

MicroRNAs (miRNAs) are small non-coding RNAs with lengths of 19~25 nucleotides that play significant roles in inhibiting gene expression by binding to the 3′ untranslated regions of mRNAs at the post-transcriptional level [1-4]. Numerous studies have demonstrated that miRNAs play important roles in multiple biological processes, including aging [5, 6], apoptosis [7], cell proliferation [8], development [9], and differentiation metabolism [10], as well as the progression of human diseases. Additionally, over the past few decades, there have been numerous studies supporting the idea that miRNA is a key factor in cancer-related processes. For example, mir-31 and mir-335 have been shown to be involved in suppressing breast cancer [11-13]. Mir-101 and mir-185 are vital components associated with breast cancer that affect Vegfa and Stathmin1, respectively [14, 15]. Calin et al. proved that mir-15 and mir-16 are key components of cancer formation based on the evidence that they were found in B-cell chronic lymphocytic leukemia patients in over 50% of cases [16]. Despite their significant role in various biological processes, inferring interactions between miRNAs and diseases utilizing experimental methods has critical disadvantages in terms of expense and time. With the emergence of miRNA-related databases from various studies, numerous computational methods have been proposed. Their common goal is to predict true miRNA-disease associations. Most previous computational methods are based on the basic assumption that functionally related miRNAs have a high chance of relating to phenotypically similar diseases [17-19]. Jiang et al. proposed a hypergeometric-distribution-based method to prioritize disease-related miRNAs by constructing a human phenome-miRNAome network, miRNAs functional interactions network, and disease similarity network [20]. However, this method only considers the information of neighboring nodes, meaning there is still a possibility of enhancing performance by utilizing a full global network. Jiang et al. further investigated inferring miRNA-disease associations by integrating multiple sources of data through a naïve Bayes’ model [21]. Zou and Zeng et al. predicted potential miRNA-disease associations through network-based analyses. Their study is based on the assumption that miRNAs with similar functions have a higher possibility of causing phenotypically similar diseases [22, 23]. Furthermore, based on this assumption, Tang et al. inferred candidate disease-related miRNAs [24]. Liu et al. integrated multiple data sources to measure miRNA and disease similarities. By calculating precise similarities, they constructed a heterogeneous network using true miRNA–disease relationships. They also implemented random walk algorithms to predict miRNA–disease associations through heterogeneous networks [25]. However, the performance of this method is strongly affected by miRNA-target interactions and disease-gene association datasets, meaning the authors only focused on specific information, which led to high false-positive and false-negative rates. There have been continuous efforts to improve the performance of predicting potential miRNA-disease associations by utilizing various types of emerging datasets. Accumulated evidence indicates that the functions of miRNAs can be affected by environmental factors (EFs), such as alcohol, cigarettes, diet, drugs, stress, radiation, and viruses. Ha el al. constructed a miRNA functional-similarity-based network by integrating miRNA expression profiles and environmental factor data, where nodes represent miRNAs and edges represent the functional similarities between miRNAs [26]. In this method, the similarity between two different miRNAs is calculated based on the common assumption that similar miRNAs tend to share larger numbers of EFs. However, this method does not consider the chemical structure similarity between EFs, which remains chance of improving performance by calculating more accurate similarity scores. Despite continuous efforts to infer the functions of miRNAs in biological processes, the known functions of miRNAs are very limited. Because of insufficient information, previous methods heavily rely on seed genes. In other words, previous methods are not applicable to new diseases with miRNA that has no revealed information. These models rely on seed miRNAs that are known to be related to a given query disease. Therefore, they fail to make accurate predictions for new miRNA nodes that are not linked to neighboring miRNAs. To solve this insufficient information problem, we propose a novel computational method called prediction of microRNA-disease association utilizing a matrix completion approach (PMAMCA) to predict potential disease-related miRNAs. Our goal is to find how each miRNA is related to a specific disease. By utilizing a machine learning technique called matrix factorization (MF), we infer potential new miRNA-disease associations in a systematic manner without relying on known miRNA-disease association. MF is a machine learning technique that has shown excellent performance in recommendation systems. It has significant advantages in terms of model expandability and accuracy. For these reasons, most major companies involved in selling products to users have adopted matrix factorization to achieve significant profits. The problem of predicting most candidate disease-related miRNAs can be represented as the same problem faced by recommendation systems. In recommendation systems, the goal is to predict the rating score that each user might assign to a given item. By replacing users with miRNAs and rating scores with diseases, we can effectively identify disease-related miRNAs. This paper is organized into four main sections. Section 1 reviewed previous computational methods that focus on inferring miRNA-disease associations and discussed their limitations. Section 2 consists of two subsections. The first enumerates the databases utilized in this paper and the second describes the proposed method. Section 3 presents the results of various experiments that verify the performance of our method. In section 4, we summarize the proposed method and results of our experiments.

Method and materials

In this Section, we describe a method for extracting miRNA-disease associations utilizing a matrix completion approach. Figure 1 illustrates the workflow of the PMAMCA model. First, we gathered miRNA-disease association data from the Human microRNA Disease Database (HMDD), miR2Disease, and Database of Differentially Expressed MiRNAs in Human Cancers (dbDEMC), and preprocessed the data into a uniform format to construct a binary miRNA-disease matrix R. Additionally, we downloaded miRNA expression data from The Cancer Genome Atlas (TCGA) and utilized it to weight our proposed cost function. Second, we divided the original matrix R into a miRNA latent space M and disease latent space D. Finally, by utilizing a MF technique, we trained each matrix M and D simultaneously according to the seed miRNAs in matrix R. Following the training process, prediction can be performed based on the miRNA-disease matrix R by calculating an inner product of M and D (i.e., =). Therefore, we can derive the score of each candidate miRNA from matrix R, where miRNAs with high scores are expected to have a high probability of being involved in disease pathogenesis. For evaluation, the validation datasets were randomly divided into training and test data-sets with a ratio of 80/20.
Fig. 1

The workflow for prioritizing candidate miRNAs

The workflow for prioritizing candidate miRNAs

Datasets

Human miRNA-disease association data

We downloaded miRNA-disease associations data from the HMDD, dbDEMC, and miR2Disease. HMDD v2.0 is a database that contains curated experiment-supported evidence for human miRNA-associated disease associations. HMDD contains 10,368 entries with information regarding 572 miRNAs and 378 diseases from 3511 papers. Yang et al. constructed the dbDEMC, which includes information regarding cancer-related miRNAs from in silico computing. A recently updated version of dbDEMC contains information regarding 2224 miRNAs and 36 diseases. miR2disease is a manually curated database that provides a comprehensive list of miRNA functions in various human diseases. Currently, miR2disease contains information regarding 3273 miRNA-disease associations for approximately 349 miRNAs and 163 diseases. By combining and preprocessing miRNA-disease association from the three databases, we extracted common information regarding 1879 miRNAs and 536 diseases.

miRNA expression data

We manually downloaded miRNA expression data from TCGA and the Gene Expression Omnibus databases for each disease d. Then, for preprocessing, we performed min-max normalization on each expression value and utilized the values as weights (w) for our cost function. We utilized the miRNA expression value only when there was no miRNA-disease association in the original matrix R. The main effect of applying miRNA expression data is that we can efficiently train the latent spaces M and D without knowing the true miRNA-disease associations in the original matrix R, which makes our model more robust.

PMAMCA

The common drawback of most previous methods is that they rely on specific seed genes. For miRNAs that have no associations with seed miRNAs, the aforementioned methods cannot be applied. In other words, previous methods are not applicable to new diseases that do not have any true miRNA-disease associations. However, by applying a machine learning technique called MF, we can solve this problem in an analytical manner. PMAMCA works well for query diseases with no previously known miRNA associations and for inferring potential miRNAs (i.e., miRNAs that are not linked to diseases). Another advantage of utilizing MF is its applicability to various domains. For these reasons, we applied MF to predict novel miRNA-disease associations based on various biological data. Predicting miRNA-disease relationships can be regarded as the same problem solved by recommendation systems, where goal is to recommend the most plausible product (disease) that the user (miRNA) might like. Most major companies that deal with selling products to users, including Netflix, have adopted MF and gained significant profits. In recommendation systems, the goal is to find a correct rating score that a user might assign to an item. By replacing each item with a disease and each user with a miRNA, we can infer whether each miRNA is related to a specific disease. Recommendation systems rely on several types of input data, including explicit feedback and implicit feedback. Explicit feedback is direct input from users regarding items of interest, such as a movie rating score. Based on the difficulty of collecting explicit feedback, recommendation systems indirectly infer the preferences of each user by observing their behavior. This type of input data is called implicit feedback and consists of search patterns, records of purchasing history, and social network information. In our study, we replaced explicit feedback with known disease-miRNA associations, which we utilized as entries in the original matrix R, and implicit feedback with miRNA expression data for the weights w in our objective function. In recommendation systems, input data are typically placed in a matrix with one dimension indicating users and the other dimension indicating items of interest. Our goal is to predict the most plausible miRNAs for a given disease of interest. We constructed a miRNA-disease associations matrix , where each row refers to a miRNA with a total number of N and each column refers to a disease with a total number of N. This original matrix R has the form of a binary matrix, which contains entries R equal to one if there exists a true miRNA-disease association or equal to zero if no association exists. We then applied the MF technique, which is the most common and successful approach for recommendation system as illustrated in Fig. 2. MF maps both miRNAs and diseases into two latent spaces of dimension k. In our method, we set the value of k to 100.
Fig. 2

Applying matrix factorization into miRNA-disease association extraction. miRNA-disease association original matrix R can be divided into latent spaces M and D. Our goal is to learn the latent spaces M and D based on the original matrix R

Applying matrix factorization into miRNA-disease association extraction. miRNA-disease association original matrix R can be divided into latent spaces M and D. Our goal is to learn the latent spaces M and D based on the original matrix R MiRNA-disease associations in the original matrix R are the inner product of the two latent spaces. Given the underlying original matrix R, our goal is to learn latent spaces M and D that are close to the observed entries in matrix R so predicted values can be obtained from the inner product of each latent space. Training was performed after each latent space was randomly initialized. Random initialization was implemented for each entry in the latent space with values following a Gaussian distribution with mean zero variance one. We then applied the MF technique to train the latent spaces. The resulting dot product denotes the relationship between miRNA i and disease j. Our proposed objective function is described above, where λ1 and λ2 represent regularization terms that control over-fitting. w is the weight for approximating the value of the corresponding entry in R. w equals one if there already exists a known relationship between miRNA i and disease j. Otherwise, we utilize a miRNA expression value for the weight w. However, in cases where a miRNA expression does not exist, we set the value of the weight to zero. By applying miRNA expression values as weights w, we can estimate the value of the corresponding entry in the original matrix R. This approximation aids in determining if miRNA i is related to disease j even if there is no information in entry R.

Optimization

The objective function in Eq. (1) is non-convex. To optimize the cost function, we adapted stochastic gradient descent. We computed the gradient of each latent vector M and D and optimized them through stochastic gradient descent. The gradients are described below. The detailed steps of PMAMCA are illustrated in Algorithm 1 and the notations are explained in Table 1.
Table 1

Notation

SymbolDescription
NmNd, Knumber of miRNAs, diseases and latent dimensionality, respectively
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathcal{L} $$\end{document}L cost function
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathrm{M}\in {R}^{N_m\times K} $$\end{document}MRNm×K, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathrm{D}\in {R}^{N_d\times K} $$\end{document}DRNd×KmiRNA and disease latent space, respectively
e ui error between original matrix and inner product of latent spaces
η learning rate
Notation

Experimental results

Validation by area under the curve (AUC)

In order to evaluate the performance of our method, we performed 5-fold cross validation utilizing our original miRNA-disease association matrix, which was aggregated from various databases (HMDD, miR2Disease, and dbDEMC). The miRNA-disease association data was divided into training and test data. Because randomness was involved in the choice of subsets, we performed cross validation 100 times and evaluated the average AUC value. For the test set, we prioritized candidate miRNAs with higher scores as predicted by our model. To validate our model performance intuitively, we first plotted the receiver operating characteristic (ROC) curve by plotting the false positive rate (FPR) against the true positive rate (TPR) based on various thresholds. We then calculated area under the ROC for our model. Theoretically, AUC = 1 indicates perfect prediction by a model and AUC = 0.5 indicates the results of random selection. Surprisingly, our model achieved a reliable value of 0.882.

Comparison with other methods

To further validate the predictive ability of PMAMCA, we experimentally compared five existing state-of-the-art methods, which have shown excellent prediction accuracy. The ROC curves that validate the prediction performance of our model are presented in Fig. 3 for easy comparison. To compare model performance more precisely, the AUC for each model was calculated. As a result, WBSMDA [27], Liu et al. [25], RWRMDA [28], RLSMDA [29], HDMP [30] achieved values of 0.832, 0.816, 0.802, 0.782, and 0.702 respectively. These values were obtained by implementing five-fold cross validation to randomly partition the miRNA-disease association data into five equal parts and utilize one part as a test set and other four parts as a training set. As a result, PMAMCA achieved superior performance compared to the five existing state-of-the-art methods with the value of 0.882.
Fig. 3

Performance comparison between PMAMCA and five state-of-the-art methods. These results demonstrate that PMAMCA is superior to the existing computational methods

Performance comparison between PMAMCA and five state-of-the-art methods. These results demonstrate that PMAMCA is superior to the existing computational methods

Effect of k

The dimension of the latent spaces is a key factor that directly influences model performance. By differentiating various dimensions k, we were able to compare performances based on AUC values. The effect of k on model performance is presented in Fig. 4. A higher k value typically yields more precise results. However, beyond a certain point, complexity begins to increase and efficiency begins to decrease. Most importantly, even a small value of k = 10 results in competitive performance compared to HDMP, as shown in Fig. 3. As we increase the value of k, performance tends to increase, however beyond the certain point of k = 100, performance stabilized. Because of the complexity and efficiency issues mentioned above, we utilized k = 100 for our experiments.
Fig. 4

Performance of PMAMCA with different values of k. Performance tends to increase as latent dimension k increases. However, even with a low value of k = 10, PMAMCA achieved competitive performance compared to previous computational methods

Performance of PMAMCA with different values of k. Performance tends to increase as latent dimension k increases. However, even with a low value of k = 10, PMAMCA achieved competitive performance compared to previous computational methods

Case studies (breast cancer, lung cancer)

Many studies have proved that half of all miRNAs are located in cancer-related genomic regions and that their common functions are related to the development of multiple human malignancies [31]. To validate the performance of PMAMCA, we implemented our algorithm on various cancers (breast cancer, lung cancer, and colon cancer) to determine how successful the proposed method is at extracting potential candidates. Validation was performed based on answer set data (HMDD, miR2disease, and dbDEBC) and literature analysis. Breast cancer is known as one of the most common female malignant neoplasms and accounts for 22% of all cancers in women [32]. For our evaluation, we implemented PMAMCA and prioritized the top-50 breast cancer-related miRNA candidates. As shown in Table 2, we confirmed that 48 miRNAs were found to be related to breast cancer based on our answer-set data. Furthermore, we checked the remaining two miRNAs (miR-140 and miR-142) through literature analysis to determine if these candidates have a high possibility being related to breast cancer. We were able to confirm that these miRNAs are directly or indirectly related to breast cancer. miR-140 is one of the known tumor suppressive miRNAs for breast cancer. Recently, it was proven that miR-140 can lead to considerably reduce expression of breast cancer tissue compared to normal breast tissue [37, 38]. This means that down-regulated miR-140 can lead to a loss of function of tumor suppressor genes and eventually cause breast cancer. miR-142 (miR-142-3p) has also been reported to have a dysregulated presentation in several breast cancer subtypes. It has been shown that overexpression of miR-142 can lead to downregulation of some certain genes that are known to be related to cytoskeletal regulation and cell motility, such as WASL or RAC1 [39]. Additionally, it has been shown that miR-142 can inhibit breast cancer cell invasiveness. By combining these results, we have demonstrated that our top-50 miRNAs were all proved to be breast-cancer-related miRNAs with an accuracy of 100%.
Table 2

Top-50 candidate miRNAs for breast cancer predicted by PMAMCA. Validation was performed utilizing HMDD, miR2Disease, dbDEMC, and literature analysis. All 50 miRNAs were confirmed to be related to breast cancer

RankNameEvidenceRankNameEvidence
1hsa-mir-155miR2Disease, dbDEMC26hsa-let-7imiR2Disease, dbDEMC
2hsa-mir-126miR2Disease, dbDEMC27hsa-mir-185dbDEMC
3hsa-mir-16dbDEMC28hsa-mir-191miR2Disease, dbDEMC
4hsa-let-7bdbDEMC29hsa-mir-143miR2Disease, dbDEMC
5hsa-let-7dmiR2Disease, dbDEMC30hsa-mir-182miR2Disease, dbDEMC
6hsa-mir-145miR2Disease, dbDEMC31hsa-mir-15bdbDEMC
7hsa-let-7amiR2Disease, dbDEMC32hsa-mir-150dbDEMC
8hsa-let-7fmiR2Disease, dbDEMC33hsa-mir-130bdbDEMC
9hsa-mir-146amiR2Disease, dbDEMC34hsa-let-7edbDEMC
10hsa-mir-100dbDEMC35hsa-mir-138dbDEMC
11hsa-mir-181amiR2Disease, dbDEMC36hsa-mir-130adbDEMC
12hsa-mir-148amiR2Disease, dbDEMC37hsa-mir-142Literature [34] [39]
13hsa-let-7gdbDEMC38hsa-mir-133bdbDEMC
14hsa-mir-101dbDEMC39hsa-mir-18amiR2Disease, dbDEMC
15hsa-mir-125bmiR2Disease, dbDEMC40hsa-mir-141miR2Disease, dbDEMC
16hsa-mir-17dbDEMC41hsa-mir-127miR2Disease, dbDEMC
17hsa-let-7cdbDEMC42hsa-mir-135bdbDEMC
18hsa-mir-139dbDEMC43hsa-mir-107dbDEMC
19hsa-mir-15adbDEMC44hsa-mir-140Literature [35] [37] [38]
20hsa-mir-146bmiR2Disease45hsa-mir-106bdbDEMC
21hsa-mir-1dbDEMC46hsa-mir-154dbDEMC
22hsa-mir-10bmiR2Disease, dbDEMC47hsa-mir-181cdbDEMC
23hsa-mir-125amiR2Disease, dbDEMC48hsa-mir-181dmiR2Disease, dbDEMC
24hsa-mir-181bmiR2Disease, dbDEMC49hsa-mir-132dbDEMC
25hsa-mir-183dbDEMC50hsa-mir-186dbDEMC
Top-50 candidate miRNAs for breast cancer predicted by PMAMCA. Validation was performed utilizing HMDD, miR2Disease, dbDEMC, and literature analysis. All 50 miRNAs were confirmed to be related to breast cancer Furthermore, we implemented functional enrichment analysis on the two aforementioned miRNAs utilizing a well-known online enrichment tool called TAM. TAM (http://www.cuilab.cn/tam) is an online miRNA functional enrichment tool developed by Lu et al. It provides the biological significance and common functions of given query miRNAs. Amazingly, the two aforementioned miRNAs were found to be related to lung cancer. Lung cancer is well known as a phenotypically similar disease to breast cancer. We downloaded a phenotypically similar disease list from MimMiner [33], which provides information regarding phenotypically similar diseases to a given input disease. From these results, we were able to validate the biological assumption that phenotypically similar diseases tend to have relationships with functionally related miRNAs. Lung cancer is one of the main causes of cancer-related deaths worldwide and it is the second leading cause of cancer death in the United States [36]. For the further evaluation of PMAMCA, we analyzed the top-50 candidates with the highest chances of being related to lung cancer as identified by PMAMCA. Validation was also performed based on our integrated miRNA-disease answer-set data and 48 candidates were found to be true lung-cancer-related miRNAs. The list of the top-50 lung-cancer-related candidates is provided in Table 3. To verify the potential biological functions of the remaining two miRNAs, we performed functional enrichment analysis on these two miRNAs (hsa-mir-142 and hsa-mir-127).
Table 3

Top-50 candidate miRNAs for lung cancer predicted by PMAMCA. Validation was performed utilizing HMDD, miR2Disease, dbDEMC, and literature analysis. All 50 miRNAs were confirmed to be related to lung cancer

RankNameEvidenceRankNameEvidence
1hsa-let-7amiR2Disease, dbDEMC26hsa-let-7emiR2Disease, dbDEMC
2hsa-mir-145miR2Disease, dbDEMC27hsa-mir-1miR2Disease, dbDEMC
3hsa-mir-17dbDEMC28hsa-mir-101miR2Disease, dbDEMC
4hsa-let-7bmiR2Disease, dbDEMC29hsa-let-7idbDEMC
5hsa-mir-15adbDEMC30hsa-mir-182miR2Disease, dbDEMC
6hsa-mir-155miR2Disease, dbDEMC31hsa-mir-181adbDEMC
7hsa-mir-16miR2Disease, dbDEMC32hsa-mir-191miR2Disease, dbDEMC
8hsa-mir-125bdbDEMC33hsa-mir-141miR2Disease, dbDEMC
9hsa-mir-126miR2Disease, dbDEMC34hsa-mir-150miR2Disease, dbDEMC
10hsa-mir-148adbDEMC35hsa-mir-139miR2Disease, dbDEMC
11hsa-mir-183miR2Disease, dbDEMC36hsa-mir-138dbDEMC
12hsa-let-7gmiR2Disease, dbDEMC37hsa-mir-107dbDEMC
13hsa-let-7cmiR2Disease, dbDEMC38hsa-mir-127Literature [42]
14hsa-mir-146amiR2Disease, dbDEMC39hsa-mir-140miR2Disease, dbDEMC
15hsa-mir-100dbDEMC40hsa-mir-133bmiR2Disease, dbDEMC
16hsa-mir-146bmiR2Disease, dbDEMC41hsa-mir-18bdbDEMC
17hsa-mir-125amiR2Disease, dbDEMC42hsa-mir-130bdbDEMC
18hsa-mir-15bdbDEMC43hsa-mir-130amiR2Disease, dbDEMC
19hsa-let-7dmiR2Disease, dbDEMC44hsa-mir-132dbDEMC
20hsa-let-7fmiR2Disease, dbDEMC45hsa-mir-133adbDEMC
21hsa-mir-10bdbDEMC46hsa-mir-185dbDEMC
22hsa-mir-143miR2Disease, dbDEMC47hsa-mir-106bdbDEMC
23hsa-mir-142Unconfirmed [41]48hsa-mir-135bdbDEMC
24hsa-mir-18amiR2Disease, dbDEMC49hsa-mir-149dbDEMC
25hsa-mir-181bdbDEMC50hsa-mir-106amiR2Disease, dbDEMC
Top-50 candidate miRNAs for lung cancer predicted by PMAMCA. Validation was performed utilizing HMDD, miR2Disease, dbDEMC, and literature analysis. All 50 miRNAs were confirmed to be related to lung cancer These two miRNAs were found to be related to lung neoplasms, breast neoplasms, and colonic neoplasm, which directly or indirectly influence the biological mechanisms of lung cancer. In addition to its role in breast cancer development, miR-142 has been reported to play an important role in modulating non-small-cell lung carcinoma cell tumorigenesis by targeting HMGB1 [40]. miR-142 has also been shown to inhibit the expression of CD133, ABCG2, and LGR5 by binding to both the 3′ untranslated regions and coding sequences of these three genes, which are related to poor prognoses in colon cancer patients [41]. It has been reported that miR-127 can induce in lung adenocarcinoma and is associated with poor prognoses [42]. The authors of [42] demonstrated that high levels of miR-127 can drive and promote stem-like transitions, meaning this miRNA plays a central role in forming aggressive phenotypes of lung cancer. It has also been shown that the up-regulation of miR-127 can affect epigenetic silencing and BCL6, which is a well-known oncogene in colorectal cancer [43]. By combining these experimental results, we verify that the proposed PMAMCA model not only proves that an MF-based prediction method is suitable for finding disease-related miRNAs, but also successfully identifies potential miRNAs with a high probability of being related to disease incidence.

Various ranking thresholds

To validate the performance of our proposed model with various ranking thresholds, we counted the number of retrieved true disease-related miRNAs for different ranking thresholds. By differentiating various ranking thresholds, we analyzed how our proposed model performs at inferring miRNA-disease associations compared to previous state-of-the-art methods. One can see from Fig. 5 that PMAMCA achieved the best performance for all ranking thresholds with various diseases.
Fig. 5

Numbers of correctly retrieved known disease-related miRNAs for various rank thresholds

Numbers of correctly retrieved known disease-related miRNAs for various rank thresholds

Discussion

miRNA functionality analysis

miRNA has shown diversity when regulating translation repression as well as during miRNA-guided rapid deadenylation. Moreover, several studies have proved that miRNAs may function as oncogenes or tumor suppressor genes. Because of the high mutational burden of cancer genomes, distinguishing passenger and driver genes has become a vital task [44]. Passenger mutations were known to affect cell growth and accumulate during tumor progression. However, existing studies have proved that accumulation of deleterious passengers may be associated with carcinogenesis that leads to cellular stress, immune response, and therapy resistance [45]. Therefore, we performed a functional analysis to verify whether the extracted miRNAs can regulate driver or passenger genes. Marchi et al. suggested 47 potential driver and 342 passenger candidate genes using a module-based analysis [46]. We downloaded the list of driver and passenger candidates from a Additional file 1 [46]. Surprisingly, our 33 candidate target genes were matched to the driver genes and 184 target genes were matched to passenger genes. Our confirmed driver and passenger genes are described in the Additional file 1: Table S5. We further performed literature-based analyses through a text-mining technique to validate the study. The following evidences are extracted from the existing papers on PubMed. Marchi et al. suggested that overexpression of miR-130b could affect the potential driver candidates (AR, BIRC5, DNMT3B, ERBB4, FGFR1, PML, PPARG, RB1, and STAT1). MiR-101 loss usually occurs in NSCLC that could be an early occurrence of lung tumorigenesis. Furthermore, miR-101 could be a therapeutic agent to target oncogenes such as EZH2. The difference in miR-101 copy number loss of SCLCs and NSCLCs, which indicates difference in miR-101 expressions may offer different mechanisms of EZH2 activation for different lung cancer types [47]. Overall, miRNA-101 has shown under-expression in various malignancies such as prostate, lung, live, and bladder. Akao et al. proved that ERK5, which is the target of miR-143, could regulate cell growth. This indicates that the anti-oncogenic role of miR-143 affects gastrointestinal cancers [48]. According to previous studies, among the five targets of miR-150, ITGA3, ITGA6, and TNC were found to be involved in integrin-mediated signaling that promotes cancer cell aggressiveness. Moreover, the remaining two targets, CAV and XIAP, have been found to be involved in cancer pathogenesis [49].

Relationship between target genes and cancer hallmarks

Because the research of cancer has considerably progressed in the recent past, further advances in this area considerably depend on the broad understanding of cancer hallmarks and related molecular pathways underpinning the mechanisms involved. These hallmarks indicate the change in cell behavior that characterizes the cancer cell. To identify the relationship between cancer hallmarks and our candidate miRNA, we checked whether our candidate miRNA targets correspond to cancer hallmarks [50]. To incorporate the information of target genes, we downloaded the open data from miRTarbase [53] and miRecords [54]. For the evaluation, we downloaded the list of 163 cancer hallmarks and their signatures from the Additional file 1 of [50]. It was confirmed that our 86 candidate targets were matched to cancer hallmarks. The confirmed cancer hallmarks and their signatures are described in Table 4.
Table 4

List of validated cancer hallmark-based signatures and their genes

ApoptosisCell CycleCell DeathCell MotilityDNA RepairImmune ResponsePhosphorylation 1Phosphorylation 2
COL4A3CCNE1ATMASTN1ANKRD17CPLX2BCKDKADRA2B
CTNNB1CUL3CIAPIN1B4GALT1APTXCRISP3CAMK4CDK17
ELMO2EGFRELMO2HMGCRATXN3FCGRTERC1DAPK1
FAF1NPATFAIMPAFAH1B1DCLRE1CIL2LMTK2EGFR
FAIMPCNPFOXL2PEX5DDB2PSEN1MAPK7LPAR2
FOXL2RASSF4GRIK2RPS6KB1EYA4TNFSF13RPS6KB1NPR1
GRIK2RBBP4JUNSCARB1RAD23BVTCN1SCYL3PIK3CB
JUNSKP1KCNC3SCYL3SFPQSMAD7PIK3R1
MCF2TNFSF13MAP3K11SHHTNFSF13TGFB2PRKCA
PPP3R1TUBB1MCF2SIRT1UPF1TNFSF13PSEN1
PSEN1ZMYND11MYCSMCPXPCTNIKPSKH1
SIRT1PAX3SMOTOP1PTPN11
TNFSF13PKM2TGFBR1TRIM24SRC
PPP3R1TNFSF13TWF1STK38L
PSEN1VAV3TYRO3TNFSF13
TGM2YWHAE
XIAP
ZMAT3
List of validated cancer hallmark-based signatures and their genes We further checked the relationship between the targets and cancer using text-mining techniques through PubMed. Surprisingly, our candidate miRNA, mir-15a, proved to be targeting CDCA4, BCL2L2, YAP1, AKT-3, and Cyclin E1 that are known as oncogenic mRNAs. Alderman et al. have validated that miR-15a plays a significant role in reducing cancer cell survival and aggressiveness through various mechanisms. Moreover, miR-15a was found to decrease the invasiveness of melanoma cells. Consequently, verified targets of miR-15a were found to be oncogenic mRNAs [51]. The above validations support the idea that our model not only efficiently finds disease-related miRNAs, but also finds mechanisms for target gene and cancer incidence.

Conclusion

Recent studies have shown that inferring new miRNA-disease associations utilizing computational methods plays an important role in bioinformatics because it efficiently reduces the time and resources required for biological experiments. In this paper, we proposed a novel method called PMAMCA that utilizes MF to predict novel miRNA-disease associations. PMAMCA achieved a reliable AUC value of 0.882 for five-fold cross validation, which randomly partitioned miRNA-disease association data into five equal groups, utilizing four groups as a training set and the remaining group as a test set. We further validated the performance of the proposed model through case studies on breast cancer, lung cancer, and colon cancer by prioritizing the top-50 candidates with the accuracies of 96, 96, and 92%, respectively. Due to the space issues, result table of colon cancer is contained in Additional file 1. The reliable performance of PMAMCA can be attributed to several advantages. First, we applied MF, which has already shown excellent performance in recommendation systems. Most major companies that deals with selling products to users, including Netflix, have adopted MF and gained significant profits. The major advantages of utilizing matrix-factorization are its domain expandability and model expandability. In recommendation system, the goal is to find the most correct rating score that a user might assign to an item. By replacing objects with miRNA and users with diseases, we can infer how each miRNA is related to specific diseases. By applying MF to predict new miRNA-disease associations, we can not only achieve improved prediction accuracy, but also solve the problem of applying limited sources of miRNA information. Previous methods relied completely on specific seed genes and miRNAs having no association with those seed genes those methods could not be implemented. To solve this problem, PMAMCA applies MF to achieve excellent performance, which was demonstrated through various experiments. Furthermore, PMAMCA also revealed mechanisms of disease pathogenesis and expanded our knowledge of the interactions of miRNAs. PMAMCA still has room for possible improvements to achieve better prediction accuracy. In future work, the performance of our proposed method can be improved by utilizing additional biological datasets as implicit feedback. Furthermore, using information of each cancer hallmark or target gene as implicit feedback increases the possibility of enhancing performance [52]. Applying meaningful biological data involved in cancer incidence is likely to improve the performance of prediction as well as increase understanding of genetic basis mechanism of miRNA. Additionally, extracting meaningful features of miRNAs utilizing various other machine learning techniques and information regarding target genes should make the prediction accuracy of PMAMCA more robust in the future. Table S1. Notation. Table S2. Top-50 candidate miRNAs for breast cancer predicted by PMAMCA. Table S3. Top-50 candidate miRNAs for lung cancer predicted by PMAMCA. Table S4. List of validated cancer hallmark-based signature and their genes. Table S5. List of confirmed driver and passenger genes. (additional experimental result) Table S6. Top-50 candidate miRNAs for colon cancer predicted by PMAMCA. (additional experimental result). Figure S1. The workflow for prioritizing candidate miRNAs. Figure S2. Applying matrix factorization into miRNA-disease association extraction. Figure S3. Performance comparisons between PMAMCA and four state-of-the-art methods. Figure S4. Performance of PMAMCA with different values of k. Figure S5. Numbers of correctly retrieved known disease-related miRNAs for various rank thresholds. (ZIP 2223 kb)
  51 in total

Review 1.  Mechanisms of gene silencing by double-stranded RNA.

Authors:  Gunter Meister; Thomas Tuschl
Journal:  Nature       Date:  2004-09-16       Impact factor: 49.962

2.  miR-127 promotes EMT and stem-like traits in lung cancer through a feed-forward regulatory loop.

Authors:  L Shi; Y Wang; Z Lu; H Zhang; N Zhuang; B Wang; Z Song; G Chen; C Huang; D Xu; Y Zhang; W Zhang; Y Gao
Journal:  Oncogene       Date:  2016-11-21       Impact factor: 9.867

3.  miR-101 DNA copy loss is a prominent subtype specific event in lung cancer.

Authors:  Kelsie L Thu; Raj Chari; William W Lockwood; Stephen Lam; Wan L Lam
Journal:  J Thorac Oncol       Date:  2011-09       Impact factor: 15.609

4.  Impact of deleterious passenger mutations on cancer progression.

Authors:  Christopher D McFarland; Kirill S Korolev; Gregory V Kryukov; Shamil R Sunyaev; Leonid A Mirny
Journal:  Proc Natl Acad Sci U S A       Date:  2013-02-06       Impact factor: 11.205

Review 5.  Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data.

Authors:  Edwin Wang; Naif Zaman; Shauna Mcgee; Jean-Sébastien Milanese; Ali Masoudi-Nejad; Maureen O'Connor-McCourt
Journal:  Semin Cancer Biol       Date:  2014-04-18       Impact factor: 15.707

6.  Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods.

Authors:  Quan Zou; Jinjin Li; Qingqi Hong; Ziyu Lin; Yun Wu; Hua Shi; Ying Ju
Journal:  Biomed Res Int       Date:  2015-07-26       Impact factor: 3.411

7.  Overexpression of suppressive microRNAs, miR-30a and miR-200c are associated with improved survival of breast cancer patients.

Authors:  Tsutomu Kawaguchi; Li Yan; Qianya Qi; Xuan Peng; Emmanuel M Gabriel; Jessica Young; Song Liu; Kazuaki Takabe
Journal:  Sci Rep       Date:  2017-11-21       Impact factor: 4.379

8.  Deep sequencing-based microRNA expression signatures in head and neck squamous cell carcinoma: dual strands of pre-miR-150 as antitumor miRNAs.

Authors:  Keiichi Koshizuka; Nijiro Nohata; Toyoyuki Hanazawa; Naoko Kikkawa; Takayuki Arai; Atsushi Okato; Ichiro Fukumoto; Koji Katada; Yoshitaka Okamoto; Naohiko Seki
Journal:  Oncotarget       Date:  2017-05-02

9.  Aberrant allele frequencies of the SNPs located in microRNA target sites are potentially associated with human cancers.

Authors:  Zhenbao Yu; Zhen Li; Normand Jolicoeur; Linhua Zhang; Yves Fortin; Edwin Wang; Meiqun Wu; Shi-Hsiang Shen
Journal:  Nucleic Acids Res       Date:  2007-06-21       Impact factor: 16.971

10.  Semi-supervised learning for potential human microRNA-disease associations inference.

Authors:  Xing Chen; Gui-Ying Yan
Journal:  Sci Rep       Date:  2014-06-30       Impact factor: 4.379

View more
  6 in total

1.  Identifying Potential miRNAs-Disease Associations With Probability Matrix Factorization.

Authors:  Junlin Xu; Lijun Cai; Bo Liao; Wen Zhu; Peng Wang; Yajie Meng; Jidong Lang; Geng Tian; Jialiang Yang
Journal:  Front Genet       Date:  2019-12-11       Impact factor: 4.599

2.  Discovering Common miRNA Signatures Underlying Female-Specific Cancers via a Machine Learning Approach Driven by the Cancer Hallmark ERBB.

Authors:  Katia Pane; Mario Zanfardino; Anna Maria Grimaldi; Gustavo Baldassarre; Marco Salvatore; Mariarosaria Incoronato; Monica Franzese
Journal:  Biomedicines       Date:  2022-06-02

3.  MDMF: Predicting miRNA-Disease Association Based on Matrix Factorization with Disease Similarity Constraint.

Authors:  Jihwan Ha
Journal:  J Pers Med       Date:  2022-05-27

4.  Improved Prediction of miRNA-Disease Associations Based on Matrix Completion with Network Regularization.

Authors:  Jihwan Ha; Chihyun Park; Chanyoung Park; Sanghyun Park
Journal:  Cells       Date:  2020-04-03       Impact factor: 6.600

5.  MicroRNA-disease association prediction by matrix tri-factorization.

Authors:  Huiran Li; Yin Guo; Menglan Cai; Limin Li
Journal:  BMC Genomics       Date:  2020-11-18       Impact factor: 3.969

6.  ANMDA: anti-noise based computational model for predicting potential miRNA-disease associations.

Authors:  Xue-Jun Chen; Xin-Yun Hua; Zhen-Ran Jiang
Journal:  BMC Bioinformatics       Date:  2021-07-02       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.