Literature DB >> 36140696

mintRULS: Prediction of miRNA-mRNA Target Site Interactions Using Regularized Least Square Method.

Sushil Shakyawar¹, Siddesh Southekal¹, Chittibabu Guda^1,2.

Abstract

Identification of miRNA-mRNA interactions is critical to understand the new paradigms in gene regulation. Existing methods show suboptimal performance owing to inappropriate feature selection and limited integration of intuitive biological features of both miRNAs and mRNAs. The present regularized least square-based method, mintRULS, employs features of miRNAs and their target sites using pairwise similarity metrics based on free energy, sequence and repeat identities, and target site accessibility to predict miRNA-target site interactions. We hypothesized that miRNAs sharing similar structural and functional features are more likely to target the same mRNA, and conversely, mRNAs with similar features can be targeted by the same miRNA. Our prediction model achieved an impressive AUC of 0.93 and 0.92 in LOOCV and LmiTOCV settings, respectively. In comparison, other popular tools such as miRDB, TargetScan, MBSTAR, RPmirDIP, and STarMir scored AUCs at 0.73, 0.77, 0.55, 0.84, and 0.67, respectively, in LOOCV setting. Similarly, mintRULS outperformed other methods using metrics such as accuracy, sensitivity, specificity, and MCC. Our method also demonstrated high accuracy when validated against experimentally derived data from condition- and cell-specific studies and expression studies of miRNAs and target genes, both in human and mouse.

Entities: Chemical

Keywords: least square regression; miRNA–target site interaction; nucleotide sequence feature; pairwise feature scoring

Mesh：

Substances：
MicroRNAs
RNA, Messenger

Year: 2022 PMID： 36140696 PMCID： PMC9498445 DOI： 10.3390/genes13091528

Source DB: PubMed Journal: Genes (Basel) ISSN： 2073-4425 Impact factor: 4.141

1. Introduction

The process of microRNA (miRNA)-directed silencing of messenger RNA (mRNA) has been described as another layer of gene regulatory mechanism in many organisms including animals and plants. By means of regulating gene expression at the post-transcriptional level, miRNA are involved in a wide range of biological processes such as cell development and maintenance [1], cell-to-cell interactions [2], and cancer growth and progression [3]. Around 90% of human genes are governed and regulated by one or more miRNAs at the post-transcriptional level [4]. Factually, single miRNA can interact with multiple mRNAs and individual mRNA can also be targeted by several miRNAs, forming a far more complex network of gene regulation [5,6], which is challenging to study and understand. The interaction between miRNA (average ~22-nt) and its target mRNA involve a seed region (~2–8 nucleotide long) on the miRNA, which seeks a complementary site mostly in the 3′ untranslated region (UTR) of mRNA to bind with; however, perfect seed pairing (canonical interaction) is not required to form a miRNA–mRNA complex in a so-called non-canonical interaction [7,8]. In previous studies, miRNA binding sites have also been identified in the 5′ UTR and coding regions [9,10]. These interactions have shown silencing effects on gene expression [11]. Recent studies also suggested that flanking regions (other than seed binding regions) at both ends of mRNA also contribute towards miRNA–mRNA interactions [12,13]. These studies reveal that the mechanisms involved in miRNA-based gene silencing are very complex and prediction of miRNA–mRNA interactions involves deploying multi-level characteristics of miRNA and their target sites. Several bioinformatics-based approaches were developed to understand miRNA–mRNA interactions. These tools mainly adopted modulating features such as Watson–Crick pairings [14], the thermodynamic stability of miRNA and mRNA complexes [15], and binding site abundance, availability, and accessibility [15] to predict the interactions. Predictive methods such as TargetScan [16], miRWalk [17], MBSTAR [18], DeepMirTar [19], miRAW [20], and RPmirDIP [21] were developed to identify association between miRNAs and mRNAs. MBSTAR uses multiple instances of learning from validated miRNA binding sites to calculate interaction scores. miRDB database [22,23] includes a large collection of miRNA–mRNA interactions predicted by MirTarget tool (an inbuild component of miRDB), which was developed based on common features of miRNA binding sites extracted from high-throughput sequencing experiment. STarMir [24] adopts logistic modeling framework with crosslinking immunoprecipitation (CLIP) studies to predict miRNA binding sites. The model uses sequence-based features and targets secondary structures for predicting the binding sites. Recently, miRAW was developed to predict non-canonical interactions between miRNAs and target mRNAs [20]. Similarly, TargetScan used 14 different sequence features to predict miRNA–mRNA interactions. In continuation, various databases were developed based on these algorithms to provide predicted and experimentally verified miRNA–mRNA interaction pairs. The most common databases that provide predicted miRNA–mRNA interactions include miRDB, TarBase [25], and miRTarBase [26]. Previous reviews also described working strategies, data integration, feature extraction, and limitations of the existing methods [27,28,29]. Early prediction tools such as GUUGle [30] have utilized a single feature based on ‘seed base pairing’ for prediction. However, most methods as mentioned above eventually adopted multiple features that include seed pairing, free energy, sequence conservation, and target site accessibility that were derived from known miRNA–mRNA interaction pairs. These tools showed inconsistencies in their predictions because of inadequate emphasis given to the selection of context-specific features and their weights to reflect the characteristic environment for miRNA–target interactions. For example, algorithms focusing on the sequence conservation strategy work better only for phylogenetically closer species. One of such methods includes the miRanda algorithm [31], which considered the conservation of miRNA binding sites and positions in 3′ UTR to identify potential miRNA–target interactions only in closely related species. Furthermore, the strategies for extracting and integrating the structural and functional features shared between multiple miRNAs that could be responsible for targeting same mRNAs have been less emphasized in previous approaches [14,32]. In other words, similarity-based feature integration strategies have not been much explored in this context. However, a recent tool, miRTMC [33] was developed by adopting similarity networks of miRNAs and mRNAs, and miRNA–mRNA interaction networks. Apart from this, the datasets used to train and test these models are consistent, leading to small overlap between predicted targets by different methods, as highlighted in the previous articles and reviews [14,27,28,34]. Subsequently, most tools suffer from poor sensitivity and accuracy when comparisons are made against experimental data [29,35], raising the need for developing more sophisticated computational methods. Here, we develop a new approach, called mintRULS (microRNA–Target Interaction Prediction Using Kronecker-Regularized Least Square classification), which incorporates sensitive features from miRNAs and target sites on mRNAs in a pairwise manner by utilizing least-square regression-based classification to predict interactions between them. We hypothesized that miRNAs with shared features are more likely to interact with the same mRNA, while mRNAs with similar features tend to be targeted by the same miRNA. With this hypothesis, our strategy of utilizing the similarity features within the miRNA and mRNA species has helped overcome the limitations of the current prediction methods. We demonstrate that our model outperforms the existing tools in the prediction accuracy and validate the method using experimental gene expression data from human and mouse, which will help improve our understanding of miRNA-associated gene regulation at the post-transcriptional level.

2. Materials and Methods

2.1. miRNA–Target Site Associations in Human and Mouse

A subset of the dataset from a previous study [36] was utilized in the present analysis. The data include miRNA and miRNA target site (miTS) associations (MTAs) from (i) study of miRNA interactome by CLASH (crosslinking, ligation, and sequencing of hybrids) in HEK293 cells [8] and (ii) miRNA-target site interaction data in MirTarBase 8.0 with experimental evidence (immunoblot, luciferase reporter assay, qRT-PCR). The combined data were preprocessed to remove pairs with incomplete information. For example, all miRNAs with one or more “N” letters in their nucleotide sequences were removed; whereas, any target sites with >50% “N” letters were filtered out from the study. The final human dataset contains 34,413 MTAs between 845 miRNAs and 32,709 miTS (from 17,625 human mRNA transcripts), while mouse dataset includes 2829 experimentally verified interactions between 327 miRNA and 2675 miTS (from 2424 mRNA transcripts: Unannotated: 1925, annotated genes: 499). For better description, the adjacency matrices and were generated for human and mouse datasets, respectively. The experimentally verified pairs in each matrix represent positive dataset, whereas the remaining pairs were considered as negative dataset.

2.2. Kernel Similarity Scores for miRNA

We developed a comprehensive scoring scheme by using relevant features that are more likely to discriminate between the binding and non-binding MTAs. The rationale for including each feature is provided below.

2.2.1. Free Energy (FE)-Based Similarity

Free energy of RNA molecules (miRNAs and mRNAs) is a very important property that facilitates their interactions because the energy is involved in unfolding the interaction sites to allow pairing of nucleotides between miRNAs and mRNAs. Therefore, lower overall free energy means higher stability of the miRNA–mRNA complex, which can be interpreted as higher possibility of the real interactions. Long et al., 2007 also found a correlation between the folded structure of mRNA and efficacy of miRNAs-driven repression [37]. This concept has also been previously used for the development of various miRNA–mRNA interaction prediction tools such as MiRNATIP [38], Avishkar [39], RNAhybrid [40], and other algorithms [41]. In the current work, Python package, seqfold (https://pypi.org/project/seqfold/, accessed on 28 March 2022) was used to calculate the minimum free energy of each miRNA. This program takes the nucleotide sequence of a given miRNA as input to calculate free energy (also referred as folding energy) based on the thermodynamic principles. The FE-based pairwise similarity between two miRNAs and is calculated as Euclidean distance (Appendix A, Equation (A1)) and is denoted as . The pairwise matrix representing FE-based similarity between all miRNAs is denoted as .

2.2.2. Gaussian Interaction Profile (GP) Kernel Similarity (Based on Known Associations)

The application of GP-based similarity has been successfully implemented in predicting drug–target interactions [42,43], drug–drug interactions [44], and miRNA–disease associations [45]. Here, GP kernel similarity between two miRNAs, and , is defined as . is the binary vector representing the interaction profile of miRNA, . is selected to adjust the kernel width and can be calculated as: nm equals the total number of selected miRNAs. Based on previous studies [46], is set to 1. As defined above, pairwise matrix of GP-based similarities of selected miRNAs is denoted as .

2.2.3. Needleman’s Sequence Similarity

As evident from experimentally verified miRNA-target pairs, miRNA with similar seed sequences are more likely to regulate a similar set of genes [47]. Based on this line of thought, the sequence-based pairwise similarity score was calculated using Needleman–Wunsch algorithms [48]. The similarity score between two miRNAs, and is denoted as , and the whole pairwise matrix is represented by .

2.2.4. Simple Sequence Repeats (SSRs)-Based Similarity

SSRs are repetitive nucleotide sequences and are considered as important binding signatures embedded at the genetic level. Previous study found that miRNAs binding to complementary regions with SSRs showed perturbation in the RNA cross-talks in case of myotonic dystrophy type 1 (DM1) and type 2 (DM2) [49]. Considering the significance of SSRs in mRNA binding, we extracted repeat motifs (RF) from each miRNA using ssrtool (https://archive.gramene.org/db/markers/ssrtool, accessed on 20 November 2021). With the filtering criteria of minimum 3 repeats, we found 12 di-, 51 tri-, and 32 tetramers in all miRNAs. Considering the repeat counts in each miRNA, the Gaussian profile based pairwise similarity between miRNAs, and are calculated as follows: where and RF are binary vectors representing all RFs in miRNAs and . Again, is selected to adjust the kernel width and can be calculated as: As explained above, is set to 1 in this case. nm is the total number of selected miRNA and represents the corresponding pairwise matrix of SR-based similarities.

2.2.5. Integration of miRNA Similarity Scores

All four types of feature scores were combined by employing a weighted combination approach to obtain an integrated similarity matrix, , as defined below: where represents weights given to the different similarities.

2.3. Kernel Similarity Scores for miTS

Similar to the scores for miRNAs, we employed a set of discriminatory features for miTS as follows.

2.3.1. FE-Based Similarity between miTS

The seqfold tool was used in similar manner to calculate the minimum free energy of each miTS, followed by calculation of FE-based similarity between two miRNA binding sites, and , as denoted by . The final symmetrical matrix of pairwise FE-based similarities is termed as, .

2.3.2. Target Site Accessibility (TA)-Based Similarity

Accessibility of the miRNA target site is responsible for easing miRNA binding and subsequent miRNA-driven regulation [6,15]. We calculated accessibility of miTS using RNAplfold module of ViennaRNA package (http://www.tbi.univie.ac.at/RNA/, accessed on 20 November 2021). The pairwise similarity between TAs of two miTS and is calculated based on Euclidean distance and is denoted as . The matrix representing score for chosen miTS is termed as .

2.3.3. AU Content (AU)-Based Similarity

mRNA can be folded to form a secondary structure which might hinder the repression potency of miRNA by lowering the site accessibility [50]. A previous study suggested that lowering the GC content (or high local AU content) near the target sites and also in the 3′ UTR region of mRNA increases accessibility to interact with miRNA [6,51]. Therefore, the GC content on each miTS was calculated separately, followed by calculation of pairwise AU-based similarity between two miTS, and based on Euclidean distance (Appendix A, Equation (A2)), and is dented by . The final similarity matrix of AU-based similarities between different miTS is represented by .

2.3.4. Simple Sequence Repeats (SSRs)-Based Similarity

Similar to miRNAs, SSR motifs were extracted from each miTS with the same filtering criteria, and Gaussian profile-based pairwise similarity , between miTSs, and were calculated. Here, we denote the whole pairwise matrix of all miTS as

2.3.5. Integration of miTS’s Pairwise Similarities

Similar to the miRNAs analysis, different similarity matrices were combined with providing specific weightage to each one, as described below, to get final matrix . provides weights given to a particular feature.

2.4. mintRULS

We developed a computational model, mintRULS, which utilizes known MTAs to predict possible interactions while incorporating multiple similarity-based kernels of miRNA and miTS. The relevance score is calculated based on Kronecker product and the regularized least square (RLS) method. The adjacency matrix, was generated to describe the known and unknown associations between nm miRNAs and nt miTS. For known associations between miRNA and miTS , the association value was assigned 1, else 0. As illustrated in Figure 1, out of the whole interaction data a random dataset with k number of miRNAs ,…}, and l number of target sites ,…} is selected to form random adjacency matrix . The samples for training can be prepared as , where represent miRNA-miTS pair and corresponding binary level in the adjacency matrix, respectively with .

Figure 1

Schematic representation of the workflow for feature integration, cross-validation, and performance evaluation of the model mintRULS. miRNA: microRNA, miTS: miRNA Target Sites. CV: Cross-Validations, LOOCV: Leave-One-Out-CV (LOOCV), LmiTOCV: Leave-miTS-Out-CV. In the matrix , 1 represents positive interactions, while 0 represents no interactions between miRNA and target site.

Further, as explained in [52], using the labeled training samples S, the following objective function is minimized with the goal of learning a function to generalize it on new miRNA–miTS samples. is the norm of function measured in Hilbert space with kernel function . The regularization parameter > 0 is adjusted for balancing prediction error and model complexity. According to Representer Theorem [53], the function in the above equation can be expressed in the following form to get minimizer of the objective function . As calculated in [54], , the function can be represented as follows: As previously mentioned in [55], in the above equation can be calculated by solving following linear equation: where is the Kronecker product of two kernel similarities functions, = , with and as integrated similarity matrix of chosen miRNA and miTS. is the identity matrix. As referred in the previous studies [56,57], the eigen decomposition of the kernel matrices and are performed as follows: In the above eigen decomposition, and represent eigenvalue vector and its transpose, respectively for miRNAs. Similar notations stand for miTS. and are the diagonal matrices. in Equation (9) can be calculated as follows: where

2.5. Cross-Validations and Performance Testing

2.5.1. Cross-Validations

The performance of mintRULS model was evaluated by conducting cross-validation (CV) mainly in two ways: (1) Leave-One-Out-CV (LOOCV) and (2) Leave-miTS-Out-CV (LmiTOCV), using human and mouse datasets, separately. LOOCV refers to the condition when one MTA is considered as a test sample while the remaining ones in the adjacency matrix are considered as training samples. In LmiTOCV, 10% of all miTS and their associations with miRNA are considered as test data while remaining MTAs in are kept for training the model. To make the simulation process computationally inexpensive, the random miRNA and miTS are chosen from the original adjacency matrix to form a sample adjacency matrix , with and . This randomization is iterated over 100 times to reduce impacts of data overfitting, and the model is simulated each time in both the environments, LOOCV and LmiTOCV.

2.5.2. Score Normalization and Performance Evaluation

Actual and predicted miRNA-miTS interactions were used to calculate true positive rate (TPR), and false-positive rate (FPR). Receiver operating characteristics (ROC) curve was drawn to determine the area under ROC curve (AUC) for estimating the performance of the models. Additionally, other parameters such as accuracy, sensitivity, specificity, and MCC were also calculated for human and mouse datasets, separately. Minimum miTS sequence length as 40 and 30 nucleotides were considered to perform simulations in case of human and mouse, respectively. In the present analysis, AUC with values 0.5 meant the model can predict randomly, while AUC = 1 indicated the best performance of the model. Further, mintRULS-predicted scores were normalized using unity-based methods to classify the miRNA-miTS pairs, as explained below: where a = 0, and b = 1 was set in current model. is the derived normalized score of predicted score for an interacting miRNA–miTS pair. are minimum and maximum mintRULS score obtained for that miRNA across all miTS. The normalized score will provide space to define the strengths of the predicted interactions rather than classifying them in binary (on/off) relationships. All the pairs were divided into three categories based on quantile normalization of the score. The lower and upper quartile lines are considered as boundaries between each category, as defined below: Weak Targets: Moderate Targets: between lower quartile (25th quartile) and upper quartile (75th quartile). Strong Targets: >upper quartile (75th quartile).

2.5.3. Comparison with Previous Methods

We also compared mintRULS predictions with the previous popular tools and databases which include miRDB, TargetScan, MBSTAR, RPmirDIP, and STarMir [24]. To make the comparison methodologically relevant and effective, we also included the tools whose working strategies directly or indirectly focus on features of miRNAs and their target sites. More specially, the objective here is to compare prediction power of mintRULS with other tools, which will subsequently help to understand importance of inclusion of multiple features (in pairwise manner) over single features. The interacting pairs predicted by these resources were obtained as of 20 March 2021. MBSTAR is a machine learning program that extracts features from validated potential binding sites in the mRNA and use them to train the classifier and predict target and non-target mRNAs. Further, by using random forest classifier, the algorithm predicts functional binding sites in the mRNA. To choose a dataset of highly interacting miRNA–mRNA target pairs, all human sequence pairs with scores higher than 0.5 were considered as positive pairs and included in the present comparative analysis. miRDB database contains miRNA-target pairs predicted by MirTarget, which is an algorithm trained by using crosslinking immunoprecipitation (CLIP)-based binding and miRNA expression data using the SVM machine learning framework. The algorithm looks for the common features which are associated with both miRNA and downregulation of the target. As a prediction score, the algorithm generates a probability score between 0 and 100 for each target site. In case of multiple target sites on mRNA, the individual score is combined to calculate final score. miRDB provides only interacting pairs with score > 50. Here, we downloaded all human interacting pairs and compared with mintRULS’s predictions. STarMir, a web server, was developed on a logistic modeling framework and trained using CLIP data. The method incorporates a variety of thermodynamic, structural, and sequence-based features for seed and non-seed regions as well as different regions (e.g., (3′ UTR, CDS and 5′ UTR)) on mRNA. In terms of the prediction score, the model outputs the probability score representing miRNA–target site interactions. As discussed in the article, predictions with the probability score of 0.75 or higher give highly likely interacting pairs. Therefore, only highly interacting pairs were considered in this analysis for comparison. TargetScan predicts miRNA–target interactions by matching conserved 8-mer, 7-mer, and 6-mer sites in the seed region. TargetScanHuman (v 7.2) (https://www.targetscan.org/vert_80/, accessed on 20 March 2021) utilizes various binding sites related characteristics and 14 features to predict interactions between miRNA and its targets. From the database, interacting pairs with weighted context++ score percentile higher than 50 were considered as positive pairs in the comparative analysis. RPmirDIP provides interacting pairs predicted by mirDIP (microRNA Data Integration Portal) [58] which uses a semi-supervised machine learning method “Reciprocal Perspective (RP)”. In the present analysis, all the pairs with the recommended Difference of Scores (DoS) of higher than 0.5 were considered. The separate data matrix representing interactions between miRNA and targets were prepared for each database discussed above. The interacting and non-interacting pairs in the test dataset were searched in each data matrix, and confusion matrix was built to calculate AUC values in each case.

2.6. Model Code Implementation and Software Availability

Python 3.7 (https://www.python.org), PyCharm Community version 2019.3 (https://www.jetbrains.com/pycharm/), and R 4.0.5 (https://www.r-project.org/) were used to develop scripts and run all the simulations, accessed on 20 November 2021. All the core scripts and related data can be accessed from https://doi.org/10.5281/zenodo.6360587.

2.7. Validation of Predictions

2.7.1. Using Condition- and Cell-Specific Studies

Experimental data that identified interactions between hsa-miR-548ba and four genes (IFR, PTEN, NEO1, and SP110) in human ovarian granulosa cells [59] were used to validate the mintRULS predictions. Similarly, experimentally verified interactions of miRNA hsa-miR-34a-5p with genes including JNK3, SMAD7, SMAD2, CREB1, TH, CLOCK, GRIA4, and PARK2 in Human Neuroblastoma Cell Line SH-SY5Y using high-throughput miRNA interaction reporter assay (HiTmIR) were also considered [60].

2.7.2. Using Literature-Based Data

The top predictions by mintRULS were compared with the information in literature and databases including miRDB and TargetScan.

2.7.3. Using Expression Data of miRNA and mRNA in Gastrointestinal (GI) Cancer

TCGA level 3 gene/mature miRNA expression data for pan-GI cancers (stomach adenocarcinoma, STAD; cholangiocarcinoma, CHOL; pancreatic adenocarcinoma, PAAD; esophageal carcinoma, ESCA; and liver hepatocellular carcinoma, LIHC) were collected and analyzed using QIAGEN Ingenuity Pathway Analysis (IPA) (please refer to Supplementary Document for the methodology of IPA) to identify negative expression correlations of top predicted miRNA–mRNA pairs from mintRULS.

2.7.4. Using Expression Data of miRNA and mRNA in Normal and Septic Mice

The expression data of miRNAs (GSE74952 study) and genes (GSE55238 study) in control and septic mice, respectively, were downloaded from Gene Expression Omnibus (GEO) database and analyzed using GEO2R. The mintRULS predicted pairs that showed negative expression correlations were identified. More methodological description of (c) and (d) are provided in Appendix A (method section).

3. Results

3.1. Performance Evaluation of mintRULS

mintRULS achieved an average AUC of 0.93 and 0.92 on the human dataset, while it scored AUC of 0.861 and 0.865 on the mouse dataset in LOOCV and LmiTOCV simulation environments, respectively (Table 1). The ROC profile indicating AUC measurements in both the cases are shown in Figure 2A,B. The model also recorded high accuracy at 90.8% and 91% in LOOCV and LmiTOCV simulations, respectively, using human data, supporting its strong prediction ability. In the case of mouse also, the achieved accuracies were 84.6% and 84.4% in LOOCV and LmiTOCV settings (Table 1). For more intuitive evaluations, high measurements of the other parameters including MCC, specificity, and sensitivity (Table 1) indicated high performance of the model on human as well mouse datasets. In case of mouse, the prediction performance of the model has been observed to be comparatively similar in both the simulation environments. In addition, the high specificity indicates the better ability for identifying specific interactions between miRNA and miTS in the mouse. We therefore interpreted that the model has the ability to predict miRNA–target site interactions.

Table 1

Performance measurements of mintRULS by different evaluation parameter using human and mouse datasets. LOOCV: Leave-One-Out-Cross Validation, LmiTOCV: Leave-miTS-Out-Cross-Validation, ROC: Receiver Operating Characteristics, AUC: Area Under Curve, MCC: Matthews correlation coefficient.

	Accuracy	Sensitivity	Specificity	MCC	AUC (ROC Curve)
Human dataset
LOOCV	0.908	0.847	0.909	0.67	0.931
LmiTOCV	0.91	0.829	0.909	0.652	0.925
Mouse dataset
LOOCV	0.846	0.783	0.846	0.59	0.861
LmiTOCV	0.844	0.767	0.839	0.564	0.863

Figure 2

Performance of the mintRULS model using ROC profiling in case of (A) human, and (B) mouse datasets. miTS: mRNA target site, LOOCV: Leave-One-Out-Cross Validation, LmiTOCV: Leave-miTS-Out-Cross-Validation.

Further, comparison of mintRULS predictions with other methods were performed using the human dataset. The methods miRDB, TargetScan, MBSTAR, RPmirDIP, and STarMir achieved AUC of 0.73, 0.77, 0.55, 0.84, and 0.67, respectively; in comparison mintRULS received better AUC of 0.93, in LOOCV settings, showing superior performance of the current method (Figure 3).

Figure 3

Performance comparisons between predictions made by mintRULS model and other previous methods that include miRDB, TargetScan, MBSTAR, RPmirDP, and STarMir, using Receiver operating characteristics (ROC) curve and Area Under Curve (AUC) determination. The dark red dashed diagonal line stands for a non-discriminatory test.

3.2. Evaluation of Regularization Parameter (λ)

As defined in the method section, tuning the regularization parameter (λ) is important to reduce the overfitting which might decrease the variance of estimated regression parameters by adjusting the bias. Herein, we evaluated λ over different datasets in both LOOCV and LmiTOCV settings. Using the adjacency matrix , five different random data matrices, i.e., , , , , and comprise of all 845 miRNAs and different numbers of random miTS, as shown in the subscript, were prepared. Figure A1 (Appendix A), indicated that a higher miTS number tends to provide better AUC in both LOOCV and LmiTOCV. However, it is not advisable to choose a larger number of miTS as it creates a very high number of empty cells in the adjacency matrix which eventually could lead to the underperformance of the model. Based on these results, we selected the dataset as optimal for further analyses.

Figure A1

AUC measurement after simulating mintRULS model on different values of regularization parameter (λ). The randomization in case of each dataset was iterated over 100 times and average AUC was calculated in (A) LOOCV, and (B) LmiTOCV environment. Definitions: Subset 1 (845 X 100) contains a matrix representing interactions among 845 miRNAs and 1000 miTS; and similarly, for other subsets.

Next, using the data matrix AUC was measured for different values of regularization parameter λ. Interestingly, as shown in Figure 4A, λ > 35 obtained the highest values of AUC corresponding to 0.931 and 0.925 in the case of LOOCV and LmiTOCV, respectively, which we interpreted as optimal in our case. With the chosen λ = 35, the model extracts favorable features from miRNA and miTS sequence with adding some obvious biases to predict miRNA-miTS interactions.

Figure 4

(A) Performance evaluation of regularization parameter (λ) in LOOCV and LmiTOCV simulation environments. The 100 times iterations of the data matrix A_(845 × 3000) (miRNA: 845 and miTS: 3000) was done with performing the model simulation. (B) Effect of variation on length of miTS sequences on the prediction performance of the model. As in the case of (A), randomized data matrix A_(845 × 3000) was used to perform the cross-validations in LOOCV and LmiTOCV environments. LOOCV: Leave-One-Out-Cross Validation; LmiTOCV: Leave-miTS-Out-Cross Validation; miRNA: MicroRNA; miTS: miRNA Target Site; AUC: Area Under the Receiver Operating Characteristic Curve.

3.3. Evaluation of miTS Sequence Length and Features

3.3.1. Effect of Longer Sequence Length

The computational models have fully or partially utilized features associated with miTS sequences to predict interactions with miRNAs. As introduced earlier, GC content, accessibility, seed pairing, and flanking sequences are some of the widely used features in these models [15]; however, lack of emphasis has been given on consideration of the length of binding sites in most of the models. This is important mainly in the sense that an optimized length of miTS (including seed regions and flanking regions on both sides) can provide the best and effective features to predict more accurate interactions with miRNAs. On this note, we performed systematic comparisons between different sequence lengths (=10, 20, 30, 40, and 50 nucleotides) of miTS to observe its impact on the model’s performance. As shown in Figure 4B, the higher sequence length corresponds to better AUC, suggesting more powerful and effective features. The shorter length of miTS may possibly cause high noises in the simulation, as also stated in [61]. However, for obvious reasons, too lengthy sequences might side pass any mutational effect on miTS, and are thus not recommended. Therefore, a sequence length of 40 nucleotides was considered as the most optimal in the current analysis.

3.3.2. Feature Selection and Feature Contribution

The model is generalized over different weight combinations used for prioritizing features of miRNA and miTS, separately. In this simulation process, the weights associated with mRNA features were kept constantly distributed to determine individual effect by miRNA’s features on model performance, as shown in Figure 5. In this case, Needleman sequence similarity and GP-based similarity showed higher contributions towards better performance of the model. Similarly, the effect of mRNA features was observed individually with no significant differences in the measured AUC values (Figure 5). Considering these findings, we simulated feature formulations giving more weightage to the features with more individual contributions and achieved significant improvements in AUC up to 0.93 (Figure 5). The model achieved higher AUCs of 0.81 and 0.80 for miRNA’s features, Needleman Sequence ()-, and Gaussian profile ()-based similarities, respectively, as compared to the other two features, free energy () and SSRs Gaussian-based similarity (). The GP-based calculations, as their intrinsic characteristic, are done with the assumption that similar miRNAs can interact with the same targets, and vice versa, which is the base hypothesis of this study. It can also cover nonlinear relationship of known miRNA–target interactions. Previous successful applications of GP kernels include development of feature-based models for predicting drug–target interactions, miRNA–disease associations, circRNA–disease association, drug–disease associations, and drug–drug interactions [42,43,44,45,62]. Likewise, we also interpret that similarity-based models, including the current mintRULS, have the potential to predict miRNA–target interactions. On the other hand, SSR-based features, both from miRNA or mRNA, were not so predictive, perhaps because of the non-specificity of SSRs (i.e., n = 3 or 4 or 5) considered in the present study. As there are a handful of studies showing significance of SSRs in miRNA-target binding [49,63,64], further investigation on feature manipulation is required to better incorporate these features in the similarity-based modeling. From the different features considered for mRNA, free energy, AU content, and accessibility were among the top predictors in case of mintRULS. These many features and their roles in miRNA binding have been previously discussed in the literature [14,32,65], with raising questions on their systematic integration and incorporation to predictive modeling which is still a challenge to the model developers.

Figure 5

The model performance using different weights combinations of miRNA and mRNA target site features. SSR: Simple sequence repeats, miRNA: microRNA, miTS: miRNA Target Sites.

3.4. Validation

Interacting pairs between miRNA hsa-miR-548ba and three genes which include IFR, PTEN, and NEO1, were classified as “Strong Target”, and showed consistency with the results in [59] (Table 2). Similarly, from the study [60], interacting pairs between miRNA hsa-miR-34a-5p and genes including SMAD7, SMAD2, CREB1, and CLOCK, were predicted as “Strong Target”, while binding of hsa-miR-34a-5p with GRIA4 was predicted as “Weak Target”. It is interesting to notice that most predicted results are consistent with the outcomes of the experimental studies (Table 2). The interaction between these many pairs were also confirmed by performing protein level analysis in SH-SY5Y cells in the same study. Other interactions such as hsa-miR-22 with BMP-7/6, hsa-miR-146a-3p with TRAF6 and RIPK2, and hsa-miR-125b with PARP1, p53, Beta-actin, and CPSF6 from different studies were also verified and found consistent with the experimental outcomes (Table 2). The experimentally validated negative interactions between hsa-miR-125b and Beta-actin, and 18S RNA with gld-1:gfp mRNA were also predicted correctly as ‘Weak Targets’ (below 25th percentile) by mintRULS (Table 2).

Table 2

Predicted miRNA-miTS interactions using mintRULS and validation using experimental data in human. Strong Target: Upper quartile (>75th percentile), Moderate Target: Middle quartile (in between 25th and 75th percentile), and Weak Target: Lower quartile (<25th percentile).

miRNA	Target Gene	Results in Reference	mintRULS		Experimental Evidence
miRNA	Target Gene	Results in Reference	Predictions (Quartile)	Classification	Cells/Tissues	Reference
hsa-miR-548ba	LIFR	Target	Upper	Strong Target	ovarian granulosa cells	[59]
	PTEN	Target	Upper	Strong Target
	NEO1	Target	Upper	Strong Target
hsa-miR-34a-5p	CLOCK	Target	Upper	Strong Target	SH-SY5Y cells	[60]
	CREB1	Target	Upper	Strong Target
	GRIA4	Target	Lower	Weak Target
	SMAD2	Target	Upper	Strong Target
	SMAD7	Target	Upper	Strong Target
hsa-miR-22	BMP-7/6	Target	Upper	Strong Target	Mouse primary kidney fibroblasts	[67]
hsa-miR-146a-3p	TRAF6	Target	Upper	Strong Target	Mouse Myeloid cells	[68]
hsa-miR-146a-3p	RIPK2	Target	Upper	Strong Target	Mouse Myeloid cells	[68]
hsa-miR-125b	CPSF6	Target	Upper	Strong Target	HEK-293T	[69]
	PARP1	Target	Middle	Moderate Target	HEK-293T cells	[70,71]
	p53	Target	Upper	Strong Target
	Beta-actin	Non-Target	Lower	Weak Target
18S RNA	gld-1:gfp	Non-Target	Lower	Weak Target	Caenorhabditis elegans	[72]

We also checked the performance of mintRULS for predicting interactions when mutation(s) in the seed region of miRNA occur. To perform this experiment, mutation information of a few randomly selected miRNAs in human (e.g., hsa-miR-124-3p, hsa-miR-662, hsa-miR-125a-5p, etc.) and mouse (e.g., mmu-miR-342-5p, mmu-miR-690, and mmu-miR-743a-3p) along with the effects on the interactions with their target genes were downloaded from the PolymiRTS database [66]. In total, 40 pairs comprising 20 wild-type (WT) and 20 mutated (mut) miRNAs with target genes were included for this experiment. The mutation-driven changes in the interactions are described by context+ score difference (∆S), as mentioned in Table 3. Interestingly, all the WT pairs (WT miRNAs and their target genes) were predicted as “Strong Targets”, while 16 (out of 20) of their mutated counterparts were predicted as “Weak Targets”, showing good consistency with the information (∆S, representing disruption in the interaction) in the PolymiRTS database. It is noteworthy that even the other four pairs (i.e., hsa-miR-125a-5p with ZMYM3, hsa-miR-645 with COL4A4, mmu-miR-342-5p with RASL10B, and mmu-miR-690 with RBBP5) involving the mutated miRNAs were predicted as “Moderate Targets” but not as “Strong Targets”, showing that the predictions are somewhat consistent with the ∆S (Table 3). We also considered a special case study by Dash et al., 2020, where interactions of hsa-miR-124-3p with WT PARP-1 and its mutant were observed. In this case, mintRULS performed very well by correctly classifying interactions of the miRNA with WT PARP-1 and with four of its variants (Mut1, Mut2, Mut3, and Mut4) (Table 3).

Table 3

Validation of mintRULS predictions in case of mutations in the seed region of miRNAs or in the target gene itself. Upper quartile (>75th percentile), Moderate Target: Middle quartile (in between 25th and 75th percentile), and Weak Target: Lower quartile (<25th percentile).

miRNA	miRNA/Seed Mutation	Target Gene/Mutation	Result in Reference	mintRULS Prediction		Reference
miRNA	miRNA/Seed Mutation	Target Gene/Mutation	Result in Reference	Quartile	Class	Reference
hsa-miR-124-3p	UAAGGCACGCGGUGAAUGCCAA	Parp-1 (WT)	Target	Upper	Strong Target	[73]
		Mut1: PARP-1 (CC > GG)	No target	Lower	Weak Target
		Mut2: PARP-1 (TG > CA)	No target	Lower	Weak Target
		Mut3: PARP-1 (GC > AA)	No target	Lower	Weak Target
		Mut4: deletion (ΔGC)	No target	Middle	Moderate Target
cel-let-7-3p	AU[G/A]CAA	LIN-41	WT: Target	Upper	Strong Target	[74]
cel-let-7-3p	AU[G/A]CAA	LIN-41	Mutation: No Target	Lower *	Weak Target *	[74]
hsa-miR-662	CCCAC[G/A]U	KLLN	Disrupted(∆S = −0.51)	Upper	Strong Target	PolymiRTS database
		KLLN	Disrupted(∆S = −0.51)	Lower *	Weak Target *	PolymiRTS database
		PATE4	Disrupted(∆S = −0.45)	Upper	Strong Target	PolymiRTS database
		PATE4	Disrupted(∆S = −0.45)	Lower *	Weak Target *	PolymiRTS database
hsa-miR-125a-5p	CCCUGA[G/U]	ZMYM3	Disrupted(∆S = −0.31)	Upper	Strong Target	PolymiRTS database
		ZMYM3	Disrupted(∆S = −0.31)	Lower *	Moderate Target *	PolymiRTS database
		PRRC1	Disrupted(∆S = −0.45)	Upper	Strong Target	PolymiRTS database
		PRRC1	Disrupted(∆S = −0.45)	Lower *	Weak Target *	PolymiRTS database
		AQPEP	Disrupted(∆S = −0.42)	Upper	Strong Target	PolymiRTS database
		AQPEP	Disrupted(∆S = −0.42)	Lower *	Weak Target *	PolymiRTS database
hsa-miR-645	[C/G]UAGGCU	COL4A4	Disrupted(∆S = −0.38)	Upper	Strong Target	PolymiRTS database
		COL4A4	Disrupted(∆S = −0.38)	Middle *	Moderate Target *	PolymiRTS database
		MAOA	Disrupted(∆S = −0.4)	Upper	Strong Target	PolymiRTS database
		MAOA	Disrupted(∆S = −0.4)	Lower *	Weak Target *	PolymiRTS database
		IL4R	Disrupted(∆S = −0.42)	Upper	Strong Target	PolymiRTS database
		IL4R	Disrupted(∆S = −0.42)	Lower *	Weak Target *	PolymiRTS database
hsa-miR-146a-3p		CP	Disrupted(∆S = −0.57)	Upper	Strong Target	PolymiRTS database
		CP	Disrupted(∆S = −0.57)	Lower *	Weak Target *	PolymiRTS database
		ABCB1	Disrupted(∆S = −0.35)	Upper	Strong Target	PolymiRTS database
		ABCB1	Disrupted(∆S = −0.35)	Lower *	Weak Target *	PolymiRTS database
mmu-miR-342-5p	[G/-]GGGUGC	PIGU	Disrupted(∆S = −0.46)	Upper	Strong Target	PolymiRTS database
		PIGU	Disrupted(∆S = −0.46)	Lower *	Weak Target *	PolymiRTS database
		RASL10B	Disrupted(∆S = −0.5)	Middle	Moderate Target	PolymiRTS database
		RASL10B	Disrupted(∆S = −0.5)	Lower *	Weak Target *	PolymiRTS database
		MCU	Disrupted(∆S = −0.54)	Upper	Strong Target	PolymiRTS database
		MCU	Disrupted(∆S = −0.54)	Lower *	Weak Target *	PolymiRTS database
mmu-miR-690	AAGGCU[A/G]	CNOT6	Disrupted(∆S = −0.3)	Upper	Strong Target	PolymiRTS database
		CNOT6	Disrupted(∆S = −0.3)	Lower *	Weak Target *	PolymiRTS database
		ELOVL4	Disrupted(∆S = −0.35)	Upper	Strong Target	PolymiRTS database
		ELOVL4	Disrupted(∆S = −0.35)	Lower *	Weak Target *	PolymiRTS database
		RBBP5	Disrupted(∆S = −0.34)	Upper	Strong Target	PolymiRTS database
		RBBP5	Disrupted(∆S = −0.34)	Middle *	Moderate Target *	PolymiRTS database
mmu-miR-743a-3p	AAAGAC[A/G]	MXI1	Disrupted(∆S = −0.33)	Upper	Strong Target	PolymiRTS database
		MXI1	Disrupted(∆S = −0.33)	Lower *	Weak Target *	PolymiRTS database
		PRRG3	Disrupted(∆S = −0.51)	Upper	Strong Target	PolymiRTS database
		PRRG3	Disrupted(∆S = −0.51)	Lower *	Weak Target *	PolymiRTS database
		MBNL3	Disrupted(∆S = −0.43)	Upper	Strong Target	PolymiRTS database
		MBNL3	Disrupted(∆S = −0.43)	Lower *	Weak Target *	PolymiRTS database

Higher value of the context+ score difference (∆S) indicates an increased likelihood disruption of interactions between miRNA and target gene. * Entries for mutation in miRNAs. The values without * represents WT cases.

Other than the aforementioned case specific validation, we also compared mintRULS predictions with the information in literature and databases. Table 4 listed a few of such miRNA and their target genes which are also mentioned in literature and databases, along with the mintRULS’s classifications. In most cases, the model’s classifications corroborate with the information in literature and databases, with identifying few novel interactions.

Table 4

miRNA–mRNA interactions predicted by mintRULS and supporting data in literature and databases.

miRNA	Target Gene	mintRULS		Evidence(Literature/Databases)
miRNA	Target Gene	Prediction Class (Quartile)	Classification	Evidence(Literature/Databases)
hsa-miR-3941	TNPO1	Upper	Strong Target	miRDB
hsa-let-7d-5p	BACH1	Upper	Strong Target	TargetScan
hsa-let-7d-5p	BCL2L1	Upper	Strong Target	TargetScan
hsa-let-7d-5p	NCAM1	Upper	Strong Target	New
hsa-let-7d-5p	TIMP3	Upper	Strong Target	New
hsa-let-7d-5p	IL6R	Upper	Strong Target	TargetScan, miRDB
hsa-let-7d-5p	CD44	Upper	Strong Target	New
hsa-let-7d-5p	ITGB3	Upper	Strong Target	TargetScan, miRDB
hsa-let-7d-5p	CCNE1	Upper	Strong Target	miRDB
hsa-let-7d-5p	MAP4K3	Upper	Strong Target	TargetScan
hsa-let-7d-5p	PTEN	Upper	Strong Target	New
hsa-let-7e-5p	TRIM71	Upper	Strong Target	TargetScan, [75]
hsa-let-7e-5p	ZBTB7A	Upper	Strong Target	New
hsa-let-7e-5p	KLF9	Upper	Strong Target	TargetScan
hsa-let-7e-5p	IGFBP5	Upper	Strong Target	New
hsa-let-7e-5p	ALDH5A1	Upper	Strong Target	New
hsa-let-7e-5p	CDK4	Upper	Strong Target	New
hsa-let-7e-5p	BCL2L1	Upper	Strong Target	miRDB
hsa-let-7e-5p	MDM4	Upper	Strong Target	TargetScan
hsa-let-7e-5p	TIMP3	Upper	Strong Target	[76]
hsa-let-7e-5p	PAPPA	Middle	Moderate Target	TargetScan
hsa-let-7e-5p	MYC	Upper	Strong Target	[76]
hsa-miR-106b-5p	NLN	Upper	Strong Target	TargetScan
hsa-miR-106b-5p	SLC6A4	Upper	Strong Target	TargetScan
hsa-miR-106b-5p	GPD2	Upper	Strong Target	TargetScan
hsa-miR-106b-5p	RASA1	Upper	Strong Target	TargetScan
hsa-miR-106b-5p	EGLN1	Upper	Strong Target	TargetScan
hsa-miR-106b-5p	ATAT1	Upper	Strong Target	New
hsa-miR-106b-5p	PAX6	Upper	Strong Target	miRDB
hsa-miR-106b-5p	PBX3	Upper	Strong Target	TargetScan
hsa-miR-106b-5p	MCL1	Upper	Strong Target	TargetScan
hsa-miR-106b-5p	FLT1	Middle	Moderate Target	TargetScan miRDB
hsa-miR-106b-5p	FXN	Middle	Moderate Target	miRDB

Supporting Predictions by Expression of miRNA and mRNA in Human and Mouse

Comparison between differentially expressed miRNA and genes, IPA results (“High predicted” or “Experimentally observed pairs only), showed that that most of the IPA filtered pairs were predicted either as “Strong Target” or “Moderate Target”, with only a few as “Weak Target” by our model (Table 5). In case of ESCA, 7 downregulated miRNAs were found associated with 26 upregulated target genes, while 10 upregulated miRNAs showed opposite expression correlation with 13 target genes (Figure 6A). Similarly, in LIHC, 3 upregulated miRNAs were associated with 2 downregulated genes; and conversely, 7 downregulated miRNAs showed associations with 20 upregulated target genes. We also identified 28 miRNA–gene pairs with 18 upregulated miRNAs and 24 downregulated genes in STAD. In case of CHOL, 27 downregulated miRNAs with 97 upregulated target genes, and 17 upregulated miRNAs with 58 downregulated target genes associations were identified (Figure A2, Appendix A). Not enough interacting pairs were identified in PAAD to carry forward in further analysis. Interestingly, the interacting pairs which showed experimental evidence in IPA analysis were all predicted as “Strong Target” by our method, indicating the strong predictability of the model. The detail of the interacting pairs with the FC values, IPA results, and mintRULS classifications are provided in Table S1 (Supplementary Material).

Table 5

The summary of miRNA–target gene pairs with opposite expression correlation of associated miRNA and target genes. The only pairs which showed “Experimental evidence” or “High prediction” in IPA analysis were selected. The corresponding columns also list pairs which were predicted as “Strong Target”, “Moderate Target”, and “Weak Target”. * All the miRNA-gene pairs which showed “Experimental evidence” in IPA were predicted as “Strong Target” in mintRULS. For detailed information, Supplementary Table S1 can be referred to. IPA: Ingenuity Pathway Analysis, mintRULS predictions (Strong Target: upper quartile, >75th percentile; Moderate Target: middle quartile, >25th percentile and <75th percentile; Weak Target: lower quartile, <25th percentile), STAD: stomach adenocarcinoma, CHOL: cholangiocarcinoma, ESCA: esophageal carcinoma, LIHC: liver hepatocellular carcinoma. Upward red arrow: upregulation, downward green arrow: down regulation.

Cancer Type	Expression		IPA			mintRULS
Cancer Type	miRNA	Target Gene	Exp. Observed*	High Predicted	Total	Strong-Target	Moderate-Target	Weak-Target	Total
STAD			13	77	90	28	46	16	90
STAD			15	11	26	16	9	1	26
CHOL			21	134	155	71	64	20	155
CHOL			80	169	249	125	101	23	249
ESCA			36	20	56	29	21	6	56
ESCA			4	20	24	14	8	2	24
LIHC			3	4	7	7	0	0	7
LIHC			23	19	42	42	0	0	42

Figure 6

The mintRULS predicted interacting pairs in the upper quartile (>75th percentile) which have a negative correlation between miRNA and target gene expression compared in (A) normal vs. esophageal carcinoma human cells, and (B) normal vs. septic mice. The only pairs with classification “Experimental evidence” or “High prediction” in IPA analysis were considered. All the observations are significant with adj p value < 0.05. FC: fold change, miRNA: microRNA. For upregulation, Log2FC > 1, and for downregulation Log2FC < −1 criteria were set.

Figure A2

The mintRULS predicted interacting pairs in the upper quartile (>75th percentile) which have a negative correlation between miRNA and target gene expression in different gastrointestinal (GI) cancer types (A) cholangiocarcinoma (CHOL), (B) stomach adenocarcinoma (STAD), and (C) liver hepatocellular carcinoma (LIHC). The only pairs with classification “Experimental evidence” or “High prediction” in IPA analysis were considered. All the observations are significant with adj p value < 0.05. FC: Fold change, miRNA: microRNA. For upregulation, Log2FC > 1, and for downregulation Log2FC < −1 criteria were set.

In case of mouse, analysis by GEO2R filtered in 11 differentially expressed miRNAs between normal and septic mice, while 5715 mRNA transcripts were differentially expressed. The integration of mintRULS predictions for all 11 miRNAs and the differentially expressed mRNAs identified 15 miRNA–mRNA pairs between 4 miRNAs and 10 mRNAs which also have a negative expression correlation between them (Figure 6B). The normalized predicted mintRULS score, classification, and other related information for each pair are provided in Table S2 (Supplementary Material).

4. Discussion

The increasing importance of miRNAs in regulating many biological processes in cells and the overall human physiology is evident from several studies. One of the major challenges in this field is the identification of functional interactions between miRNAs and target genes. The advances in sequencing technologies and the growing volume of reliable data on miRNAs and their target sites on genes have greatly facilitated studies to predict the unknown and biologically relevant interactions. Bioinformatics solutions in this realm are very diverse and inconsistent in the sense that they incorporate unique characteristics in their algorithms and provide contradictory results [77]. Several machine learning models have utilized learning features for predicting miRNA–miTS interactions but could not achieve optimal performance due to the limitations in feature selection and lack of systematic integration of multiple features. To address some of these limitations, we employed a comprehensive list of learning features and trained them on a large experimental dataset to predict target sites with high accuracy. A special aspect of the current method includes the incorporation of pairwise similarities between various features of miRNA and miTS to improve the performance of the prediction model. The strategy for integrating pairwise correlation between miRNAs and miTS is useful for proving our hypothesis that similar miRNAs are more likely to target the same target site; and similar miTS tend to be targeted by the same miRNA. The real conditions for miRNA–miTS interactions depend on several factors such as target site accessibility [78] and complex stability [79]. mintRULS employed several of such features including binding free energy, the abundance of SSRs, and target site accessibility in the training process to develop an integrated objective scoring system. The working postulate of our method is different from those of the existing methods as evidenced by its superior prediction performance (with an AUC of 0.93) over miRDB, TargetScan, MBSTAR, RPmirDIP, and STarMir using human dataset. We attribute the performance advantage of mintRULS to its discrete feature selection and the integrated scoring function. As shown in Figure 5, the kernels built from individual features of miRNAs and miTS fairly performed with the highest AUC of 0.82, but the integrated kernel comparatively achieved higher AUC of 0.93, showing the successful integration of different sequence-derived features of miRNAs and mRNAs in a similarity-based fashion to train the model for predicting interaction pairs. The 100-fold randomization of the training dataset to train the model is extremely powerful to avoid prediction overfitting. Further, validation of predicted interacting pairs using different datasets, i.e., previous gene expression studies, literature-based findings, IPA knowledgebase with experimental and predicted interactions, and the expression data of miRNA and the target genes in four type of GI cancers (Table 5 and Table S1) showed the potential of the current model to make biologically relevant predictions. Moreover, the capability of mintRULS to predict interactions between gene and miRNAs in WT as well as mutated cases is extremely promising (Table 3). We also demonstrated that mintRULS program can be used to predict miRNA–miTS interactions in mouse with a reasonable AUC of 0.86. The interacting miRNA-mRNA pairs that show opposite expression correlation between normal and septic mice are in support of the predictions. Negative expression correlation between miRNA and target mRNA is not a clear indication of interactions between them, but throws the high possibility, which can be confirmed by further experiments. Overall, validation of our top predictions in human and mouse shows the robustness and superior ability of mintRULS to predict miRNA and their target site interactions. Despite obtaining high performing and reliable prediction, mintRULS have worth-noticing limitations, which mainly include lack of an experimentally validated negative dataset, and exclusion of miRNA or target abundance information. The miRNA–gene interactions are surrounded by many of the complex networks such as protein–protein interactions and gene–gene interactions, which along with the other reliable biological information could be incorporated in the future to further improve the prediction accuracy and to extend this method to predict miRNA–gene interactions in other species as well.

5. Conclusions

We developed a regularized least square (RLS)-based method, mintRULS, which uniquely utilizes multiple feature similarity-based metrics of miRNA and target sites to predict their interactions in human and mouse. mintRULS achieved the highest AUC of 0.93 and 0.86 in case of human and mouse, respectively. The multiple iteration and randomization strategy has helped reduce data overfitting while improving generalization and prediction performance. In comparison to other methods that include miRDB, TargetScan, MBSTAR, RPmirDIP, and STarMir, mintRULS demonstrated superior prediction ability. The model successfully utilized the existing knowledgebase as well as its unique design for pairwise incorporation of different features of miRNAs and mRNAs to predict interactions between them. Further, rigorous validation of the top predictions using multiple data sources showed outstanding capability and reliability of the model. Our method also identified new miRNA–mRNA interacting pairs such as hsa-let-7d-5p and TIMP3, hsa-let-7e-5p and ZBTB7A, and hsa-miR-106b-5p and ATAT1, which needs to be validated by further experimental studies. We anticipate that the current method could be easily adopted to predict miRNA–gene interactions in other species as well to improve our knowledge of miRNA-regulated gene expression at the post-transcriptional level in different species.

76 in total

1. Potent effect of target structure on microRNA function.

Authors: Dang Long; Rosalind Lee; Peter Williams; Chi Yu Chan; Victor Ambros; Ye Ding
Journal: Nat Struct Mol Biol Date: 2007-04-01 Impact factor: 15.369

2. Large-scale prediction of protein-protein interactions from structures.

Authors: Martial Hue; Michael Riffle; Jean-Philippe Vert; William S Noble
Journal: BMC Bioinformatics Date: 2010-03-18 Impact factor: 3.169

3. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding.

Authors: Aleksandra Helwak; Grzegorz Kudla; Tatiana Dudnakova; David Tollervey
Journal: Cell Date: 2013-04-25 Impact factor: 41.582

4. Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs.

Authors: David M Garcia; Daehyun Baek; Chanseok Shin; George W Bell; Andrew Grimson; David P Bartel
Journal: Nat Struct Mol Biol Date: 2011-09-11 Impact factor: 15.369

5. Validation of human microRNA target pathways enables evaluation of target prediction tools.

Authors: Fabian Kern; Lena Krammes; Karin Danz; Caroline Diener; Tim Kehl; Oliver Küchler; Tobias Fehlmann; Mustafa Kahraman; Stefanie Rheinheimer; Ernesto Aparicio-Puerta; Sylvia Wagner; Nicole Ludwig; Christina Backes; Hans-Peter Lenhof; Hagen von Briesen; Martin Hart; Andreas Keller; Eckart Meese
Journal: Nucleic Acids Res Date: 2021-01-11 Impact factor: 16.971

6. MicroRNA target prediction using thermodynamic and sequence curves.

Authors: Asish Ghoshal; Raghavendran Shankar; Saurabh Bagchi; Ananth Grama; Somali Chaterji
Journal: BMC Genomics Date: 2015-11-25 Impact factor: 3.969

7. Effects of genetic variations on microRNA: target interactions.

Authors: Chaochun Liu; William A Rennie; C Steven Carmack; Shaveta Kanoria; Jijun Cheng; Jun Lu; Ye Ding
Journal: Nucleic Acids Res Date: 2014-07-31 Impact factor: 16.971

8. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.

Authors: Antonio Colaprico; Tiago C Silva; Catharina Olsen; Luciano Garofano; Claudia Cava; Davide Garolini; Thais S Sabedot; Tathiane M Malta; Stefano M Pagnotta; Isabella Castiglioni; Michele Ceccarelli; Gianluca Bontempi; Houtan Noushmehr
Journal: Nucleic Acids Res Date: 2015-12-23 Impact factor: 16.971

9. miRDB: an online database for prediction of functional microRNA targets.

Authors: Yuhao Chen; Xiaowei Wang
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

10. Therapeutic Significance of microRNA-Mediated Regulation of PARP-1 in SARS-CoV-2 Infection.

Authors: Sabyasachi Dash; Chandravanu Dash; Jui Pandhare
Journal: Noncoding RNA Date: 2021-09-22