Literature DB >> 34556758

A novel lncRNA-protein interaction prediction method based on deep forest with cascade forest structure.

Xiongfei Tian1, Ling Shen1, Zhenwu Wang1, Liqian Zhou2, Lihong Peng3.   

Abstract

Long noncoding RNAs (lncRNAs) regulate many biological processes by interacting with corresponding RNA-binding proteins. The identification of lncRNA-protein Interactions (LPIs) is significantly important to well characterize the biological functions and mechanisms of lncRNAs. Existing computational methods have been effectively applied to LPI prediction. However, the majority of them were evaluated only on one LPI dataset, thereby resulting in prediction bias. More importantly, part of models did not discover possible LPIs for new lncRNAs (or proteins). In addition, the prediction performance remains limited. To solve with the above problems, in this study, we develop a Deep Forest-based LPI prediction method (LPIDF). First, five LPI datasets are obtained and the corresponding sequence information of lncRNAs and proteins are collected. Second, features of lncRNAs and proteins are constructed based on four-nucleotide composition and BioSeq2vec with encoder-decoder structure, respectively. Finally, a deep forest model with cascade forest structure is developed to find new LPIs. We compare LPIDF with four classical association prediction models based on three fivefold cross validations on lncRNAs, proteins, and LPIs. LPIDF obtains better average AUCs of 0.9012, 0.6937 and 0.9457, and the best average AUPRs of 0.9022, 0.6860, and 0.9382, respectively, for the three CVs, significantly outperforming other methods. The results show that the lncRNA FTX may interact with the protein P35637 and needs further validation.
© 2021. The Author(s).

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 34556758      PMCID: PMC8460650          DOI: 10.1038/s41598-021-98277-1

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Noncoding RNAs regulate the majority of biological processes associated with development, differentiation, and metabolism in organisms[1]. In contrast to small noncoding RNAs (i.e., miRNAs), which are highly conserved and regulate transcriptional and posttranscriptional gene silencing[2,3], long noncoding RNAs (lncRNAs), as one type of transcribed RNA molecules, are poorly conserved and control gene expression based on various mechanisms[4-6]. lncRNAs have close linkages with posttranscriptional gene regulation by regulating biological processes including protein synthesis, RNA maturation and transportation, and transcriptional gene silencing[7,8]. Although a few lncRNAs have been well studied, the biological functions of the majority of lncRNAs remain enigmatic[9]. Recent studies demonstrate that most of lncRNAs regulate various biological activities through specific associations with chromatin, for example, interacting with corresponding RNA-binding proteins[10-12]. Therefore, identification of potential lncRNA–protein Interactions (LPIs) is vital to understand lncRNAs’ biological functions and mechanisms. To find new LPIs, many experimental methods were designed[13,14]. However, wet experiments for finding possible LPIs are costly and time-consuming. Computational methods are thus developed as a silver-bullet solution to LPI prediction. This type of methods is classified into two main categories: network-based methods and machine learning-based methods[15,16]. Network-based LPI prediction methods, for example, random walk with restart-based model[17], linear neighborhood propagation algorithm[18], bipartite network projection-based recommendation method[19-21], HeteSim algorithm[22], firstly computed lncRNA similarity and protein similarity based on related biological data, and then integrated similarity matrix to heterogeneous lncRNA–protein network, finally designed network propagation algorithms to score for unknown lncRNA–protein pairs. Network-based LPI prediction methods successfully found part of LPIs, however, the type of methods cannot be applied to predict linkage information for an orphan lncRNA or protein. Machine learning-based LPI identification methods first extracted features of lncRNAs and proteins and then designed a novel machine learning model to compute interaction probabilities for lncRNA–protein pairs. Classical machine learning-based LPI prediction models include matrix factorization-based methods and ensemble learning-based methods. Matrix factorization-based methods represented LPI prediction as a recommender task and used diverse matrix factorization models to discover unobserved LPIs, for example, gradient boosted regression trees[23], graph regularized nonnegative matrix factorization[24], and neighborhood regularized logistic matrix factorization[25,26]. Ensemble learning-based methods utilized ensemble techniques and constructed ensemble models for new LPIs identification[27,28], for example, random forest-based ensemble framework[29], sequence feature projection-based ensemble algorithm[30], broad learning system-based stacked ensemble classifier[31], and graph attention-based deep learning model[32]. Although computational methods effectively identified potential linkages between lncRNAs and proteins, most of the above models remain the following limitations. First, the performance of these models was evaluated only on one dataset, thereby producing prediction bias. Second, the vast majority of models are not applied to find possible association proteins (or lncRNAs) for lncRNAs (or proteins) without any interaction information. Third, the performance needs to be further improved. To solve the above three problems, in this study, known LPI data are firstly integrated and five different LPI datasets are collected. Second, the features of lncRNAs and proteins are extracted based on four-nucleotide composition and the BioSeq2vec methods, respectively. Finally, a Deep Forest model (LPIDF) with cascade forest structure is designed to find LPI candidates. We compare the proposed LPIDF method with four classical LPI prediction models based on three different cross validations. The results show that LPIDF obtains better average AUCs and the best average AUPRs on the five datasets under the three cross validations. More importantly, case studies demonstrate that most of our predicted lncRNA–protein pairs with higher interaction probabilities are true LPIs and the remaining needs further experimental validation.

Results

We perform a series of experiments to investigate the prediction performance of our proposed LPIDF method.

Evaluation metrics

In this study, precision, recall, accuracy, F1-score, AUC and AUPR are used to evaluate the performance of LPIDF. Precision, recall, accuracy, and F1-score are defined as follows.where TP, FP, TN, and FN denote the predicted number of true LPIs, false LPIs, true non-LPIs, and false non-LPIs. AUC and AUPR denote the average areas under the ROC curve and the precision-recall curve, respectively. The experiments are repeated for 20 times and the average performance from the 20 rounds is computed as the final performance.

Experimental settings

In the study, we conduct three different experimental settings. Five-fold Cross Validation 1 (CV1): Cross validation on lncRNAs, that is, random rows (i.e., lncRNAs) in an LPI matrix are masked for testing. Five-fold Cross Validation 2 (CV2): Cross validation on proteins, that is, random columns (i.e., proteins) in an LPI matrix are masked for testing. Five-fold Cross validation 3 (CV3): Cross validation on lncRNA–protein pairs, that is, random lncRNA–protein pairs in an LPI matrix are masked for testing. Under CV1, in each round, 80% of lncRNAs in an LPI network are screened as training set and the remaining is represented as testing set. Under CV2, in each round, 80% of proteins in are screened as training set and the remaining is represented as testing set. Under CV3, in each round, 80% of lncRNA–protein pairs in are represented as training set and the remaining is represented as testing set. The three cross validations refer to LPI identification for (1) new (unknown) lncRNAs (lncRNAs whose interaction information is unknown), (2) new proteins, and (3) lncRNA–protein pairs, respectively.

Comparison with four state-of-the-art methods

We compare our proposed LPIDF method with four state-of-the-art association identification methods to evaluate the prediction ability and robustness of LPIDF, that is, XGBoost[33,34], Categorical Boosting (CatBoost)[35], random forest[36,37], and DRPLPI[38]. The above methods are classical machine learning models and obtained wide applications in various areas. XGBoost[33,34] is a scalable and end-to-end tree boosting-based model. CatBoost[35] is a novel gradient boosting-based technique and can effectively integrate ordered boosting and processing categorical features. Random forest[36,37] is composed of multiple decision trees and each tree is independently trained on a random subset. DRPLPI[38] exploited a multi-head self-attention model to extract high quality LPI features based on long short-term memory encoder-decoder mechanism. In the experiments, we randomly select the same number of negative LPIs as positive LPIs from unknown lncRNA–protein pairs to decrease the overfitting problem produced by data imbalance. In random forest, the number of trees is set as 70, and the minimum number used to split samples is set as 5. In CatBoost, the maximum number of trees is set as 150, the maximum depth as 15, and the learning rate as 0.5. Other parameters are set as the corresponding values provided by the corresponding manuscript. XGBoost is conducted based on the scikit-learn package[39]. Table 1 shows the precision, recall, accuracy, F1-score, AUC and AUPR values computed by LPIDF and other four methods under CV1. As shown in Table1, LPIDF achieves the highest average precision, accuracy, F1-score, and AUPR over all datasets, remarkably outperformed other four competing LPI prediction methods. Although the average recall and AUC computed by LPIDF are slightly lower than random forest and DRPLPI, LPIDF obtains the best average AUPR. The computed average AUPR obtained by LPIDF is 0.9022, which is 0.96%, 2.10%, 0.02% and 0.63% higher than XGBoost, CatBoost, random forest, and DRPLPI, respectively. Compared to AUC, AUPR is one more important measurement metric. Therefore, LPIDF can effectively find potential proteins interacting with a new lncRNA.
Table 1

The performance of five LPI prediction methods on CV1.

XGBoostCatBoostRandom forestDRPLPILPIDF
PrecisionDataset 10.8585 ± 0.01990.8424 ± 0.01200.8357 ± 0.00670.8361 ± 0.00860.8621 ± 0.0208
Dataset 20.8608 ± 0.01200.8677 ± 0.01710.8529 ± 0.01570.8518 ± 0.01670.8716 ± 0.0086
Dataset 30.7126 ± 0.02100.7158 ± 0.02250.7236 ± 0.01700.7174 ± 0.01950.7285 ± 0.0102
Dataset 40.8879 ± 0.04950.9066 ± 0.03850.9248 ± 0.05180.9286 ± 0.03350.9374 ± 0.0353
Dataset 50.8826 ± 0.01240.8662 ± 0.01250.8882 ± 0.00270.8732 ± 0.01330.9000 ± 0.0073
Ave0.84050.83970.84500.84140.8599
RecallDataset 10.9179 ± 0.01670.9245 ± 0.00410.9593 ± 0.01300.9505 ± 0.00980.9170 ± 0.0124
Dataset 20.9289 ± 0.02810.9298 ± 0.01590.9740 ± 0.01230.9533 ± 0.02480.9183 ± 0.0174
Dataset 30.6979 ± 0.01910.7398 ± 0.02050.7278 ± 0.00830.7166 ± 0.02670.7199 ± 0.0249
Dataset 40.6891 ± 0.05710.6879 ± 0.05770.6748 ± 0.04080.6888 ± 0.06230.6722 ± 0.0487
Dataset 50.8531 ± 0.01690.8502 ± 0.01100.8484 ± 0.00910.8531 ± 0.01240.8476 ± 0.0170
Ave0.81740.82640.83690.83250.8150
AccuracyDataset 10.8890 ± 0.01270.8756 ± 0.00670.8852 ± 0.01020.8821 ± 0.00860.8850 ± 0.0090
Dataset 20.8481 ± 0.01090.8938 ± 0.00830.9029 ± 0.00850.8934 ± 0.01400.8916 ± 0.0083
Dataset 30.7079 ± 0.00950.7225 ± 0.00280.7226 ± 0.00920.7169 ± 0.01040.7254 ± 0.0146
Dataset 40.8033 ± 0.03830.8089 ± 0.05370.8049 ± 0.02530.8183 ± 0.05300.8132 ± 0.0284
Dataset 50.8697 ± 0.00980.8594 ± 0.00570.8708 ± 0.00410.8646 ± 0.00680.8767 ± 0.0072
Ave0.82360.83200.83730.83510.8384
F1-scoreDataset 10.8870 ± 0.01110.8814 ± 0.00510.8932 ± 0.00800.8876 ± 0.00800.8885 ± 0.0091
Dataset 20.8932 ± 0.01180.8974 ± 0.00820.9093 ± 0.00840.8993 ± 0.01250.8943 ± 0.0088
Dataset 30.7047 ± 0.00940.7270 ± 0.00210.7256 ± 0.01100.7165 ± 0.01380.7238 ± 0.0085
Dataset 40.7730 ± 0.02900.7798 ± 0.03430.7702 ± 0.01930.7884 ± 0.03760.7807 ± 0.0186
Dataset 50.8674 ± 0.00710.8580 ± 0.00360.8608 ± 0.00520.8629 ± 0.00360.8729 ± 0.0064
Ave0.82510.82870.83180.83090.8320
AUCDataset 10.9387 ± 0.00950.9294 ± 0.00570.9377 ± 0.00650.9333 ± 0.00560.9426 ± 0.0088
Dataset 20.9403 ± 0.00750.9458 ± 0.00700.9476 ± 0.00720.9408 ± 0.00640.9506 ± 0.0063
Dataset 30.7975 ± 0.00880.8169 ± 0.00750.8045 ± 0.01530.8096 ± 0.00880.8108 ± 0.0131
Dataset 40.8677 ± 0.02710.8110 ± 0.02910.8776 ± 0.01930.8857 ± 0.02510.8480 ± 0.0340
Dataset 50.9518 ± 0.00600.8597 ± 0.00540.9397 ± 0.00900.9472 ± 0.00410.9542 ± 0.0045
Ave0.89920.87260.90140.90330.9012
AUPRDataset 10.9196 ± 0.00790.9061 ± 0.00520.9212 ± 0.00660.9106 ± 0.01000.9250 ± 0.0144
Dataset 20.9214 ± 0.00530.9280 ± 0.00870.9336 ± 0.00810.9222 ± 0.00520.9375 ± 0.0134
Dataset 30.7663 ± 0.01330.8005 ± 0.00990.7949 ± 0.01620.7839 ± 0.01540.7964 ± 0.0029
Dataset 40.8995 ± 0.02220.8759 ± 0.02600.9063 ± 0.03270.9116 ± 0.01790.8937 ± 0.0131
Dataset 50.9564 ± 0.00330.8957 ± 0.00560.9539 ± 0.00300.9510 ± 0.00310.9584 ± 0.0040
Ave0.89260.88120.90200.89590.9022

The best performance is represented in boldface in each row in each table.

The performance of five LPI prediction methods on CV1. The best performance is represented in boldface in each row in each table. Table 2 gives the comparison results under CV2. In particular, LPIDF computes the best average precision, recall, accuracy, F1-score, AUC and AUPR over all datasets. Over all datasets, LPIDF investigates the best average AUC value of 0.6937, which is 4.80%, 10.81%, 1.17% and 0.91% better than XGBoost, CatBoost, random forest, and DRPLPI, respectively. More importantly, LPIDF calculates the highest average AUPR value of 0.6860, which is 2.17% and 2.65% higher than the second-best and third-best methods, respectively. In summary, under CV2, LPIDF remarkably improves LPI prediction performance compared to the other four prediction methods and is statistically significant in identifying possible lncRNAs for a new protein.
Table 2

The performance of five LPI prediction methods on CV2.

XGBoostCatBoostRandom forestDRPLPILPIDF
PrecisionDataset 10.5630 ± 0.21870.2339 ± 0.13890.3181 ± 0.24320.3426 ± 0.23550.5673 ± 0.2705
Dataset 20.5214 ± 0.17010.4117 ± 0.22690.6310 ± 0.16720.6634 ± 0.21520.6374 ± 0.1278
Dataset 30.6444 ± 0.07590.5885 ± 0.11980.6873 ± 0.26170.7173 ± 0.05540.6248 ± 0.1310
Dataset 40.4502 ± 0.10570.5185 ± 0.16330.5597 ± 0.22840.4951 ± 0.16160.5100 ± 0.0385
Dataset 50.6798 ± 0.13380.7454 ± 0.10150.7516 ± 0.03750.7562 ± 0.10970.6976 ± 0.0768
Ave0.57180.49960.58950.59490.6074
RecallDataset 10.1205 ± 0.07350.0898 ± 0.05690.0056 ± 0.00860.0056 ± 0.00410.0996 ± 0.1279
Dataset 20.0458 ± 0.02780.1162 ± 0.11110.0136 ± 0.00870.0159 ± 0.00940.1418 ± 0.1116
Dataset 30.3651 ± 0.17380.5795 ± 0.19730.2578 ± 0.13010.3695 ± 0.15410.6318 ± 0.2191
Dataset 40.9087 ± 0.09930.7777 ± 0.13430.9899 ± 0.01230.8619 ± 0.10620.9284 ± 0.0398
Dataset 50.9654 ± 0.02440.9096 ± 0.05430.9545 ± 0.02870.9219 ± 0.05160.9762 ± 0.0189
Ave0.48110.49460.44430.43500.5556
AccuracyDataset 10.5499 ± 0.13850.4727 ± 0.17570.5398 ± 0.14170.5383 ± 0.15330.5631 ± 0.1793
Dataset 20.5386 ± 0.14970.5237 ± 0.10410.5125 ± 0.08450.5422 ± 0.15180.5596 ± 0.1162
Dataset 30.5822 ± 0.07470.5972 ± 0.10370.5901 ± 0.10710.6159 ± 0.08090.6187 ± 0.0812
Dataset 40.4516 ± 0.13350.5286 ± 0.14040.5571 ± 0.21910.4972 ± 0.15640.5147 ± 0.0313
Dataset 50.7353 ± 0.10200.7909 ± 0.05460.8197 ± 0.03110.8029 ± 0.06080.7736 ± 0.0546
Ave0.57150.58260.60380.59930.6059
F1-scoreDataset 10.1803 ± 0.10020.1181 ± 0.09050.0109 ± 0.01660.0107 ± 0.00760.1461 ± 0.1693
Dataset 20.0819 ± 0.04680.1680 ± 0.14600.0261 ± 0.01600.0308 ± 0.01790.2146 ± 0.1560
Dataset 30.4425 ± 0.13060.5465 ± 0.13860.3349 ± 0.15780.4708 ± 0.13670.5954 ± 0.1125
Dataset 40.5954 ± 0.09620.5970 ± 0.07070.6901 ± 0.15790.6085 ± 0.08900.6565 ± 0.0276
Dataset 50.7889 ± 0.09410.8146 ± 0.06560.8407 ± 0.03210.8253 ± 0.06910.8110 ± 0.0518
Ave0.41780.44880.38050.38920.4847
AUCDataset 10.6116 ± 0.13840.4431 ± 0.06070.6034 ± 0.16480.5407 ± 0.14310.6549 ± 0.1973
Dataset 20.5819 ± 0.07880.5090 ± 0.04270.6079 ± 0.04900.5938 ± 0.10760.5956 ± 0.1482
Dataset 30.6239 ± 0.07810.6236 ± 0.08460.6402 ± 0.06830.6698 ± 0.08110.7224 ± 0.1072
Dataset 40.5515 ± 0.13630.5634 ± 0.10260.6414 ± 0.15230.7190 ± 0.06650.5794 ± 0.1465
Dataset 50.8554 ± 0.09360.7889 ± 0.04110.9169 ± 0.04010.8998 ± 0.05630.9161 ± 0.0397
Ave0.64570.58560.68200.68460.6937
AUPRDataset 10.5460 ± 0.15100.3720 ± 0.12180.5629 ± 0.12780.4744 ± 0.17260.5937 ± 0.1696
Dataset 20.5099 ± 0.13660.4746 ± 0.16140.5409 ± 0.09520.5240 ± 0.15190.5611 ± 0.1563
Dataset 30.6241 ± 0.07090.6925 ± 0.09450.6061 ± 0.22650.6801 ± 0.07220.7041 ± 0.1431
Dataset 40.5640 ± 0.13830.6999 ± 0.05860.6900 ± 0.21690.7518 ± 0.06110.6750 ± 0.0770
Dataset 50.8267 ± 0.17610.8522 ± 0.05590.8978 ± 0.05640.8912 ± 0.06720.8962 ± 0.0614
Ave0.61410.61820.65950.66430.6860

The best performance is represented in boldface in each row in each table.

The performance of five LPI prediction methods on CV2. The best performance is represented in boldface in each row in each table. The prediction results computed under CV3 are shown in Table 3. In particular, LPIDF outperforms other LPI prediction methods over all datasets in terms of all six measurements. For example, LPIDF achieves the best average AUC value of 0.9457, which is 1.72%, 6.39%, 0.87%, and 0.97% better than XGBoost, CatBoost, random forest, and DRPLPI, respectively. In addition, for the AUPR metric, LPIDF obtains the best average AUPR of 0.9382, which is 0.88% and 1.20% superior to the second-best and third-best methods, respectively. It can be seen that the LPIDF can effectively predict potential LPIs.
Table 3

The performance of five LPI prediction methods on CV3.

XGBoostCatBoostRandom forestDRPLPILPIDF
PrecisionDataset 10.8508 ± 0.01150.8457 ± 0.01420.8466 ± 0.00560.8401 ± 0.01310.8589 ± 0.0115
Dataset 20.8682 ± 0.01020.8604 ± 0.01210.8549 ± 0.00650.8574 ± 0.01120.8645 ± 0.0119
Dataset 30.7455 ± 0.02130.7401 ± 0.01560.7438 ± 0.01710.7503 ± 0.01980.7549 ± 0.0120
Dataset 40.9117 ± 0.00510.9340 ± 0.01340.9261 ± 0.01710.9381 ± 0.01420.9390 ± 0.0227
Dataset 50.8899 ± 0.00570.9250 ± 0.00470.9223 ± 0.00170.9224 ± 0.00400.9297 ± 0.0039
Ave0.85320.86100.85870.86170.8694
RecallDataset 10.9293 ± 0.00790.9676 ± 0.00650.9707 ± 0.00160.9630 ± 0.00920.9634 ± 0.0104
Dataset 20.9486 ± 0.00830.9666 ± 0.00880.9745 ± 0.00410.9722 ± 0.00480.9752 ± 0.0079
Dataset 30.7863 ± 0.01570.8031 ± 0.02400.8061 ± 0.01340.7976 ± 0.01280.8257 ± 0.0149
Dataset 40.8394 ± 0.03050.8734 ± 0.04790.8803 ± 0.02350.8711 ± 0.04790.8908 ± 0.0225
Dataset 50.9048 ± 0.00510.9304 ± 0.00640.9282 ± 0.00340.9345 ± 0.00470.9307 ± 0.0047
Ave0.88170.90820.91200.90770.9172
AccuracyDataset 10.8832 ± 0.00630.8957 ± 0.00830.8974 ± 0.00510.8899 ± 0.00800.9025 ± 0.0065
Dataset 20.9022 ± 0.00720.9049 ± 0.00900.9046 ± 0.00390.9051 ± 0.00640.9111 ± 0.0075
Dataset 30.7591 ± 0.01510.7608 ± 0.01290.7640 ± 0.01140.7660 ± 0.01360.7786 ± 0.0112
Dataset 40.8792 ± 0.01430.9056 ± 0.02340.9056 ± 0.01760.9066 ± 0.02140.9161 ± 0.0066
Dataset 50.8964 ± 0.00270.9279 ± 0.00490.9250 ± 0.00230.9283 ± 0.00370.9302 ± 0.0027
Ave0.86400.87900.87930.87920.8877
F1-scoreDataset 10.8883 ± 0.00700.9025 ± 0.00910.9044 ± 0.00380.8973 ± 0.00850.9080 ± 0.0062
Dataset 20.9066 ± 0.00620.9103 ± 0.00940.9108 ± 0.00420.9111 ± 0.00540.9164 ± 0.0074
Dataset 30.7653 ± 0.01730.7702 ± 0.01700.7735 ± 0.01040.7731 ± 0.02120.7886 ± 0.0096
Dataset 40.8738 ± 0.01730.9019 ± 0.02600.9025 ± 0.01970.9025 ± 0.02480.9138 ± 0.0071
Dataset 50.8973 ± 0.00470.9276 ± 0.00490.9252 ± 0.00180.9284 ± 0.00330.9302 ± 0.0030
Ave0.86630.88250.88330.88250.8914
AUCDataset 10.9376 ± 0.00540.8955 ± 0.00460.9484 ± 0.00310.9413 ± 0.00380.9521 ± 0.0053
Dataset 20.9507 ± 0.00400.9049 ± 0.00830.9537 ± 0.00510.9510 ± 0.00640.9545 ± 0.0064
Dataset 30.8452 ± 0.01330.7755 ± 0.00990.8531 ± 0.00960.8517 ± 0.01060.8739 ± 0.0111
Dataset 40.9407 ± 0.00980.9054 ± 0.02280.9483 ± 0.01640.9526 ± 0.01690.9634 ± 0.0091
Dataset 50.9681 ± 0.00080.9279 ± 0.00500.9815 ± 0.00060.9834 ± 0.00050.9848 ± 0.0008
Ave0.92850.88180.93700.93600.9457
AUPRDataset 10.9155 ± 0.00740.9259 ± 0.00700.9279 ± 0.01010.9212 ± 0.00850.9381 ± 0.0095
Dataset 20.9314 ± 0.00860.9218 ± 0.00750.9403 ± 0.00880.9341 ± 0.01130.9384 ± 0.0101
Dataset 30.8213 ± 0.02040.8235 ± 0.01530.8350 ± 0.01100.8271 ± 0.01860.8562 ± 0.0135
Dataset 40.9532 ± 0.00830.9354 ± 0.01270.9615 ± 0.01290.9648 ± 0.01220.9726 ± 0.0043
Dataset 50.9709 ± 0.00120.9450 ± 0.00370.9824 ± 0.00050.9839 ± 0.00050.9857 ± 0.0008
Ave0.91850.91030.92940.92620.9382

The best performance is represented in boldface in each row in each table.

The performance of five LPI prediction methods on CV3. The best performance is represented in boldface in each row in each table.

Case study

After confirming the performance of our proposed LPIDF method, we further identify possible LPIs, especially predict interaction information for new lncRNAs and proteins.

Finding possible proteins interacting with new lncRNAs

In this section, we intend to find potential proteins interacting with new lncRNAs. Small Nucleolar RNA Host Gene 3 (SNHG3) and Growth Arrest-Special transcript 5 (GAS5) are masked all association information and taken as new lncRNAs. LPIDF is then applied to identify possible proteins interacting with the two lncRNAs. SNHG3 is an RNA Gene affiliated with the lncRNA class. It may have dense correlation with various cancers, for example, hepatocellular carcinoma[40], non-small-cell lung cancer[41], clear cell renal cell carcinoma[42], gastric cancer[43], hypoxic-ischemic brain damage[44], papillary thyroid carcinoma[45], ovarian cancer[46,47], bladder cancer[48], and acute myeloid leukemia[49]. Table 4 shows the predicted top 5 proteins related to SNHG3 with the highest interaction probabilities on three human datasets.
Table 4

The predicted top 5 proteins interacting with SNHG3.

DatasetProteinsConfirmedLPIDFXGBoostRandom forestCatBoostDRPLPI
Dataset1Q15717Yes11212
P35637Yes25473
O00425Yes32651
Q9UKV8Yes46186
Q9NZI8Yes53835
Dataset2Q15717Yes11211
Q9NZI8Yes23748
Q9Y6M1Yes32532
P35637Yes44154
Q96PU8Yes518161517
Dataset3Q9NUL5No11111
Q9Y6M1Yes232204
Q9NZI8Yes34572
O00425No42333
Q13148No57854
The predicted top 5 proteins interacting with SNHG3. The results from Table 4 show that SNHG3-protein interaction pairs predicted by LPIDF are rank advanced in all other four methods. We predict that O00425 may interact with SNHG3 (ranked as 4) in dataset 3, which has been validated in dataset 1. In addition, we observe that Q9NUL5 and Q13148 may interact with SNHG3. Among all possible 27 proteins, the interaction between Q9NUL5 and SNHG3 is ranked as 1 by all five LPI prediction methods. The association between Q13148 and SNHG3 is ranked as 5, 7, 8, 5, and 4 by LPIDF, XGBoost, random forest, CatBoost, and DRPLIP, respectively. The facts demonstrate the powerful prediction performance of LPIDF. GAS5 can prevent glucocorticoid receptors from being activated and thus control transcriptional activities from its target genes. It is inferred as a potential tumor suppressor and has close correlations with coronary artery disease[50], cirrhotic livers[51], coronary artery disease[52,53], rheumatoid arthritis[54], Parkinson’s disease[55], and primary glioblastoma[56]. Table 5 lists the predicted top 5 proteins interacting with GAS3 with the highest association scores on three human datasets. In dataset 3, although the interactions between GAS5 and Q9NZI8 and Q9Y6M1 are unknown, we find that the two LPIs are ranked as 5 and 4 by LPIDF, respectively. More importantly, in datasets 1 and 2, it can be seen that Q9NZI8 and Q9Y6M1 show higher interaction probabilities with GAS5 and the two LPIs have been reported. In addition, O00425 is inferred to interact with GAS5 with the ranking of 2 in dataset 3 and has been validated in dataset 1. These facts again suggest that LPIDF can effectively find possible proteins associated with a new lncRNA.
Table 5

The predicted top 5 proteins interacting with GAS5.

DatasetProteinsConfirmedLPIDFXGBoostRandom forestCatBoostDRPLPI
Dataset1O00425Yes12161
Q15717No21212
P35637No34353
Q9NZI8Yes43425
Q9Y6M1Yes55534
Dataset2Q15717No11381
Q9NZI8Yes23625
Q9Y6M1Yes32453
P35637No44212
P31483Yes55134
Dataset3Q9NUL5Yes11111
O00425No222310
Q07955Yes396122
Q9Y6M1No43356
Q9NZI8No54425
The predicted top 5 proteins interacting with GAS5.

Finding potential lncRNAs interacting with new proteins

We continue to uncover lncRNAs interacting with a new protein on three human datasets. Q13148 and Q9HCK5 are masked all associated lncRNAs and taken as new proteins. LPIDF is then used to find possible associated lncRNAs for the two proteins. Q13148 is an RNA-binding protein involved in RNA biogenesis and processing and various neurodegenerative diseases[57-60]. In addition, it also participates in the formation and regeneration of normal skeletal muscles and plays an important role in keeping the circadian clock periodicity[59,60]. Its second RNA recognition motif has been reported as a major promoter towards aggregation and resultant toxicity[61]. Frontotemporal lobar degeneration associated with Q13148 aggregation is depicted as progressive neuronal atrophy in cerebral cortex[62]. Table 6 illustrates the predicted top 5 lncRNAs associated with Q13148 on three human datasets. From Table 6, we can investigate that all predicted top 5 lncRNAs interacting with Q13148 are known in the three datasets.
Table 6

The predicted top 5 lncRNAs interacting with Q13148.

DatasetlncRNAsConfirmedLPIDFXGBoostCatBoostRandom forestDRPLPI
Dataset1SNHG1Yes153601713
NEAT1Yes21135230199
7SLYes3784234472264
RP11-439E19.10Yes43761441555
SFPQYes52825187
Dataset2SNHG1Yes1142275
NEAT1Yes25128139
7SLYes361144150
RP11-439E19.10Yes4274103106559
SFPQYes54813866
Dataset3RPI001_124073Yes141659
LINC00638Yes2170722
LINC00338Yes32863747
RP11-38P22.2Yes4294619947
GAS5Yes5110813110
The predicted top 5 lncRNAs interacting with Q13148. Table 7 lists the identified top 5 lncRNAs associated with Q9HCK5 on three human datasets. Q9HCK5 is required for RNA-mediated genes’ silencing, RNA-directed transcription and human hepatitis delta virus replication[63]. Table 7 demonstrates that all predicted top 5 LPIs for Q9NCK5 are given in the three datasets. In summary, LPIDF may be appropriate for LPI identification for a new protein.
Table 7

The predicted top 5 lncRNAs interacting with Q9HCK5.

DatasetlncRNAsConfirmedLPIDFXGBoostCatBoostRandom forestDRPLPI
Dataset1RPI001_233996Yes1468733171
RPI001_122583Yes2164113935
RPI001_1006381Yes3442338177
RPI001_1000866Yes45805115189
RP5-1057J7.6Yes52632956116
Dataset2SFPQYes1451223
RPI001_1015379Yes25591752
RPI001_247329Yes3126512224
RPI001_1000866Yes4185364
NEAT1Yes5154949
Dataset3RP11-357C3.3Yes1764136
RP1-140A9.1Yes263901511
RPI001_124073Yes3525932
RPI001_1001088Yes41141010
AC010890.1Yes51742628
The predicted top 5 lncRNAs interacting with Q9HCK5.

Finding new LPIs based on known LPIs

The number of lncRNA–protein pairs with unknown interaction information is 51,686, 71,075, 22,572, 2,867 and 49,435 on the five datasets, respectively. We rank these unknown lncRNA–protein pairs based on their interaction probabilities computed by LPIDF and list the predicted top 100 lncRNA–protein pairs. The results are shown in Fig. 1. In Fig. 1, black dotted lines and sky blue solid lines represent unknown and known LPIs predicted by LPIDF, respectively. Tan hexagons and light sky blue circulars denote lncRNAs whose interactions with proteins are unknown and known, respectively. Yellow diamonds denote proteins.
Figure 1

The predicted top 100 LPIs on the five datasets (a) Dataset 1, (b) Dataset 2, (c) Dataset 3, (d) Dataset 4, (e) Dataset 5.

The predicted top 100 LPIs on the five datasets (a) Dataset 1, (b) Dataset 2, (c) Dataset 3, (d) Dataset 4, (e) Dataset 5. We observe that some identified lncRNA–protein pairs have higher interaction probabilities. For example, the interactions between NONHSAT137627 and P35637, n344749 and Q15717, NONHSAT119864 and Q15717, AthIncRNA18 and Q9LES2, and ZmaLncRNA38 and C4J594 are ranked as 33, 97, 85, 161, and 215, respectively. The lncRNA–protein pairs with advanced ranks need further experimental validation. The lncRNA FTX (NONHSAT137627) can positively regulate the expression and function of ALG3 in AML cells, especially cell growth and apoptosis related to ADR-resistance. FTX could thus probably be applied to reduce therapeutic resistance in AML[64]. P35637 plays a key role in RNA transport, mRNA stability and synaptic homeostasis in neuronal cells[63]. The protein has been validated to be target of the treatment of cancers, amyotrophic lateral sclerosis, and Alzheimer’s disease[65]. In dataset 2, it is observed that FTX interacts with Q15717, Q9NZI8, and P26599. Q15717 helps in increasing the leptin mRNA’s stability. Q9NZI8 can regulate neurite outgrowth and neuronal cell migration, promote tumor-derived cells’ adhesion and movement, and prevent infectious HIV-1 particles’ formation[64]. P26599 can bind to the viral internal ribosome entry site and stimulate the translation mediated by the picornaviruses’ infection site. Q35637 has similar functions with Q15717, Q9NZI8, and P26599. Based on the “guilt-by-association” theory, we infer that FTX may associate with P35637.

Fractions of true LPIs among the predicted top N LPIs

In addition, we consider the fractions of true LPIs among the inferred top LPIs. The results are shown in Table 8. is selected as 10, 30, and 50, respectively. From Table 8, we can find that all the predicted top 10 LPIs by LPIDF have been labeled as 1 on five datasets. Similar to top 10, we can obtain the same fraction results on the predicted top 30 LPIs. For the predicted top 50 LPIs by LPIDF, although only 94% of LPIs have been labeled as 1 in dataset 1, all the top 50 LPIs are known on other four datasets. In summary, LPIDF obtains the best prediction performance based on fractions of true LPIs among the top 10, 30, and 50 LPIs.
Table 8

The fractions of true LPIs among the top interactions under CV3.

XGBoost (%)CatBoost (%)Random forest (%)DRPLPI (%)LPIDF (%)
Dataset 1Top 106010090100100
Top 30709090100100
Top 507488889494
Dataset 2Top 1070709090100
Top 3077878793100
Top 5080868692100
Dataset 3Top 10809090100100
Top 3093939650100
Top 5088969896100
Dataset 4Top 10100100100100100
Top 3097100100100100
Top 509610094100100
Dataset 5Top 1010010090100100
Top 3010010086100100
Top 5010010088100100
The fractions of true LPIs among the top interactions under CV3.

Discussion and conclusion

lncRNAs are widely distributed in various organisms and regulate gene expression on transcriptome and post-transcriptome. However, lncRNAs are difficult to crystallize and only several lncRNAs have been investigated. Since lncRNAs play an important regulatory role in protein molecules, the discovery of proteins binding to specific lncRNAs becomes an issue to identify lncRNAs’ functions and mechanisms. In this study, first, we integrate five LPI datasets where three datasets are from human and the remaining is from plants. Second, features of lncRNAs and proteins are selected by four-nucleotide composition and BioSeq2vec based on their sequences, respectively. Finally, a deep forest model with cascade forest structure, LPIDF, is developed to predict LPI candidates. To evaluate the performance of LPIDF, we compare our proposed LPIDF method with other four LPI prediction models on five datasets under three cross validations. The results suggest that LPIDF remarkably outperforms other four competing LPI identification methods. We further conduct a series of case studies to find possible associated proteins (or lncRNAs) for new lncRNAs (or proteins) and potential LPIs. The results from case analyses again demonstrate that LPIDF is a powerful LPI identification method. LPIDF can compute the optimal precision, recall, accuracy, F1-score, AUC and AUPR. We think that it may be attribute to the following advantages. First, LPIDF selects high quality features of lncRNAs and proteins based on four-nucleotide composition and BioSeq2vec, respectively. Second, deep forest with cascade forest structure could automatically determine the depths of cascade forest, thereby reducing prediction bias produced by parameter tuning. Finally, each layer in the cascade forest receives LPI features from the last layer and sends its result to the next layer. Since all layers are automatically generated, LPIDF need not set too many hyperparameters. The predominant experimental consequences indicate that LPIDF has a powerful ability in excavating new LPIs. In addition, the time required for the proposed LPIDF model and other methods is investigated. The details are shown in Table 9. It can be seen that the time required for LPIDF is much lower than ones of CatBoost and DRPLPI.
Table 9

The time required for all LPI prediction methods.

XGBoostCatBoostRandom forestDRPLPILPIDF
Dataset 1168 s74,120 s20 s74,292 s28,174 s
Dataset 2688 s70,042 s22 s68,469 s25,348 s
Dataset 3527 s63,627 s37 s63,711 s26,342 s
Dataset 4400 s25,971 s6 s25,994 s7216 s
Dataset 51852 s61,998 s197 s62,258 s82,022 s

Where s denotes second.

The time required for all LPI prediction methods. Where s denotes second. However, our work has a few limitations. We only consider LPI prediction on human and plant LPI-related datasets. Indeed, other species closer human evolutionarily than plants should be investigated. In addition, the predicted LPIs with the highest interaction probabilities should be experimentally validated. In the future, first, we will integrate more biological information, for example, disease symptom information, drug chemical structure, miRNA-lncRNA interactions. Second, we will consider the prediction performance of the proposed model on other species closer human evolutionarily than plants. Third, CD-Hit[66] is one broadly used software for reducing sequence redundancy. To improve the performance of sequence analyses algorithms, we will further remove proteins with high sequence similarity in larger datasets based on CD-Hit. Finally, we will further conduct experimental validation for the predicted RNA-binding proteins.

Materials and methods

Data preparation

In this study, we integrate five different LPI datasets. Dataset 1 was provided by Li et al.[17]. Noncoding RNA–protein interaction data were firstly downloaded from the NPInter 2.0 database[67]. lncRNA and protein sequences were extracted from the NONCODE database 4.0[68] and the UniProt[65] database, respectively. 3,487 LPIs from 938 lncRNAs and 59 proteins were obtained. We then remove lncRNAs and proteins whose sequences are unknown in the UniProt[65], NPInter[67] and NONCODE[68] databases. Finally, we obtain 3,479 LPIs from 935 lncRNAs and 59 proteins. Dataset 2 was provided by Zheng et al.[22]. Noncoding RNA–protein interaction, lncRNA and protein sequences were downloaded from NPInter 2.0[67], NONCODE 4.0[68], and UniProt[65], respectively. They obtained 4,467 LPIs between 1,050 lncRNAs and 84 proteins. Similar to dataset 1, we further remove the lncRNAs and proteins whose sequences are unknown in the NONCODE[68], UniProt[65], and NPInter[67] databases and obtain 3,265 LPIs from 885 lncRNAs and 84 proteins. Dataset 3 was provided by Zhang et al.[18]. Experimentally validated LPIs between 1,114 lncRNAs and 96 proteins were extracted based on data resources compiled by Ge et al.[69]. The sequence and expression data of lncRNAs in 24 human tissues or cell types were downloaded from the NONCODE 4.0 database[68]. The sequence data of proteins were obtained from the SUPERFAMILY database[70]. lncRNAs without sequence or expression information and proteins without sequence information were removed. lncRNA (or protein) with only one associated protein (or lncRNA) were still removed. Finally, 4,158 LPIs from 990 lncRNAs and 27 proteins were selected. Dataset 4 contains sequence information of lncRNAs and proteins about Arabidopsis thaliana from the plant lncRNA database (PlncRNADB[71]). LPI data can be obtained from http://bis.zju.edu.cn/plncRNADB. The dataset contains 948 LPIs from 109 lncRNAs and 35 proteins. Dataset 5 contains sequence data of lncRNAs and proteins about Zea mays from the PlncRNADB database[71]. LPI data can be downloaded from http://bis.zju.edu.cn/plncRNADB. The dataset contains 22,133 LPIs from 1,704 lncRNAs and 42 proteins. Table 10 describes the details about the five datasets.
Table 10

The details of LPI data.

DatasetlncRNAsProteinsLPIs
Dataset 1935593479
Dataset 2885843265
Dataset 3990274158
Dataset 410935948
Dataset 517044222,133
The details of LPI data. We describe an LPI network as a matrix :

Feature selection

Feature selection of lncRNAs

Tri-nucleotide composition is effectively applied to characterize lncRNA sequences[72]. In this section, we use four-nucleotide composition to select lncRNA features. Given an lncRNA sequence with the length of where and , we use a four-tuple letter arrangement, for example, (A, A, A, A), (A, A, A, C), (A, A, A, G), …, (T, T, T, T), to compute the numeric matrix from .

Feature selection of proteins

The encoder-decoder structure can better describe sequence-to-sequence features[73,74]. Inspired by the sequence representation techniques provided by Sutskever et al.[74] and Yi et al.[75], we use Biological Sequence-to-vector (BioSeq2vec) representation learning method[75] with encoder-decoder structure to characterize amino acids of a protein. For a protein with sequence length of , first, a sliding window of size is used to divide the sequence into segments. Second, the segments are converted into hash values. Finally, the hash values are used as input of an autoencoder. As shown in Fig. 2, an input vector composed of the hash values is first mapped into a low-dimensional feature vector by an encoder. Second, the low-dimensional feature vector is reproduced as an input vector by a decoder. Finally, the reproduced low-dimensional feature vector in the final intermediate layer is used as features of a protein.
Figure 2

Protein feature selection based on the encoder–decoder.

Protein feature selection based on the encoder–decoder.

Deep forest with cascade forest structure

In this study, we utilize a Deep Forest with cascade forest structure (LPIDF) to find new LPIs. Deep forest with cascade forest structure, integrating deep forest and ensemble learning, exploits an ensemble-ensemble architecture. In the model, deep forest[76] conducts layer-by-layer propagation and feature transformation. Ensemble learning-based model, composed of multiple single classifiers, more effectively improves LPI prediction compared with one single classifier[77]. For ensemble learning, larger diversities between single classifiers mean better improvement. To ensure the diversity, in this study, four different types of classifiers, logistic regression, XGBoost Classifier, random forest, and extra trees, are utilized to learn the model. In the model, class vectors used to denote the class distribution are obtained through the four basic classifiers. For a given LPI feature, the class distribution first calculated the proportions that the feature classifies an lncRNA–protein pair as two classes (positive class and negative class), respectively. Suppose that there are three trees in a random forest. As shown in Fig. 3, for a LPI feature , the probabilities that classify an lncRNA–protein pair as two classes (positive class and negative class) in the three trees are , and , respectively. The probabilities are then summed up and averaged and thus the final class distribution can be computed based on the feature . That is, the probability that classify the lncRNA–protein pair as positive example is and the probability that classify the lncRNA–protein pair as negative sample is .
Figure 3

Computing the probability that a feature is classified as positive (or negative) sample.

Computing the probability that a feature is classified as positive (or negative) sample. Similarly, at each layer, for each LPI feature, logistic regression, XGBoost Classifier, random forest, and extra trees are trained. An 8-dimensional class vector is generated based on two classes and four types of classifiers. Figure 4 shows a deep forest with cascade structure. As illustrates in Fig. 4, an 800-dimensional feature vector is used as the initial input to the cascade forest. After each layer, the generated eight-dimensional class vector with the most important information combining the old 800-dimensional features are used as the input at the next layer. The details are shown as follows. First, four different types of classifiers, logistic regression, XGBoost Classifier, random forest, and extra trees, are utilized to train the model. Second, an eight-dimensional class vector is picked and concatenated with the original 800-dimensional feature vector to generate an 808-dimensional vector. Third, an 808-dimensional class vector is used as the input at the second layer. Similarly, the second layer produces an eight-dimensional class vector, which will be concatenated with the 800-dimensional feature vector. And another 808-dimensional class vector is applied as the input at the third layer. Finally, when training on a new layer, a training set is used to tune the parameters and a validation set is utilized to evaluate the performance. The feature importance will be evaluated through the prediction difference between the original LPI features and the learned ones in the four different types of classifiers. The training process will be terminated when the performance is not significantly improved. After training, LPI features with zero importance values are removed and the features with valid importance values are kept. For a test example (an LPI feature), it will be represented by each level until the last level.
Figure 4

Deep forest with cascade forest structure.

Deep forest with cascade forest structure. Figure 5 demonstrates the pipeline of LPIDF. First, five LPI datasets are obtained based on the existing resources. Second, for an lncRNA–protein pair, lncRNA and protein sequences are characterized and concatenated as a vector based on four-nucleotide composition and BioSeq2vec with encoder–decoder structure. Third, the concatenated vector is used as the input to the cascade forest. Finally, the most important features are selected based on layer-to-layer propagation and label of each lncRNA–protein pair is computed.
Figure 5

Flowchart of the LPI prediction framework based on deep forest with cascade forest structure.

Flowchart of the LPI prediction framework based on deep forest with cascade forest structure.
  62 in total

1.  iRecSpot-EF: Effective sequence based features for recombination hotspot prediction.

Authors:  Md Rafsan Jani; Md Toha Khan Mozlish; Sajid Ahmed; Niger Sultana Tahniat; Dewan Md Farid; Swakkhar Shatabda
Journal:  Comput Biol Med       Date:  2018-10-11       Impact factor: 4.589

2.  HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy.

Authors:  Huan Hu; Li Zhang; Haixin Ai; Hui Zhang; Yetian Fan; Qi Zhao; Hongsheng Liu
Journal:  RNA Biol       Date:  2018-06-06       Impact factor: 4.652

3.  The lncRNA SNHG3 regulates energy metabolism of ovarian cancer by an analysis of mitochondrial proteomes.

Authors:  Na Li; Xiaohan Zhan; Xianquan Zhan
Journal:  Gynecol Oncol       Date:  2018-06-18       Impact factor: 5.482

4.  lncRNA SNHG3 facilitates acute myeloid leukemia cell growth via the regulation of miR-758-3p/SRGN axis.

Authors:  Linqiang Peng; Yanzhi Zhang; Hongli Xin
Journal:  J Cell Biochem       Date:  2019-08-26       Impact factor: 4.429

5.  Long noncoding RNA GAS5 promotes microglial inflammatory response in Parkinson's disease by regulating NLRP3 pathway through sponging miR-223-3p.

Authors:  Wei Xu; Ling Zhang; Yu Geng; Ye Liu; Ning Zhang
Journal:  Int Immunopharmacol       Date:  2020-05-26       Impact factor: 4.932

6.  A Bipartite Network-based Method for Prediction of Long Non-coding RNA-protein Interactions.

Authors:  Mengqu Ge; Ao Li; Minghui Wang
Journal:  Genomics Proteomics Bioinformatics       Date:  2016-02-22       Impact factor: 7.691

7.  Upregulation of the Long Noncoding RNA SNHG3 Promotes Lung Adenocarcinoma Proliferation.

Authors:  Liang Liu; Jianjiao Ni; Xinhong He
Journal:  Dis Markers       Date:  2018-07-22       Impact factor: 3.434

8.  The SUPERFAMILY 2.0 database: a significant proteome update and a new webserver.

Authors:  Arun Prasad Pandurangan; Jonathan Stahlhacke; Matt E Oates; Ben Smithers; Julian Gough
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

9.  PredPRBA: Prediction of Protein-RNA Binding Affinity Using Gradient Boosted Regression Trees.

Authors:  Lei Deng; Wenyi Yang; Hui Liu
Journal:  Front Genet       Date:  2019-08-02       Impact factor: 4.599

10.  Characterizing TDP-43 interaction with its RNA targets.

Authors:  Amit Bhardwaj; Michael P Myers; Emanuele Buratti; Francisco E Baralle
Journal:  Nucleic Acids Res       Date:  2013-03-21       Impact factor: 16.971

View more
  4 in total

1.  Editorial: Machine Learning-Based Methods for RNA Data Analysis.

Authors:  Lihong Peng; Jialiang Yang; Minxian Wang; Liqian Zhou
Journal:  Front Genet       Date:  2022-05-25       Impact factor: 4.772

2.  Predicting circRNA-drug sensitivity associations via graph attention auto-encoder.

Authors:  Lei Deng; Zixuan Liu; Yurong Qian; Jingpu Zhang
Journal:  BMC Bioinformatics       Date:  2022-05-04       Impact factor: 3.307

3.  Inferring Latent Disease-lncRNA Associations by Label-Propagation Algorithm and Random Projection on a Heterogeneous Network.

Authors:  Min Chen; Yingwei Deng; Ang Li; Yan Tan
Journal:  Front Genet       Date:  2022-02-04       Impact factor: 4.599

4.  Finding Lung-Cancer-Related lncRNAs Based on Laplacian Regularized Least Squares With Unbalanced Bi-Random Walk.

Authors:  Zhifeng Guo; Yan Hui; Fanlong Kong; Xiaoxi Lin
Journal:  Front Genet       Date:  2022-07-22       Impact factor: 4.772

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.