Literature DB >> 28029645

An ensemble approach for large-scale identification of protein- protein interactions using the alignments of multiple sequences.

Lei Wang^1,2, Zhu-Hong You³, Xing Chen⁴, Jian-Qiang Li⁵, Xin Yan⁶, Wei Zhang², Yu-An Huang⁵.

Abstract

Protein-Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use.

Entities: Chemical Disease Gene Species

Keywords: cancer; disease; multiple sequences alignments; position-specific scoring matrix

Mesh：

Substances：

Year: 2017 PMID： 28029645 PMCID： PMC5354898 DOI： 10.18632/oncotarget.14103

Source DB: PubMed Journal: Oncotarget ISSN： 1949-2553

INTRODUCTION

Protein–Protein Interactions (PPIs) play an important role in almost every cellular process [1, 2]. A variety of biochemical activities performed by PPIs are the foundation of life, such as immune response, regulation of transcription and translation, DNA replication, and endocrine function [3]. In recent decades, in order to understand the mechanisms of all kinds of biochemical activities, a variety of biological experimental methods have been designed to detect the interactions between proteins, for example, two-hybrid systems [4, 5], mass spectrometry [6, 7], immunoprecipitation [8], protein chip technology [9], etc. However, it is time-consuming, cost-intensive and small-scale to identify the interactions among proteins using biological experiments only. Therefore, there is an urgent need to use computational methods to predict protein-protein interactions efficiently and massively. So far, a number of computational methods have been proposed to predict protein-protein interactions. These methods can be roughly divided into three types: structure-based methods [10-13], sequence-based methods [14-25] and function-annotation-based methods [26-29]. Among them, there is no need to know protein structure information and a pre-knowledge using the sequence-based approaches, which has aroused more and more interests in researchers. For example, Martin et al. developed a computational model to identify the interactions among proteins by using the signature descriptor [30]. This model achieved an accuracy of 70% and 80% when testing on the H. pylori and Yeast data sets by 10-fold cross-validation. Shen et al. proposed the conjoint triad approach to predict human PPIs considering the local environments of residues [16]. In the experiment, the accuracy of this model reached 83.9%. Ahmad et al. proposed an algorithm to predict the DNA-binding sites based on the neural network, which adopted amino acid sequences evolutionary information in terms of their position specific-scoring matrices [31]. In this paper, we propose a novel sequence-based computational method for predicting potential protein-protein interactions. Specifically, we first convert the protein amino acids sequence into the Position Specific Scoring Matrix (PSSM) [32] that contains the information of evolution; Then use the Pseudo Position-Specific Score Matrix (PsePSSM) [33-35] algorithm to extract features expecting more information. Finally, the Rotation Forest (RF) [36, 37] classifier is applied to determine whether the proteins are related or not. In the experiment, the proposed method is implemented on the Yeast data set, and the accuracy of five-fold cross-validation is 98%. At the same time, we also verified on the Helicobacter. pylori, C.elegans, E.coli, H.sapiens and M.musculus data sets, and yielded the accuracy of 89.75%, 98.50%, 91.00%, 97.45% and 98.08%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other excellent methods. Comparison results show that the proposed method consistently outperforms other state-of-the-art methods.

RESULTS AND DISCUSSIONS

Evaluation measures

Four standard criteria are used to evaluate the performance of our approach, including accuracy (Accu.), sensitivity (Sen.), precision (Prec.) and Matthews correlation coefficient (MCC). MCC represents the correlation coefficient between the observed and the predicted class. It ranges from -1 (the best predictive model) to 1 (the worst predictive model). These measures are defined as follows: where TP denotes the number of positive samples to be correctly predicted; FP denotes the number of negative samples to be incorrectly predicted; TN denotes the number of negative samples to be correctly predicted; FN denotes the number of positive samples to be incorrectly predicted, respectively. In addition, the receiver operating characteristic (ROC) [38] curve is used to access the performance of classifier. In the ROC curve, the default threshold for the classifier is 0.5. The threshold will be changed with the true positive rate versus the false positive rate when a new set of prediction result is accepted; this change will be expressed through graphics.

Assessment of prediction ability

In order to achieve the best performance of the rotation forest, we use the grid search method to adjust the corresponding parameters. In this study, PCA [36] was chosen as rotation forest transformation method and the J48 decision tree [39] derived from the WEKA machine learning workbench was selected as the base classifier. Figure 1 shows the accuracy of the classifier under different parameter values. From the Figure 1 we can see that our method performs well, the average prediction accuracy is rapidly increasing with the increase of the value of L at the beginning and increase rate becomes slow when the value of L is greater than 5. However, the accuracy always presents a fluctuation state with the increase of the value of the parameter K. After a comprehensive assessment, we choose the optimal parameters of K=8 and L=5 ultimately.

Figure 1

Accuracy surface obtained of rotation forest for optimizing regularization parameters K and L

In this paper, 5-fold cross-validation technique is used as a means to evaluate our model. More specifically, the entire feature data set is randomly divided into five approximately equal subsets. Four of these subsets are used for training and the rest of the subset for testing. The cross-validation process is repeated 5 times so that each data set can be used for testing once. Table 1 lists the results of our predictions on Yeast data set, the value of average accuracy, precision, sensitivity, and MCC are 98.38%, 99.92%, 96.84%, and 96.82%, respectively. The prediction accuracy of the five models are all greater than 98.17%, the precisions are greater than 99.62%, the sensitivities are greater than 96.32%, and the MCC are greater than 96.40%. The ROC curves performed on Yeast data set is shown in Figure 2. In this figure, X-ray depicts false positive rate (FPR) while y-ray depicts true positive rate (TPR).

Table 1

5-fold cross-validation results obtained by using proposed method on Yeast data set

Testing set	Accu.(%)	Prec.(%)	Sen.(%)	MCC(%)
1	98.17	100.00	96.32	96.40
2	98.30	100.00	96.69	96.66
3	98.17	100.00	96.37	96.40
4	98.30	99.62	96.88	96.65
5	98.97	100.00	97.93	97.97
Average	98.38±0.34	99.92±0.17	96.84±0.65	96.82±0.66

Figure 2

ROC curves performed by proposed method on Yeast data set

The performance of the proposed method on the H. pylori data set

To better evaluate the performance of the proposed model in PPIs prediction, we focused on the testing of H. pylori data set. We use the same feature extraction method and the same RF parameters to verify its effect, the results achieved as shown in Table 2. On the H. pylori data set we obtain the accuracy of the 5 models are 92.45%, 88.16%, 90.05%, 89.37%, and 88.70%, respectively. We can see from Table 2 that the excellent prediction performance of our model with an average precision value of 89.75%, precision value of 90.18%, sensitivity value of 89.12%, and MCC value of 81.62%. Additionally, it can also be seen from Table 2 that the standard deviation of accuracy, precision, sensitivity and MCC is as low as 0.0167, 0.0274, 0.0183 and 0.0269. The ROC curves are shown in Figure 3.

Table 2

5-fold cross-validation results obtained by using proposed method on H. pylori data set

Testing set	Accu.(%)	Prec.(%)	Sen.(%)	MCC(%)
1	92.45	93.44	92.23	86.00
2	88.16	86.93	88.49	79.10
3	90.05	92.06	87.63	82.06
4	89.37	90.56	88.10	80.99
5	88.70	87.93	89.16	79.95
Average	89.75±1.67	90.18±2.74	89.12±1.83	81.62±2.69

Figure 3

ROC curves performed by the proposed method on H. pylori data set

Comparison with previous method

In recent years, many researchers have proposed various models to predict the PPIs and achieved good results. In order to further evaluate the prediction performance, we compare the proposed method with these excellent methods in the same benchmark data sets. In addition, as the state-of-the-art classification algorithm, SVM has been successfully used to predict PPIs. In this experiment, we also compare the classification performance between Rotation Forest classifier and SVM classifier on the Yeast data set. The corresponding parameters of the SVM were selected by the grid search method, and finally we set c=0.1 and g=0.2, respectively. The LIBSVM tools we adopted are downloaded at www.csie.ntu.edu.tw/~cjlin/libsvm. Table 3 and Table 4 summarize the results of these comparisons.

Table 3

Performance comparison of different models on Yeast data set

Model	Test set	Accu.(%)	Prec.(%)	Sen.(%)	MCC(%)
Guos’ work [17]	ACC	89.33±2.67	88.87±6.16	89.93±3.68	N/A
	AC	87.36±1.38	87.82±4.33	87.30±4.68	N/A
Zhous’ work [40]	SVM + LD	88.56±0.33	89.50±0.60	87.37±0.22	77.15±0.68
Yangs’ work [41]	Cod1	75.08±1.13	74.75±1.23	75.81±1.20	N/A
	Cod2	80.04±1.06	82.17±1.35	76.77±0.69	N/A
	Cod3	80.41±0.47	81.86±0.99	78.14±0.90	N/A
	Cod4	86.15±1.17	90.24±0.45	81.03±1.74	N/A
Yous’ work [42]	PCA-EELM	87.00±0.29	87.59±0.32	86.15±0.43	77.36±0.44
Our method	SVM+PSSM	95.19±0.42	94.72±0.68	95.72±0.53	90.84±0.75
	RF + PSSM	98.38±0.34	99.92±0.17	96.84±0.65	96.82±0.66

Table 4

Performance comparison of different models on H. pylori data set

Model	Accu.(%)	Prec.(%)	Sen.(%)	MCC(%)
Phylogentic bootstrap [43]	75.80	80.20	69.80	N/A
HKNN [44]	84.00	84.00	86.00	N/A
Signature products [30]	83.40	85.70	79.90	N/A
Ensemble of HKNN [45]	86.60	85.00	86.70	N/A
Boosting [46]	79.52	81.69	80.37	70.64
Ensemble ELM [42]	87.50	86.15	88.95	78.13
Our method	89.75	90.18	89.12	81.62

Table 3 shows the average prediction results of the different models on the Yeast data set, we can see that the accuracy obtained by other methods are between 75.08% and 89.33%, the average accuracy obtained by our method is 98.38%. In the comparison of classifiers, the accuracy obtained on the rotation forest classifier is higher than those obtained on the support vector machine classifier. Table 4 shows the performance of different methods on the H. pylori data sets. We can see from the Table 4 that the accuracies of the other six methods are 75.80%, 84.00%, 83.40%, 86.60%, 79.52% and 87.50%, while our method is 89.75%; the precisions of the other six methods are 80.20%, 84.00%, 85.70%, 85.00%, 81.69% and 86.15%, while our method is 90.18%; the sensitivity of the other six methods are 69.80%, 86.00%, 79.90%, 86.70%, 80.37% and 88.95%, while our method is 89.12%. The results obtained by these methods are significantly lower than ours.

Performance on independent data sets

After completing the experiment on the Yeast and H. pylori data sets, we continue to test the performance of the proposed method on the independent data sets (C.elegans, E.coli, H. sapiens and M.musculus). In the experiment, we take all the Yeast data set as training set, independent data sets as the test set to predict protein-protein interactions. Table 5 lists the accuracy of our method on four data sets. It can be seen from the table that the highest accuracy of the proposed method is 98.50% on the C.elegans data set, and even the lowest accuracy achieved on the E.coli data set reached 91.00%. It is demonstrates that our method has good accuracy in predicting the interaction of other species.

Table 5

Prediction results on four species based on our model

Species	Test pairs	Accu.(%)
C.elegans	4013	98.50
E.coli	6954	91.00
H.sapiens	1412	97.45
M.musculus	313	98.08

Validate potential protein-protein interactions from the PPIs database

After evaluating the effectiveness of the proposed model by using the 5-fold cross validation method, we here calculate the interaction probability for all potential protein-protein pairs in the datasets of Yeast. Specifically, the whole negative and positive data explored in 5-fold cross validation experiments are used for training and all the unknown protein-protein pairs are used as testing set. The predicted protein pairs with top-100 ranks in the potential PPI lists are considered as highly potential protein-protein interactions and further verified by three public databases (i.e. DIP [47], MINT [48] and IntAct [49]). These databases have been supplemented by some newly detected protein-protein interactions since the gold standard data explored in this study were collected in 2007. All the predicted possibilities for top 100 potential PPIs in Yeast can be obtained in Supplementary Table S1. As shown in Table 6, 15 new protein-protein interactions are finally confirmed. Note that the high-ranked interactions that are not reported yet may also exist in reality. Based on these results, we anticipate that the proposed model is feasible to predict new protein-protein interactions.

Table 6

The newly confirmed PPIs with high possibility in the Yeast data set

Protein ID	Protein ID	The probability of protein-protein interactions	Evidence
DIP:1113N	DIP:655N	0.9917	DIP
sw:P29295	sw:P20604	0.9912	MINT
sw:P47054	sw:P49687	0.9908	IntAct
DIP:1040N	DIP:2463N	0.9891	DIP
sw:P04050	sw:P16370	0.9869	MINT
DIP:2808N	DIP:6282N	0.9854	DIP
DIP:1408N	DIP:6416N	0.9848	DIP
DIP:1558N	DIP:2370N	0.9846	DIP
DIP:5037N	DIP:799N	0.9840	DIP
sw:Q12176	sw:Q03532	0.9839	MINT, IntAct
DIP:1364N	DIP:2483N	0.9836	DIP
DIP:1726N	DIP:834N	0.9833	DIP
DIP:2417N	DIP:5630N	0.9831	DIP
sw:P18888	sw:P32591	0.9826	MINT, IntAct
sw:Q04067	sw:P40217	0.9812	MINT, IntAct

MATERIALS AND METHODS

Data sources

We evaluate our model focus on publicly available Saccharomyces cerevisiae data set introduced by Guo et al. [17]. The PPIs data were extracted from Saccharomyces cerevisiae core subset of database of interacting proteins (DIP) [47], version DIP_20070219. Through the two algorithms, paralogous verification method (PVM) and expression profile reliability (EPR) [50], the core subset of reliability is tested. And less than 50 residues of the protein of protein pairs are removed. In order to reduce pairwise sequence redundancy, multiple sequence alignment tool, CD-Hit [51, 52], was adopted with a threshold of 40% identity. Eventually the 5594 proteins are left to form the positive data set. The negative dataset consists of 5594 additional protein pairs, which are selected at different subcellular localization. Therefore, the positive and negative data set each accounted for half of the 11188 protein pairs constitute the final data set. As a comparison, we further assess the capabilities of our model in the H. pylori data set, which was described by Rain et al. [53]. It can be downloaded at http://www.cs.sandia.gov/~smartin/software.html. This data set contains 2916 protein pairs which include half interacting pairs and half non-interacting pairs. It provides a platform for comparing different methods [30, 42, 43, 45, 46].

Position-specific scoring matrix

Position-Specific Scoring Matrix (PSSM) is used to detect the distantly related proteins, and initially introduced by Gribskov et al. [32]. It has made outstanding achievements in these areas: protein secondary structure prediction [54], prediction of disordered regions [55], and protein binding site prediction [56]. A PSSM is an L × 20 matrix, which can be denoted as , where L denotes protein sequence length and the number of 20 is due to 20 amino acids. Each element PSSM (i, j) of the matrix is defined as follows: where α in the i row of PSSM means that the probability of the ith residue being mutated into type j of 20 native amino acids during the procession of evolutionary in the protein from multiple sequence alignments. In order to extract the evolutionary information, each protein sequence in the data set is used to align and search homogenous sequences from SwissProt database by the Position Specific Iterated BLAST (PSI-BLAST) [57] tool. PSI-BLAST will return a 20-dimensional vector which indicates the probabilities of conservation against mutations to 20 different amino acids including its own. To get broad and high homologous sequences, we select in this study the value of e-value is 0.001 and the value of iterations is 3, respectively. Applications of PSI-BLAST and SwissProt database can be downloaded at http://blast.ncbi.nlm.nih.gov/Blast.cgi.

Pseudo position-specific score matrix

In order to reduce the probability of missing sequence-order information, we introduced the concept of pseudo amino acid composition by Chou et al. [58]. In this article the sample of a protein sequence PSSM is represented by Equation 5 and PsePSSM obtained from the following Equation: where represents the original scores directly generated by the PSI-BLAST, and its value is typically positive integers or negative integers. This is not what we want standardized scores, which may have zero means if more than 20 amino acids and may remain unchanged if it continues through the same conversion program. The positive score implies that the corresponding mutation appears more frequently in the alignment than expected by chance, and the negative score, on the contrary, implies that the corresponding mutation appears less frequently in the alignment than expected by chance. However, according to the definition of PSSM, different lengths of proteins will correspond to different rows number in matrices. Equation 7 is employed to express the protein sample PSSM, so that the PSSM descriptor can be represented as a uniform pattern. and where denotes the average score when the amino acid residues in protein P in the process of running the algorithm was evolved into amino acid type j. However, if only is used to represent the protein P, all the sequence information will be lost during evolution. In order to prevent the occurrence of missing all information of sequence-order, the thought of pseudo amino acid was introduced to improve the Equation 7. Hence, based on the Equation 9 segmented PsePSSM features can be obtained: where is the correlation factors of amino acid type j. Although the value allowed for ε can be 0, 1, 2, …, or 49, considering the time costs and efficiency factors, we took ε to 0,1,2,3,4, so a total of 200-dimensional vectors are eventually used in this study.

Rotation forest

Rotation Forest (RF) is a novel proposed ensemble classifier that uses independently trained decision trees. The main idea of the Rotation Forest simultaneously encourages individual accuracy and diversity within the ensemble. In order to generate the training samples of the base classifier, the feature set is randomly divided into K subsets. The linear transformation method is applied to each subset, and retains all the principal components to maintain the precision of data. The rotation formed the training sample of new features to ensure the diversity of data. Hence the rotation forest can enhance the accuracy of individual classifier and the diversity in the ensemble at the same time. Suppose that contains N training samples, where be a D-dimensional feature vector, be the training sample set (n×D matrix), which is composed of n observation feature vector composition, be the corresponding labels, and S be the feature set. Assuming that the number of decision trees in the rotation forest is L, expressed as , respectively, and the feature set is randomly divided into K subsets of equal size. The preprocessing steps for an individual classifier is: the first select the appropriate parameters K which is a factor of n, and S randomly divided into K disjoint subsets, so the number of features contained in each feature subset is ; the second from the training dataset X to select the corresponding column of the feature in the subset R, form a new matrix X. Then the bootstrap subset of objects extracts three-quarters the size of the data set from X to construct a new training subset X; The third matrix X is used as the feature transform for producing the coefficients in a matrix M, which jth column coefficient as the characteristic component jth; and the final a sparse rotation matrix Mat is formed, and its coefficients in matrix M is expressed as Equation 10: In the prediction phase, given a test sample x, let be the probability produced by the classifier R to the hypothesis in which x belongs to class Y. Then the confidence for a class can be computed according to the average combined method shown in Equation 11: Therefore, the test sample x easily assigned to the classes with the greatest possible. The schematic diagram of the prediction model is shown in Figure 4.

Figure 4

The schematic diagram of the prediction model

CONCLUSIONS

In this paper, we proposed a novel method to predict protein-protein interactions using the Rotation Forest combine with Pseudo Position-Specific Score Matrix. In order to preserve as much information as possible, we first convert the protein amino acids sequences into the PSSM matrix, and then extract the features using the PsePSSM algorithm, finally determine whether there is an interaction between protein pairs through the RF classifier. To evaluate the performance of the proposed method, we implement it on the Yeast, H. pylori and independent data sets. In addition, we also compare the proposed method with other excellent methods. Excellent experimental results demonstrate that the proposed method is feasible and effective in the prediction of protein interactions. The low standard deviation of these criterion values indicates that our method is stable and robust. In future studies, we will focus on improving the classification algorithm to expect higher predictive accuracy and less time consumption in predicting protein-protein interactions.

WEBSERVER

In order to facilitate the use of researchers, we have built a web server to implement the proposed prediction model. The web server provides the source code and the Yeast data sets used in this article for users to download. It can be accessed to at http://202.119.201.126:8888/PsePSSM/. Users can query the predicted results of the Yeast data sets through the webpage and receive the predict results by e-mail.

52 in total

1. Protein secondary structure prediction based on position-specific scoring matrices.

Authors: D T Jones
Journal: J Mol Biol Date: 1999-09-17 Impact factor: 5.469

2. Clustering of highly homologous sequences to reduce the size of large protein databases.

Authors: W Li; L Jaroszewski; A Godzik
Journal: Bioinformatics Date: 2001-03 Impact factor: 6.937

3. Biological networks: the tinkerer as an engineer.

Authors: U Alon
Journal: Science Date: 2003-09-26 Impact factor: 47.728

4. Prediction of disordered regions in proteins from position specific score matrices.

Authors: David T Jones; Jonathan J Ward
Journal: Proteins Date: 2003

5. POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome.

Authors: Tao-Wei Huang; An-Chi Tien; Wen-Shien Huang; Yuan-Chii G Lee; Chin-Lin Peng; Huei-Hun Tseng; Cheng-Yan Kao; Chi-Ying F Huang
Journal: Bioinformatics Date: 2004-06-24 Impact factor: 6.937

6. Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM.

Authors: Hong-Bin Shen; Kuo-Chen Chou
Journal: Protein Eng Des Sel Date: 2007-11-10 Impact factor: 1.650

7. An empirical study on the matrix-based protein representations and their combination with sequence-based approaches.

Authors: Loris Nanni; Alessandra Lumini; Sheryl Brahnam
Journal: Amino Acids Date: 2012-10-30 Impact factor: 3.520

8. The protein-protein interaction map of Helicobacter pylori.

Authors: J C Rain; L Selig; H De Reuse; V Battaglia; C Reverdy; S Simon; G Lenzen; F Petel; J Wojcik; V Schächter; Y Chemama; A Labigne; P Legrain
Journal: Nature Date: 2001-01-11 Impact factor: 49.962

9. Predicting protein-protein interactions using signature products.

Authors: Shawn Martin; Diana Roe; Jean-Loup Faulon
Journal: Bioinformatics Date: 2004-08-19 Impact factor: 6.937

10. MINT: the Molecular INTeraction database.

Authors: Andrew Chatr-aryamontri; Arnaud Ceol; Luisa Montecchi Palazzi; Giuliano Nardelli; Maria Victoria Schneider; Luisa Castagnoli; Gianni Cesareni
Journal: Nucleic Acids Res Date: 2006-11-29 Impact factor: 16.971

13 in total

1. Prediction of cassava protein interactome based on interolog method.

Authors: Ratana Thanasomboon; Saowalak Kalapanulak; Supatcharee Netrphan; Treenut Saithong
Journal: Sci Rep Date: 2017-12-08 Impact factor: 4.379

2. The PPI network analysis of mRNA expression profile of uterus from primary dysmenorrheal rats.

Authors: Pei Fan; Qiao-Hui Lin; Ying Guo; Lan-Ling Zhao; He Ning; Meng-Ying Liu; Dong-Qing Wei
Journal: Sci Rep Date: 2018-01-10 Impact factor: 4.379

3. PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences.

Authors: Yanbin Wang; Zhuhong You; Xiao Li; Xing Chen; Tonghai Jiang; Jingting Zhang
Journal: Int J Mol Sci Date: 2017-05-11 Impact factor: 5.923

4. PPInS: a repository of protein-protein interaction sitesbase.

Authors: Vicky Kumar; Suchismita Mahato; Anjana Munshi; Mahesh Kulharia
Journal: Sci Rep Date: 2018-08-20 Impact factor: 4.379

5. Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions.

Authors: Lei Wang; Zhu-Hong You; Xin Yan; Shi-Xiong Xia; Feng Liu; Li-Ping Li; Wei Zhang; Yong Zhou
Journal: Sci Rep Date: 2018-08-27 Impact factor: 4.379

10. In silico-prediction of protein-protein interactions network about MAPKs and PP2Cs reveals a novel docking site variants in Brachypodium distachyon.

Authors: Min Jiang; Chao Niu; Jianmei Cao; Di-An Ni; Zhaoqing Chu
Journal: Sci Rep Date: 2018-10-10 Impact factor: 4.379