Literature DB >> 27314023

Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences.

Ji-Yong An¹, Fan-Rong Meng¹, Zhu-Hong You², Yu-Hong Fang¹, Yu-Jun Zhao¹, Ming Zhang¹.

Abstract

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2016 PMID： 27314023 PMCID： PMC4893571 DOI： 10.1155/2016/4783801

Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411

1. Introduction

Proteins are crucial molecules that participate in many cellular functions in an organism. Typically, proteins do not perform their roles individually, so detection of PPIs becomes more and more important. Knowledge of PPIs can provide insight into the molecular mechanisms of biological processes and lead to a better understanding of practical medical applications. In recent years, various high-throughput technologies, such as yeast two-hybrid screening methods [1, 2], immunoprecipitation [3], and protein chips [4], have been developed to detect interactions between proteins. Until now, a large quantity of PPI data for different organisms has been generated, and many databases, such as MINT [5], BIND [6], and DIP [7], have been built to store protein interaction data. However, these experimental methods have some shortcomings, such as being time-intensive and costly. In addition, the aforementioned approaches suffer from high rates of false positives and false negatives. For these reasons, predicting unknown PPIs is considered a difficult task using only biological experimental methods. As a result, a number of computational methods have been proposed to infer PPIs from different sources of information, including phylogenetic profiles, tertiary structures, protein domains, and secondary structures [8-16]. However, these approaches cannot be employed when prior knowledge about a protein of interest is not available. With the rapid growth of protein sequence data, the protein sequence-based method is becoming the most widely used tool for predicting PPIs. Consequently, a number of protein sequence-based methods have been developed for predicting PPIs. For example, Bock and Gough [17] used a support vector machine (SVM) combined with several structural and physiochemical descriptors to predict PPIs. Shen et al. [18] developed a conjoint triad method to infer human PPIs. Martin et al. [19] used a descriptor called the signature product of subsequences and an expansion of the signature descriptor based on the available chemical information to predict PPIs. Guo et al. [20] used the SVM model combined with an autocorrelation descriptor to predict Yeast PPIs. Nanni and Lumini [21] proposed a method based on an ensemble of K-local hyperplane distances to infer PPIs. Several other methods based on protein amino acid sequences have been proposed in previous work [22, 23]. In spite of this, there is still space to improve the accuracy and efficiency of the existing methods. In this paper, we propose a novel computational method that can be used to predict PPIs using only protein sequence data. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise by using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. More specifically, we first represent each protein using a PSSM representation. Then, a LPQ descriptor is employed to capture useful information from each protein PSSM and generate a 256-dimensional feature vector. Next, dimensionality reduction method PCA is used to reduce the dimensions of the LPQ vector and the influence of noise. Finally, the RVM model is employed as the machine learning approach to carry out classification. The proposed method was executed using two different PPIs datasets: Yeast and Human. The experimental results are found to be superior to SVM and other previous methods, which prove that the proposed method performs incredibly well in predicting PPIs.

2. Materials and Methodology

2.1. Dataset

To verify the proposed method, two publicly available datasets are used in our study. The datasets are Yeast and Human that were obtained from the publicly available Database of Interaction Proteins (DIP) [24]. For better implementation, we selected 5594 positive protein pairs to build the positive dataset and 5594 negative protein pairs to build the negative dataset from the Yeast dataset. Similarly, we selected 3899 positive protein pairs to build the positive dataset and 4262 negative protein pairs to build the negative dataset from the Human dataset. Consequently, the Yeast dataset contains 11188 protein pairs and the Human dataset contains 8161 protein pairs.

2.2. Position Specific Scoring Matrix

A Position Specific Scoring Matrix (PSSM) is an M × 20 matrix X = {X : i = 1 ⋯ M, j = 1 ⋯ 20} for a given protein, where M is the length of the protein sequence and 20 represents the 20 amino acids [28-33]. A score X is allocated for the jth amino acid in the ith position of the given protein sequence in the PSSM. The score X of the position of a given sequence is expressed as X = ∑ 20 p(i, k) × q(j, k), where p(i, k) is the ratio of the frequency of the kth amino acid appearing at position i of the probe to be the total number of probes and q(j, k) is the value of Dayhoff's mutation matrix [34] between the jth and kth amino acids [35-37]. As a result, a high score represents a largely conserved position and a low score represents a weakly conserved position [38-40]. PSSMs are used to predict protein folding patterns, protein quaternary structural attributes, and disulfide connectivity [41, 42]. Here, we also use PSSMs to predict PPIs. In this paper, we used the Position Specific Iterated BLAST (PSI-BLAST) [43] to create PSSMs for each protein sequence. The e-value parameter was set as 0.001, and three iterations were selected for obtaining broadly and highly homologous sequences in the proposed method. The resulting PSSMs can be represented as 20-dimensional matrices. Each matrix is composed of L × 20 elements, where L is the total number of residues in a protein. The rows of the matrix represent the protein residues, and the columns of the matrix represent the 20 amino acids.

2.3. Local Phase Quantization

Local Phase Quantization (LPQ) has been described in detail in the literature [44]. The LPQ method is based on the blur invariance property of the Fourier phase spectrum [45-47]. It is an operator used to process spatial blur in textural features of images. The spatial invariant blurring of an original image f(x) apparent in an observed image g(x) can be expressed as a convolution, given bywhere h(x) is the function of the spread point of the blur, ∗ represents two-dimensional convolutions, and x is a vector of coordinates [x, y]. In the Fourier domain, this amounts towhere G(u), F(u), and H(u) are the discrete Fourier transforms (DFT) of the blurred image g(x), the original image f(x), and h(x), respectively, and u is a vector of coordinates [u, v]. According to the characteristic of the Fourier transform, the phase relations can be expressed asWhen the spread point function h(x) is the center of symmetry, meaning h(x) = h(−x), the Fourier transform of h(x) always has a real value. As a result, its phase can be expressed as a two-valued function, given byThis means thatThe shape of the point spread function h(x) is similar to the Gaussian or Sin function. This ensures that H(u) ≥ 0 and ∠G(u) = ∠F(u) at low frequencies, which means that the phase characteristics are due to blur invariance. The local phase information can be extracted using the two-dimensional DFT in LPQ. In other words, a short-term Fourier transform (STFT) computed over a rectangular M × M neighborhood N at each pixel position x of an image f(x) is represented bywhere w is the basis vector of the two-dimensional DFT at frequency u and f is another vector containing all M 2 image samples from N . Using LPQ, the Fourier coefficients of four frequencies are calculated: u 1 = [a, 0], u 2 = [0, a], u 3 = [a, a], and u 4 = [a, −a], where a is a small enough number to satisfy h(u) ≥ 0. As a result, each pixel point can be expressed as a vector, given byThen, using a simple scalar quantizer, the resulting vectors are quantized, given bywhere g (x) is the jth component of F . After quantization, F becomes an eight-bit binary number vector, and each component of F is assigned a weight of 2. As a result, the quantized coefficients are represented as integer values between 0 and 255 using binary coding Finally, a histogram of these integer values from all image positions is composed and used as a 256-dimensional feature vector in classification. In this paper, the PSSM matrixes of each protein from the Yeast and Human datasets were converted to 256-dimensional feature vectors using this LPQ method.

2.4. Principal Component Analysis

Principal Component Analysis (PCA) is widely used to process data and reduce the dimensions of datasets. In this way, high-dimensional information can be projected to a low-dimensional subspace, while retaining the main information. The basic principle of PCA is as follows. A multivariate dataset can be expressed as the following matrix X:where s is the number of variables and N is the number of samplings of each variable. PCA closely related to singular value decomposition (SVD) of matrix and the singular value decomposition of matrix X as follows:where c represent feature vector of X X and b represent feature vector of XX and a is singular value. If there are m linear relationships between s variables, then m singular values are zero. Any line of X can be expressed as feature vector (q 1, q 2,…, q ):where r (t) = x(t)q is projection x(t) on q , feature vector (q 1, q 2,…, q ) is load vector, and r (t) is score. When there is a certain degree of linear correlation between the variables of matrix, then the projection of final several load vectors of matrix X will be enough small for resulting from measurement noise. As a result, the principal decomposition of matrix X is represented bywhere E is error matrix and can be ignored. This does not bring about the obvious loss of useful information of data. In this paper, for the sake of reducing the influence of noise and improving the prediction accuracy, we reduce the dimensionality of the Yeast dataset from 256 to 180 and dimensionality of the Human dataset from 256 to 172 in the proposed method by using Principal Component Analysis.

2.5. Relevance Vector Machine

The characteristics of the Relevance Vector Machine have been described in detail in the literature [48]. For binary classification problems, assume that the training sample sets are {x , t } , x ∈ R is the training sample, t ∈ {0,1} represents the training sample label, t represents the testing sample label, and t = y + ε , where y = w φ(x ) = ∑ w K(x , x ) + w 0 is the model of classification prediction; ε is additional noise, with a mean value of zero and a variance of σ 2, where ε ~ N(0, σ 2), t ~ N(y , σ 2). Assuming that the training sample sets are independent and identically distributed, the observation of vector t obeys the following distribution [49-51]:where φ is defined as follows: The RVM uses sample label t to predict the testing sample label t , given byTo make the value of most components of the weight vector w zero and to reduce the computational work of the kernel function, the weight vector w is subjected to additional conditions. Assuming that w obeys a distribution with a mean value of zero and a variance of α −1, the mean w ~ N(0, α −1), p(w∣a) = ∏ p(w ∣a ), where a is a hyperparameters vector of the prior distribution of the weight vector w. Hence,Because p(w, a, σ 2∣t) cannot be obtained by an integral, it must be resolved using a Bayesian formula, given byThe integral of the product of p(t∣a, σ 2) and p(w∣a) is given byBecause p(a, σ 2∣t) ∝ p(t∣a, σ 2)p(a)p(σ 2) and p(a, σ 2∣t) cannot be solved by means of integration, the solution is approximated using the maximum likelihood method, represented by The iterative process of a and σ 2 is as follows:where ∑i, i is ith element on the diagonal of Σ and the initial value of a and σ 2 can be determined via the approximation of a and σ 2 by continuously updating using formula (21). After enough iterations, most of a will be close to infinity, the value of the corresponding parameters in w will be zero, and other a values will be close to finite. The resulting corresponding parameters x of a are now referred to as the relevance vector.

2.6. Procedure of the Proposed Method

In the paper, our proposed method contains three steps: feature extraction, dimensionality reduction using PCA, and sample classification. The feature extraction step contains two steps: (1) each protein from the datasets is represented as a PSSM matrix and (2) the PSSM matrix of each protein is expressed as a 256-dimensional vector using the LPQ method. Dimensional reduction of the original feature vector is achieved using the PCA method. Finally, sample classification occurs in two steps: (1) the RVM model is used to carry out classification based on the datasets from Yeast and Human whose features have been extracted and (2) the SVM model is employed to execute classification on the dataset of Yeast. The flow chart of the proposed method is displayed in Figure 1.

Figure 1

The flow chart of the proposed method.

2.7. Performance Evaluation

To evaluate the feasibility and efficiency of the proposed method, five parameters, the accuracy of prediction (Ac), sensitivity (Sn), specificity (Sp), precision (Pe), and Matthews's correlation coefficient (MCC), were computed. They are represented as follows:where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. True positives stand for the number of true interacting pairs correctly predicted. True negatives are the number of true noninteracting pairs predicted correctly. False positives stand for the number of true noninteracting pairs falsely predicted, and false negatives are the number of true interacting pairs falsely predicted to be noninteracting pairs. Moreover, a Receiver Operating Curve (ROC) was created to evaluate the performance of our proposed method.

3. Results and Discussion

3.1. Performance of the Proposed Method

To avoid the overfitting in the prediction model and to test the reliability of our proposed method, we used 5-fold cross-validation in our experiment. More specifically, the whole dataset was divided into five parts; four parts were employed for training model, and one part was used for testing. Five models were gained from the Yeast and Human datasets using this method, and each model was executed alone in the experiment. For the sake of ensuring fairness, the related parameters of the RVM model were set up the same for the two different datasets, Yeast and Human. Here, the Gaussian function was selected as the kernel function with the following parameters: width = 0.6, initapla = 1/N 2, and beta = 0, where width represents the width of the kernel function, N is the number of training samples, and the value of beta was defined as zero, which represents classification. The experimental results of the prediction models of the RVM classifier combined with Local Phase Quantization and the Position Specific Scoring Matrix and Principal Component Analysis based on the protein sequence information from the two datasets are listed in Tables 1 and 2.

Table 1

5-fold cross-validation results shown by using our proposed method on the Yeast dataset.

Testing set	Ac (%)	Sn (%)	Pe (%)	MCC (%)
1	92.76	92.73	92.79	86.56
2	93.79	93.27	93.41	88.34
3	91.28	92.12	90.43	84.08
4	92.27	92.02	92.50	85.72
5	93.17	93.02	93.32	87.27
Average	92.65 ± 0.95	92.63 ± 0.55	92.67 ± 1.40	86.40 ± 1.61

Table 2

5-fold cross-validation results shown by using our proposed method on the Human dataset.

Testing set	Ac (%)	Sn (%)	Pe (%)	MCC (%)
1	98.10	98.99	97.25	96.27
2	97.67	99.49	96.02	95.45
3	97.37	99.25	95.55	94.87
4	97.24	98.96	95.72	94.63
5	99.26	99.22	99.31	98.54
Average	97.92 ± 0.81	99.18 ± 0.21	96.77 ± 1.57	95.95 ± 1.58

Using the proposed method on the Yeast dataset, we achieved the results of average accuracy, sensitivity, precision, and MCC of 96.25%, 92.63%, 92.67%, and 87.27%. The standard deviations of these criteria values were 0.95%, 0.55%, 1.40%, and 1.61%, respectively. Similarly, we also obtained good results of average accuracy, sensitivity, precision, and MCC of 97.92%, 99.187%, 96.77%, and 95.95% on the Human dataset. The standard deviations of these criteria values were 0.81%, 0.21%, 1.57%, and 1.58%, respectively. It can be seen from Tables 1 and 2 that the proposed method is accurate, robust, and effective for predicting PPIs. The better performance for predicting PPIs may be attributed to the feature extraction of the proposed method. This approach is novel and effective, and the choice of the classifier is accurate. The proposed feature extraction method contains three data processing steps. First, the PSSM matrix not only describes the order information for the protein sequence but also retains sufficient prior information; thus, it is widely used in other proteomics research. As a result, we converted each protein sequence to a PSSM matrix that contains all the useful information from each protein sequence. Second, because Local Phase Quantization has the advantage of blur invariance in the domain of image feature extraction, information can be effectively captured from the PSSMs using the LPQ method. Finally, while meeting the condition of maintaining the integrity of the information in the PSSM, we reduced the dimensions of each LPQ vector and reduced the influence of noise using Principal Component Analysis. Consequently, the sample information that was extracted using the proposed feature extraction method is very suitable for predicting PPIs.

3.2. Comparison with the SVM-Based Method

Although our proposed method achieved reasonably good results on the Yeast and Human datasets, its performance must be further validated against the state-of-the-art support vector machine (SVM) classifier. More specifically, we compared the classification performances between SVM and RVM model on the Yeast dataset using the same feature extraction method. The LIBSVM tool (available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/) was employed to carry out classification in SVM. Two corresponding parameters of SVM, c and g, are optimized using a grid search method. In the experiment, we set c = 0.7 and g = 0.6 and used a radial basis function as the kernel function. The prediction results of the SVM and RVM methods on Yeast dataset are shown in Table 3, and the ROC curves are displayed in Figure 2. From Table 3, the prediction results of the SVM method achieved 85.34% average accuracy, 84.40% average sensitivity, 86.89% average specificity, and 74.97% average MCC, while the prediction results of the RVM method achieved 92.65% average accuracy, 92.63% average sensitivity, 92.67%, average specificity, and 86.40% average MCC. From these results, we can see that the RVM classifier is significantly better than the SVM classifier. In addition, the ROC curves were analyzed in Figure 2, showing that the ROC curve of the RVM classifier is significantly better than that of the SVM classifier. This clearly proves that the RVM classifier of the proposed method is an accurate and robust classifier for predicting PPIs. The increased classification performance of the RVM classifier compared with the SVM classifier can be explained by two reasons: (1) the obvious advantage of RVM is that the computational work of the kernel function is greatly reduced and (2) RVM overcomes the shortcoming of the kernel function being required to satisfy the condition of Mercer. Due to these reasons, the RVM classifier of our proposed method is significantly better than the SVM classifier. At the same time, it has been proven that the proposed method can yield highly accurate PPI predictions.

Table 3

5-fold cross-validation results shown by using our proposed method on the Yeast dataset.

Testing set	Ac (%)	Sn (%)	Sp (%)	MCC (%)
SVM + PSSM + LPQ
1	85.96	84.77	87.13	75.86
2	84.18	82.86	85.43	73.33
3	85.52	84.10	86.97	75.22
4	85.29	84.12	86.47	74.91
5	85.76	86.16	88.45	75.55
Average	85.34 ± 0.69	84.40 ± 1.20	86.89 ± 1.09	74.97 ± 0.98

RVM + PSSM + LPQ
1	92.76	92.73	92.79	86.56
2	93.79	93.27	93.41	88.34
3	91.28	92.12	90.43	84.08
4	92.27	92.02	92.50	85.72
5	93.17	93.02	93.32	87.27
Average	92.65 ± 0.95	92.63 ± 0.55	92.67 ± 1.40	86.40 ± 1.61

Figure 2

Comparison of ROC curves performed between RVM and SVM on the Yeast dataset.

3.3. Comparison with Other Methods

In addition, a number of PPI prediction methods based on protein sequences have been proposed. To prove the effectiveness of our proposed method, we compared the prediction ability of our proposed method, which uses an RVM model combined with a Position Specific Scoring Matrix, Local Phase Quantization, and Principal Component Analysis, with existing methods on Yeast and Human datasets. It can be seen from Table 4 that the average prediction accuracy of the five different methods is between 75.08% and 89.33% for Yeast dataset. The prediction accuracies of these methods are lower than that of the proposed method, which is 92.65%. Similarly, the precision and sensitivity of our proposed method are also superior to those of the other methods. At the same time, Table 5 shows the average prediction accuracy between the six different methods and the proposed method on the Human dataset. From Table 5, the prediction accuracies yielded by the other methods are between 89.3% and 96.4%. None of these methods obtains higher prediction accuracy than our proposed method. From Tables 4 and 5, it can be observed that the proposed method yielded obviously better prediction results compared to other existing methods based on ensemble classifiers. All these results prove that the RVM classifier combined with Local Phase Quantization and the Position Specific Scoring Matrix and Principal Component Analysis can improve the prediction accuracy relative to current state-of-the-art methods. Our method improves predictions by using a correct classifier and a novel extraction method that captures the useful evolutionary information.

Table 4

Predicting ability of different methods on the Yeast dataset.

Model	Testing set	Ac (%)	Sn (%)	Pe (%)	MCC (%)
Guo et al.'s work [20]	ACC	89.33 ± 2.67	89.93 ± 3.60	88.77 ± 6.16	N/A
Guo et al.'s work [20]	AC	87.36 ± 1.38	87.30 ± 4.68	87.82 ± 4.33	N/A

Zhou et al.'s work [25]	SVM + LD	88.56 ± 0.33	87.37 ± 0.22	89.50 ± 0.60	77.15 ± 0.68

Yang et al.'s work [26]	Cod1	75.08 ± 1.13	75.81 ± 1.20	74.75 ± 1.23	N/A
	Cod2	80.04 ± 1.06	76.77 ± 0.69	82.17 ± 1.35	N/A
	Cod3	80.41 ± 0.47	78.14 ± 0.90	81.66 ± 0.99	N/A
	Cod4	86.15 ± 1.17	81.03 ± 1.74	90.24 ± 1.34	N/A

You et al.'s work [27]	PCA-EELM	87.00 ± 0.29	86.15 ± 0.43	87.59 ± 0.32	77.36 ± 0.44

The proposed method	RVM	92.65 ± 0.95	92.63 ± 0.55	92.67 ± 1.40	86.40 ± 1.61

Table 5

Predicting ability of different methods on the Human dataset.

Model	Ac (%)	Sn (%)	Pe (%)	MCC (%)
LDA + RF [28]	96.4	94.2	N/A	92.8
LDA + RoF [28]	95.7	97.6	N/A	91.8
LDA + SVM [28]	90.7	89.7	N/A	81.3
AC + RF [28]	95.5	94.0	N/A	91.4
AC + RoF [28]	95.1	93.3	N/A	91.0
AC + SVM [28]	89.3	94.0	N/A	79.2
The proposed method	97.92	99.18	96.77	95.95

4. Conclusion

Knowledge of PPIs is becoming increasingly more important, which has prompted the development of computational methods. Though many approaches have been developed to solve this problem, the effectiveness and robustness of previous prediction models can still be improved. In this study, we explore a novel method using an RVM classifier combined with Local Phase Quantization and a Position Specific Scoring Matrix. From the experimental results, it can be seen that the prediction accuracy of the proposed method is obviously higher than those of previous methods. It is a very promising and useful support tool for future proteomics research. The main improvements of the proposed method come from adopting an effective feature extraction method that can capture useful evolutionary information. Moreover, the results showed that PCA significantly improves the prediction accuracy by integrating the useful information and reducing the influence of noise. In addition, the experimental results show that the RVM model is suitable for predicting PPIs. In conclusion, the proposed method is an efficient, reliable, and powerful prediction model and can be a useful tool for future proteomics research.

39 in total

1. BIND--The Biomolecular Interaction Network Database.

Authors: G D Bader; I Donaldson; C Wolting; B F Ouellette; T Pawson; C W Hogue
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. Psychophysiological classification and experiment study for spontaneous EEG based on two novel mental tasks.

Authors: Hui Wang; Aiguo Song; Bowei Li; Baoguo Xu; Yangming Li
Journal: Technol Health Care Date: 2015 Impact factor: 1.285

3. Functional organization of the yeast proteome by systematic analysis of protein complexes.

Authors: Anne-Claude Gavin; Markus Bösche; Roland Krause; Paola Grandi; Martina Marzioch; Andreas Bauer; Jörg Schultz; Jens M Rick; Anne-Marie Michon; Cristina-Maria Cruciat; Marita Remor; Christian Höfert; Malgorzata Schelder; Miro Brajenovic; Heinz Ruffner; Alejandro Merino; Karin Klein; Manuela Hudak; David Dickson; Tatjana Rudi; Volker Gnau; Angela Bauch; Sonja Bastuck; Bettina Huhse; Christina Leutwein; Marie-Anne Heurtier; Richard R Copley; Angela Edelmann; Erich Querfurth; Vladimir Rybin; Gerard Drewes; Manfred Raida; Tewis Bouwmeester; Peer Bork; Bertrand Seraphin; Bernhard Kuster; Gitte Neubauer; Giulio Superti-Furga
Journal: Nature Date: 2002-01-10 Impact factor: 49.962

4. Global analysis of protein activities using proteome chips.

Authors: H Zhu; M Bilgin; R Bangham; D Hall; A Casamayor; P Bertone; N Lan; R Jansen; S Bidlingmaier; T Houfek; T Mitchell; P Miller; R A Dean; M Gerstein; M Snyder
Journal: Science Date: 2001-07-26 Impact factor: 47.728

5. Profile analysis: detection of distantly related proteins.

Authors: M Gribskov; A D McLachlan; D Eisenberg
Journal: Proc Natl Acad Sci U S A Date: 1987-07 Impact factor: 11.205

6. A comprehensive two-hybrid analysis to explore the yeast protein interactome.

Authors: T Ito; T Chiba; R Ozawa; M Yoshida; M Hattori; Y Sakaki
Journal: Proc Natl Acad Sci U S A Date: 2001-03-13 Impact factor: 11.205

7. Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages.

Authors: Xun Lan; Heather Witt; Koichi Katsumura; Zhenqing Ye; Qianben Wang; Emery H Bresnick; Peggy J Farnham; Victor X Jin
Journal: Nucleic Acids Res Date: 2012-06-06 Impact factor: 16.971

8. Assessing and predicting protein interactions by combining manifold embedding with multiple information integration.

Authors: Ying-Ke Lei; Zhu-Hong You; Zhen Ji; Lin Zhu; De-Shuang Huang
Journal: BMC Bioinformatics Date: 2012-05-08 Impact factor: 3.169

9. A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network.

Authors: Zhu-Hong You; Zheng Yin; Kyungsook Han; De-Shuang Huang; Xiaobo Zhou
Journal: BMC Bioinformatics Date: 2010-06-24 Impact factor: 3.169

10. Large-scale protein-protein interactions detection by integrating big biosensing data with computational model.

Authors: Zhu-Hong You; Shuai Li; Xin Gao; Xin Luo; Zhen Ji
Journal: Biomed Res Int Date: 2014-08-18 Impact factor: 3.411

4 in total

1. PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences.

Authors: Yanbin Wang; Zhuhong You; Xiao Li; Xing Chen; Tonghai Jiang; Jingting Zhang
Journal: Int J Mol Sci Date: 2017-05-11 Impact factor: 5.923

2. Prediction of Protein-Protein Interactions with Local Weight-Sharing Mechanism in Deep Learning.

Authors: Lei Yang; Yukun Han; Huixue Zhang; Wenlong Li; Yu Dai
Journal: Biomed Res Int Date: 2020-06-13 Impact factor: 3.411

3. An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions.

Authors: Ji-Yong An; Yong Zhou; Yu-Jun Zhao; Zi-Ji Yan
Journal: Evol Bioinform Online Date: 2019-10-03 Impact factor: 1.625

4. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics.

Authors: Zheng-Wei Li; Zhu-Hong You; Xing Chen; Jie Gui; Ru Nie
Journal: Int J Mol Sci Date: 2016-08-25 Impact factor: 5.923

4 in total