Literature DB >> 31173946

ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation.

Hai-Cheng Yi¹, Zhu-Hong You², Xi Zhou³, Li Cheng³, Xiao Li³, Tong-Hai Jiang³, Zhan-Heng Chen³.

Abstract

Cancer is a well-known killer of human beings, which has led to countless deaths and misery. Anticancer peptides open a promising perspective for cancer treatment, and they have various attractive advantages. Conventional wet experiments are expensive and inefficient for finding and identifying novel anticancer peptides. There is an urgent need to develop a novel computational method to predict novel anticancer peptides. In this study, we propose a deep learning long short-term memory (LSTM) neural network model, ACP-DL, to effectively predict novel anticancer peptides. More specifically, to fully exploit peptide sequence information, we developed an efficient feature representation approach by integrating binary profile feature and k-mer sparse matrix of the reduced amino acid alphabet. Then we implemented a deep LSTM model to automatically learn how to identify anticancer peptides and non-anticancer peptides. To our knowledge, this is the first time that the deep LSTM model has been applied to predict anticancer peptides. It was demonstrated by cross-validation experiments that the proposed ACP-DL remarkably outperformed other comparison methods with high accuracy and satisfied specificity on benchmark datasets. In addition, we also contributed two new anticancer peptides benchmark datasets, ACP740 and ACP240, in this work. The source code and datasets are available at https://github.com/haichengyi/ACP-DL.

Entities: Chemical Disease Gene Species

Keywords: anticancer peptides; binary profile feature; deep learning; k-mer sparse matrix; long short-term memory

Year: 2019 PMID： 31173946 PMCID： PMC6554234 DOI： 10.1016/j.omtn.2019.04.025

Source DB: PubMed Journal: Mol Ther Nucleic Acids ISSN： 2162-2531 Impact factor: 8.886

Introduction

Cancer is one of the most devastating killers of human beings, accounting for millions of deaths around the world each year.1, 2 Conventional physical and chemical methods, including targeted therapy, chemotherapy, and radiation therapy, remain the principle modes to treat cancer, which focus on killing the diseased cells, but normal cells are also adversely affected.3, 4 More obviously, these treatments are expensive and inefficient, which means there is an urgent need to develop novel efficient measures to solve this deadly disease. The discovery of anticancer peptides (ACPs), a kind of short peptide generally with a length less than 50 amino acids and most of which are derived from antimicrobial peptides (AMPs), often cationic in nature, has led to the emergence of a novel alternative therapy to treat cancer. ACPs open a promising perspective for cancer treatment, and they have various attractive advantages,6, 7 including high specificity, ease of synthesis and modification, low production cost, and so on. ACPs could interact with the anionic cell membrane components of only cancer cells, and, for this reason, they can selectively kill cancer cells with almost no harmful effect on normal cells.4, 9 In addition, few ACPs, e.g., cell-penetrating peptides or peptide drugs, inhibit the cell cycle or any other functionality. Thus, they are safer than traditional broad-spectrum drugs, which have become the most competitive choice as therapeutics compared to small molecules and antibodies. In recent years, ACP therapeutics have been extensively explored and used to fight various tumor types across different phases of preclinical and clinical trials.10, 11, 12, 13, 14 However, only a few of them can eventually be employed for clinical treatment. Furthermore, it’s time-consuming, expensive, and lab-limited to identify potential new ACPs by experiment. With the huge therapeutic importance of ACPs, there is an urgent need to develop highly efficient prediction techniques. Some notable research has been reported in the prediction of ACPs. Tyagi et al. developed a support vector machine (SVM) model using amino acid composition (AAC) and dipeptide composition as input features on experimentally confirmed anticancer peptides and random peptides derived from the Swiss-Prot database. Hajisharifi et al. also reported an SVM model using Chou’s18, 19 pseudo AAC (PseAAC) and the local alignment kernel-based method. Vijayakumar and Ptv proposed that, between ACPs and non-ACPs, there was no significant difference in AAC observed. Also, they presented a novel encoding measure, which achieved better predictive performance than AAC-based features, considering both compositional information and centroidal, distributional measures of amino acids. Shortly afterward, based on the optimal g-gap dipeptide components, by exploring the correlation between long-range residues and sequence-order effects, Chen et al. described iACP, which exhibited the best predictive performance at that time. More recently, Wei et al. developed a sequence-based predictor called ACPred-FL, which uses two-step feature selection and seven different feature representation methods. According to the cognition of the short length of ACPs, it’s difficult to exploit the efficient features of many mature feature representation methods, which are widely used on protein sequences. With the rapid growth of the number of ACPs that has been identified experimentally, by machine learning, and by bioinformatics research,24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 the computational prediction methods of ACPs still need further development. In this study, we proposed a deep learning long short-term memory (LSTM) neural network model to predict anticancer peptides, which we named ACP-DL. The efficient features exploited from peptides sequences are fed as input to train the LSTM model. More specifically, peptide sequences are transformed by k-mer sparse matrix of the reduced amino acid alphabet,41, 42 which is a 2D matrix, and retained almost complete sequence order and amino acid component details. Meanwhile, peptide sequence are also converted by a binary profile feature, which can be regarded as one-hot encoding of categorical variables and has been suggested to be an efficient feature extraction technique.16, 22 Finally, these features are fed into our LSTM model to predict new anticancer peptides. To further evaluate the performance of our model, we evaluated the ACP-DL on two novel benchmark datasets. We also compared the purposed ACP-DL with existing state-of-the-art machine-learning models, e.g., SVM,44, 45 Random Forest (RF),and Naive Bayes (NB). The 5-fold cross-validation experimental results showed that our method is suitable for the anticancer prediction mission with notable prediction performance. The workflow of ACP-DL is show in Figure 1.

Figure 1

The Flowchat of Our ACP-DL Method

We used the k-mer sparse matrix and binary profile feature to represent peptide sequences, and the deep LSTM model is trained to predict anticancer peptides.

The Flowchat of Our ACP-DL Method We used the k-mer sparse matrix and binary profile feature to represent peptide sequences, and the deep LSTM model is trained to predict anticancer peptides.

Results and Discussion

Above all, we compared the different distributions of amino acids in anticancer peptides, non-anticancer peptides, and all peptides in datasets ACP740 and ACP240. The results for ACP740 are shown in Figure 2, the composition of all 20 amino acids in these peptides were counted and compared. Certain residues, including Cys, Phe, Gly, His, Ile, Asn, Ser, and Tyr, were found to be abundant in anticancer peptides compared to non-anticancer peptides, while Glu, Leu, Met, Gln, Arg, and Trp were abundant in non-anticancer peptides compared to anticancer peptides. Similarly, as shown in Figure 3, in dataset ACP240, the Phe, His, Ile, and Lys were abundant in anticancer peptides. Since terminal residues play essential roles in biological functions of peptides.

Figure 2

Comparison of Amino Acid Composition of Anticancer, Non-anticancer, and Total Peptides in Dataset ACP740

Figure 3

Comparison of Amino Acid Composition of Anticancer, Non-anticancer, and Total Peptides in Dataset ACP240

Comparison of Amino Acid Composition of Anticancer, Non-anticancer, and Total Peptides in Dataset ACP740 Comparison of Amino Acid Composition of Anticancer, Non-anticancer, and Total Peptides in Dataset ACP240

Evaluation of ACP-DL’s Capability to Predict Anticancer Peptides

First, we executed our model ACP-DL on the ACP740 and ACP240 datasets to evaluate its ability of predicting anticancer peptides. The 5-fold cross-validation details are offered in Tables 1 and 2.

Table 1

The 5-Fold Cross-Validation Details in the ACP740 Dataset

Fold Set	Acc (%)	Sens (%)	Spec (%)	Prec (%)	MCC (%)
1	79.73	81.94	77.63	81.94	59.58
2	83.11	85.71	80.00	86.30	66.39
3	81.08	79.75	84.00	78.08	62.22
4	85.81	86.49	85.33	86.30	71.63
5	77.70	79.17	76.00	79.45	55.47
Average	81.48 ± 3.12	82.61 ± 3.36	80.59 ± 4.01	82.41 ± 3.81	63.05 ± 6.23

Table 2

The 5-Fold Cross-Validation Details in the ACP240 Dataset

Fold Set	Acc (%)	Sens (%)	Spec (%)	Prec (%)	MCC (%)
1	93.75	89.66	99.99	86.36	87.99
2	81.25	77.42	92.31	68.18	63.02
3	87.50	88.46	88.46	86.36	74.83
4	83.33	90.91	76.92	90.91	67.83
5	81.25	76.67	92.00	69.57	63.53
Average	85.42	84.62	89.94	80.28	71.44

The 5-Fold Cross-Validation Details in the ACP740 Dataset The 5-Fold Cross-Validation Details in the ACP240 Dataset The average accuracy of 5-fold cross-validation on ACP740 was 81.48% with 3.12% SD, the average sensitivity (Sens) was 82.61% with 3.36% SD, the average specificity (Spec) was 80.59% with 4.01% SD, the mean precision (Prec) was 82.41% with 3.81% SD, and the Matthews correlation coefficient (MCC) was 63.05% with 6.23% SD. ACP-DL showed an outstanding capability to identify anticancer peptides, performed an area under the receiver operating characteristic (ROC) curve (AUC) of 0.894, as shown in Figure 4A, and has achieved the best performance on the ACP740 dataset among all comparison methods.

Figure 4

Performance of the Proposed Model ACP-DL and Comparison Model on Datasets ACP740 and ACP240

(A) The performance of the proposed model ACP-DL in dataset ACP740. (B) The performance of the comparison models in dataset ACP740, including SVM, RF, and NB. (C) The performance of the proposed model ACP-DL in dataset ACP240. (D) The performance of the comparison models in dataset ACP240, including SVM, RF, and NB.

Performance of the Proposed Model ACP-DL and Comparison Model on Datasets ACP740 and ACP240 (A) The performance of the proposed model ACP-DL in dataset ACP740. (B) The performance of the comparison models in dataset ACP740, including SVM, RF, and NB. (C) The performance of the proposed model ACP-DL in dataset ACP240. (D) The performance of the comparison models in dataset ACP240, including SVM, RF, and NB. The mean accuracy of 5-fold cross-validation on ACP240 was 85.42%, the average Sens was 84.62%, the average Spec was 89.94%, the mean Prec was 80.28%, and the MCC was 71.44%; and, the AUC of ACP-DL was 0.906, as shown in Figure 4C. In general, the performance of the deep learning model will become better with the increase in the scale of data, and the model that can achieve good results on smaller datasets will also achieve good results on huger data. Actually, the data scale of anticancer peptides is not very large, so we didn’t implement a neural network model with very complex architecture; and, to a certain extent, the 5-fold cross-validation is not conducive to the neural network model, because it further reduces the amount of training data. It is noteworthy that, although the dataset ACP240 was smaller than ACP740, our model ACP-DL still performed very well. The experimental results of rigorous cross-validation on benchmark dataset ACP740 and dataset ACP240 confirmed that our model has a good capability to predict anticancer peptides.

Comparison with Three Widely Used Machine-Learning Models

To evaluate the ability of the purposed method, we further compared ACP-DL with other widely used machine-learning models on the same benchmark datasets, including ACP740 and ACP240. Here we have selected the SVM, RF, and NB models, and we built them using the same cross-validation datasets. The implementation of these three machine-learning models comes from Scikit-learn, and they were tested with default parameters. Since these methods were evaluated using the same evaluation criteria, the comparison model and deep learning model ACP-DL results are shown in Table 3 and Figures 4 and 5. ACP-DL obtained the most significant performance among the contrasted methods.

Table 3

Actual Performance of Comparison Models and ACP-DL in the Same Dataset

Dataset	Model	Acc (%)	Sens (%)	Spec (%)	Prec (%)	MCC (%)	AUC
ACP740	SVM	64.59	62.43	90.68a	37.57	33.57	0.829
	RF	76.35	75.10	80.34	72.27	53.06	0.842
	NB	69.73	84.70a	49.21	90.94a	43.98	0.825
	ACP-DL	81.48a	82.61	80.59	82.41	63.05a	0.894a
ACP240	SVM	77.50	85.89a	70.68	85.65a	57.31	0.855
	RF	72.08	73.53	76.09	67.63	44.38	0.793
	NB	72.50	72.26	79.94	63.95	45.44	0.719
	ACP-DL	85.42a	84.62	89.94a	80.28	71.44a	0.906a

This measure of performance is the best among the compared methods.

Figure 5

Comparison of SVM, Random Forest, Naive Bayes, and ACP-DL in Benchmark Datasets ACP740 and ACP240

Actual Performance of Comparison Models and ACP-DL in the Same Dataset This measure of performance is the best among the compared methods. Comparison of SVM, Random Forest, Naive Bayes, and ACP-DL in Benchmark Datasets ACP740 and ACP240 Table 3 shows the details of the comparison. In the ACP740 dataset, our method ACP-DL significantly outperformed other methods with an accuracy of 81.48%, a Sens of 82.61%, a Spec of 80.59%, a Prec of 82.41%, an MCC of 63.05%, and an AUC of 0.894. ACP-DL increased the accuracy by over 5%, the MCC by over 10%, and the AUC by more than 5%, respectively. In the dataset ACP240, ACP-DL also performed remarkably with an accuracy of 85.42%, a Sens of 84.62%, a Spec of 89.94%, a Prec of 80.28%, an MCC of 71.44%, and an AUC of 0.906. ACP-DL improved the accuracy by over 8%, the Spec by over 10%, the MCC by over 14%, and the AUC by more than 5%, respectively. Obviously, the deep learning model shows its power, and our model is suitable for anticancer peptide identification and prediction. ACP-DL is a competitive model used to predict anticancer peptides and accelerate related research. The comparison experiment results proved our assumption.

Conclusions

In this study, we proposed a deep learning LSTM model to predict potential anticancer peptides using high-efficiency feature representation. More specifically, we developed an efficient feature representation approach by integrating binary profile feature and k-mer sparse matrix of reduced amino acid alphabet feature to fully exploit peptide sequence information. Then we implemented a deep LSTM model to automatically learn how to identify anticancer peptides and non-anticancer peptides. To the best of our knowledge, this is the first time that the deep LSTM model has been applied to predict anticancer peptides. Meanwhile, to evaluate the capability of the proposed method, we further compared ACP-DL with widely used machine-learning models in the same benchmark datasets, including ACP740 and ACP240; experimental results on the 5-fold cross-validation show that the proposed method achieved outstanding performance compared with existing methods, on benchmark dataset ACP740 with 81.48% accuracy at the AUC of 0.894 and on dataset ACP240 with an accuracy of 85.42% at the Spec of 89.94 and the AUC of 0.906, respectively. The improvement is mainly from the deep LSTM model’s model parameter optimization and effective feature representation from original peptide sequences. In addition, we have contributed two novel anticancer peptide benchmark datasets, ACP740 and ACP240, in this work. It is anticipated that ACP-DL will become a very useful high-throughput and cost-effective tool, being widely used in anticancer peptide prediction as well as cancer research. Further, as demonstrated in a series of recent publications in developing new prediction methods,49, 50, 51 user-friendly and publicly accessible web servers will significantly enhance their impacts. It is our wish to be able to provide in the future a web server for the prediction method presented in this paper.

Materials and Methods

In this study, we proposed a novel deep learning LSTM model to predict anticancer peptides, named ACP-DL, using high-efficiency features provided by k-mer sparse matrix and the binary profile feature. Furthermore, we evaluated ACP-DL’s predictive performance of anticancer peptides in benchmark datasets ACP740 and ACP240. Moreover, we compared ACP-DL with three widely used machine-learning models in the same datasets, including SVM, RF, and NB, to prove the robustness and effectiveness of the proposed method. Eventually, we made a summary, analysis, and discussion of the ACP-DL.

Construction of Datasets

We constructed two novel benchmark datasets in this work for ACP identification, named ACP740 and ACP240. As previous studies suggested, the new datasets comprised both positive and negative datasets, while positive samples were experimentally validated ACPs and AMPs without anticancer function were collected as negative samples. The positive anticancer peptide samples can be represented as , and the negative non-anticancer peptides can be represented as . So, the whole dataset can be represented as P.Moreover, there is no overlap between the positive and negative datasets.

Dataset ACP740

We selected 388 samples as the initial positive data on the basis of Chen et al.’s and Wei et al.’s studies, of which 138 were from Chen et al.’s work and 250 were from Wei et al.’s work. Correspondingly, the initial negative data were 456 samples, of which 206 were from Chen et al.’s work and 250 were from Wei et al.’s work, respectively. To avoid the bias of dataset, the widely used tool CD-HIT was further used to remove those peptides sequences with a similarity of more than 90%. As a result, we finally obtained a dataset containing 740 samples, of which 376 were positive samples and 364 were negative samples.

Dataset ACP240

As the same procedure, to validate the generalization ability of the predictive model, we further constructed an additional dataset, named ACP240, which initially included 129 experimentally validated anticancer peptide samples as the positive dataset and 111 AMPs without anticancer functions as the negative dataset, respectively. Moreover, those sequences with a similarity of more than 90% were removed using the popular tool CD-HIT. The similarity setting was consistent with previous studies.21, 22 The CD-HIT is available at http://weizhong-lab.ucsd.edu/cdhit-web-server. There was no overlap between dataset ACP740 and dataset ACP240, and these two datasets are both non-redundant datasets. The two benchmark datasets are publicly available at https://github.com/haichengyi/ACP-DL.

Representation of the Peptide Sequences

A peptide sequence can be represented as follows:where represents the first residue in the peptide P, denotes the second residue in the peptide P, and so on; l represents the length of P. Note that the residue is an element of the standard amino acid alphabet to train a machine-learning model; the first step is to convert diverse-length peptides into fixed-length feature vectors. In this study, we exploited two feature representation methods, as described below.

Binary Profile Feature (BPF)

As mentioned above, there are 20 different amino acids in the standard amino acid alphabet (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y). Each amino acid type is encoded with the following feature vector composed of 0/1. More specifically, the first amino acid type A in the alphabet is encoded as f(A) = (1,0,…,0), the second amino acid type C is encoded as f(C) = (0,1,…,0), and so on. Subsequently, for a given peptide sequence P, its N terminus with the length of k amino acids was encoded as the following feature vector:where k represents the length of the N terminus of the peptide P. Thus, the dimension of BPF(P) is 1 × 20. Experiments show that the best results can be achieved when k is set to 7. So, one given peptide sequence is encoded to a feature vector by binary profile.

K-mer Sparse Matrix

We also encoded the peptide sequence by using the k-mer sparse matrix previously proposed. In detail, its k-1 consecutive nucleotides and k consecutive nucleotides are regarded as a unit. 3-mer of peptides is composed of 3 amino acids. First the 20 amino acids were reduced into 7 groups based on their dipole moments and side chain volume: Ala, Gly, and Val; Ile, Leu, Phe, and Pro; Tyr, Met, Thr, and Ser; His, Asn, Gln, and Tpr; Arg and Lys; Asp and Glu; and Cys.16, 54, 55 So, the peptide sequence was reduced into a 7-letter alphabet. Then we scanned each peptide sequence from left to right, stepping one amino acid at a time, which is considered the characteristics of each amino acid. Suppose an above-mentioned peptide sequence length is L, there would be different possible k-mer and an step appearing in the RNA sequence. One peptide sequence is transformed into a k-mer sparse matrix M. Initialization of all elements is 0. When are just equal to the ith k-mer among different k-mer, set the element a = 1. The rest can be handled in the same way. Thus, an input peptide sequence is converted into a matrix M. In this study, the value of k is set to 3 to process the peptide sequence. The k-mer sparse matrix M can be defined as follows:The k-mer sparse matrix M is a low-rank matrix, which almost retained all the raw information, including sequence frequency, position, and order hidden information. Then, singular value decomposition (SVD) is used to reduce one two-dimensional matrix M into a feature vector. Finally, we conjoined two different feature representation methods’ output, each peptide sequence gain conjoined feature vector. Meanwhile, the whole dataset was transformed as a 2D matrix here. The feature matrix was reshaped into a 3D tensor for training the LSTM model, while the feature matrix without being formally reshaped was used to train the comparison model.

Deep LSTM Model Architecture

LSTM is an improvement of a recurrent neural network (RNN), which is mainly used in the natural language processing (NLP) and speech recognition field.57, 58, 59 Different from a traditional neural network, an RNN can take advantage of sequence information. Theoretically, it can utilize the information of arbitrary length sequence; but, because of the problem of vanishing gradient in the network structure, it can only retrospectively utilize the information on time steps that are close to it in practical applications. To solve this problem, LSTM was presented with specially designed network architecture, which can learn long-term dependency information naturally. A general architecture of LSTM is composed of an input gate, a forget gate, an update gate, and a memory block. The improvement of LSTM is mainly from incorporating a memory cell that accepts the network to learn when to forget previous hidden states and when to update hidden states, according to the input information through time. It uses dedicated storage units to store information. To our knowledge, the deep LSTM model was first applied to predict novel anticancer peptides in this work. LSTM selectively passes information through a gate unit, which mainly is by means of a sigmoid neural layer and a dot multiplication operation. Each element of the sigmoid layer output (a vector) is a real number between 0 and 1, representing the weight (or percentage) that the corresponding information passes through. For example, 0 means no information is allowed, and 1 means let all information pass.

Forget Gate

In the information flow processing of LSTM, the first step is to decide what information will discord from the cell state. This decision is accomplished by a way known as forget gate. Forget gate reads and , then outputs a value between 0 and 1 for each digit in cell state ; 1 means reserved absolutely and 0 means discard completely.Here, the represents the output of the previous cell, represents the current cell input, and means Sigmoid function.

Input Gate

The next step is to decide how much new information will be added to the cell state. To do this, there are two steps: first, a Sigmoid layer called the input gate layer determines which information needs to be updated; and then, a tanh layer generates a vector, which is the alternate content to update. We combined the two parts to update the state of cell.We multiply the old state with and discard the information we need to discard. Then we add . This is the new candidate value, which is changed according to the degree of each state we decide to update.

Output Gate

Ultimately, we need to determine what output is. This output will be based on our cell state, but it is also a filtered version. First, we run a sigmoid layer to determine which part of the cell state will be exported. Then, we process the cell state through a tanh function (to get a value between −1 and 1) and multiply it with the output of the Sigmoid gate, and eventually we just output the portion of the output we determine.In this experiment, considering the limited scale of anticancer peptide samples, we set the parameter of LSTM input layer to 128, and the output of LSTM layers was fed into a dense layer (a fully connected neural network layer) as input to obtain a final prediction result. We also used a sigmoid function as an activation function in the proposed model. The mathematical behaviors of a sigmoid function can be demonstrated as follows:Between them, the dropout layer was applied to reduce over-fitting and enhance the neural network model robustness, and the parameter dropout was set to 0.25. Moreover, a loss function can measure the performance of machine-learning models. We selected to use log loss function (binary cross-entropy) corresponding to sigmoid function as loss function, which can be defined as:where p and t represent the prediction output of model and true target value, respectively. Finally, the Adam optimizer was used to update the weights of network iteratively, which is popular in the deep learning field and combined the advantage of root-mean-square propagation (RMSProp) and adaptive gradient (AdaGrad) algorithm. The implementation of the deep learning model is based on the Keras framework, which is capable of running on top of TensorFlow, Theano, or CNTK and is supported on both GPUs and CPUs. It was developed with a focus on enabling fast experimentation.

Performance Evaluation Criteria

In this study, we proposed a novel deep learning LSTM model, ACP-DL, using an efficiency feature to predict potential anticancer peptides. We used 5-fold cross-validation to evaluate the performance of ACP-DL and comparison models. In each validation, all data randomly divide into five equal parts: the 4-fold set data are taken as training data, and the remaining 1-fold data are taken as test data. To guarantee the unbiased comparison, it was confirmed that there was no overlap between training data and test data. The final validation result was the average of 5-fold with SDs. We followed the widely used evaluation criteria,62, 63 including accuracy (Acc), Sens or recall, Spec, Prec, and MCC, defined as follows:where TN indicates the true negative number, TP denotes the true positive number, FN represents the false negative number, and FP stands for the false positive number. Certainly, the ROC curve and the AUC were also adopted to evaluate the performance.

Author Contributions

H.-C.Y. and Z.-H.Y. conceived the algorithm, carried out analyses, prepared the datasets, carried out experiments, and wrote the manuscript. Other authors designed, performed, and analyzed experiments and wrote the manuscript. All authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no competing interests.

49 in total

1. Learning to forget: continual prediction with LSTM.

Authors: F A Gers; J Schmidhuber; F Cummins
Journal: Neural Comput Date: 2000-10 Impact factor: 2.026

2. Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data.

Authors: Zhu-Hong You; Ying-Ke Lei; Jie Gui; De-Shuang Huang; Xiaobo Zhou
Journal: Bioinformatics Date: 2010-09-03 Impact factor: 6.937

3. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

4. Using ensemble classifier to identify membrane protein types.

Authors: H-B Shen; K-C Chou
Journal: Amino Acids Date: 2006-10-12 Impact factor: 3.520

Review 5. Studies on anticancer activities of antimicrobial peptides.

Authors: David W Hoskin; Ayyalusamy Ramamoorthy
Journal: Biochim Biophys Acta Date: 2007-11-22

6. Peptide-based drug design: here and now.

Authors: Laszlo Otvos
Journal: Methods Mol Biol Date: 2008

Review 7. Cationic antimicrobial peptides as novel cytotoxic agents for cancer treatment.

Authors: Jamie S Mader; David W Hoskin
Journal: Expert Opin Investig Drugs Date: 2006-08 Impact factor: 6.206

8. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes.

Authors: Kuo-Chen Chou
Journal: Bioinformatics Date: 2004-08-12 Impact factor: 6.937

9. Assessment of the biological and pharmacological effects of the alpha nu beta3 and alpha nu beta5 integrin receptor antagonist, cilengitide (EMD 121974), in patients with advanced solid tumors.

Authors: S Hariharan; D Gustafson; S Holden; D McConkey; D Davis; M Morrow; M Basche; L Gore; C Zang; C L O'Bryant; A Baron; D Gallemann; D Colevas; S G Eckhardt
Journal: Ann Oncol Date: 2007-08 Impact factor: 32.976

10. A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network.

Authors: Zhu-Hong You; Zheng Yin; Kyungsook Han; De-Shuang Huang; Xiaobo Zhou
Journal: BMC Bioinformatics Date: 2010-06-24 Impact factor: 3.169

21 in total

1. MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides.

Authors: You Li; Xueyong Li; Yuewu Liu; Yuhua Yao; Guohua Huang
Journal: Pharmaceuticals (Basel) Date: 2022-06-03

Review 2. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification.

Authors: Xiao Liang; Fuyi Li; Jinxiang Chen; Junlong Li; Hao Wu; Shuqin Li; Jiangning Song; Quanzhong Liu
Journal: Brief Bioinform Date: 2021-07-20 Impact factor: 11.622

Review 3. Artificial intelligence to deep learning: machine intelligence approach for drug discovery.

Authors: Rohan Gupta; Devesh Srivastava; Mehar Sahu; Swati Tiwari; Rashmi K Ambasta; Pravir Kumar
Journal: Mol Divers Date: 2021-04-12 Impact factor: 3.364

4. MIPDH: A Novel Computational Model for Predicting microRNA-mRNA Interactions by DeepWalk on a Heterogeneous Network.

Authors: Leon Wong; Zhu-Hong You; Zhen-Hao Guo; Hai-Cheng Yi; Zhan-Heng Chen; Mei-Yuan Cao
Journal: ACS Omega Date: 2020-07-09

5. Construction and Comprehensive Analysis of a Molecular Association Network via lncRNA-miRNA -Disease-Drug-Protein Graph.

Authors: Zhen-Hao Guo; Hai-Cheng Yi; Zhu-Hong You
Journal: Cells Date: 2019-08-09 Impact factor: 6.600

Review 6. Peptides with Dual Antimicrobial-Anticancer Activity: Strategies to Overcome Peptide Limitations and Rational Design of Anticancer Peptides.

Authors: Yamil Liscano; Jose Oñate-Garzón; Jean Paul Delgado
Journal: Molecules Date: 2020-09-16 Impact factor: 4.411

7. Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method.

Authors: Phasit Charoenkwan; Wararat Chiangjong; Vannajan Sanghiran Lee; Chanin Nantasenamat; Md Mehedi Hasan; Watshara Shoombuatong
Journal: Sci Rep Date: 2021-02-04 Impact factor: 4.379

8. In silico drug repositioning using deep learning and comprehensive similarity measures.

Authors: Hai-Cheng Yi; Zhu-Hong You; Lei Wang; Xiao-Rui Su; Xi Zhou; Tong-Hai Jiang
Journal: BMC Bioinformatics Date: 2021-06-01 Impact factor: 3.169

9. Prediction of Anticancer Peptides with High Efficacy and Low Toxicity by Hybrid Model Based on 3D Structure of Peptides.

Authors: Yuhong Zhao; Shijing Wang; Wenyi Fei; Yuqi Feng; Le Shen; Xinyu Yang; Min Wang; Min Wu
Journal: Int J Mol Sci Date: 2021-05-26 Impact factor: 5.923

10. ACP-DA: Improving the Prediction of Anticancer Peptides Using Data Augmentation.

Authors: Xian-Gan Chen; Wen Zhang; Xiaofei Yang; Chenhong Li; Hengling Chen
Journal: Front Genet Date: 2021-06-30 Impact factor: 4.599