Literature DB >> 29652843

iAPSL-IF: Identification of Apoptosis Protein Subcellular Location Using Integrative Features Captured from Amino Acid Sequences.

Yadong Tang1, Lu Xie2, Lanming Chen3.   

Abstract

Apoptosis proteins (APs) control normal tissue homeostasis by regulating the balance between cell proliferation and death. The function of APs is strongly related to their subcellular location. To date, computational methods have been reported that reliably identify the subcellular location of APs, however, there is still room for improvement of the prediction accuracy. In this study, we developed a novel method named iAPSL-IF (identification of apoptosis protein subcellular location-integrative features), which is based on integrative features captured from Markov chains, physicochemical property matrices, and position-specific score matrices (PSSMs) of amino acid sequences. The matrices with different lengths were transformed into fixed-length feature vectors using an auto cross-covariance (ACC) method. An optimal subset of the features was chosen using a recursive feature elimination (RFE) algorithm method, and the sequences with these features were trained by a support vector machine (SVM) classifier. Based on three datasets ZD98, CL317, and ZW225, the iAPSL-IF was examined using a jackknife cross-validation test. The resulting data showed that the iAPSL-IF outperformed the known predictors reported in the literature: its overall accuracy on the three datasets was 98.98% (ZD98), 94.95% (CL317), and 97.33% (ZW225), respectively; the Matthews correlation coefficient, sensitivity, and specificity for several classes of subcellular location proteins (e.g., membrane proteins, cytoplasmic proteins, endoplasmic reticulum proteins, nuclear proteins, and secreted proteins) in the datasets were 0.92-1.0, 94.23-100%, and 97.07-100%, respectively. Overall, the results of this study provide a high throughput and sequence-based method for better identification of the subcellular location of APs, and facilitates further understanding of programmed cell death in organisms.

Entities:  

Keywords:  Markov chains; apoptosis proteins; physicochemical properties; position specific scoring matrix; recursive feature elimination; support vector machine

Mesh:

Substances:

Year:  2018        PMID: 29652843      PMCID: PMC5979326          DOI: 10.3390/ijms19041190

Source DB:  PubMed          Journal:  Int J Mol Sci        ISSN: 1422-0067            Impact factor:   5.923


1. Introduction

Apoptosis, or programmed cell death, is a fundamental process controlling normal tissue homeostasis by regulating the balance between cell proliferation and death [1]. Blocking apoptosis is associated with cancer and autoimmune diseases (e.g., autoimmune lymphoproliferative syndrome (types I and II) and systemic lupus erythematosus), whereas unwanted and increased apoptosis can lead to ischemic damage or neurodegenerative diseases (e.g., Alzheimer’s disease, Parkinson’s disease, amyotrophic lateral sclerosis, and Creutzfeldt-Jakob disease) [2,3,4]. The subcellular location (e.g., membrane, cytoplasm, nuclear, endoplasmic reticulum, and mitochondria) of APs is strongly related to their function [2]. Subcellular location can be identified using conventional experimental methods, such as electronic microscopy, cell separation, and fluorescence microscopy [5]. Nevertheless, these experimental methods are time-consuming and expensive [5]. Facing the explosion of new protein sequences generated in the post-genomic and big data age [6], there exists a clear need for developing high-throughput, and sequence-based methods to identify the subcellular location of APs. To date, computational methods have been reported to efficiently identify the subcellular location of APs [7]. These methods were developed based on; (1) the design of the protein encoding scheme of the feature extraction; (2) the selection of the classifier [7]. Some sequence features are used for the first task, e.g., amino acid composition [8], dipeptide composition, which represents the composition of amino acid pairs and gapped amino acid pairs [9], pseudo amino acid composition [10,11,12,13,14], Markov chains [15], wavelet coefficients [3], distance frequency [16], grouped weight encoding [2], PSSMs [7,17,18], and gene ontology [19,20]. For example, the Markov chains, being a discrete stochastic model [21], contain the frequencies of 20 native amino acids and the information of amino acid pairs in protein sequences, which reflect the composition and local amino acid order of the protein sequences. They have been used for the identification of interaction sites between proteins and nucleic acids [21,22]. The PSSM reflects the evolutionary information of a protein sequence, and has been used for the prediction of protein function [23], subcellular location [5], and structural class [24,25]. In addition, a few machine learning algorithms have been developed for the second task, including the fuzzy k-nearest neighbor algorithm [12], SVM [3,7,16,17,18], covariant discrimination algorithm [9], and ensemble classifier [26,27]. Among these, the SVM proposed by Vapnik [28] exhibited the most promising results [7]. It is a supervised machine learning algorithm based on the structural risk minimization principle of statistical learning theory [26]. Samples labeled positive or negative are projected into a high dimensional feature space using a kernel, in which the hyper plane is optimized to maximize the margin of positive and negative samples [29]. For the SVM-based methods, it is crucial to convert the protein sequences with different lengths into fixed-length vectors [18]. The ACC transformation method was developed by Wold et al. [30], and has been widely used in protein family classification and protein interaction prediction [31,32]. Although computational methods, such as PSSM-trigram [7] and FKNN (fast k-nearest neighbor algorithm) [12], have been reported to reliably identify the subcellular location of APs, there is still room for improvement of the prediction accuracy. In our previous research, we established highly accurate protein structural class prediction methods based on the PSSMs using the SVM classifier [25,32]. In this study, we developed a novel method named iAPSL-IF using integrative features captured from amino acid sequences (Figure 1), and examined it based on three datasets ZD98, CL317, and ZW225 using the jackknife cross-validation test, as it is an objective and rigorous statistical test [22]. In jackknifing, a part of the sample is systematically omitted, for example, by removing one data point at a time, and the analysis is then carried out for each newly constructed subset [33]. Our data indicated that the iAPSL-IF achieved better results than the known predictors reported in the literature.
Figure 1

The flowchart of the iAPSL-IF method. NR: non-redundant (NR) database of the National Center for Biotechnology Information (NCBI) (available online: https://www.ncbi.nlm.nih.gov/).

2. Results and Discussion

2.1. Feature Extraction

In order to capture the feature information embedded in amino acid sequences, we analyzed amino acid compositions and Markov chains of each protein sequence in the three datasets ZD98, CL317, and ZW225, and encoded each sequence by a 420 (20 × 20 + 20) dimensional feature vector (see the Materials and Methods section). Meanwhile, the 10 physicochemical properties of amino acids were also individually numbered based on their corresponding values [34] for these protein sequences, and then each sequence was replaced by a numerical physicochemical property matrix. Furthermore, the evolutionary information of these protein sequences was each extracted by BLAST analysis, and then each sequence was represented by a PSSM. The resulting physicochemical property matrices and PSSM displayed different lengths, based on the different protein sequences.

2.2. Parameter Selection

To transform the physicochemical property matrices and PSSMs with different lengths into fixed-length feature vectors using the ACC method, we analyzed the key parameter length (g). The g values were set in the range of 4 ≤ g ≤ L/4 [17], where L is the length of the shortest protein sequence in a dataset. For the three datasets ZD98, CL371, and ZW225, L was 130, 87, and 76, respectively. We used the jackknife cross-validation test to measure the overall accuracy of the datasets corresponding to different g values. The resulting data were illustrated in Figure 2 and Figure 3. For the physicochemical property matrices, the highest overall accuracy of the datasets ZD98 and CL317 were 90.82% and 90.22%, when g = 12 and 13, respectively (Figure 2). In order to guarantee that the dimensions of the vectors were consistent, we set g = 12 for the ACC transformation of the physicochemical property matrices. Hence, each protein sequence was encoded by a 1100 (10 × 10 × (12 − 1)) dimensional vector. Likewise, for the PSSMs, when g = 8, the highest overall accuracy of the datasets ZD98 and CL317 were observed (94.90% and 93.69%, respectively). Therefore, each protein sequence was also replaced by a 2800 (20 × 20 × (8 − 1)) dimensional vector through the ACC transformation (Figure 3). Given the same dimension (420) for the Markov chains vectors, we obtained a 4320 (1100 + 2800 + 420) dimensional vector for each protein sequence by integrating the three different types of sequence features. Similarly, the parameter based on the dataset ZW225, with a similar size as CL371, was also determined.
Figure 2

The effects of g on overall accuracy based on the datasets ZD98 and CL317 after ACC transformation of the physicochemical property matrices.

Figure 3

The effect of g on overall accuracy based on the datasets ZD98 and CL317 after ACC transformation of the PSSMs.

2.3. Optimal Feature Selection

Although the integrated features captured sequence information from multiple aspects, the number of candidate features was large and the original feature space may have contained noisy and redundant features. Therefore, we reduced the dimensions using the SVM-RFE method [35], and improved the performance: (1) less prone to overfitting; (2) able to make full use of the training data; (3) much faster [29]. The feature vectors of a dataset were ranked according to their importance, and their top-K (K = 10, 20, 30, …, 380, 390, 400) [32] features were examined by the jackknife cross-validation test. The resulting data were illustrated in Figure 4. Based on the dataset ZD98, when K = 50, the highest overall accuracy (OA) was observed (100%), whereas when K = 90, the highest OA was 94.95% and 97.78% on the datasets CL317 and ZW225, respectively. In order to avoid losing important information if the dimension was low, we choose the top-90 ranked features for further analyses.
Figure 4

The effect of the Top-K features on overall accuracy based on datasets ZD98, CL317, and ZW225.

2.4. Performance of the iAPSL-IF

In this study, each protein sequence was encoded by a 90-dimensional vector after feature integration and optimal feature selection. We trained these features using the SVM and developed the iAPSL-IF. The performance of the iAPSL-IF was examined by the jackknife cross-validation test based on the three datasets, and the results were presented in Table 1. Based on the datasets ZD98, CL317, and ZW225, the OA was 98.98%, 94.95%, and 97.33%, respectively; the sensitivity (S) for different classes of subcellular location proteins was 97.67–100%, 88.24–100%, 88.00–100%, respectively; the specificity (S) was 98.53–100%, 97.07–100%, 98.71–100%, respectively; and the Matthew’s correlation coefficient (MCC) was 0.98–1.00, 0.88–0.99, 0.90–0.98, respectively. Notably, among the seven classes of subcellular location proteins tested in this study, only the S of the mitochondrial proteins (Mito) in the datasets CL317 and ZW225 was slightly lower (88.24% and 88.0%, respectively) than the other subcellular location proteins. Moreover, their corresponding MCC values were also lower (0.88 and 0.90, respectively). Similar results yielded by some previous predictors were also reported [1,10,23]. This may result from the discrepancies in dataset traits, such as the size, sequence homology, and unbalance of the subsets [16].
Table 1

Performance of the iAPSL-IF on the three datasets.

DatasetLocationSens (%)Spec (%)MCCOA (%)
ZD98Cyto97.671000.9898.98
Memb10098.530.98
Mito1001001.0
other1001001.0
CL317Cyto95.5497.070.9294.95
Memb94.5598.850.93
Mito88.2498.940.88
Secr10099.670.97
Nucl94.2398.870.93
Endo97.871000.99
ZW225Cyto10098.710.9897.33
Memb98.8899.260.98
Mito88.0099.500.90
Nucl95.1298.980.94

2.5. Performance Comparison with Other Known Methods

To evaluate how reliable the performance of the iAPSL-IF was, we compared it with all the known methods based on the same datasets available in the literature. The OA and S of different subcellular location proteins were chosen as the evaluation indexes for the jackknife cross-validation test. Based on dataset ZD98, the OA was 76.5–96.9%, as identified by twelve previous predictors, among which, the PSSM-trigram had the best performance (96.9%). However, the iAPSL-IF developed in this study further increased the OA by 2.1% when compared with the PSSM-trigram (Table 2). Moreover, the highest S value was also achieved by the iAPSL-IF for the cytoplasm proteins (Cyto) (97.7%), membrane proteins (Memb) (100%), Mito (100%), and other proteins (100%) (Table 2).
Table 2

Performance comparison of different methods on the ZD98 dataset.

MethodSens for Each Class (%)OA (%)Reference
CytoMembMitoOther
Covariant97.773.330.825.072.5[1]
ID_SVM95.393.384.658.388.8[10]
DWT_SVM95.493.353.991.788.8[26]
ID90.790.092.391.790.8[11]
EBGW_SVM97.790.092.383.392.9[2]
PseAAC_SVM95.393.392.383.392.9[36]
DF_SVM97.796.792.375.093.9[16]
Dual_layer SVM95.496.792.391.794.9[37]
APSLAP95.390.010091.794.9[27]
FKNN95.396.710091.795.9[12]
PSSM-AC97.796.710083.395.9[38]
PSSM-trigram95.310010091.796.9[7]
iAPSL-IF97.710010010099.0This study
Based on dataset CL317, the performance of the iAPSL-IF was compared with ten known predictors by the jackknife cross-validation test. The OA of the predictors ranged between 82.7% and 95.0%, among which the iAPSL-IF achieved the best performance (95.0%) with an increase of 2.6%. Although the S values for the Cyto, Memb, and Mito proteins identified by the iAPSL-IF were slightly lower (95.5%, 94.5%, 88.2%) than the previous better predictors (81.3–99.1%, 81.8–95.7%, 76.5–93.8%), the S values for the endoplasmic reticulum proteins (Endo), nuclear proteins (Nucl), and secreted proteins (Secr) were the highest among all the methods analyzed in this study (Table 3).
Table 3

Performance comparison of different methods on the CL317 dataset.

MethodSens for Each Class (%)OA (%)Reference
CytoMembMitoSecrNuclEndo
ID81.381.885.388.282.783.082.7[10]
ID_SVM91.189.179.458.873.187.284.2[11]
DF_SVM92.985.576.576.593.686.588.0[16]
Auto_Cova86.490.793.885.792.193.890.0[14]
FKNN93.892.782.476.590.493.690.9[12]
PseAAC_SVM93.890.985.376.590.495.791.1[36]
EN_FKNN98.283.679.482.490.497.991.5[26]
PSSM-AC93.890.991.282.486.595.791.5[38]
APSLAP99.189.185.388.284.395.892.4[27]
EI_SVM94.695.792.782.490.470.691.1[18]
iAPSL-IF95.594.588.210094.297.995.0This study
Similarly, based on dataset ZW225, the iAPSL-IF was also compared with seven known predictors available in the literature, and the resulting data were presented in Table 4. The OA (97.3%) generated by the iAPSL-IF was higher than most of the methods tested in this study, but was lower than the PSSM-trigram by 0.5%. The S values for the Cyto and Memb proteins identified by the iAPSL-IF were the highest (100% and 98.9%, respectively) among all these methods, but the S for the Mito and Nucl proteins were slightly lower than the PSSM-trigram. Given that the sequence features were also extracted from the PSSM by the PSSM-trigram [7], we concluded that the PSSMs contained important evolutionary information about protein sequences, and were very useful for identifying the subcellular location of APs.
Table 4

Performance comparison of different methods on the ZW225 dataset.

MethodSens for Each Class (%)OA (%)Reference
CytoMembMitoNucl
EBGW_SVM90.093.360.063.483.1[2]
DF_SVM87.192.164.073.284.0[16]
PSSM-AC82.992.168.078.084.0[38]
ID_SVM92.991.068.073.285.8[11]
Auto_Cova81.393.385.784.687.1[14]
EN_FKNN94.394.460.080.588.0[26]
PSSM-trigram97.198.996.097.697.8[7]
iAPSL-IF10098.988.095.197.3This study
Taken together, the overall performance of the iAPSL-IF developed in this study was better than the previous methods reported in the literature. It is known that increased or decreased apoptosis is associated with human diseases, and the function of APs is strongly related to their subcellular location. Therefore, the high-throughput and sequence-based iAPSL-IF will benefit researchers by allowing fast and efficient identification of the subcellular location of APs, which could be candidate targets for the development of novel diagnostics, vaccines, and therapeutics for human diseases.

3. Materials and Methods

3.1. Datasets

In this study, three widely used datasets ZD98, CL317, and ZW225 were used to test the performance of proposed methods for identifying the subcellular location of APs. The protein sequences were retrieved from the SWISS-PROT database (available online: https://www.uniprot.org/uniprot/), a source which includes protein sequences for human and the other organisms (e.g., pig, bovine, rat, chicken, African clawed frog, and fruit fly). The dataset ZD98 contained 98 proteins: 43 Cyto, 13 Mito, 30 Memb, and 12 other proteins [1]. The dataset CL317 consisted of 317 proteins classified into 6 classes: 112 Cyto, 55 Memb, 52 Nucl, 47 Endo, 34 Mito, and 17 Secr [10,11]. The dataset ZW225 contained 225 proteins classified into 4 classes: 89 Memb, 70 Cyto, 41 Nucl, and 25 Mito [2].

3.2. Markov Chains

The Markov chains have a substantial mathematical foundation [21]. Suppose S is a set of finite state, S = {S1, S2, …, SN}, where S is called state set and the symbol SN (N is positive integer) is called state. For a random sequence {X}=0, X refers to a state in S at time t. The state of Markov chains is q at t time, if the state q+1 at t + 1 only related to q, i.e., In the formula, q0, q1, …, q∈S. Thus, the {X}=0 is called Markov chains [22]. The matrix M = {P} (i, j∈S) is the transition matrix of Markov chains, and is the transition probability. M can be expressed as: In this matrix, 0 ≤ P ≤ 1, for all state i, j. Suppose the length of a protein sequence S is L, A is the ith amino acid of S. Thus, this protein sequence can be represented as Pro = A1 A2 A3 … A+1 … A. In this study, we analyzed the probability of each amino acid residue affected by the previous amino acid residue, which is expressed as: where F(A−1A) and F(A−1) is the frequency of amino acid pairs A−1A and A−1, respectively. Every protein sequence consists of 20 native amino acids, thus, the combinations of amino acid pairs generate a 20 × 20 matrix. In addition, amino acid composition is a basic feature of every protein sequence, which consists of 20 discrete numbers. Each of the numbers represent the frequency of the native amino acid residues in a protein sequence [39]. In this study, every protein sequence was represented by a 420 (20 × 20 + 20) dimensional vector by combining amino acid composition and Markov chains.

3.3. Physiochemical Properties of Amino Acids

In this study, 10 physicochemical properties were adopted: polarity, secondary structure, molecular volume, codon diversity, electrostatic charge, hydrophobicity, hydrophilicity, side-chain volume, polarizability, and solvent-accessible surface area, which were represented as P(1), P(2), P(3), P(4), P(5), P(6), P(7), P(8), P(9), P(10), respectively [35]. The original values of these physicochemical properties (Table 5) were normalized by the following formula before use, as described previously [40]: where is the value of the n type physicochemical property of the m type amino acid, and are the mean and standard deviation of the n type physicochemical property of the 20 native amino acids. Therefore, a protein sequence with a length of L can be encoded into ten different numerical series as follows: where is the polarity value of the first amino acid in the protein sequence, is the secondary structure value of the second amino acid in the sequence, and so forth.
Table 5

The original values of the ten physiochemical properties for all amino acids [41].

AAP(1)P(2)P(3)P(4)P(5)P(6)P(7)P(8)P(9)P(10)
A8.100−1.302−0.7331.57−0.1460.620−0.50027.5000.0461.181
C5.5000.465−0.862−1.02−0.2550.290−1.00044.6000.1281.461
D13.0000.302−3.656−0.259−3.242−0.9003.00040.0000.1051.587
E12.300−1.4531.4770.113−0.837−0.7403.00062.0000.1511.862
F5.200−0.591.891−0.3970.4121.190−2.500115.5000.2902.228
G9.0001.6521.331.0452.0640.4800.0000.0000.0000.881
H10.400−0.417−1.673−1.474−0.078−0.400−0.50079.0000.2302.025
I5.200−0.5472.1310.3930.8161.380−1.80093.5000.1861.810
K11.300−0.5610.533−0.2771.648−1.5003.000100.0000.2192.258
L4.900−0.987−1.5051.266−0.9121.060−1.80093.5000.1861.931
M5.700−1.5242.219−1.0051.2120.640−1.30094.1000.2212.034
N11.6000.8281.299−0.1690.933−0.7802.00058.7000.1341.655
P8.0002.081−1.6280.421−1.3920.1200.00041.9000.1311.468
Q10.500−0.179−3.005−0.503−1.853−0.8500.20080.7000.1801.932
R10.500−0.0551.5020.442.897−2.5303.000105.0000.2912.560
S9.2001.399−4.760.67−2.647−0.1800.30029.3000.0621.298
T8.0000.3262.2130.9081.313−0.050−0.40051.3000.1081.525
V5.900−0.279−0.5441.242−1.2621.080−1.50071.5000.1401.645
W5.4000.0090.672−2.128−0.1840.810−3.400145.5000.4092.663
Y6.2000.833.097−0.8381.5120.260−2.300117.3000.2982.368

3.4. PSSM

In this study, we analyzed all protein sequences using the PSI-BLAST program [26] against a NR database of the NCBI (available online: https://www.ncbi.nlm.nih.gov/) with default parameters, except the e-value threshold and the maximum number of iterations were set to 0.001 and 3, respectively, as described previously [35]. Each protein sequence generated a corresponding L × 20 PSSM as follows: where L is the length of a protein sequence and 20 is the number of native amino acids. The element p represents the occurrence probability of amino acid j at position i of the protein sequence. The rows of the matrix represent the positions of the sequence, while the columns represent the 20 amino acids [42]. The original values of the PSSM were normalized to reduce the noise and bias using the sigmoid function: f(x) = 1/(1 + e−) [16], where x is the original value of the PSSM.

3.5. ACC Transformation

In this study, the matrices of protein sequences with different lengths were transformed into fixed-length vectors using the ACC method, as described previously [35]. The method has two variables: auto covariance A(j, g) measures the correlation of the same property between amino acids by a distance of g along the sequence; cross-covariance C(j, k, g) measures different properties [43]. Both variables can be computed using the following formulae: where and represent the average scores of amino acid j and k, respectively. L is the length of the protein sequence, while j and k represent different amino acids, and g is the gap between two amino acids [29]. In this study, for the physicochemical property matrices of protein sequences, the number of auto-covariance variables is 10 × G, while the number of cross-covariance variables is 10 × 9 × G. Hence, each protein sequence can be encoded by a 100 × G dimensional feature vector. Likewise, for the PSSMs of protein sequences, the numbers of auto-covariance variables and cross-covariance variables are 20 × G, and 20 × 19 × G, respectively. Therefore, every protein sequence can be replaced by a 400 × G dimensional feature vector, where G is the maximal value for g.

3.6. SVM and SVM-RFE

In this study, SVM was adopted as the classifier using the LIBSVM algorithm package [44], in which four basic kernel functions are provided: linear, polynomial, Gaussian, and radial basis function (RBF). We chose the RBF as the kernel function, because it had a better boundary response and could reflect the distribution of the dataset more accurately [34]. The two parameters C and γ were optimized by the jackknife cross-validation test on the datasets. In order to decrease feature abundance and computation complexity, we reduced the feature dimensions using the SVM-RFE method [35]. Briefly, a matrix was constructed based on all the feature vectors of protein sequences in a dataset, where each row represented a protein sequence and each column a feature [7]. Then, a ranked feature list was obtained by running the SVM-RFE algorithm based on the feature importance. Subsequently, each protein sequence was encoded by an optimal subset of top-K ranked features.

3.7. Performance Measurement

In this study, the jackknife cross-validation test was chosen to measure the performance of predictors, because it is recognized as an objective and rigorous statistical test [45]. In a dataset, one protein sequence was selected as the test set each time, and the remaining were used as the training set. All the protein sequences were selected in turn, and finished until all the sequences were tested. Four widely-used measurements, including the OA, S, S, and MCC were used to measure the predictive capability of the classification, as described previously [45]. They were calculated using the following formulae: OA = (TP + TN)/(TP + TN + FP + FN); S = TP/(TP + FN); S = TN/(TN + TP); MCC = [(TP × TN) − (FP × FN)]/√[(TP + FP)(TP + FN)(TN + FP)(TN + FN)], where TP, TN, FP, and FN represent the numbers of true positive, true negative, false positive, and false negative results, respectively [42].

4. Conclusions

In this study, we developed a novel sequence-based method, iAPSL-IF, for the identification of the subcellular location of APs using integrative features captured from Markov chains, physicochemical property matrices, and PSSMs of amino acid sequences. Based on the three datasets ZD98, CL317, and ZW225, the iAPSL-IF outperformed the known predictors reported in the literature for several classes of subcellular location of APs, including the membrane proteins, cytoplasmic proteins, endoplasmic reticulum proteins, nuclear proteins, and secreted proteins. The source codes were written in the programming language Python 3, which is available by contacting the authors. In our future research, a web-based platform will be constructed for further application of the iAPSL-IF.
  38 in total

1.  Subcellular location prediction of apoptosis proteins.

Authors:  Guo-Ping Zhou; Kutbuddin Doctor
Journal:  Proteins       Date:  2003-01-01

2.  A novel representation for apoptosis protein subcellular localization prediction using support vector machine.

Authors:  Li Zhang; Bo Liao; Dachao Li; Wen Zhu
Journal:  J Theor Biol       Date:  2009-03-27       Impact factor: 2.691

3.  Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation.

Authors:  Xiaoqing Yu; Xiaoqi Zheng; Taigang Liu; Yongchao Dou; Jun Wang
Journal:  Amino Acids       Date:  2011-02-23       Impact factor: 3.520

4.  Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.

Authors:  Bin Liu; Shanyi Wang; Qiwen Dong; Shumin Li; Xuan Liu
Journal:  IEEE Trans Nanobioscience       Date:  2016-04-20       Impact factor: 2.935

5.  Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier.

Authors:  Xiaotong Guo; Fulin Liu; Ying Ju; Zhen Wang; Chunyu Wang
Journal:  Sci Rep       Date:  2016-06-21       Impact factor: 4.379

6.  Predicting apoptosis protein subcellular location with PseAAC by incorporating tripeptide composition.

Authors:  Bo Liao; Jun-Bao Jiang; Qing-Guang Zeng; Wen Zhu
Journal:  Protein Pept Lett       Date:  2011-11       Impact factor: 1.890

7.  Finding RNA-Protein Interaction Sites Using HMMs.

Authors:  Tao Wang; Jonghyun Yun; Yang Xie; Guanghua Xiao
Journal:  Methods Mol Biol       Date:  2017

8.  Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition.

Authors:  Ying-Li Chen; Qian-Zhong Li
Journal:  J Theor Biol       Date:  2007-05-18       Impact factor: 2.691

9.  Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach.

Authors:  Taigang Liu; Yufang Qin; Yongjie Wang; Chunhua Wang
Journal:  Int J Mol Sci       Date:  2015-12-24       Impact factor: 5.923

10.  MultiP-Apo: A Multilabel Predictor for Identifying Subcellular Locations of Apoptosis Proteins.

Authors:  Xiao Wang; Hui Li; Rong Wang; Qiuwen Zhang; Weiwei Zhang; Yong Gan
Journal:  Comput Intell Neurosci       Date:  2017-07-04
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.