Literature DB >> 31671806

Alignment-Free Method to Predict Enzyme Classes and Subclasses.

Riccardo Concu1, M Natália D S Cordeiro2.   

Abstract

The Enzyme Classification (EC) number is a numerical classification scheme for enzymes, established using the chemical reactions they catalyze. This classification is based on the recommendation of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. Six enzyme classes were recognised in the first Enzyme Classification and Nomenclature List, reported by the International Union of Biochemistry in 1961. However, a new enzyme group was recently added as the six existing EC classes could not describe enzymes involved in the movement of ions or molecules across membranes. Such enzymes are now classified in the new EC class of translocases (EC 7). Several computational methods have been developed in order to predict the EC number. However, due to this new change, all such methods are now outdated and need updating. In this work, we developed a new multi-task quantitative structure-activity relationship (QSAR) method aimed at predicting all 7 EC classes and subclasses. In so doing, we developed an alignment-free model based on artificial neural networks that proved to be very successful.

Entities:  

Keywords:  QSAR; alignment-free; artificial neural network; enzyme; enzyme classification; machine learning

Mesh:

Substances:

Year:  2019        PMID: 31671806      PMCID: PMC6862210          DOI: 10.3390/ijms20215389

Source DB:  PubMed          Journal:  Int J Mol Sci        ISSN: 1422-0067            Impact factor:   5.923


1. Introduction

By the late 1950s, the International Union of Biochemistry and Molecular Biology foresaw the need for unique nomenclature for enzymes. In those years, the number of known enzymes had grown very rapidly and, because of the absence of general guidelines, the nomenclature of the enzymes was getting out of hand. In some cases, enzymes with similar names were catalyzing different reactions, while conversely different names were given to the same or similar enzymes. Due to this, during the third International Congress of Biochemistry in Brussels in August 1955, the General Assembly of the International Union of Biochemistry (IUB) decided to establish an International Commission in charge of developing a nomenclature for enzymes. In 1961, the IUB finally released the first version of the Enzyme Classification (EC) and Nomenclature List. This nomenclature was based on assigning a four number code to enzymes with the following meaning: (i) the first number identifies the main enzyme class; (ii) the second digit indicates the subclass; (iii) the third number denotes the sub-subclass; and (iv) the fourth digit is the serial number of the enzyme in its sub-subclass. Six enzyme classes were identified, with the classification based on the type of reaction catalyzed: oxidoreductases (EC 1), transferases (EC 2), hydrolases (EC 3), lyases (EC 4), isomerases (EC 5) and ligases (EC 6) [1]. Although several revisions have been made to the 1961 version, the six classes identified have not received any change. However, in August 2018, a new class was added. This new class contains the translocases (EC 7), and was added to describe those enzymes catalyzing the movement of ions or molecules across membranes or their separation within membranes. For this reason, some enzymes which had previously been classified in other classes—EC 3.6.3 for example—were now included in the EC 7 class. Predicting enzyme classes or protein function using bioinformatic tools is still a key goal in bioinformatics and computational biology due to both the prohibitive costs and the time-consuming nature of wet-lab-based functional identification procedures. In point of fact, there are more than four thousand sequences whose function remains unknown so far and this number is still growing [2]. The problem is that our ability to assign a specific function to a sequence is far lower than our ability to isolate and identify sequences. For this reason, significant efforts have been devoted to developing reliable methods able to predict protein function. Several methodological strategies and tools have been proposed to classify enzymes based on different approaches [3,4,5,6,7,8,9,10]. The Basic Local Alignment Search Tool (BLAST) [11] is likely to be one of the most powerful and used tools which finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates their statistical significance. However, as is the case with all methods, these procedures may fail under certain conditions. In some cases, enzymes with a sequence similarity higher than 90% may belong to different enzyme families and, thus, have different EC annotations [12,13,14]. On the other hand, some enzymes which share the same first EC number may have a sequence similarity below 30%. Some authors have described this situation well and highlighted the need to develop alignment-free methods, which may be used in a complementary way [15,16]. Other relevant tools based on sequence similarity are the UniProtKB database [17], the Kyoto Encyclopedia of Genes and Genomes (KEGG) [18], the PEDANT protein database [19], DEEPre [20], ECPred [21] and EzyPred [22]. DEEPre is a three-level EC number predictor, which predicts whether an input protein sequence is an enzyme, and its main class and subclass if it is. This method is based on a dataset of 22,198 sequences achieving an overall accuracy of more than 90%. ECPred is another enzymatic function prediction tool based on an ensemble of machine learning classifiers. The creators of this tool developed it using a dataset of approximately 245,000 proteins, achieving score classifications in the 6 EC classes and subclasses like the ones reported by DEEPre. EzyPred is a top-down approach for predicting enzyme classes and subclasses. This model was developed using a 3-layer predictor using the ENZYME [23] dataset (approximately 9800 enzymes when the model was developed), which was able to achieve an overall accuracy above 86%. Other relevant methods with similar classification scores have also been reported [10,15,20,24,25]. All these methods have proved to be robust; however, they are all outdated since they cannot predict the EC 7 classification, and should therefore be updated in accordance with the new EC class. In light of what has been referred to so far, the major target of this work was to develop an alignment-free strategy using machine learning (ML) methods to predict the first two digits of the seven EC classes. Previous ML methods have used alignment-free numerical parameters to quantify information about the 2D or 3D structure of proteins [26,27,28,29]. Specifically, Graham, Bonchev, Marrero-Ponce, and others [30,31,32,33,34] used Shannon’s entropy measures to quantify relevant structural information about molecular systems. In addition, González-Díaz et al. [35,36,37] introduced so-called Markov–Shannon entropies (θ) to codify the structural information of large bio-molecules and complex bio-systems or networks. For comparative purposes, we developed different linear and non-linear models, including a linear discriminant analysis (LDA) and various types of artificial neural networks (ANNs). In addition, we focused our work on performing an efficient feature selection (FS). Nowadays, there are several software packages or tools that may be used to calculate thousands of molecular descriptors (MDs). As a result, a proper FS method is essential to develop robust and reliable quantitative structure–activity relationship (QSAR) models. This is particularly the case when using ANNs, since QSAR models developed with a large set of MDs are really complex, vulnerable to overfitting and difficult to obtain a mechanistic interpretation from [38,39].

2. Results

2.1. LDA Model

As a first step, we used the LDA algorithm implemented in the software STATISTICA® [40] to derive a linear model able to discriminate all of the subclasses of enzymes using a multi-task model, which means that a single model was developed in order to assign each enzyme to a specific class. From the first pool of more than 200 variables, we selected four that clearly had an influence on the model using a supervised forward stepwise analysis. In order to validate the model, we split our dataset, assigning 70% of the entries to the training class and the remaining 30% to the validation class. The latter was used for validation of the model using a cross-validation procedure. The LDA model had the following overall values for specificity: Sp = 99.71%, sensitivity: Sn = 98.16% and accuracy: Acc = 98.66%. In the training series, the model displayed Sp = 99.71%, Sn = 98.13% and Acc = 98.63%, while in the validation series Sp = 99.71, Sn = 98.27, Acc = 98.73. All of these statistics are reported in Table 1.
Table 1

Accuracy for the linear discriminant analysis (LDA) model.

TrainingValidationOverall
All−1 = Sn1 = SpAll−1 = Sn1 = SpAll−1 = Sn1 = Sp
−198.1340,78177898.2713,61324098.1654,3941018
199.75719,49899.7119649899.717625,996
Total98.6340,83820,27698.7313,632673898.6654,47027,014
The linear equation (Equation (1)) for this model is shown below and information regarding its variables is given in Table 5:Other relevant statistics for the LDA model (both training and validation), such as the Wilk’s lambda and Matthews correlation coefficient (MCC), are reported in Table 2.
Table 2

Relevant statistics for the LDA model.

EigenvalueCanonicalRWilk’sLambdaChi-Sqr.dfp-valueMCC
1.2418790.7442750.44605449334.994.0000000.000.97

2.2. ANN models

We then decided to move a step forward and try to develop non-linear models using various neural networks’ architectures. We firstly investigated ANN models using either the multi-layer perceptron (MLP) algorithm or the radial basis function (RBF) [41,42,43,44,45,46]. To do so, we ran a set of 50 ANN-MLP models in order to identify the best topology and architecture. The best model found had an MLP 4-9-2 topology, and was developed using the same four variables used for the LDA model. Additionally, it was able to correctly classify 100% of the cases in both the training and validation series. Table 3 shows the statistical parameters obtained for this model. As can be seen, the MCC value was, as expected, 1.
Table 3

Performance of the best multi-layer perceptron (MLP) model found.

Obs. Sets aStat. Param. aPred. Stat. aPredicted sets
1−1nj
Training Series
1Sp a10017,500057,039
−1Sn a100039,5390
totalAc a10017,50039,53957,039
Validation Series
1Sp a1008572024,445
−1Sn a100015,8730
totalAc a100857215,87324,445
Overall
1Sp a10026,072081,484
−1Sn a100055,4120
totalAc a10026,07255,41281,484

a Obs. Sets = Observed sets, Stat. Param. = Statistical parameter, Pred. Stat. =Predicted statistics, Sp = Specificity, Sn = Sensitivity, Ac =Accuracy.

For comparative purposes, Table 4 reports the statistics of the 10 best MLP and RBF models found.
Table 4

Resumé of the 10 best MLP and radial basis function (RBF) models.

TrainingValidationOverall
Model −1 = Sn1 = SpAll−1 = Sn1 = SpAll−1 = Sn1 = SpAll
BESTMLP: 4-9-2Total55,41226,07281,48455,41226,07281,48455,41226,07281,484
Correct55,41226,07281,48455,41226,07281,48455,41226,07281,484
Incorrect0.000.000.000.000.000.000.000.000.00
Correct (%)100100100100100100100100100
Incorrect (%)0.000.000.000.000.000.000.000.000.00
1.MLP 4-7-2Total39,44817,59157,03915,873857224,44555,41226,07281,484
Correct39,44817,56757,01515,873856224,43555,41226,03481,446
Incorrect024240101003838
Correct (%)10099.8699.96100.0099.8899.96100.0099.8599.95
Incorrect (%)00.140.040.000.120.040.000.150.05
2.MLP 4-8-2Total39,44817,59157,03915,873857224,44555,41226,07281,484
Correct39,44817,56557,01315,873856324,43655,41226,03781,449
Incorrect0262609903535
Correct (%)10099.8599.95100.0099.9099.96100.0099.8799.96
Incorrect (%)00.150.050.000.100.040.000.130.04
3.MLP 4-10-2Total39,44817,59157,03915,873857224,44555,41226,07281,484
Correct39,44817,56557,01315,873856324,43655,41226,03781,449
Incorrect0262609903535
Correct (%)10099.8599.95100.0099.9099.96100.0099.8799.96
Incorrect (%)00.150.050.000.100.040.000.130.04
4.MLP 4-11-2Total39,44817,59157,03915,873857224,44555,41226,07281,484
Correct39,44817,56657,01415,873856324,43655,41226,03781,449
Incorrect0252509903535
Correct (%)10099.8699.96100.0099.9099.96100.0099.8799.96
Incorrect (%)00.140.040.000.100.040.000.130.04
5.MLP 4-16-2Total39,44817,59157,03915,873857224,44555,32126,16381,484
Correct39,44817,56757,01515,873857224,44555,32126,13981,460
Incorrect02424000000
Correct (%)10099.8699.96100.00100.00100.00100.0099.9199.97
Incorrect (%)00.140.040.000.000.000.000.090.03
6.RBF 4-21-2Total39,53917,50057,03915,873857224,44555,41226,07281,484
Correct39,52016,42655,94615,855805923,91455,37524,48579,860
Incorrect1910741093185135313715871624
Correct (%)99.9593.8698.0899.8994.0297.8399.9393.9198.01
Incorrect (%)0.056.141.920.115.982.170.076.091.99
7.RBF 4-29-2Total39,53917,50057,03915,873857224,44555,41226,07281,484
Correct39,16517,47556,64015,714856124,27554,87926,03680,915
Incorrect374253991591117053336569
Correct (%)99.0599.8699.399.0099.8799.3099.0499.8699.30
Incorrect (%)0.950.140.71.000.130.700.960.140.70
8.RBF 4-21-2Total39,53917,50057,03915,873857224,44555,41226,07281,484
Correct39,52616,13855,66415,868787323,74155,39424,01179,405
Incorrect131362137556997041820612079
Correct (%)99.9792.2297.5999.9791.8597.1299.9792.0997.45
Incorrect (%)0.037.782.410.038.152.880.037.912.55
9.RBF 4-28-2Total39,53917,50057,03915,197857123,76853,00826,06081,484
Correct39,48916,00023,48915,197844823,64553,00825,67478,682
Incorrect5015001,45001231230386386
Correct (%)99.8791.4395.65100.0098.5699.48100.0098.5299.51
Incorrect (%)0.037.784.350.001.440.520.001.480.49
10.RBF 4-26-2Total39,53917,50057,03915,873857224,44555,41226,07281,484
Correct11,880662918,50947483170791816,628979926,427
Incorrect27659108713853011125540216527387841627355057
Correct (%)30.0537.8832.4529.9136.9832.3930.0137.5832.43
Incorrect (%)69.9562.1267.5570.0963.0267.6169.9962.4267.57
The results reported in Table 4 clearly indicate that MLP models perform better than RBF ones. Even if the best MLP model was able to achieve 100% overall accuracy, we decided to perform a quantitative analysis to infer whether the MLP models were failing. As can be seen in Table 5, the non-optimal MLP models were particularly problematic in discriminating the EC 6.5 subclass.
Table 5

Quantitative analysis of the non-optimal MLP models.

ModelClassFailTotal Class
1. MLP 4-7-26.41104
6.53436
2. MLP 4-8-21.634
6.41104
6.53436
3. MLP 4-10-21.634
6.41104
6.53336
4. MLP 4-11-21.634
6.41104
6.53236
5. MLP 4-16-26.41104
6.533infer 36
Finally, a sensitivity analysis was also performed to assess the influence of the MDs in the model. The results of this analysis are shown in Table 6.
Table 6

Sensitivity analysis for the artificial neural network (ANN) model.

Input VariableVariable SensitivityVariable Name/Details
<Tr5(srn)>15,896,991Expected value of Trace of order 5 of the srn for the sequence
D Tr5(srn)1,288,626Deviation of Trace of order 5 of the srn with respect to the mean value of the class
<Tr3(srn)>591,331.9Expected value of Trace of order 3 of the srn for the sequence
D Tr3(srn)108.7591Deviation of Trace of order 3 of the srn with respect to the mean value of the class
Sensitivity analysis refers to the assessment of the importance of predictors in a developed model, with higher values of sensitivity being assigned to the most important predictors. As seen, the high sensitivity values found for some of the parameters suggest that the model’s performance can drastically fall if the parameters used in the model are removed. On the other hand, parameters with lower values of sensitivity may be discarded since they are not relevant to the performance of the model and may lead to an overfitted model. Regarding the variables presented in Table 6, they are traces of the n connectivity matrices of the amino acid sequences. The terms 3 and 5 represent the order of the matrix used in the calculation. The terms within brackets (“< >”) represent the mean value of each subclass, while “D” stands for the difference (or distance) between each amino acid sequence and the mean value of its subclass. This basically means that the model, in order to correctly predict each sequence as an enzyme and then input it into the specific subclass, is calculating the distance between each input and the mean of its subclass. This is in fact how a multi-target model works.

3. Discussion

The main aim of this study was to develop a new QSAR-ML model able to predict enzyme subclasses considering the new and recently introduced EC class 7. We retrieved from the Protein Data Bank (PDB) more than 26,000 enzyme and 55,000 non-enzyme sequences in order to build up our dataset. All of the enzyme sequences belonged to one of the 7 main classes and 65 subclasses. The EC 7 class was introduced just few months ago and, due to this, all of the current models do not include this new enzyme class. As a result, the classification or prediction such models are performing may be misleading. Hence, the development of new models which are capable of predicting all enzyme classes and subclasses—including the EC 7 class—are of utmost importance. In view of this, we developed a new machine learning model able to discriminate between enzymes and non-enzymes. In addition, the model was capable of assigning enzymes to a specific enzyme subclass. We generated linear and non-linear models using alignment-free variables to find the best model to predict EC classes and subclasses. The results of the linear model were impressive since with only four MDs the model could discriminate between enzymes and non-enzymes, as well as assign a specific EC class and subclass to each enzyme sequence. We checked the accuracy and robustness of the model and the results clearly indicate that the model is reliable. Regarding the validation, we performed a classical cross-validation procedure using 30% of the dataset. This led to almost the same results for the training and validation sets, indicating once more the robustness of the model and approach. Although the accuracy of the derived LDA model was near 100%, we decided to further test our approach by developing some neural network models, which usually improve LDA results. To the best of our knowledge, an MLP is generally considered the best ANN algorithm and, in this case, had the potential to improve our linear model. As previously reported, the MLP was able to perfectly discriminate between enzymes and non-enzymes, in addition to assigning each enzyme sequence to a specific subclass. It is also remarkable that the best model only needed nine neurons in the hidden layer. This low number of neurons, considering the number of sequences and variables, suggest that the model is not suffering from an overfitting problem. Mechanistic interpretation of ANN models is always a challenging task since these models do not lead to simple linear equations. A sensitivity analysis may then be used to analyze the influence of each MD on the model. For the ANN model, we carried out such an analysis to evaluate the weight of each variable in the model. This analysis is also useful for identifying redundant variables in models, assisting in their eliminatation to avoid an unlikely overfitting problem. In the case of the ANN model, we identified that the same four variables used in the LDA model were able to perfectly discriminate between enzymes and non-enzymes and assign each enzyme sequence to a specific subclass. Finally, we also tested RBF models, which afforded results that were worse than the MLP models. In fact, the general accuracy was lower when compared to the MLP models, which usually need less neurons to achieve greater accuracy.

4. Materials and Methods

4.1. Dataset

From the PDB, we retrieved a total of 81,486 protein FASTA sequences. Of those sequences, 26,073 were enzymes, while 55,413 were non-enzymes (α-proteins, β-proteins, membrane proteins, and so forth). Each of the 26,073 enzyme sequences belonged to one of the 65 enzyme subclasses. In order to avoid redundant sequences, we selected the enzymes using the specific EC classification query module of the PDB and then double-checked the dataset, eliminating double entries. Regarding the non-enzyme sequences, we randomly downloaded protein sequences belonging to different classes, such as membrane proteins, multi-domains, alfas and betas. The complete list of EC subclasses is reported in Supplemental Material S1, while Table 7 reports the number of entries for each one of the subclasses.
Table 7

Number of entries for each subclass.

EC SubclassNumber of SequencesEC SubclassNumber of SequencesEC SubclassNumber of Sequences
1.15552.37224.6120
1.22502.44244.9995
1.31722.52915.1176
1.41082.6195.274
1.552.731125.3247
1.642.8715.4160
1.7912.9105.5115
1.81653.115595.6159
1.9733.1175.993
1.105553.1336.1277
1.111363.27006.238
1.12323.31646.3291
1.131233.414816.4104
1.142443.55616.536
1.151623.64177.18827
1.161733.7697.2927
1.171213.8777.4189
1.18453.937.5187
1.202504.14867.6197
1.21284.2460
1.2334.397
2.15224.439
2.21074.525

4.2. Molecular Descriptor Calculation

The software S2SNet [47] was used to transform each protein sequence into one sequence recurrence network (SRN). The SRN of a protein sequence can be constructed starting from one of two directions: (1) from a sequence graph with linear topology by adding amino acid recurrence information, or (2) from a protein representation graph with star graph (SG) topology by adding sequence information [48,49,50,51,52]. Note that, in both of these SRN representations of a protein sequence, the amino acids are the nodes and are paired (na and nb) in the network (being connected by a link, αab = 1) if they are adjacent and/or neighbour recurrent nodes. This means that αab = 1 if the topological distance between na and nb is d = 1 (chemically bonded amino acids), or if they are the nearest neighbour amino acid of the same type (A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V, X) with minimal topological distance, dab = min(dab), between them. The first node in the sequence (centre of the star graph) is a bias or a dummy non-residue vertex. Secondly, we needed to transform the SRN of each sequence into one stochastic matrix 1Π. The elements of 1Π were found by considering the probability (pab) of reaching an amino acid (node nj) by walking from another amino acid (node ni) through a walk of length dij = 1 (Equation (2)): Note that the number of amino acids in the sequences was equal to the number of nodes (n) in the SRN graph, and was also equal to the number of rows and columns in 1Π, the length of the sequence (L), and the maximal topological distance in the sequence max(dab). In this work, we quantified the information content of a peptide using the Shannon entropy values (θ) of the k-th natural powers of the Markov matrix 1Π. The same procedure was used to quantify the information of the q-seqs () and r-seqs (). The formula for the Markov–Shannon entropy is as follows (Equation (3)): where represents the absolute probability of reaching a node moving throughout a walk of length k with respect to any node in the spectral graph. Further details of this formula can be seen in previous works [35,36,37]. In the Supplemental Material S2, we report the complete list of sequence entries with the respective value of the MD used to develop the models.

4.3. Multi-Target Linear model

The LDA model was developed using the General Discriminant tool implemented in the software STATISTICA [40]. The model is based on a multi-task approach, meaning it is able to predict if a sequence belongs to one out of the seven EC classes. It starts by identifying the presence of enzyme activity εq(ci) = 1 of subclass ci (or the absence of this activity εq(ci) = 0) for a query protein with a known amino acid sequence. The linear model is based on a linear equation, which directly correlates the dependent variable (enzyme or not) with the independent variable (MD). The multi-target LDA model was developed as follows. Once the MD were calculated, we computed the mean value of each subclass and then the difference between each sequence and the mean value of its subclass. Due to the model’s incorporation of the mean value of each subclass and the difference between each sequence, as well as the mean value of its subclass, the model is able to achieve a multi-target prediction. For further information regarding this statistical technique, please refer to the bibliography [53,54,55]. This same procedure was used also for the development of the multi-target ANN model. The validation of the model was performed using the cross-validation module implemented in the software. This procedure is aimed at assessing the predictive accuracy of a model. The test split the dataset into a training set and a validation set, ensuring that if an entry was included in the test set it could not be used in the validation test. In so doing, the model was developed using the cases in the training or learning sample, which, in our study, was 70% of the dataset. The predictive accuracy was then assessed using the remaining 30% of the dataset [56,57]. Standard statistics, such as the specificity (Sp), sensitivity (Sn), probability of error (p), cross-validation, and the Matthews correlation coefficient (MCC) [58], were used to assess the discriminatory power of the model.

4.4. Non-Linear Models

The non-linear models were developed using the neural network tool implemented in the software STATISTICA. In order to identify the best topology and architecture, we ran a large set of 50 models with various topologies. This step is crucial to avoid an (albeit unlikely) overfitting problem. We examined RBF and MLP networks since these usually perform better than other algorithms. The discriminatory power of the models was assessed using the cross-validation method. The models were validated using the cross-validation tool implemented in the ANN module of the STATISTICA software. In this validation procedure, the software automatically assigns 70% of the dataset to training the model. Once the model is trained, the remaining 30% of the inputs are used for validation. It is important to note that if an entry is used in the training set it cannot be used for the validation series.

5. Conclusions

Developing new, reliable, and robust methods for predicting protein function and enzyme class and subclasses is a key goal for theoreticians, especially in light of the recently introduced EC 7 class. In this work, we developed linear and non-linear models using an alignment-free approach to discriminate between enzymes and non-enzymes, as well as assign each enzyme sequence to a specific EC class. The best LDA model showed an overall accuracy of 98.63%, which is considered a remarkable result. However, we decided to explore further and develop some non-linear models using two different algorithms: MLP and RBF. While the latter was unable to improve the results of the LDA model, the MLP model was able to achieve an overall accuracy of 100%. This means that it was able to perfectly discriminate between enzymes and non-enzymes and identify the EC class of each enzyme.
  53 in total

1.  Prediction of human protein function from post-translational modifications and localization features.

Authors:  L J Jensen; R Gupta; N Blom; D Devos; J Tamames; C Kesmir; H Nielsen; H H Staerfeldt; K Rapacki; C Workman; C A F Andersen; S Knudsen; A Krogh; A Valencia; S Brunak
Journal:  J Mol Biol       Date:  2002-06-21       Impact factor: 5.469

Review 2.  Automatic prediction of protein function.

Authors:  B Rost; J Liu; R Nair; K O Wrzeszczynski; Y Ofran
Journal:  Cell Mol Life Sci       Date:  2003-12       Impact factor: 9.261

3.  The PEDANT genome database.

Authors:  Dmitrij Frishman; Martin Mokrejs; Denis Kosykh; Gabi Kastenmüller; Grigory Kolesov; Igor Zubrzycki; Christian Gruber; Birgitta Geier; Andreas Kaps; Kaj Albermann; Andreas Volz; Christian Wagner; Matthias Fellenberg; Klaus Heumann; Hans-Werner Mewes
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Radial basis function neural networks for the characterization of heart rate variability dynamics.

Authors:  A Bezerianos; S Papadimitriou; D Alexopoulos
Journal:  Artif Intell Med       Date:  1999-03       Impact factor: 5.326

5.  On representation of proteins by star-like graphs.

Authors:  Milan Randić; Jure Zupan; Drazen Vikić-Topić
Journal:  J Mol Graph Model       Date:  2006-12-15       Impact factor: 2.518

6.  Assessing and improving the stability of chemometric models in small sample size situations.

Authors:  Claudia Beleites; Reiner Salzer
Journal:  Anal Bioanal Chem       Date:  2008-01-29       Impact factor: 4.142

Review 7.  A review of protein function prediction under machine learning perspective.

Authors:  Juliana S Bernardes; Carlos E Pedreira
Journal:  Recent Pat Biotechnol       Date:  2013-08

8.  Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins.

Authors:  Riccardo Concu; Maria A Dea-Ayuela; Lazaro G Perez-Montoto; Francisco Bolas-Fernández; Francisco J Prado-Prado; Gianni Podda; Eugenio Uriarte; Florencio M Ubeira; Humberto González-Díaz
Journal:  J Proteome Res       Date:  2009-09       Impact factor: 4.466

9.  The ENZYME data bank.

Authors:  A Bairoch
Journal:  Nucleic Acids Res       Date:  1993-07-01       Impact factor: 16.971

10.  Artificial neural networks and the study of the psychoactivity of cannabinoid compounds.

Authors:  Káthia M Honório; Emmanuela F de Lima; Marcos G Quiles; Roseli A F Romero; Fábio A Molfetta; Albérico B F da Silva
Journal:  Chem Biol Drug Des       Date:  2010-06       Impact factor: 2.817

View more
  2 in total

Review 1.  Enzyme Models-From Catalysis to Prodrugs.

Authors:  Zeinab Breijyeh; Rafik Karaman
Journal:  Molecules       Date:  2021-05-28       Impact factor: 4.411

2.  The Role of Gene Duplication in the Divergence of Enzyme Function: A Comparative Approach.

Authors:  Alejandro Álvarez-Lugo; Arturo Becerra
Journal:  Front Genet       Date:  2021-07-14       Impact factor: 4.599

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.