| Literature DB >> 29297322 |
Clarence White1, Hamid D Ismail1, Hiroto Saigo2, Dukka B Kc3.
Abstract
BACKGROUND: The β-Lactamase (BL) enzyme family is an important class of enzymes that plays a key role in bacterial resistance to antibiotics. As the newly identified number of BL enzymes is increasing daily, it is imperative to develop a computational tool to classify the newly identified BL enzymes into one of its classes. There are two types of classification of BL enzymes: Molecular Classification and Functional Classification. Existing computational methods only address Molecular Classification and the performance of these existing methods is unsatisfactory.Entities:
Keywords: Beta lactamase protein classification; Convolutional neural network; Deep learning; Feature selection
Mesh:
Substances:
Year: 2017 PMID: 29297322 PMCID: PMC5751796 DOI: 10.1186/s12859-017-1972-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Venn diagram showing the relationship between molecular class and Functional group of Beta Lactamase
Molecular Class/Functional Group Benchmark Dataset
| # | Class/Group | # of Sequences Before /After CD-hit |
|---|---|---|
| 1 | Class A | 11,987/278 |
| 2 | Class B/Group 3 | 120,465/2184 |
| 3 | Class C/Group 1 | 12,350/744 |
| 4 | Class D | 4853/62 |
| 5 | Group 2 | 16,840/340 |
| 6 | Non BL | 497 |
Molecular Class/Functional Group Datasets
| # | Class/Group | Training | Independent 1 | Independent 2 |
|---|---|---|---|---|
| 1 | Class A | 268 | 10 | 4 |
| 2 | Class B/Group 3 | 2069 | 115 | 6 |
| 3 | Class C/Group 1 | 701 | 43 | 6 |
| 4 | Class D | 59 | 3 | 4 |
| 5 | Group 2 | 318 | 22 | 8 |
| 6 | Non BL | 478 | 19 | – |
Fig. 2Schematic of our multi-class classification approach for Beta Lactamase
Molecular Class/Functional Group Benchmark Dataset after Balancing
| # | Class | Method | Positive | Negative |
|---|---|---|---|---|
| 1 | Level 1 | ROS | 3268 | 3268 |
| 2 | Class A | ROS | 2990 | 2990 |
| 3 | Class B/Group 3 | RUS | 1084 | 1084 |
| 4 | Class C/Group 1 | ROS | 2524 | 2524 |
| 5 | Class D | SMOTE | 3200 | 3200 |
| 6 | Group 2 | ROS | 2770 | 2770 |
Feature set and Feature Selection Results. CSKAAP [22] refers to the K-spaced amino acid Pairs, CT [20] refers to Conjoint Triad and TAAC is the Tri-peptide Amino acid composition
| Feature Set | Total Features | Molecular Class / Functional Group – Total Features after Feature Selection | |||||
|---|---|---|---|---|---|---|---|
| Level 1 | Class A | Class B / Group 3 | Class C / Group 1 | Class D | Group 2 | ||
| CKSAAP [ | 2400 | 367 | 270 | 240 | 230 | 197 | 266 |
| CT [ | 512 | 208 | 151 | 149 | 145 | 147 | 160 |
| TAAC | 8000 | 325 | 227 | 262 | 249 | 120 | 219 |
| ALL | 10,912 | 363 | 288 | 243 | 257 | 195 | 270 |
Fig. 3Convolutional Neural Network (CNN) architecture used in our approach
Fig. 4Top 10 Features from CKSAAP (k-spaced Amino Acid Pairs). The features and their relative importance after feature selection for Level 1 and Classes A, B, C and D using XGBOOST
Performance of CKSAAP, TAAC, CT and ALL for Level 1 using 10-Fold CV (ALL refers to CKSAAP + CT + TAAC)
| Methods | Level 1 | |||
|---|---|---|---|---|
| AUC | Sen (%) | Sp (%) | MCC | |
| CKSAAP | 1.00 | 99.90 | 95.73 | 0.96 |
| CT | 0.98 | 98.30 | 93.81 | 0.92 |
| TAAC | 0.98 | 97.27 | 92.29 | 0.89 |
| ALL | 1.00 | 99.77 | 96.47 | 0.96 |
Fig. 5ROC Curve for 10-fold cross validation (CKSAAP). All curves follow closely to the left and top border, with AUC above 90%, indicating the classifiers have a high accuracy
Fig. 6Comparison of MCC Scores based on 10-fold cross validation
Performance of CKSAAP using 10-Fold Cross Validation
| Class/Group | AUC | Sen (%) | Sp (%) | MCC |
|---|---|---|---|---|
| Level 1 | 1.00 | 99.90 | 95.73 | 0.96 |
| Class A | 1.00 | 98.03 | 100.00 | 0.98 |
| Class B/Group 3 | 1.00 | 97.94 | 97.94 | 0.96 |
| Class C/Group 1 | 1.00 | 98.02 | 99.15 | 0.97 |
| Class D | 1.00 | 99.58 | 99.97 | 1.00 |
| Group 2 | 1.00 | 97.44 | 99.93 | 0.97 |
Independent Test Set Performance of CKSAAP
| Class/Group | AUC | Sen (%) | Sp (%) | MCC |
|---|---|---|---|---|
| Level 1 | 0.96 | 97.60 | 68.18 | 0.70 |
| Class A | 0.99 | 76.92 | 98.68 | 0.78 |
| Class B/Group 3 | 1.00 | 100.00 | 98.48 | 0.99 |
| Class C/Group 1 | 0.99 | 86.49 | 99.21 | 0.89 |
| Class D | 1.00 | 83.33 | 100.00 | 0.91 |
| Group 2 | 0.99 | 89.47 | 96.55 | 0.81 |
Complete Results of CNN-BLPred Independent Testing
| Class | Sensitivity | Specificity | Accuracy | F1 Score | MCC |
|---|---|---|---|---|---|
| Level 1 | 97.60 | 68.18 | 94.18 | 0.97 | 0.70 |
| Class A | 76.92 | 98.68 | 96.95 | 0.80 | 0.78 |
| Class B/Group 3 | 100.00 | 98.48 | 99.39 | 0.99 | 0.99 |
| Class C/Group 1 | 86.49 | 99.21 | 96.34 | 0.91 | 0.89 |
| Class D | 83.33 | 100.00 | 99.39 | 0.91 | 0.91 |
| Group 2 | 77.27 | 98.59 | 95.73 | 0.83 | 0.81 |
Comparative Results using Benchmark Dataset 1 for RF, RNN, CNN-ext. and CNN. RF refers to Random Forest. RNN refers to Recurrent Neural Network. CNN-ext. refers to extended CNN where we use our original architecture with another convolutional layer and max pooling layer adding after the original max pooling layer. CNN refers to the Convolutional Neural Network described in the paper
| # | Class/Group | Training | Independent Test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| RF | RNN | CNN-ext | CNN | RF | RNN | CNN-ext | CNN | ||
| 1 | Level 1 | 0.97 | 0.43 | 0.95 | 0.96 | 0.95 | 0.70 | 0.69 | 0.70 |
| 2 | Class A | 0.97 | 0.16 | 0.97 | 0.98 | 0.75 | 0.70 | 0.78 | 0.78 |
| 3 | Class B/Group 3 | 0.94 | −0.04 | 0.96 | 0.96 | 0.94 | 0.34 | 1.00 | 0.99 |
| 4 | Class C/Group 1 | 0.92 | 0.20 | 0.96 | 0.97 | 0.90 | 0.54 | 0.89 | 0.89 |
| 5 | Class D | 1.00 | 0.66 | 0.99 | 1.00 | 0.44 | 0.06 | 1.00 | 0.91 |
| 6 | Group 2 | 0.96 | 0.42 | 0.97 | 0.97 | 0.75 | 0.34 | 0.81 | 0.81 |
Fig. 7Comparison of PredLactamase vs. CNN-BLPred (Independent Test) using MCC on an independent test set. MCC score was higher using CNN-BLPred than PredLactamase. CNN-BLPred is testing using Independent Dataset 1 (Additional file 1). CNN-BLPred* is testing using Independent Dataset 2 (Additional file 2)
Comparative Results using Independent Dataset 1 for PredLactamase and CNN-BLPred
| Class | PredLactamase | CNN-BLPred* | ||||
|---|---|---|---|---|---|---|
| Correct | Incorrect | ACC | Correct | Incorrect | ACC | |
| A | 15 | 5 | 75.00 | 18 | 2 | 90.00 |
| B | 15 | 5 | 75.00 | 19 | 1 | 95.00 |
| C | 15 | 5 | 75.00 | 18 | 2 | 90.00 |
| D | 15 | 5 | 75.00 | 19 | 1 | 95.00 |
| Overall | 75.00 | 92.50 | ||||
*CNN-BLPred is testing using Independent Dataset 2