| Literature DB >> 28340571 |
Prabina Kumar Meher1, Tanmaya Kumar Sahu2, Anjali Banchariya2,3, Atmakuri Ramakrishna Rao4.
Abstract
BACKGROUND: Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides.Entities:
Keywords: Cytochrome P450; Di-peptide composition; GABA; Insecticide resistance; SVM
Mesh:
Substances:
Year: 2017 PMID: 28340571 PMCID: PMC5364559 DOI: 10.1186/s12859-017-1587-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Composition of amino acids in all the four categories of insecticide resistant proteins. It is observed that proportions of leucine are higher, whereas proportions of cystene and tryptophan are lower in all the four categories
Fig. 2a ROC curves of SVM for different kernels and features, b bar plots of corresponding AUC-ROC values. It is seen that the AUC-ROC values are higher for RBF kernel as compared to other kernels
Estimates of different performance metrics for SVM with RBF kernel in discriminating resistant from non-resistant proteins, under all the feature sets as well as different percentage of sequence identity in the positive dataset
| Performance metrics | |||||||
|---|---|---|---|---|---|---|---|
| Id(%) | Feature | Sn | Sp | Ac | Pre | MCC | AUC-ROC |
| 40 | AAC | 0.836 ± 0.018 | 0.952 ± 0.014 | 0.894 ± 0.012 | 0.946 ± 0.015 | 0.794 ± 0.024 | 0.924 ± 0.020 |
| DPC | 0.849 ± 0.013 | 0.983 ± 0.011 | 0.916 ± 0.009 | 0.980 ± 0.012 | 0.839 ± 0.017 | 0.948 ± 0.011 | |
| PAAC | 0.836 ± 0.018 | 0.956 ± 0.014 | 0.896 ± 0.013 | 0.951 ± 0.015 | 0.798 ± 0.026 | 0.922 ± 0.018 | |
| CTD | 0.841 ± 0.015 | 0.981 ± 0.011 | 0.911 ± 0.010 | 0.978 ± 0.013 | 0.831 ± 0.020 | 0.932 ± 0.010 | |
| ACF | 0.836 ± 0.017 | 0.9530.016 | 0.895 ± 0.012 | 0.947 ± 0.017 | 0.795 ± 0.025 | 0.901 ± 0.017 | |
| 60 | AAC | 0.870 ± 0.012 | 0.959 ± 0.008 | 0.914 ± 0.008 | 0.955 ± 0.009 | 0.832 ± 0.016 | 0.946 ± 0.008 |
| DPC | 0.875 ± 0.008 | 0.986 ± 0.007 | 0.931 ± 0.006 | 0.984 ± 0.007 | 0.866 ± 0.011 | 0.972 ± 0.005 | |
| PAAC | 0.870 ± 0.014 | 0.960 ± 0.010 | 0.915 ± 0.010 | 0.956 ± 0.011 | 0.833 ± 0.020 | 0.947 ± 0.010 | |
| CTD | 0.860 ± 0.011 | 0.985 ± 0.007 | 0.923 ± 0.007 | 0.983 ± 0.008 | 0.852 ± 0.014 | 0.959 ± 0.006 | |
| ACF | 0.869 ± 0.011 | 0.964 ± 0.009 | 0.917 ± 0.007 | 0.960 ± 0.009 | 0.837 ± 0.015 | 0.932 ± 0.009 | |
| 70 | AAC | 0.886 ± 0.011 | 0.961 ± 0.008 | 0.924 ± 0.008 | 0.958 ± 0.008 | 0.850 ± 0.015 | 0.953 ± 0.008 |
| DPC | 0.883 ± 0.008 | 0.987 ± 0.005 | 0.935 ± 0.005 | 0.986 ± 0.005 | 0.875 ± 0.009 | 0.973 ± 0.004 | |
| PAAC | 0.891 ± 0.010 | 0.961 ± 0.008 | 0.926 ± 0.007 | 0.958 ± 0.008 | 0.854 ± 0.013 | 0.955 ± 0.007 | |
| CTD | 0.866 ± 0.010 | 0.987 ± 0.005 | 0.926 ± 0.006 | 0.985 ± 0.006 | 0.859 ± 0.012 | 0.961 ± 0.006 | |
| ACF | 0.888 ± 0.008 | 0.963 ± 0.009 | 0.925 ± 0.006 | 0.960 ± 0.009 | 0.853 ± 0.013 | 0.948 ± 0.007 | |
| 90 | AAC | 0.886 ± 0.010 | 0.959 ± 0.006 | 0.923 ± 0.006 | 0.956 ± 0.006 | 0.847 ± 0.012 | 0.955 ± 0.006 |
| DPC | 0.899 ± 0.009 | 0.989 ± 0.005 | 0.944 ± 0.006 | 0.988 ± 0.005 | 0.892 ± 0.011 | 0.978 ± 0.004 | |
| PAAC | 0.889 ± 0.011 | 0.959 ± 0.007 | 0.924 ± 0.007 | 0.956 ± 0.007 | 0.850 ± 0.014 | 0.956 ± 0.006 | |
| CTD | 0.887 ± 0.008 | 0.987 ± 0.005 | 0.937 ± 0.005 | 0.985 ± 0.006 | 0.878 ± 0.010 | 0.972 ± 0.005 | |
| ACF | 0.894 ± 0.010 | 0.967 ± 0.006 | 0.930 ± 0.006 | 0.964 ± 0.006 | 0.863 ± 0.013 | 0.949 ± 0.006 | |
Id(%): maximum percentage of pair-wise sequence identity present in the positive dataset
Sn Sensitivity, Sp Specificity, Ac Accuracy, Pre Precision, MCC Matthew’s correlation coefficient, AUC-ROC area under ROC curves
Fig. 3Performance metrics of SVM with RBF kernel for different feature sets and different percentage of pair-wise sequence identity in the positive set. It can be seen that the performance metrics are higher for DPC feature set as compared to other feature sets, irrespective of the percentage of sequence identity in the positive dataset
Estimates of performance metrics for classification of detoxification and target-based resistant proteins, under different feature sets
| Feature | Sn | Sp | Ac | Pre | MCC | AUC-ROC |
|---|---|---|---|---|---|---|
| AAC | 0.927 ± 0.020 | 0.966 ± 0.042 | 0.946 ± 0.024 | 0.966 ± 0.041 | 0.894 ± 0.049 | 0.960 ± 0.023 |
| DPC | 0.967 ± 0.067 | 0.985 ± 0.031 | 0.976 ± 0.035 | 0.986 ± 0.029 | 0.955 ± 0.065 | 0.972 ± 0.051 |
| PAAC | 0.929 ± 0.016 | 0.952 ± 0.048 | 0.941 ± 0.027 | 0.953 ± 0.046 | 0.883 ± 0.054 | 0.956 ± 0.028 |
| CTD | 0.895 ± 0.042 | 0.979 ± 0.035 | 0.937 ± 0.024 | 0.979 ± 0.035 | 0.879 ± 0.047 | 0.935 ± 0.036 |
| ACF | 0.912 ± 0.041 | 0.927 ± 0.051 | 0.919 ± 0.037 | 0.927 ± 0.049 | 0.840 ± 0.074 | 0.967 ± 0.021 |
Sn Sensitivity, Sp Specificity, Ac Accuracy, Pre Precision, MCC Matthew’s correlation coefficient, AUC-ROC area under ROC curves
Estimates of performance metrics for discriminating target-based resistant proteins from non-resistant proteins, under different features
| Feature | Sn | Sp | Ac | Pre | MCC | AUC-ROC |
|---|---|---|---|---|---|---|
| AAC | 0.912 ± 0.031 | 0.940 ± 0.055 | 0.926 ± 0.034 | 0.941 ± 0.052 | 0.854 ± 0.068 | 0.879 ± 0.045 |
| DPC | 0.924 ± 0.090 | 0.981 ± 0.041 | 0.952 ± 0.057 | 0.979 ± 0.043 | 0.909 ± 0.111 | 0.924 ± 0.083 |
| PAAC | 0.919 ± 0.029 | 0.947 ± 0.053 | 0.933 ± 0.034 | 0.948 ± 0.051 | 0.868 ± 0.067 | 0.880 ± 0.043 |
| CTD | 0.855 ± 0.037 | 0.945 ± 0.047 | 0.900 ± 0.034 | 0.941 ± 0.049 | 0.804 ± 0.069 | 0.844 ± 0.028 |
| ACF | 0.915 ± 0.037 | 0.927 ± 0.054 | 0.921 ± 0.037 | 0.928 ± 0.051 | 0.844 ± 0.074 | 0.846 ± 0.043 |
Sn Sensitivity, Sp Specificity, Ac Accuracy, Pre Precision, MCC Matthew’s correlation coefficient, AUC-ROC area under ROC curves
Estimates of different performance metrics for discriminating detoxification-based resistant proteins from non-resistant proteins
| Feature | Sn | Sp | Ac | Pre | MCC | AUC-ROC |
|---|---|---|---|---|---|---|
| AAC | 0.898 ± 0.009 | 0.963 ± 0.006 | 0.931 ± 0.006 | 0.960 ± 0.007 | 0.863 ± 0.013 | 0.960 ± 0.007 |
| DPC | 0.911 ± 0.006 | 0.992 ± 0.004 | 0.951 ± 0.004 | 0.991 ± 0.004 | 0.905 ± 0.008 | 0.980 ± 0.004 |
| PAAC | 0.901 ± 0.008 | 0.965 ± 0.006 | 0.933 ± 0.006 | 0.962 ± 0.007 | 0.867 ± 0.012 | 0.960 ± 0.006 |
| CTD | 0.907 ± 0.007 | 0.990 ± 0.004 | 0.948 ± 0.005 | 0.989 ± 0.004 | 0.900 ± 0.009 | 0.974 ± 0.004 |
| ACF | 0.912 ± 0.007 | 0.969 ± 0.006 | 0.941 ± 0.005 | 0.968 ± 0.006 | 0.883 ± 0.010 | 0.959 ± 0.005 |
Sn Sensitivity, Sp Specificity, Ac Accuracy, Pre Precision, MCC Matthew’s correlation coefficient, AUC-ROC area under ROC curves
Performance metrics for the proposed approach, Blast, PSI-Blast and Delta-Blast, in discriminating the resistant proteins from non-resistant proteins, where the positive dataset consists of <40% (first) and <90% (second) pair-wise sequence identity
| Dataset | Method | Sn | Sp | Ac | Pre | MCC |
|---|---|---|---|---|---|---|
| First | Proposed | 0.897 | 0.934 | 0.916 | 0.933 | 0.836 |
| Blast | 0.961 | 0.611 | 0.786 | 0.713 | 0.617 | |
| PSI-Blast | 0.959 | 0.602 | 0.780 | 0.707 | 0.607 | |
| Delta-Blast | 0.961 | 0.652 | 0.806 | 0.735 | 0.647 | |
| Second | Proposed | 0.875 | 0.891 | 0.883 | 0.901 | 0.784 |
| Blast | 0.958 | 0.350 | 0.654 | 0.596 | 0.392 | |
| PSI-Blast | 0.958 | 0.358 | 0.658 | 0.601 | 0.400 | |
| Delta-Blast | 0.958 | 0.466 | 0.712 | 0.646 | 0.495 |
Here, AUC-ROC values were not computed, as in Blast algorithms accuracies are computed based on number of hits
Sn Sensitivity, Sp Specificity, Ac Accuracy, Pre Precision, MCC Matthew’s correlation coefficient
Performance of the proposed approach based on an independent dataset of 75 insecticide resistant proteins
| Predicted | |||
|---|---|---|---|
| Resistance family | Observed | 1st training model | 2nd training model |
| Cytochrome P450 | 53 | 51 | 53 |
| Kdr | 2 | 2 | 2 |
| Rdl | 3 | 3 | 3 |
| AChE | 17 | 13 | 17 |
| Total | 75 | 69 | 75 |
Fig. 4Heat map of the probabilities with which 75 test sequences are predicted in two different training datasets. All the 75 sequences are correctly predicted as resistant proteins in the second training dataset, whereas 69 are correctly predicted with the first training dataset. It is further seen that most of the test sequences are correctly predicted with high probabilities (>0.9)
Fig. 5a Server page of DIRProt, b result page after execution with an example dataset. The result page is displayed in a tabular form, where the last column is the probabilities with which the each sequences are predicted as insecticide-resistant proteins