| Literature DB >> 31391777 |
Yongchun Zuo1,2, Yu Chang2, Shenghui Huang2, Lei Zheng2, Lei Yang3, Guifang Cao1.
Abstract
Defensins as 1 of major classes of host defense peptides play a significant role in the innate immunity, which are extremely evolved in almost all living organisms. Developing high-throughput computational methods can accurately help in designing drugs or medical means to defense against pathogens. To take up such a challenge, an up-to-date server based on rigorous benchmark dataset, referred to as iDEF-PseRAAC, was designed for predicting the defensin family in this study. By extracting primary sequence compositions based on different types of reduced amino acid alphabet, it was calculated that the best overall accuracy of the selected feature subset was achieved to 92.38%. Therefore, we can conclude that the information provided by abundant types of amino acid reduction will provide efficient and rational methodology for defensin identification. And, a free online server is freely available for academic users at http://bioinfor.imu.edu.cn/idpf. We hold expectations that iDEF-PseRAAC may be a promising weapon for the function annotation about the defensins protein.Entities:
Keywords: Defensin prediction; reduced amino acid descriptor; sequence composition; web server
Year: 2019 PMID: 31391777 PMCID: PMC6669840 DOI: 10.1177/1176934319867088
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
The sequence profile used in this study.
| Subset | Family | Number |
|---|---|---|
|
| Insect defensins | 60 |
|
| Invertebrate defensins | 31 |
|
| Plant defensins | 42 |
|
| Unclassified defensins | 38 |
|
| Vertebrate defensins | 157 |
| Total | 328 |
Figure 1.(A) Binary precision density maps illustrate the distribution of different descriptors based on different N-peptide composition. (B) The predictive accuracy of defensin families based on 2-peptide composition using different types of reduced amino acid alphabet. A bluer box indicates a higher accuracy, while a lighter box has the opposite.
Figure 2.(A) Cluster size of reduced amino acid alphabet based on secondary-structure method. (B) The prediction results of different cluster sizes for alphabet type 5. (C) The comparison of different alphabet types with the same cluster number (C = 19).
Figure 3.The IFS curve shows feature extraction process using F-score of different features in dipeptide composition (T = 5, C = 19). An IFS peak of 92.38% was obtained when using the 329 optimal features. IFS indicates incremental feature selection.
Figure 4.Univariate density map of ACC for the defensins prediction based on RAAC. ACC indicates accuracy; RAAC, reduced amino acid cluster.
The comparison between our model with previous methods.
| Method | Family | Sn (%) | Sp (%) | MCC | OA (%) |
|---|---|---|---|---|---|
| iDPF-PseRAAAC | Insect | 90.00 | 97.07 | 0.86 | 85.59 |
| Invertebrate | 61.76 | 97.32 | 0.64 | ||
| Plant | 90.48 | 98.97 | 0.90 | ||
| Unclassified | 40.00 | 96.63 | 0.46 | ||
| Vertebrate | 99.36 | 88.64 | 0.88 | ||
| iDEF-PseRAAC | Insect | 96.67 | 98.13 | 0.93 | 91.16 |
| Invertebrate | 74.19 | 97.64 | 0.73 | ||
| Plant | 92.86 | 98.60 | 0.91 | ||
| Unclassified | 68.42 | 97.23 | 0.69 | ||
| Vertebrate | 97.45 | 97.08 | 0.95 |
Abbreviations: Sn, sensitivity; Sp, specificity; MCC, Matthews correlation coefficient; OA, overall accuracy.
The prediction accuracy matrix [M] of 4 defensin families based on dipeptide composition of type 5, cluster 19 (T = 5, C = 19).
|
| Insect | Invertebrate | Plant | Unclassified | Vertebrate | Total |
|---|---|---|---|---|---|---|
| Insect | 58 | 1 | 0 | 1 | 0 | 60 |
| Invertebrate | 4 | 23 | 2 | 2 | 0 | 31 |
| Plant | 0 | 2 | 39 | 1 | 0 | 42 |
| Unclassified | 1 | 4 | 2 | 26 | 5 | 38 |
| Vertebrate | 0 | 0 | 0 | 4 | 153 | 157 |
| Total | 328 |
Figure 5.The homepage of the iDEF-PseRAAC web server is shown by a screenshot.