| Literature DB >> 35949489 |
Phasit Charoenkwan1, Nalini Schaduangrat2, S M Hasan Mahmud2,3, Orawit Thinnukool1, Watshara Shoombuatong2.
Abstract
Nearly all living species comprise of host defense peptides called defensins, that are crucial for innate immunity. These peptides work by activating the immune system which kills the microbes directly or indirectly, thus providing protection to the host. Thus far, numerous preclinical and clinical trials for peptide-based drugs are currently being evaluated. Although, experimental methods can help to precisely identify the defensin peptide family and subfamily, these approaches are often time-consuming and cost-ineffective. On the other hand, machine learning (ML) methods are able to effectively employ protein sequence information without the knowledge of a protein's three-dimensional structure, thus highlighting their predictive ability for the large-scale identification. To date, several ML methods have been developed for the in silico identification of the defensin peptide family and subfamily. Therefore, summarizing the advantages and disadvantages of the existing methods is urgently needed in order to provide useful suggestions for the development and improvement of new computational models for the identification of the defensin peptide family and subfamily. With this goal in mind, we first provide a comprehensive survey on a collection of six state-of-the-art computational approaches for predicting the defensin peptide family and subfamily. Herein, we cover different important aspects, including the dataset quality, feature encoding methods, feature selection schemes, ML algorithms, cross-validation methods and web server availability/usability. Moreover, we provide our thoughts on the limitations of existing methods and future perspectives for improving the prediction performance and model interpretability. The insights and suggestions gained from this review are anticipated to serve as a valuable guidance for researchers for the development of more robust and useful predictors.Entities:
Keywords: bioinformatics; classification; defensins; feature selection; machine learning; sequence analysis
Year: 2022 PMID: 35949489 PMCID: PMC9360473 DOI: 10.17179/excli2022-4913
Source DB: PubMed Journal: EXCLI J ISSN: 1611-2156 Impact factor: 4.022
Figure 1The general machine learning framework of the prediction of defensins and their family/subfamily
Table 1A list of currently available machine learning-based methods for the predictions of defensins and their family/subfamily
Table 2The detailed information of the existing datasets used for analyzing in this review
Table 3Performance comparison of Karnik's method and DEFPRED for the prediction of defensins
Table 4Performance comparison of ID_RAAA, iDPF-PseRAAAC and iDEF-PseRAAC for the prediction of defensins family
Table 5Performance comparison of ID_RAAA and iDPF-PseRAAAC for the prediction of vertebrate defensins subfamily.
Figure 2Boxplots of average amino acid compositions of 20 amino acids of Defensins vs AMPs (A) and Defensins vs non-Defensins (B). X- and Y-axes represent 20 amino acids along with their p-value and average amino acid composition.