| Literature DB >> 33923273 |
Haitao Han1, Chenchen Ding1, Xin Cheng1, Xiuzhi Sang1, Taigang Liu1.
Abstract
Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells. These substrate proteins, which are called type IV secreted effectors (T4SE), manipulate host cell processes during infection, often resulting in severe diseases or even death of the host. Therefore, identification of putative T4SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions. PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks. In the present study, an accurate computational predictor termed iT4SE-EP was developed for identifying T4SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles. First, four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles. Then, the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information. Finally, the optimal features were input into a support vector machine classifier to carry out the prediction of T4SEs. Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.Entities:
Keywords: position-specific frequency matrix; position-specific scoring matrix; random forest; support vector machine; type IV secreted effectors
Mesh:
Substances:
Year: 2021 PMID: 33923273 DOI: 10.3390/molecules26092487
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411