| Literature DB >> 31348610 |
Chaolu Meng1,2, Leyi Wei1, Quan Zou1,3,4.
Abstract
Secretory proteins of Mycobacterium tuberculosis have created more concern, given their dominant immunogenicity and role in pathogenesis. In view of expensive and time-consuming traditional biochemical experiments, an advanced support vector machine model named SecProMTB is constructed in this study and the proteins are identified by a bioinformatic approach. First, an improved pseudo-amino acid composition (PseAAC) algorithm is used to extract features from all entities. Second, a novel imbalanced-data strategy is proposed and adopted to divide the original data set into train set and test set. Third, to overcome the overfitting problem, feature-ranking algorithms are applied with an increment feature selection. Finally, the model is trained and optimized. Consequently, a model is obtained with an area under the curve of 0.862 and average accuracy of 86% in the independent test. For the convenience of users, SecProMTB and related data are openly accessible at http://server.malab.cn/SecProMTB/index.jsp.Entities:
Keywords: imbalanced-data strategy; improved PseAAC; secretory proteins of Mycobacterium tuberculosis; support vector machine
Mesh:
Substances:
Year: 2019 PMID: 31348610 DOI: 10.1002/pmic.201900007
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984