Jiawei Wang1, Bingjiao Yang2, Jerico Revote1, André Leier3, Tatiana T Marquez-Lago3, Geoffrey Webb4, Jiangning Song1,4,5, Kuo-Chen Chou6,7,8, Trevor Lithgow1. 1. Biomedicine Discovery Institute, Monash University, VIC 3800, Australia. 2. College of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China. 3. Informatics Institute and Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA. 4. Monash Centre for Data Science, Faculty of Information Technology. 5. ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, VIC 3800, Australia. 6. Gordon Life Science Institute, Boston, MA 02478, USA. 7. Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China. 8. Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia.
Abstract
SUMMARY: Evolutionary information in the form of a Position-Specific Scoring Matrix (PSSM) is a widely used and highly informative representation of protein sequences. Accordingly, PSSM-based feature descriptors have been successfully applied to improve the performance of various predictors of protein attributes. Even though a number of algorithms have been proposed in previous studies, there is currently no universal web server or toolkit available for generating this wide variety of descriptors. Here, we present POSSUM ( Po sition- S pecific S coring matrix-based feat u re generator for m achine learning), a versatile toolkit with an online web server that can generate 21 types of PSSM-based feature descriptors, thereby addressing a crucial need for bioinformaticians and computational biologists. We envisage that this comprehensive toolkit will be widely used as a powerful tool to facilitate feature extraction, selection, and benchmarking of machine learning-based models, thereby contributing to a more effective analysis and modeling pipeline for bioinformatics research. AVAILABILITY AND IMPLEMENTATION: http://possum.erc.monash.edu/ . CONTACT: trevor.lithgow@monash.edu or jiangning.song@monash.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Evolutionary information in the form of a Position-Specific Scoring Matrix (PSSM) is a widely used and highly informative representation of protein sequences. Accordingly, PSSM-based feature descriptors have been successfully applied to improve the performance of various predictors of protein attributes. Even though a number of algorithms have been proposed in previous studies, there is currently no universal web server or toolkit available for generating this wide variety of descriptors. Here, we present POSSUM ( Po sition- S pecific S coring matrix-based feat u re generator for m achine learning), a versatile toolkit with an online web server that can generate 21 types of PSSM-based feature descriptors, thereby addressing a crucial need for bioinformaticians and computational biologists. We envisage that this comprehensive toolkit will be widely used as a powerful tool to facilitate feature extraction, selection, and benchmarking of machine learning-based models, thereby contributing to a more effective analysis and modeling pipeline for bioinformatics research. AVAILABILITY AND IMPLEMENTATION: http://possum.erc.monash.edu/ . CONTACT: trevor.lithgow@monash.edu or jiangning.song@monash.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Surendra Chaurasiya; Wanfu Wu; Anders M Strom; Margaret Warner; Jan-Åke Gustafsson Journal: Proc Natl Acad Sci U S A Date: 2020-10-05 Impact factor: 11.205