Jiawei Wang1, Bingjiao Yang2, André Leier3, Tatiana T Marquez-Lago3, Morihiro Hayashida4, Andrea Rocker1, Yanju Zhang2, Tatsuya Akutsu5, Kuo-Chen Chou6,7,8, Richard A Strugnell9, Jiangning Song10,11,12, Trevor Lithgow1. 1. Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC, Australia. 2. Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China. 3. Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA. 4. National Institute of Technology, Matsue College, Matsue, Shimane, Japan. 5. Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan. 6. Gordon Life Science Institute, Boston, MA, USA. 7. Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China. 8. Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia. 9. Department of Microbiology and Immunology and Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Parkville, VIC, Australia. 10. Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology. 11. Monash Centre for Data Science, Faculty of Information Technolog, Monash University, Clayton, VIC, Australia. 12. ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, Clayton, VIC, Australia.
Abstract
Motivation: Many Gram-negative bacteria use type VI secretion systems (T6SS) to export effector proteins into adjacent target cells. These secreted effectors (T6SEs) play vital roles in the competitive survival in bacterial populations, as well as pathogenesis of bacteria. Although various computational analyses have been previously applied to identify effectors secreted by certain bacterial species, there is no universal method available to accurately predict T6SS effector proteins from the growing tide of bacterial genome sequence data. Results: We extracted a wide range of features from T6SE protein sequences and comprehensively analyzed the prediction performance of these features through unsupervised and supervised learning. By integrating these features, we subsequently developed a two-layer SVM-based ensemble model with fine-grain optimized parameters, to identify potential T6SEs. We further validated the predictive model using an independent dataset, which showed that the proposed model achieved an impressive performance in terms of ACC (0.943), F-value (0.946), MCC (0.892) and AUC (0.976). To demonstrate applicability, we employed this method to correctly identify two very recently validated T6SE proteins, which represent challenging prediction targets because they significantly differed from previously known T6SEs in terms of their sequence similarity and cellular function. Furthermore, a genome-wide prediction across 12 bacterial species, involving in total 54 212 protein sequences, was carried out to distinguish 94 putative T6SE candidates. We envisage both this information and our publicly accessible web server will facilitate future discoveries of novel T6SEs. Availability and implementation: http://bastion6.erc.monash.edu/. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Many Gram-negative bacteria use type VI secretion systems (T6SS) to export effector proteins into adjacent target cells. These secreted effectors (T6SEs) play vital roles in the competitive survival in bacterial populations, as well as pathogenesis of bacteria. Although various computational analyses have been previously applied to identify effectors secreted by certain bacterial species, there is no universal method available to accurately predict T6SS effector proteins from the growing tide of bacterial genome sequence data. Results: We extracted a wide range of features from T6SE protein sequences and comprehensively analyzed the prediction performance of these features through unsupervised and supervised learning. By integrating these features, we subsequently developed a two-layer SVM-based ensemble model with fine-grain optimized parameters, to identify potential T6SEs. We further validated the predictive model using an independent dataset, which showed that the proposed model achieved an impressive performance in terms of ACC (0.943), F-value (0.946), MCC (0.892) and AUC (0.976). To demonstrate applicability, we employed this method to correctly identify two very recently validated T6SE proteins, which represent challenging prediction targets because they significantly differed from previously known T6SEs in terms of their sequence similarity and cellular function. Furthermore, a genome-wide prediction across 12 bacterial species, involving in total 54 212 protein sequences, was carried out to distinguish 94 putative T6SE candidates. We envisage both this information and our publicly accessible web server will facilitate future discoveries of novel T6SEs. Availability and implementation: http://bastion6.erc.monash.edu/. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Devanand D Bondage; Jer-Sheng Lin; Lay-Sun Ma; Chih-Horng Kuo; Erh-Min Lai Journal: Proc Natl Acad Sci U S A Date: 2016-06-16 Impact factor: 11.205
Authors: Joseph D Mougous; Marianne E Cuff; Stefan Raunser; Aimee Shen; Min Zhou; Casey A Gifford; Andrew L Goodman; Grazyna Joachimiak; Claudia L Ordoñez; Stephen Lory; Thomas Walz; Andrzej Joachimiak; John J Mekalanos Journal: Science Date: 2006-06-09 Impact factor: 47.728
Authors: Sarah L Murdoch; Katharina Trunk; Grant English; Maximilian J Fritsch; Ehsan Pourkarimi; Sarah J Coulthurst Journal: J Bacteriol Date: 2011-09-02 Impact factor: 3.490
Authors: Sanna Koskiniemi; James G Lamoureux; Kiel C Nikolakakis; Claire t'Kint de Roodenbeke; Michael D Kaplan; David A Low; Christopher S Hayes Journal: Proc Natl Acad Sci U S A Date: 2013-04-09 Impact factor: 11.205