Kun Ming Kenneth Lim1, Chenhao Li2, Kern Rei Chng3, Niranjan Nagarajan2. 1. Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore Computational Biology Program, Faculty of Science. 2. Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore Department of Computer Science, National University of Singapore, Singapore, Singapore. 3. Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore.
Abstract
MOTIVATION: Microbial consortia are frequently defined by numerous interactions within the community that are key to understanding their function. While microbial interactions have been extensively studied experimentally, information regarding them is dispersed in the scientific literature. As manual collation is an infeasible option, automated data processing tools are needed to make this information easily accessible. RESULTS: We present @MInter, an automated information extraction system based on Support Vector Machines to analyze paper abstracts and infer microbial interactions. @MInter was trained and tested on a manually curated gold standard dataset of 735 species interactions and 3917 annotated abstracts, constructed as part of this study. Cross-validation analysis showed that @MInter was able to detect abstracts pertaining to one or more microbial interactions with high specificity (specificity = 95%, AUC = 0.97). Despite challenges in identifying specific microbial interactions in an abstract (interaction level recall = 95%, precision = 25%), @MInter was shown to reduce annotator workload 13-fold compared to alternate approaches. Applying @MInter to 175 bacterial species abundant on human skin, we identified a network of 357 literature-reported microbial interactions, demonstrating its utility for the study of microbial communities. AVAILABILITY AND IMPLEMENTATION: @MInter is freely available at https://github.com/CSB5/atminter CONTACT: nagarajann@gis.a-star.edu.sg SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Microbial consortia are frequently defined by numerous interactions within the community that are key to understanding their function. While microbial interactions have been extensively studied experimentally, information regarding them is dispersed in the scientific literature. As manual collation is an infeasible option, automated data processing tools are needed to make this information easily accessible. RESULTS: We present @MInter, an automated information extraction system based on Support Vector Machines to analyze paper abstracts and infer microbial interactions. @MInter was trained and tested on a manually curated gold standard dataset of 735 species interactions and 3917 annotated abstracts, constructed as part of this study. Cross-validation analysis showed that @MInter was able to detect abstracts pertaining to one or more microbial interactions with high specificity (specificity = 95%, AUC = 0.97). Despite challenges in identifying specific microbial interactions in an abstract (interaction level recall = 95%, precision = 25%), @MInter was shown to reduce annotator workload 13-fold compared to alternate approaches. Applying @MInter to 175 bacterial species abundant on human skin, we identified a network of 357 literature-reported microbial interactions, demonstrating its utility for the study of microbial communities. AVAILABILITY AND IMPLEMENTATION: @MInter is freely available at https://github.com/CSB5/atminter CONTACT: nagarajann@gis.a-star.edu.sg SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.