Literature DB >> 34849572

iProbiotics: a machine learning platform for rapid identification of probiotic properties from whole-genome primary sequences.

Yu Sun1, Haicheng Li1, Lei Zheng1, Jinzhao Li1, Yan Hong1, Pengfei Liang1, Lai-Yu Kwok2, Yongchun Zuo1, Wenyi Zhang2, Heping Zhang2.   

Abstract

Lactic acid bacteria consortia are commonly present in food, and some of these bacteria possess probiotic properties. However, discovery and experimental validation of probiotics require extensive time and effort. Therefore, it is of great interest to develop effective screening methods for identifying probiotics. Advances in sequencing technology have generated massive genomic data, enabling us to create a machine learning-based platform for such purpose in this work. This study first selected a comprehensive probiotics genome dataset from the probiotic database (PROBIO) and literature surveys. Then, k-mer (from 2 to 8) compositional analysis was performed, revealing diverse oligonucleotide composition in strain genomes and apparently more probiotic (P-) features in probiotic genomes than non-probiotic genomes. To reduce noise and improve computational efficiency, 87 376 k-mers were refined by an incremental feature selection (IFS) method, and the model achieved the maximum accuracy level at 184 core features, with a high prediction accuracy (97.77%) and area under the curve (98.00%). Functional genomic analysis using annotations from gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Rapid Annotation using Subsystem Technology (RAST) databases, as well as analysis of genes associated with host gastrointestinal survival/settlement, carbohydrate utilization, drug resistance and virulence factors, revealed that the distribution of P-features was biased toward genes/pathways related to probiotic function. Our results suggest that the role of probiotics is not determined by a single gene, but by a combination of k-mer genomic components, providing new insights into the identification and underlying mechanisms of probiotics. This work created a novel and free online bioinformatic tool, iProbiotics, which would facilitate rapid screening for probiotics.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  feature selection; k-mer composition; machine learning; prediction; probiotic

Mesh:

Year:  2022        PMID: 34849572     DOI: 10.1093/bib/bbab477

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  1 in total

Review 1.  Strategies for the Identification and Assessment of Bacterial Strains with Specific Probiotic Traits.

Authors:  Edgar Torres-Maravilla; Diana Reyes-Pavón; Antonio Benítez-Cabello; Raquel González-Vázquez; Luis M Ramírez-Chamorro; Philippe Langella; Luis G Bermúdez-Humarán
Journal:  Microorganisms       Date:  2022-07-10
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.