| Literature DB >> 35135629 |
Lummy Maria Oliveira Monteiro1,2,3, João Pedro Saraiva1, Rodolfo Brizola Toscan1, Peter F Stadler2, Rafael Silva-Rocha3, Ulisses Nunes da Rocha4.
Abstract
BACKGROUND: Transcription factors (TFs) are proteins controlling the flow of genetic information by regulating cellular gene expression. A better understanding of TFs in a bacterial community context may open novel revenues for exploring gene regulation in ecosystems where bacteria play a key role. Here we describe PredicTF, a platform supporting the prediction and classification of novel bacterial TF in single species and complex microbial communities. PredicTF is based on a deep learning algorithm.Entities:
Keywords: Deep learning; Gene regulation; Microbial communities; Transcription factor database; Transcription factors
Year: 2022 PMID: 35135629 PMCID: PMC8822659 DOI: 10.1186/s40793-021-00394-x
Source DB: PubMed Journal: Environ Microbiome ISSN: 2524-6372
Fig. 1PredicTF workflow and testing. We collected publicly available data on TFs from two different databases: CollecTF and UniProtKB. After removing redundancies and filtering TFs well characterized, this data (BacTFDB) was used to train a deep learning model to predict new TFs and their families. Five model organisms (Escherichia coli, Bacillus subtillis, Pseudomonas fluorescens, Azotobacter vinelandii and Caulobacter crescentus) were used to test the accuracy of PredicTF. Later, we used the same approach to predict TFs from an isolate (P. aeruginosa) and mapped TFs predicted in transcriptomics data (P. aeruginosa and mutants in two experimental conditions). Finally, we used our tool to predict TF in complex communities (metagenome) and mapped these TFs in their respective meta-transcriptomes
Fig. 2Database composition: Transcription Factor Database (BacTFDB) distribution. A Database distribution based on the TFs and B regulatory elements families and organisms species. These graphics show only families with up to 50 sequences and only organisms that contributed with more than 50 sequences
PredicTF performance, accuracy for experimentally validated Transcription Factors (Accuracy EV), and accuracy for putative Transcription Factors (Accuracy PU) in genomes of model organisms. We removed the sequences from the genus and/or family of the different model organisms from our TF database (BacTFDB) before model training to reduce the chances of false positives (i.e., the presence of identical sequences in the training dataset)
| Organism | Performancea (%) | Accuracyb EVb (%) | Modelc | Accuracy PUd |
|---|---|---|---|---|
| 33.33 | 94.44 | PredicTF-no- | 85.71 | |
| 31.27 | 95.12 | PredicTF-no- | 85.71 | |
| 24.26 | 87.76 | PredicTF-no- | 100 | |
| 34.36 | 96.00 | PredicTF-no- | -e | |
| 46.44 | 98.69 | PredicTF-no- | - | |
| 62.28 | 98.43 | PredicTF-no- | - |
aPerformance was calculated by the ratio of the total number of TFs predicted by PredicTF (Predicted TFs) to the total number of proteins annotated as TFs in NCBI (Annotated TFs) multiplied by 100
bAccuracy EV was determined by the ratio of the total number of TFs predicted by PredicTF in agreement with NCBI annotation (TFs predicted correctly) to the total number of TFs predicted by PredicTF (TFs predicted) multiplied by 100
cPrecicTF Model used for the prediction of TF to the specific organism
dAccuracy TU was determined by the total number of putative TFs predicted correctly divided by putative TFs predicted multiplied by 100; Putative TFs predicted correctly is the total number of putative TFs predicted correctly by PredicTF in agreement with NCBI annotation; and, Putative TFs predicted is the total number of putative TFs predicted by PredicTF
eCurrently there are no putative annotated TFs described in the genome of C. crescentus, P. fluorescens and A.vinelandii
Fig. 3Prediction of TFs by PredicTF for genomes of model organisms. Prediction of TFs or 5 model organisms sorted by family. A Escherichia coli B Bacillus subtillis C Caulobacter crescentus D Pseudomonas fluorescens E Azotobacter vinelandii
Fig. 4Recovery of novel Transcription Factors in one metagenome and eleven metatranscriptomes. a PredicTF predicted 792 TFs were predicted in one anaerobic ammonium oxidizing microbial communities from anammox membrane bioreactor (LAC_MetaG_1) and were grouped by family. b Using 792 TFs predicted in one metagenome, we mapped these TFs for 11 metatranscriptomes of reference from the same bioreactor where the metagenome was recovered