| Literature DB >> 36081016 |
Sung-Yoon Ahn1, Mira Kim2, Ji-Eun Bae2, Iel-Soo Bang2, Sang-Woong Lee1.
Abstract
Several pathogens that spread through the air are highly contagious, and related infectious diseases are more easily transmitted through airborne transmission under indoor conditions, as observed during the COVID-19 pandemic. Indoor air contaminated by microorganisms, including viruses, bacteria, and fungi, or by derived pathogenic substances, can endanger human health. Thus, identifying and analyzing the potential pathogens residing in the air are crucial to preventing disease and maintaining indoor air quality. Here, we applied deep learning technology to analyze and predict the toxicity of bacteria in indoor air. We trained the ProtBert model on toxic bacterial and virulence factor proteins and applied them to predict the potential toxicity of some bacterial species by analyzing their protein sequences. The results reflect the results of the in vitro analysis of their toxicity in human cells. The in silico-based simulation and the obtained results demonstrated that it is plausible to find possible toxic sequences in unknown protein sequences.Entities:
Keywords: BERT; protein; toxin; virulence factors
Mesh:
Year: 2022 PMID: 36081016 PMCID: PMC9459819 DOI: 10.3390/s22176557
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Overview of the datasets used.
| Dataset | Purpose | Positive | Negative |
|---|---|---|---|
| ToxDL [ | Training set | 4413 | 5671 |
| Validation set | 59 | 670 | |
| BTXpred [ | Training set | 140 | 402 |
| Validation set | 43 | 92 | |
| Combined [ | Training set | 20,229 | 14,258 |
| Validation set | 5059 | 3563 | |
| UniprotKB | Test set | 216 | 1000 |
Figure 1Pipeline for unknown protein data collection.
Number of collected proteins sequences of each species.
| Bacteria Species | Number of Proteins |
|---|---|
| Staphylococcus aureus | 1497 |
| Miccroccus luteus | 236 |
| Staphylococcus epidermidis | 27 |
| Bacillus subtilis | 344 |
Figure 2In vitro evaluation of bacterial toxicity in human cells. Effects of four species of bacteria on the viability of MRC5 (A), HeLa (B), and YD38 (C) cells analyzed by the MTT assay 24 h after the bacterial samples were added to human cells pre-seeded in 96-well plates. Results are expressed as cell viability as a percentage of cells incubated without bacteria. Experiments were repeated twice, with each condition being assessed in triplicates. Data are shown as the mean ± SD. CFU; colony forming unit.
Test results on the toxic animal protein dataset.
| Method | F1-Score | MCC | auROC | auPRC |
|---|---|---|---|---|
| BLAST 1 [ | 0.800 | 0.801 | - | - |
| BLAST-score 1 [ | 0.789 | 0.775 | 0.868 | 0.818 |
| InterProScan 1 [ | 0.347 | 0.402 | - | - |
| Hmmsearch 1 [ | 0.185 | 0.307 | - | - |
| ClanTox 1 [ | 0.620 | 0.604 | 0.903 | 0.612 |
| ToxinPred-RF 1 [ | 0.667 | 0.638 | 0.948 | 0.716 |
| ToxinPred-SVM 1 [ | 0.677 | 0.648 | 0.939 | 0.712 |
| ToxDL 1 [ | 0.809 | 0.793 | 0.989 | 0.913 |
| ToxIBTL 2 [ | 0.830 | 0.816 | 0.953 | 0.847 |
| This study | 0.833 | 0.818 | 0.915 | 0.814 |
1 Results provided from the research in [13]. 2 Results provided from the research in [14].
Test results on the toxic bacteria protein dataset.
| Method | Accuracy | F1 | MCC | auROC | auPRC |
|---|---|---|---|---|---|
| BTXpred 1 [ | 96.07% | - | 0.9293 | - | - |
| TOXIFY [ | 83.35% | - | - | - | - |
| This study | 97.98% | 0.9579 | 0.9434 | 0.9671 | 0.9647 |
1 Results provided from the research in [19].
Test results on the combined VFDB and toxinpred2 dataset.
| Method | F1 | MCC | auROC | auPRC |
|---|---|---|---|---|
| This study | 0.9527 | 0.8845 | 0.9645 | 0.9408 |
Test data results.
| Dataset Used for Training | F1 | MCC | auPRC | auROC |
|---|---|---|---|---|
| BTXpred [ | 0.7790 | 0.7617 | 0.8401 | 0.8239 |
| Combined [ | 0.7036 | 0.6412 | 0.7388 | 0.8644 |
Figure 3(a) Confusion matrix on the test data using the model trained using BTXpred data; (b) confusion matrix on the test data using the model trained using the combined VFDB and toxinpred2 data.
Test data results on two models trained using BTXpred data and the combined data of VFDB and toxinpred2.
| Dataset Used for Training | Staphylococcus Aureus | Micrococcus | Staphylococcus Epidermidis | Bacillus Subtilis |
|---|---|---|---|---|
| BTXpred [ | 45 | 43 | 5 | 21 |
| Combined [ | 263 | 49 | 5 | 27 |