| Literature DB >> 33193260 |
Lihong Peng1, Ling Shen1, Longjie Liao1, Guangyi Liu1, Liqian Zhou1.
Abstract
Microbes with abnormal levels have important impacts on the formation and development of various complex diseases. Identifying possible Microbe-Disease Associations (MDAs) helps to understand the mechanisms of complex diseases. However, experimental methods for MDA identification are costly and time-consuming. In this study, a new computational model, RNMFMDA, was developed to find possible MDAs. RNMFMDA contains two main processes. First, Reliable Negative MDA samples were selected based on Positive-Unlabeled (PU) learning and random walk with restart on the heterogeneous microbe-disease network. Second, Logistic Matrix Factorization with Neighborhood Regularization (LMFNR) was developed to compute the association probabilities for all microbe-disease pairs. To evaluate the performance of the proposed RNMFMDA method, we compared RNMFMDA with five state-of-the-art MDA prediction methods based on five-fold cross-validations on microbes, diseases, and MDAs. As a result, RNMFMDA obtained the best AUCs of 0.6332, 0.8669, and 0.9081, respectively for the three five-fold cross validations, significantly outperforming other models. The promising prediction performance may be attributed to the following three features: highly quality negative MDA sample selection, LMFNR-based MDA prediction model, and various biological information integration. In addition, a few predicted microbe-disease pairs with high association scores are worthy of further experimental validation.Entities:
Keywords: logistic matrix factorization with neighborhood regularization; microbe-disease associations; positive-unlabeled learning; random walk with restart; reliable negative samples
Year: 2020 PMID: 33193260 PMCID: PMC7652725 DOI: 10.3389/fmicb.2020.592430
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Confusion matrix of a binary classifier.
| Predicted class = 1 | True positive (TP) | False positive (FP) |
| Predicted class = −1 | False negative (FN) | True negative (TN) |
Performance comparison of RNMFMDA with other five methods under CV1.
| KATZHMDA | 0.2772 | 0.6690 | 0.6653 | 0.3646 |
| LRLSHMDA | 0.3286 | 0.4364 | ||
| NGRHMDA | 0.0777 | 0.3423 | 0.4817 | 0.4156 |
| MDLPHMDA | 0.3273 | 0.6890 | 0.6855 | 0.4022 |
| NTSHMDA | 0.1899 | 0.6177 | 0.6138 | 0.3042 |
| RNMFMDA | 0.6278 | 0.6274 |
These bolded values represent the best values for the different methods under the same evaluation.
Performance comparison of RNMFMDA with other five methods under CV3.
| KATZHMDA | 0.6503 | 0.6518 | 0.8571 | |
| LRLSHMDA | 0.7971 | 0.7412 | 0.7416 | 0.8794 |
| NGRHMDA | 0.4207 | 0.3308 | 0.7796 | 0.9025 |
| MDLPHMDA | 0.8268 | 0.6729 | 0.6741 | 0.8938 |
| NTSHMDA | 0.8545 | 0.5904 | 0.5926 | 0.8896 |
| RNMFMDA | 0.5810 |
These bolded values represent the best values for the different methods under the same evaluation.
Figure 1Performance comparison of RNMFMDA with other five methods under CV1.
Figure 3Performance comparison of RNMFMDA with other five methods under CV3.
Performance comparison considering the number of negative sample CV1.
| 0.0 | 0.4941 | 0.5696 | 0.5696 | 0.5262 |
| 0.1 | 0.5933 | 0.5931 | 0.5920 | |
| 0.2 | 0.4943 | 0.6131 | 0.6128 | 0.6123 |
| 0.5 | 0.4941 | 0.6279 | ||
| 1.0 | 0.4938 | |||
| 2.0 | 0.4938 | 0.6226 | 0.6223 | 0.6226 |
| 3.0 | 0.4931 | 0.6218 | 0.6216 | 0.6115 |
| 4.0 | 0.4926 | 0.6067 | 0.6066 | 0.6057 |
| 5.0 | 0.4923 | 0.5674 | 0.5676 | 0.5650 |
These bolded values represent the best values for the different methods under the same evaluation.
Performance comparison considering the number of negative sample CV3.
| 0.0 | 0.5437 | 0.8559 | 0.8533 | 0.8662 |
| 0.1 | 0.5668 | 0.8565 | 0.8541 | 0.8827 |
| 0.2 | 0.6012 | 0.8532 | 0.8511 | 0.8886 |
| 0.5 | 0.8560 | 0.8541 | 0.8970 | |
| 1.0 | 0.5810 | 0.8818 | 0.8793 | 0.9081 |
| 2.0 | 0.5612 | 0.8916 | 0.8887 | |
| 3.0 | 0.5559 | 0.9096 | ||
| 4.0 | 0.5527 | 0.8912 | 0.8879 | 0.9099 |
| 5.0 | 0.5459 | 0.8842 | 0.8807 | 0.9026 |
These bolded values represent the best values for the different methods under the same evaluation.
Figure 4The performance comparison under different negative MDA selection ratios under CV1.
Figure 6The performance comparison under different negative MDA selection ratios under CV3.
The top 20 microbes associated with asthma.
| 1 | Clostridium difficile | PMID:21872915 |
| 2 | Firmicutes | PMID:23265859 |
| 3 | Bacteroides | PMID:18822123 |
| 4 | Veillonella | PMID:25329665 |
| 5 | Clostridia | Unconfirmed |
| 6 | Clostridium coccoides | PMID:21477358 |
| 7 | Actinobacteria | PMID:28947029 |
| 8 | Collinsella aerofaciens | Unconfirmed |
| 9 | Lachnospiraceae | PMID:26220531 |
| 10 | Lactobacillus | PMID:20592920 |
| 11 | Enterobacteriaceae | Unconfirmed |
| 12 | Staphylococcus aureus | Unconfirmed |
| 13 | Streptococcus | PMID:17950502 |
| 14 | Fusobacterium | DOI:10.4167/jbv.2013.43.4.270 |
| 15 | Burkholderia | PMID:24451910 |
| 16 | Enterococcus | PMID:29788027 |
| 17 | Bifidobacterium | PMID:24735374 |
| 18 | Klebsiella | PMID:29788027 |
| 19 | Faecalibacterium prausnitzii | Unconfirmed |
| 20 | Pseudomonas | PMID:13268970 |
The top 20 microbes associated with asthma.
| 1 | Helicobacter pylori | PMID:22221289 |
| 2 | Clostridium difficile | PMID:27698615 |
| 3 | Bacteroidetes | PMID:25307765 |
| 4 | Firmicutes | PMID:25307765 |
| 5 | Prevotella | PMID:25307765 |
| 6 | Clostridium coccoides | PMID:19235886 |
| 7 | Bacteroides | PMID:25307765 |
| 8 | Veillonella | PMID:28842640 |
| 9 | Clostridia | Unconfirmed |
| 10 | Collinsella aerofaciens | PMID:26848182 |
| 11 | Staphylococcus aureus | PMID:24117882 |
| 12 | Enterobacteriaceae | Unconfirmed |
| 13 | Staphylococcus | PMID:30246806 |
| 14 | Haemophilus | PMID:24013298 |
| 15 | Lactobacillus | PMID:26340825 |
| 16 | Bifidobacterium | Unconfirmed |
| 17 | Enterococcus | PMID:24629344 |
| 18 | Burkholderia | PMID:24325678 |
| 19 | Streptococcus | PMID:23679203 |
| 20 | Klebsiella | PMID:29573336 |
Performance comparison of RNMFMDA with other five methods under CV2.
| KATZHMDA | 0.6487 | 0.6501 | 0.8662 | |
| LRLSHMDA | 0.6944 | 0.7333 | 0.7330 | 0.8086 |
| NGRHMDA | 0.3800 | 0.3285 | 0.7403 | 0.8224 |
| MDLPHMDA | 0.7318 | 0.6653 | 0.6658 | 0.8178 |
| NTSHMDA | 0.7913 | 0.5905 | 0.5921 | 0.8292 |
| RNMFMDA | 0.5850 |
These bolded values represent the best values for the different methods under the same evaluation.
Performance comparison considering the number of negative sample CV2.
| 0 | 0.5439 | 0.7621 | 0.7603 | 0.7673 |
| 0.1 | 0.5707 | 0.7676 | 0.7660 | 0.8088 |
| 0.2 | 0.6040 | 0.7678 | 0.7664 | 0.8220 |
| 0.5 | 0.7830 | 0.7816 | 0.8410 | |
| 1.0 | 0.5850 | 0.8304 | 0.8283 | 0.8669 |
| 2.0 | 0.5581 | 0.8547 | 0.8521 | 0.8791 |
| 3.0 | 0.5560 | |||
| 4.0 | 0.5492 | 0.8563 | 0.8533 | 0.8782 |
| 5.0 | 0.5461 | 0.8515 | 0.8483 | 0.8734 |
These bolded values represent the best values for the different methods under the same evaluation.