Meifang Hua1, Shengpeng Yu1, Tianyu Liu1, Xue Yang1, Hong Wang2. 1. School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China. 2. School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China. wanghong106@163.com.
Abstract
MOTIVATION: Exploring the interrelationships between microbes and disease can help microbiologists make decisions and plan treatments. Predicting new microbe-disease associations currently relies on biological experiments and domain knowledge, which is time-consuming and inefficient. Automated algorithms are used to uncover the intrinsic link between microbes and disease. However, due to data noise and inadequate understanding of relevant biology, the efficient prediction of microbe-disease associations is still crucial. This study develops a multi-view graph augmentation convolutional network (MVGCNMDA) to predict potential disease-associated microbes. METHODS: First, we use two data augmentation methods, edge perturbation and node dropping, to remove the data noise in the preprocessing stage. Second, we calculate Gaussian interaction profile kernel similarity and cosine similarity. Therefore, the Graph Convolutional Network(GCN) can fully use multi-view features. Then, the multi-view features are fed into the multi-attention block to learn the weights of different features adaptively. Finally, the embedding results are obtained using a Convolutional Neural Network (CNN) combiner, and the matrix completion is used to predict the relationship between potential microbes and diseases. RESULTS: We test our model on the Human microbe-disease Association Database (HMDAD), Disbiome, and the Combined Dataset (Peryton and MicroPhenoDB). The area under PR curve (AUPR), area under ROC curve (AUC), F1 score, and RECALL value are calculated to evaluate the performance of the developed MVGCNMDA. The AUPR is 0.9440, AUC is 0.9428, F1 score is 0.9383, and RECALL value is 0.8858. The experiments show that our model can accurately predict potential microbe-disease associations compared with the state-of-the-art works on the global Leave-One-Out-Cross-Validation (LOOCV) and the fivefold Cross-Validation (fivefold CV). To further verify the effectiveness of the proposed graph data augmentation, we designed five different settings in the ablation study. Furthermore, we present two case studies that validate the prediction of the potential association between microbes and diseases by MVGCNMDA.
MOTIVATION: Exploring the interrelationships between microbes and disease can help microbiologists make decisions and plan treatments. Predicting new microbe-disease associations currently relies on biological experiments and domain knowledge, which is time-consuming and inefficient. Automated algorithms are used to uncover the intrinsic link between microbes and disease. However, due to data noise and inadequate understanding of relevant biology, the efficient prediction of microbe-disease associations is still crucial. This study develops a multi-view graph augmentation convolutional network (MVGCNMDA) to predict potential disease-associated microbes. METHODS: First, we use two data augmentation methods, edge perturbation and node dropping, to remove the data noise in the preprocessing stage. Second, we calculate Gaussian interaction profile kernel similarity and cosine similarity. Therefore, the Graph Convolutional Network(GCN) can fully use multi-view features. Then, the multi-view features are fed into the multi-attention block to learn the weights of different features adaptively. Finally, the embedding results are obtained using a Convolutional Neural Network (CNN) combiner, and the matrix completion is used to predict the relationship between potential microbes and diseases. RESULTS: We test our model on the Human microbe-disease Association Database (HMDAD), Disbiome, and the Combined Dataset (Peryton and MicroPhenoDB). The area under PR curve (AUPR), area under ROC curve (AUC), F1 score, and RECALL value are calculated to evaluate the performance of the developed MVGCNMDA. The AUPR is 0.9440, AUC is 0.9428, F1 score is 0.9383, and RECALL value is 0.8858. The experiments show that our model can accurately predict potential microbe-disease associations compared with the state-of-the-art works on the global Leave-One-Out-Cross-Validation (LOOCV) and the fivefold Cross-Validation (fivefold CV). To further verify the effectiveness of the proposed graph data augmentation, we designed five different settings in the ablation study. Furthermore, we present two case studies that validate the prediction of the potential association between microbes and diseases by MVGCNMDA.
Authors: Brian D Muegge; Justin Kuczynski; Dan Knights; Jose C Clemente; Antonio González; Luigi Fontana; Bernard Henrissat; Rob Knight; Jeffrey I Gordon Journal: Science Date: 2011-05-20 Impact factor: 47.728
Authors: Julia K Goodrich; Jillian L Waters; Angela C Poole; Jessica L Sutter; Omry Koren; Ran Blekhman; Michelle Beaumont; William Van Treuren; Rob Knight; Jordana T Bell; Timothy D Spector; Andrew G Clark; Ruth E Ley Journal: Cell Date: 2014-11-06 Impact factor: 41.582
Authors: Mohamed S Donia; Peter Cimermancic; Christopher J Schulze; Laura C Wieland Brown; John Martin; Makedonka Mitreva; Jon Clardy; Roger G Linington; Michael A Fischbach Journal: Cell Date: 2014-09-11 Impact factor: 41.582
Authors: Tanya Yatsunenko; Federico E Rey; Mark J Manary; Indi Trehan; Maria Gloria Dominguez-Bello; Monica Contreras; Magda Magris; Glida Hidalgo; Robert N Baldassano; Andrey P Anokhin; Andrew C Heath; Barbara Warner; Jens Reeder; Justin Kuczynski; J Gregory Caporaso; Catherine A Lozupone; Christian Lauber; Jose Carlos Clemente; Dan Knights; Rob Knight; Jeffrey I Gordon Journal: Nature Date: 2012-05-09 Impact factor: 49.962
Authors: Lawrence A David; Corinne F Maurice; Rachel N Carmody; David B Gootenberg; Julie E Button; Benjamin E Wolfe; Alisha V Ling; A Sloan Devlin; Yug Varma; Michael A Fischbach; Sudha B Biddinger; Rachel J Dutton; Peter J Turnbaugh Journal: Nature Date: 2013-12-11 Impact factor: 49.962
Authors: Peter J Turnbaugh; Micah Hamady; Tanya Yatsunenko; Brandi L Cantarel; Alexis Duncan; Ruth E Ley; Mitchell L Sogin; William J Jones; Bruce A Roe; Jason P Affourtit; Michael Egholm; Bernard Henrissat; Andrew C Heath; Rob Knight; Jeffrey I Gordon Journal: Nature Date: 2008-11-30 Impact factor: 49.962