| Literature DB >> 35273458 |
Marco Cappellato1, Giacomo Baruzzo1, Ilaria Patuzzi1, Barbara Di Camillo1.
Abstract
In the current research landscape, microbiota composition studies are of extreme interest, since it has been widely shown that resident microorganisms affect and shape the ecological niche they inhabit. This complex micro-world is characterized by different types of interactions. Understanding these relationships provides a useful tool for decoding the causes and effects of communities' organizations. Next-Generation Sequencing technologies allow to reconstruct the internal composition of the whole microbial community present in a sample. Sequencing data can then be investigated through statistical and computational method coming from network theory to infer the network of interactions among microbial species. Since there are several network inference approaches in the literature, in this paper we tried to shed light on their main characteristics and challenges, providing a useful tool not only to those interested in using the methods, but also to those who want to develop new ones. In addition, we focused on the frameworks used to produce synthetic data, starting from the simulation of network structures up to their integration with abundance models, with the aim of clarifying the key points of the entire generative process.Entities:
Keywords: Microbiota; microbial interactions; microbiota analysis; network inference; relationship models; synthetic count data
Year: 2021 PMID: 35273458 PMCID: PMC8822226 DOI: 10.2174/1389202921999200905133146
Source DB: PubMed Journal: Curr Genomics ISSN: 1389-2029 Impact factor: 2.689
Fig. (1)Number of articles in the literature relating to the main downstream analyses. Counts are obtained by querying the PubMed database searching in the title/abstract: (network OR network analysis OR microbial interactions) AND (16S OR microbiota OR microbial communities) for Network Analysis; (Differential OR abundance OR analysis OR statistical) AND (16S OR microbiota OR microbial community) for DA Analysis; (Alpha OR beta OR diversity OR analysis) AND (16S OR microbiota OR microbial community) for Alpha/Beta Diversity.
Fig. (2)Comparison graph. Each node corresponds to a method mentioned in the review. An edge from node A to node B means that in the article of method A there is a comparison between method A and method B in simulated context. The methods inside a circle deal with cross-sectional data, while in square with time series. Pearson and Spearman nodes do not have a specific form since they are correlation measures that can be applied to any profile vector. The graph was built with the R package network using circle layout option.
Fig. (3)Scheme of the overall synthetic count data procedure. The two approaches for generating association structures are shown separately. Both alternatives produce the ground truth network, which represents the input to the count data simulation step.
Some examples of biological networks.
|
|
|
|
|---|---|---|
| Gene Co-Expression Networks | Genes | Co-expression level |
| Gene Regulatory Networks | Transcription factors and binding sites or genes and their regulators | Regulatory relationships |
| Metabolic Networks | Metabolites | Biochemical reactions |
| Microbial Interaction network | Taxa (and ecological or physiological variables) | Microbe-microbe, (environment-microbes and host-microbes) interactions |
| Protein-Protein interaction network (PPI) | Proteins | Interactions involving the activation of a molecular and cellular mechanism |
| Sequence Similarity Networks (SSNs) | Proteins or genes sequences | The similarity in the amino acid or nucleotide chain |
Table of methods covered in this review with indications for code availability, base approach and type of data.
|
|
|
|
|
|
|---|---|---|---|---|
| Meta-network | Python code |
| Rule Mining Association | Cross-sectional |
| MDiNE | R package |
| Bayesian Graphical Model | Cross-sectional |
| SPRING | - | - | Graphical Model | Cross-sectional |
| TIME | web app |
| Granger Causality | Time Series |
| MPLasso | R package |
| Graphical Model | Cross-sectional |
| gCoda | R code |
| Graphical Model | Cross-sectional |
| BAnOCC | R package |
| Bayesian Graphical Model | Cross-sectional |
| MTPLasso | - | - | gLV | Time Series |
| Ridenhour | R code |
| ARIMA | Time Series |
| cooccur | R package |
| Probability Theory | Cross-sectional |
| CoNet | Cytoscape plugin |
| Ensemble Pairwise Metrics | Cross-sectional |
| metaMIS | Matlab (stand-alone GUI) |
| gLV | Time Series |
| SPIEC-EASI | R package |
| Graphical Model | Cross-sectional |
| REBACCA | R code |
| Covariance Estimation | Cross-sectional |
| CCLasso | R code |
| Covariance Estimation | Cross-sectional |
| RMN | - | - | rule-based algorithm | Time Series |
| LIMITS | Mathematica |
| gLV | Time Series |
| eLSA | Python package | https://bio.tools/elsa | LSA | Time Series |
| SparCC | Python package1 |
| Compositional Correlation | Cross-sectional |
| MENAP | Web App |
| Random Matrix Theory | Cross-sectional |
| CCREPE | R package2 |
| Ensemble Pairwise Metrics | Cross-sectional |
| MIC | MINEv2.jar (Java), minepy (Python-Matlab), minerva (R) |
| Pairwise Relationship | Cross-sectional |
1. There is also an R implementation in SpiecEasi, gCoda and CCLasso package, 2. This information comes from the SpiecEasi paper.
Summary of the generative processes used by the inference methods present in the review to evaluate performance against the ground truth structure.
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
| MDiNE | Ω obtained by Cholesky decomposition | - | NorTA | -AUROC and AUROC of edge difference between two precision matrix ΩA, ΩB | ||||
| SPRING | -Band (B2) | method 3 | NorTA | -dij | ||||
| MPLasso | -Cluster (C2) | method 4 | -ln (y) ⁓N (µ, Γ) | -AUPRC | ||||
| gCoda | -Random | method 1 | ln (y) ⁓N (µ, Γ) | -AUROC | ||||
| BAnOCC | - 4 correlation scenarios | - | - model of the method | -heatmap of the estimates and significance of correlations | ||||
| MTPLasso | -Random | method 2 | gLV | -AUPRC | ||||
| Ridenhour | -Small-world (Watts-Strogatz) | method 2 | Xi(t+1)=Xi(t)eCiX(t) | -ROC curve | ||||
| SPIEC-EASI | -Band: (B2) | method 3 | NorTA | -PR curve | ||||
| REBACCA | -3 fix structure: | method 1 | -y⁓log ratio normal (LRN) | -AUROC | ||||
|
|
|
|
|
| ||||
| CCLasso | -Random | method 1 | ln (y) ⁓N (µ, Γ) | -AUROC | ||||
| RMN | - Association structure imposed by the system | - | system of tanh equation with 3 latent factors considered | -TP rate | ||||
| LIMITS | - mii sampled from a uniform distribution, mij are iteratively added up to the model stability maintenance | method 2 | gLV | -scatter plot (interactions | ||||
| SparCC | -random Γ where each OTU pair has a given probability of being perfectly correlated | method 1 | ln (y) ⁓N (µ, Γ) | -Visual comparison of network | ||||
Ω = precision matrix; mij = interaction coefficients of M ; Γ = covariance matrix; y = true compositional abundance; µ = the mean abundance vector; x(t)= abundance of taxon i in t; Wnc= Weighted natural connectivity; dij= pairwise absolute difference; dH= Hamming distance; dF= Frobenius norm distance; Pr=probability of prediction for pairs with less than 0.5 of non linear correlation coefficient.