Literature DB >> 31510709

Learning a mixture of microbial networks using minorization-maximization.

Abstract

MOTIVATION: The interactions among the constituent members of a microbial community play a major role in determining the overall behavior of the community and the abundance levels of its members. These interactions can be modeled using a network whose nodes represent microbial taxa and edges represent pairwise interactions. A microbial network is typically constructed from a sample-taxa count matrix that is obtained by sequencing multiple biological samples and identifying taxa counts. From large-scale microbiome studies, it is evident that microbial community compositions and interactions are impacted by environmental and/or host factors. Thus, it is not unreasonable to expect that a sample-taxa matrix generated as part of a large study involving multiple environmental or clinical parameters can be associated with more than one microbial network. However, to our knowledge, microbial network inference methods proposed thus far assume that the sample-taxa matrix is associated with a single network.
RESULTS: We present a mixture model framework to address the scenario when the sample-taxa matrix is associated with K microbial networks. This count matrix is modeled using a mixture of K Multivariate Poisson Log-Normal distributions and parameters are estimated using a maximum likelihood framework. Our parameter estimation algorithm is based on the minorization-maximization principle combined with gradient ascent and block updates. Synthetic datasets were generated to assess the performance of our approach on absolute count data, compositional data and normalized data. We also addressed the recovery of sparse networks based on an l1-penalty model.
AVAILABILITY AND IMPLEMENTATION: MixMPLN is implemented in R and is freely available at https://github.com/sahatava/MixMPLN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Species

Mesh：

Year: 2019 PMID： 31510709 PMCID： PMC6612855 DOI： 10.1093/bioinformatics/btz370

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Microbes are found almost everywhere on earth, including in environments deemed too extreme for other life forms, and they play critical roles in many biogeochemical processes (Falkowski ; Whitman ). Microbial communities are also found in association with higher life forms, including plants and animals; for instance, trillions of microbes live in or on the human body (almost as many human cells as there are in the body) (Sender et al., 2016) and ongoing research continues to reveal the important roles that many of these microbes play in human health (Huttenhower ; Qin ). Microbial communities are typically structured and composed of members of different species. The microbes in a community do not exist in isolation, but interact with each other and also compete for the available carbon and energy sources. These interactions, along with resource availability and environmental parameters (like temperature, pH and salinity) (Hibbing ; Williamson and Yooseph, 2012), determine the taxonomic composition of the microbial community and the abundance levels of its constituents. Knowledge of these interactions is crucial for understanding the overall behavior of the microbial community, and can be used to elucidate the biological mechanisms underlying microbe-associated disease progression and microbe-mediated processes (like biofilm formation). The study of microbial communities has been greatly enabled with the advent of high-throughput next-generation DNA sequencing technologies (Bentley, 2006; Margulies ; Quail ). The taxonomic composition of a microbial community can be obtained by sequencing the DNA extracted from a biological sample that has been collected from the environment of interest. This is achieved either using a targeted approach, involving the sequencing of a taxonomic marker gene [for instance, the 16S ribosomal RNA gene, which is found in all bacteria (Woese and Fox, 1977)] or using a whole-genome shotgun sequencing approach (Venter ). Both approaches generate taxa counts that are compositional in nature, and that enable the estimation of the relative abundances of the constituent members of the community. Microbial interactions can be modeled using a weighted graph (or network), where each node in the graph represents a taxon (or taxonomic group) and an (undirected) edge exists between two nodes if the corresponding taxa interact with, or influence, each other. The edge weight captures the strength of the interaction, with its sign reflecting whether the interaction is positive or negative. This framework can be used to model a variety of microbial interactions, including competition and co-operation. While we do not consider it here, a directed graph can also be used to represent interactions, where the edge direction indicates the direction of influence (or causality). Microbial networks are typically constructed from sample-taxa count matrices. A sample-taxa count matrix is generated by sequencing multiple biological samples (n samples) collected from the environment of interest and identifying the counts of the observed taxa (d taxa) in each sample. As we discuss briefly below, microbial networks can be constructed using a variety of different approaches. To our knowledge, all of these methods assume that the sample-taxa matrix is associated with a single underlying stochastic process (i.e. there is one underlying network topology and set of edge weights). However, this need not always be the case. In this paper, we consider an important extension to the network inference problem, whereby we develop a mixture modeling framework for inferring K microbial networks when the observed sample-taxa matrix is associated with K underlying distributions. We are motivated by large-scale human-associated and other environmental microbial community projects that are now possible due to cost-effective sequencing. For instance, human gut microbiome studies now routinely analyze large cohorts of individuals and generate microbial community data from several hundreds (to even thousands) of samples. An important research question in this area involves the definition of a ‘core’ microbiome associated with a particular host phenotype (Huttenhower ; Qin ). It is well understood that the gut microbiome composition is greatly influenced by many factors including diet and age, and thus it is not unreasonable to expect the associated microbial network interactions to also be different when these factors vary (e.g. the gut microbial community interaction network in vegetarian hosts can be expected to be different compared to the network in non-vegetarian hosts). A similar situation also occurs in environmental studies where the microbial interactions are influenced greatly by the physical and chemical gradients of the environment. Often the collected metadata in these studies may not be comprehensive enough to discern these interactions in a supervised manner. Our proposed mixture framework offers a principled approach to identifying these multiple microbial interaction networks from a sample-taxa matrix. Several methods have been proposed for constructing a single microbial network from an input sample-taxa matrix (Layeghifard ). One approach involves using pairwise correlations (Pearson or Spearman) between taxa to define the edge weights in the graph. However, the computation of these correlation networks directly from the observed count data can be misleading because of the compositional nature of these data (Gloor ). Furthermore, for a microbial network with d nodes, while there are edge weights that need to be determined, the number of available samples n is often not large enough, with the result that the system of equations to determine all pairs of correlations is under-determined. This later issue is typically handled by assuming that the network is sparse [i.e. the number of edges is O(d)]. Methods based on latent variable modeling have been proposed to infer correlation networks (Fang ; Friedman and Alm, 2012). These methods use log-ratio transformations of the original count data (Aitchison, 1982) to deal with the compositional nature of these data and subsequently infer the correlation matrix (i.e. edge weights) under the assumption of sparsity. Microbial networks have also been constructed using a probabilistic graphical model framework (Jordan, 1999) that enables the modeling of conditional dependencies associated with the interactions. For instance, the assumption that the log-ratio transformed count data follow a Gaussian distribution, results in a Gaussian graphical model (GGM) framework. In this scenario, the graph structure represents the precision matrix (or inverse covariance matrix) of the underlying multivariate Gaussian distribution. This graph has the property that an edge exists between two nodes iff the corresponding entry in the precision matrix is non-zero. A zero entry in the precision matrix indicates conditional independence between the two corresponding random variables. When the graph is assumed to be sparse, the GGM inference problem can be solved using sparse precision matrix estimation algorithms (Friedman ). This approach has been used to construct microbial networks from sample-taxa matrices (Kurtz ). An alternate approach to constructing a microbial network, and that which we adopt in this paper, is to model the vector of observed taxa counts (in samples) using a multivariate distribution and to infer the parameters of this distribution from the observed data using a maximum likelihood framework. Any candidate multivariate distribution for this approach will have to be flexible enough to capture the underlying covariance structure to model the network interactions (i.e. allow for both positive and negative covariances); this rules out distributions like the multinomial or the Dirichlet-Multinomial, which are popular choices for modeling microbial count data in certain situations (Holmes ; La Rosa et al., 2012), but which cannot capture both types of interactions. The Multivariate Poisson Log-Normal (MPLN) distribution (Aitchison and Ho, 1989) can be used for modeling multivariate count data and its covariance structure can capture both positive and negative interactions. This distribution was used recently (Biswas et al., 2015) to model counts in a sample-taxa matrix and infer an underlying microbial network using an assumption of sparsity. In this paper, we consider the following computational inference problem: for a sample-taxa matrix X (containing absolute counts of taxa) that is generated by a mixture model consisting of K MPLN component distributions, estimate the mixing coefficients and the distribution parameters of the K components. We note that the precision matrices of the K components define the K different microbial networks. We formulate this inference problem using a maximum likelihood framework, and estimate the various parameters of the constructed likelihood function directly using a numerical optimization method based on the minorization–maximization (MM) principle (Lange, 2016; Wu ; Zhou and Lange, 2010). We extend this formulation based on an l1-penalty model and provide algorithms to infer K sparse networks. We evaluate the performance of our algorithms using both synthetic and real datasets. We also evaluate the performance of our method on compositional data obtained by subsampling from the true counts of the taxa. This evaluation models the real-world scenario, where the sample-taxa matrices that we have access to, contain only relative abundances of the observed taxa.

2 Materials and methods

Prior to describing our mixture modeling framework, we describe a single MPLN distribution. In our discussions, we denote matrices using upper-case letters, column vectors using bold letters (upper- or lower- case) and scalar values using normal lower-case letters.

2.1 The model

Single MPLN distribution:

Consider an MPLN distribution with parameter set, where represents its d-dimensional mean vector and Σ represents its d × d covariance matrix. Then, a d-dimensional count vector generated by this distribution has the following property: where denotes a Poisson distribution with mean c and denotes a d-dimensional multivariate normal distribution with mean and covariance Σ. An MPLN distribution thus has two layers, with the observed counts being generated by a mixture of independent Poisson distributions whose (hidden) means follow a multivariate normal distribution. If we use to denote the latent (or hidden) variable representing the Poisson means, then the probability density function of the MPLN distribution is given by Let denote n independent samples drawn from an MPLN distribution, where each is a d-dimensional count vector. We use to denote the sample-taxa matrix generated from d taxa and n samples, and x to denote the count of the j-th taxon in. We can estimate the parameter set Θ of this MPLN distribution using a likelihood framework by considering the d-dimensional latent variables, and associated with samples, and respectively; let matrix and λ denote the j-th entry in . The log-likelihood function can be optimized using a simple iterative stepwise ascent (or conditional maximization) procedure to compute Θ and Λ. The estimated values of the parameters can be used to provide an approximation for as, where is defined as: An analogous approach based on optimizing the log-posterior using an iterative conditional modes algorithm has been proposed previously (Biswas ).

Mixture of K MPLN distributions (MixMPLN):

In our framework, we consider a mixture model involving K MPLN distributions. Let represent the mixing coefficients of the K components (where), and let p and Θ denote the l-th component distribution and its parameter set. A d-dimensional sample vector generated from this mixture has the following distribution: For n independent samples, the probability distribution is given by The general log-likelihood function is thus where is the d-dimensional latent variable associated with in component l. We use a maximum log-likelihood framework to estimate the parameters of the MixMPLN model from the observed data X by optimizing the function.

2.2 Optimizing the log-likelihood function using the MM principle

The MM principle is a general technique (Hunter and Lange, 2004; Lange, 2016) that has proven to be useful for tackling function optimization problems (MM stands for minorization–maximization in maximization problems and for majorization–minimization in minimization problems). For our scenario, let denote a function to be maximized. The MM principle proposes to maximize the minorizer function instead of maximizing directly; here, denotes a fixed value of the parameter θ. The function is said to be a minorizer of if: Therefore, the first step in our MM approach is to find a minorizer function which has the required property. For this, we use the following observation that follows from the concavity property of the log function (Lange, 2016; Zhou and Lange, 2010) for m non-negative values: Equation (6) can thus be lower-bounded based on this observation: where, weight is defined as follows: We use the function on the right-hand side of Equation (9) as the minorizer function for our problem. Thus, we will define the new objective function (L) to be maximized as follows:

2.3 Steps of the MixMPLN optimization algorithm

We used a coordinate ascent approach in conjunction with a block update strategy to optimize L. We present the details of parameter initialization and subsequent iterative updates below.

Parameter initialization

The n samples are partitioned into K clusters (components) using the K-means algorithm. Then, the samples assigned to a component are used to estimate the parameters of that component using the moments of an MPLN distribution (Aitchison and Ho, 1989), given by the equations below; here, σ denotes the ij-th entry in Σ.

Parameter estimation in iteration

The portion of function L in Equation (11) that is dependent on the π’s can be optimized. Since, we can optimize L1 by introducing a Lagrange multiplier and identifying a stationary point of the subsequent Lagrangian (Bilmes ). This yields where is calculated using Equation (10). Considering the part of the L function that is dependent on Λ and Θ, we have the following term to maximize: Expanding p using Equation (3), we have: where λ is the j-th entry in. We solve for the parameters separately using the partial derivation method. Calculation of the partial derivative of L2 with respect to λ gives us: where denotes the inner product of vectors and , and denotes the j-th column of. We use the Newton–Raphson method to estimate from this equation, using values from iteration t for and (where). Partial derivation with respect to gives us: Thus, can be estimated from and. And finally, partial derivation with respect to results in: We can solve this equation using matrix derivative rules to compute an estimate for the covariance matrix Σ in iteration as:

2.4 Inferring sparse networks using an l1-penalty model

We extended the MixMPLN framework by incorporating an l1-norm penalty as follows: where is the l1-norm of the precision matrix of the l-th component, and are tuning parameters that can be selected independently. This framework allows for the inference of sparse networks associated with the K components. For a fixed tuning parameter of ρ and a multivariate Gaussian distribution, the problem of selecting a precision matrix with the l1-norm penalty can be stated as (Friedman ): where S denotes the empirical covariance matrix. In our implementation of the extended MixMPLN framework, we calculated the empirical covariance matrix for each component using Equation (20) in each iteration. We used the ‘glasso’ and ‘huge’ packages in R to solve the sparse precision matrix selection problem (Friedman ; Zhao ). We applied three different strategies using each of the two packages. In MixMPLN + glasso(cross validation), we used cross validation to determine the value of ρ (i.e. we picked that ρ value which gave the best log-likelihood value among the subsamples). In MixMPLN + glasso (fixed tuning parameter), we used, as proposed in Biswas . In this formulation, is the estimated precision matrix after the initialization step of MixMPLN. In MixMPLN + glasso(iterative tuning parameter), we used the same formulation to initialize the ρ value, but then updated it in each iteration based on the new estimated precision matrix in that iteration. In MixMPLN + huge(StARS), we used the stability approach to regularization selection (StARS) method (Liu ). This method selects the coefficient which results in the most edge stability in the final graph) (Kurtz ). The tuning parameter selection method in MixMPLN + huge(fixed tuning parameter) and MixMPLN + huge(fixed tuning parameter) was the same as the corresponding implementations using glasso.

2.5 Performance evaluation and datasets

Synthetic sample-taxa count matrices were generated in order to assess the performance of the various MixMPLN algorithms. We evaluated the convergence properties of the algorithms as a function of increasing sample sizes. Since, in practice, sample-taxa count matrices generated from biological samples are compositional in nature, we also evaluated the effect of sampling from the true (or absolute) counts, and the subsequent application of data normalization, on network recovery and convergence. In addition, we also evaluated the accuracy in recovering sparse networks. Finally, we applied our mixture model framework to analyze a real dataset.

Datasets

Synthetic data generation: Each sample-taxa count matrix X was produced by combining samples generated from K component MPLN distributions, where component l generated n samples (d-length count vectors), and such that and. For each component l, the d × d covariance matrix of its MPLN distribution was derived from a randomly generated d × d positive definite precision matrix containing a fixed number of zero entries (as given by the sparsity level sp which denotes the fraction matrix entries that are zero). The mean vector for each MPLN component was a random d-length vector. For a sample-taxa matrix X generated this way, it is assumed that each entry in the matrix is the true (or absolute) abundance of taxon j in sample i. We refer to X as the original count matrix. To simulate compositional count data and sequencing depth differences between the biological samples, we generated a sampled data matrix XS from X by first normalizing each entry in a sample (by dividing by sample size) and subsequently scaling that value by a sample-specific random number (to model sequencing depth for the biological sample). Finally, to study the effect of data normalization, we applied the trimmed mean of M-values (TMM) normalization procedure [from the ‘edgeR’ package (Robinson and Oshlack, 2010)] to XS to generate the TMM normalized data matrix TS. Real data: We also applied our mixture modeling framework to a sample-taxa count matrix produced by a recent microbiome study (Yooseph ) that explored connections between gut microbiome composition and the risk of Plasmodium falciparum infection. In this study, stool samples from a cohort composed of 195 Malian adults and children were collected and analyzed. The samples were assayed by sequencing the 16S ribosomal RNA gene to determine the bacterial communities they contained. This generated a sample-taxa count matrix with 195 samples and 221 bacterial genera which we analyzed in this paper.

Benchmarking criteria

Let denote the true precision matrix and an inferred (or predicted) precision matrix. For evaluating the performance of our algorithms on synthetic data, we used three different criteria to compare the inferred precision matrices with the original precision matrices that were used to generate the sample-taxa matrices. Relative difference between two matrices A and B defined as. Frobenius norm of the difference between the partial correlations of matrices A and B. Frobenius norm of a matrix M is defined as. For a precision matrix C, its partial correlation matrix P is calculated as. Area under the ROC (AUC): this measure was used to assess the recovery of the edges of the microbial network (i.e. identification of the zero entries in the precision matrix). The AUC was calculated by applying varying thresholds to the original and estimated precision matrices to define zero and non-zero entries. As any non-zero entry in the estimated precision matrix represents an edge in the graph, the specificity and sensitivity of detecting an edge at different thresholds can be computed and used to plot the receiver-operating characteristic curve (ROC). For the above measures, smaller values for relative difference and Frobenius norm indicate increased proximity to the ground truth. Values closer to 1 for the AUC plots indicate increased accuracy in reconstructing the network topology. When K > 1, we first matched the predicted precision matrix (of a component) to the nearest true precision matrix from the set of K true precision matrices. We used the Frobenius norm measure for this. After pairing the true and predicted matrices, we report their mean value statistics.

3 Results

We implemented the MixMPLN algorithms in R, and assessed their performance using the synthetic datasets. For our assessments, we generated sample-taxa count matrices X (and their corresponding XS and TS matrices) for the following four sets of parameters: (), (), (), (). For each dataset, each component mixing coefficient was . In addition, five replicates were generated for each parameter combination. In total, 465 datasets were generated and analyzed. The datasets with sp = 0.9 were used to assess the performance of the MixMPLN algorithms in recovering sparse networks. First we evaluated the ability of the MixMPLN algorithm to recover the true precision matrices with increasing sample size n. For this, we used the original count data, the sampled data and the TMM normalized data. Figure 1 shows the benchmark results for the parameter combination of and K = 1, 2, 3; Supplementary Figures S1 and S2 show the corresponding results for and for. The three benchmark criteria (relative difference, Frobenius norm and AUC) show that the entries in predicted precision matrices approach their true values as the sample size n increases. A strong convergence trend is seen using the original count data (blue curves/bar chart), with the AUC values approaching 0.97, 0.96 and 0.9 for K = 1, 2, 3 respectively. The drop in performance with increasing K is not unexpected given the smaller fraction of data available per component to infer the component parameters. The figure also shows that the accuracy of recovery of the true precision matrices is not as high when using the sampled data (red curves/bar chart) and the TMM normalized data (orange curves/bar chart). In addition, from our analysis, it is not immediately evident that applying a TMM type data normalization is advantageous for the purpose of inferring the underlying covariance structure of the network.

Fig. 1.

Performance on synthetic data with . (a) Relative difference between the predicted and true precision matrices. (b) Frobenius norm of the difference. (c) Area under the ROC

Performance on synthetic data with . (a) Relative difference between the predicted and true precision matrices. (b) Frobenius norm of the difference. (c) Area under the ROC We also evaluated the performance of MixMPLN and its l1-norm penalty variants in recovering sparse networks. Table 1 and Figure 2 show the performance of these methods for d = 200, K = 1, 2, 3 and sparsity level sp = 0.9. Sample sizes of were used in these evaluations. As can be seen, the performance for the methods generally improve with increasing n, and the l1-norm penalty variants perform better than the unpenalized MixMPLN model on these data. The performances of the l1-norm penalty variants are generally quite comparable to each other [with MixMPLN + huge(StARS) having a slightly lower performance compared to the others].

Table 1.

Performance on synthetic data with

	n = 2d	n = 5d	n = 10d
One component Frobenius norm
MixMPLN	14.16	13.99	7.34
MixMPLN + huge(StARS)	5.62	16.02	7.15
MixMPLN + huge(fixed ρ)	4.86	2.87	2.08
MixMPLN + huge(iterative ρ)	5.62	3.01	2.15
MixMPLN + glasso(cross validation)	4.17	2.96	2.09
MixMPLN + glasso(fixed ρ)	4.86	2.87	2.08
MixMPLN + glasso(iterative ρ)	5.62	3.01	2.15
One component relative distance
MixMPLN	27.36	30.23	11.75
MixMPLN + huge(StARS)	1.96	30.31	9.62
MixMPLN + huge(fixed ρ)	3.10	1.04	0.72
MixMPLN + huge(iterative ρ)	4.80	1.49	1.07
MixMPLN + glasso(cross validation)	1.52	1.23	0.77
MixMPLN + glasso(fixed ρ)	3.10	1.04	0.72
MixMPLN + glasso(iterative ρ)	4.80	1.49	1.07
Two components Frobenius norm
MixMPLN	73.67	19.76	8.79
MixMPLN + huge(StARS)	5.60	5.42	15.29
MixMPLN + huge(fixed ρ)	27.24	5.28	3.48
MixMPLN + huge(iterative ρ)	25.94	4.33	2.88
MixMPLN + glasso(cross validation)	5.04	3.73	2.84
MixMPLN + glasso(fixed ρ)	28.83	5.28	3.48
MixMPLN + glasso(iterative ρ)	27.63	4.33	2.88
Two components relative distance
MixMPLN	14443.03	56.52	14.42
MixMPLN + huge(StARS)	1.98	1.83	25.50
MixMPLN + huge(fixed ρ)	957.78	4.68	2.52
MixMPLN + huge(iterative ρ)	772.93	2.73	1.29
MixMPLN + glasso(cross validation)	1.82	1.30	1.04
MixMPLN + glasso(fixed ρ)	920.39	4.68	2.52
MixMPLN + glasso(iterative ρ)	842.48	2.73	1.29
Three components Frobenius norm
MixMPLN	20.07	14.66	13.61
MixMPLN + huge(StARS)	6.93	5.90	5.46
MixMPLN + huge(fixed ρ)	19.15	8.58	6.04
MixMPLN + huge(iterative ρ)	18.56	6.11	4.97
MixMPLN + glasso(cross validation)	6.48	6.25	5.88
MixMPLN + glasso(fixed ρ)	19.67	11.02	8.03
MixMPLN + glasso(iterative ρ)	19.44	6.11	4.98
Three components relative distance
MixMPLN	48131.48	4571.65	24.22
MixMPLN + huge(StARS)	8959.39	2.18	1.87
MixMPLN + huge(fixed ρ)	16096.08	12.68	6.70
MixMPLN + huge(iterative ρ)	5891.79	5.63	3.04
MixMPLN + glasso(cross validation)	2.60	2.46	3.55
MixMPLN + glasso(fixed ρ)	11828.42	2029.51	10.91
MixMPLN + glasso(iterative ρ)	5836.67	5.63	3.32

Fig. 2.

AUC values for the synthetic datasets generated using

Performance on synthetic data with AUC values for the synthetic datasets generated using Since MixMPLN + glass(cross validation) performs better than the other approaches for, we decided to apply this method to analyze the real dataset. We used the silhouette method from the ‘factoextra’ R package (Kassambara, 2017) to compute the optimal number of components from this sample-taxa matrix. Figure 3 shows the results of our analysis. We find that there is strong evidence for two underlying (and different) microbial networks (K = 2 components) associated with this sample-taxa data. We assigned component membership to the samples based on their final weights w [Equation (10)]. This resulted in Component 1 containing 158 samples and Component 2 containing 37 samples. The average age of the individuals in Component 1 was 9 years while that of individuals in Component 2 was 0.7 years. Our reconstructed networks are consistent with the observation that infants have a different gut microbiome composition compared to older children and adults (Yooseph ). The constructed networks include edges involving bacterial groups like Bifidobacterium, Staphylococcus, Streptococcus and Escherichia–Shigella that are known to be key players in early gut microbiome development. Our method identifies both positive and negative interactions between pairs of taxa (red and green edges) for the chosen threshold of 0.3; Supplementary Figure S3 shows the graph structures for other selected threshold values. The biological significance of these interactions needs to be investigated further.

Fig. 3.

Application of MixMPLN + glasso with cross validation to a real dataset. Green and red edges represent positive and negative entries respectively in the estimated partial correlation matrices. (a) Graph of Component 1 which contains 158 samples; (b) graph of Component 2 which contains 37 samples. The threshold to select the edges is 0.3; (c) Selection of the optimal number of the components

4 Conclusion

We presented a mixture model framework and network inference algorithms to analyze sample-taxa matrices that are associated with K microbial networks. Future work will include the development of Bayesian approaches for model selection for this problem. Conflict of Interest: none declared. Click here for additional data file.

23 in total

1. Genome sequencing in microfabricated high-density picolitre reactors.

Authors: Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal: Nature Date: 2005-07-31 Impact factor: 49.962

Review 2. Whole-genome re-sequencing.

Authors: David R Bentley
Journal: Curr Opin Genet Dev Date: 2006-10-18 Impact factor: 5.578

3. MM Algorithms for Some Discrete Multivariate Distributions.

Authors: Hua Zhou; Kenneth Lange
Journal: J Comput Graph Stat Date: 2010-09-01 Impact factor: 2.302

4. Sparse inverse covariance estimation with the graphical lasso.

Authors: Jerome Friedman; Trevor Hastie; Robert Tibshirani
Journal: Biostatistics Date: 2007-12-12 Impact factor: 5.899

Review 5. The microbial engines that drive Earth's biogeochemical cycles.

Authors: Paul G Falkowski; Tom Fenchel; Edward F Delong
Journal: Science Date: 2008-05-23 Impact factor: 47.728

6. A human gut microbial gene catalogue established by metagenomic sequencing.

Authors: Junjie Qin; Ruiqiang Li; Jeroen Raes; Manimozhiyan Arumugam; Kristoffer Solvsten Burgdorf; Chaysavanh Manichanh; Trine Nielsen; Nicolas Pons; Florence Levenez; Takuji Yamada; Daniel R Mende; Junhua Li; Junming Xu; Shaochuan Li; Dongfang Li; Jianjun Cao; Bo Wang; Huiqing Liang; Huisong Zheng; Yinlong Xie; Julien Tap; Patricia Lepage; Marcelo Bertalan; Jean-Michel Batto; Torben Hansen; Denis Le Paslier; Allan Linneberg; H Bjørn Nielsen; Eric Pelletier; Pierre Renault; Thomas Sicheritz-Ponten; Keith Turner; Hongmei Zhu; Chang Yu; Shengting Li; Min Jian; Yan Zhou; Yingrui Li; Xiuqing Zhang; Songgang Li; Nan Qin; Huanming Yang; Jian Wang; Søren Brunak; Joel Doré; Francisco Guarner; Karsten Kristiansen; Oluf Pedersen; Julian Parkhill; Jean Weissenbach; Peer Bork; S Dusko Ehrlich; Jun Wang
Journal: Nature Date: 2010-03-04 Impact factor: 49.962

7. A scaling normalization method for differential expression analysis of RNA-seq data.

Authors: Mark D Robinson; Alicia Oshlack
Journal: Genome Biol Date: 2010-03-02 Impact factor: 13.583

Review 8. Bacterial competition: surviving and thriving in the microbial jungle.

Authors: Michael E Hibbing; Clay Fuqua; Matthew R Parsek; S Brook Peterson
Journal: Nat Rev Microbiol Date: 2010-01 Impact factor: 60.633

9. Environmental genome shotgun sequencing of the Sargasso Sea.

Authors: J Craig Venter; Karin Remington; John F Heidelberg; Aaron L Halpern; Doug Rusch; Jonathan A Eisen; Dongying Wu; Ian Paulsen; Karen E Nelson; William Nelson; Derrick E Fouts; Samuel Levy; Anthony H Knap; Michael W Lomas; Ken Nealson; Owen White; Jeremy Peterson; Jeff Hoffman; Rachel Parsons; Holly Baden-Tillson; Cynthia Pfannkoch; Yu-Hui Rogers; Hamilton O Smith
Journal: Science Date: 2004-03-04 Impact factor: 47.728