| Literature DB >> 18515346 |
E Mejía-Roa1, P Carmona-Saez, R Nogales, C Vicente, M Vázquez, X Y Yang, C García, F Tirado, A Pascual-Montano.
Abstract
In the last few years, advances in high-throughput technologies are generating large amounts of biological data that require analysis and interpretation. Nonnegative matrix factorization (NMF) has been established as a very effective method to reveal information about the complex latent relationships in experimental data sets. Using this method as part of the exploratory data analysis, workflow would certainly help in the process of interpreting and understanding the complex biology mechanisms that are underlying experimental data. We have developed bioNMF, a web-based tool that implements the NMF methodology in different analysis contexts to support some of the most important reported applications in biology. This online tool provides a user-friendly interface, combined with a computational efficient parallel implementation of the NMF methods to explore the data in different analysis scenarios. In addition to the online access, bioNMF also provides the same functionality included in the website as a public web services interface, enabling users with more computer expertise to launch jobs into bioNMF server from their own scripts and workflows. bioNMF application is freely available at http://bionmf.dacya.ucm.es.Entities:
Mesh:
Year: 2008 PMID: 18515346 PMCID: PMC2447803 DOI: 10.1093/nar/gkn335
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic representation of the NMF model applied to gene-expression data. Input data matrix V is represented as a gene–experiment matrix and it is decomposed by the product of two new nonnegative matrices W and H. The k columns of W, therefore, will have the dimension of a single array (genes) and are known as basis experiments or factors. The columns of H are known as encoding vectors and are in one-to-one correspondence with a single experiment of the gene-expression matrix. Consequently, each row of H has the dimension of a single gene and it is denoted as basis gene.
Figure 2.Snapshot of the output of the Sample Classification module. Results show the cophenetic correlation coefficient (left) for different values of k and the reordered consensus matrices (right) calculated for the AML–ALL data set. (A) The consensus matrix pattern for k = 2 indicates a stable classification into two samples (most of the values are either 0 or 1 represented in red and blue colors in the picture). This is the expected clustering pattern in this two-class data set. (B) Consensus matrix for k = 5 showing a scattered pattern that indicates a more unstable classification in five classes.
Figure 3.A Heatmap showing the subset of genes and samples in the bicluster. All samples are shown sorted by it association to the bicluster (local pattern). The plot on the upper part of the image represents the coefficients of all samples in the corresponding row of H. In blue are the samples that show the largest coefficient for that factor while in green are those samples associated to others.