| Literature DB >> 23933456 |
Ruiqi Liao1, Yifan Zhang1, Jihong Guan2, Shuigeng Zhou3.
Abstract
In the past decades, advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation. Recently, nonnegative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data as well as to interpret them, and has been applied to various fields of biological research. In this paper, we present CloudNMF, a distributed open-source implementation of NMF on a MapReduce framework. Experimental evaluation demonstrated that CloudNMF is scalable and can be used to deal with huge amounts of data, which may enable various kinds of a high-throughput biological data analysis in the cloud. CloudNMF is freely accessible at http://admis.fudan.edu.cn/projects/CloudNMF.html.Entities:
Keywords: Bioinformatics; MapReduce; Nonnegative matrix factorization
Mesh:
Year: 2013 PMID: 23933456 PMCID: PMC4411332 DOI: 10.1016/j.gpb.2013.06.001
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Algorithm for CloudNMF
| Input: nonnegative matrix |
| Output: nonnegative matrices |
| 1: initiate |
| 2: for each iteration: |
| 3: calculate |
| 4: calculate |
| 5: update |
| 6: calculate |
| 7: calculate |
| 8: update |
| 9: output |
Figure 1Using CloudNMF with a local Hadoop cluster
Figure 2Using CloudNMF with Amazon Web Services
Figure 3Performance of CloudNMF A. Performance of CloudNMF on four real datasets shows the linear correlation of runtime per iteration with a number of nonzero elements in the matrix. B. Performance of CloudNMF on simulated matrices of different sizes but with the same number of nonzero elements shows that the runtime per iteration is linear to the logarithm of matrix size. Note that the X-axis is on a logarithmic scale.