| Literature DB >> 29910825 |
Jinyu Chen1,2, Shihua Zhang1,2,3.
Abstract
The increasing availability of high-throughput biological data, especially multi-dimensional genomic data across the same samples, has created an urgent need for modular and integrative analysis tools that can reveal the relationships among different layers of cellular activities. To this end, we present a MATLAB package, Matrix Integration Analysis (MIA), implementing and extending four published methods, designed based on two classical techniques, non-negative matrix factorization (NMF), and partial least squares (PLS). This package can integrate diverse types of genomic data (e.g., copy number variation, DNA methylation, gene expression, microRNA expression profiles, and/or gene network data) to identify the underlying modular patterns by each method. Particularly, we demonstrate the differences between these two classes of methods, which give users some suggestions about how to select a suitable method in the MIA package. MIA is a flexible tool which could handle a wide range of biological problems and data types. Besides, we also provide an executable version for users without a MATLAB license.Entities:
Keywords: bioinformatics; data integration; matrix integrative analysis; module discovery; multi-dimensional genomics; non-negative matrix factorization (NMF); partial least squares (PLS)
Year: 2018 PMID: 29910825 PMCID: PMC5992392 DOI: 10.3389/fgene.2018.00194
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Brief summary of the four methods in MIA.
| jNMF | Factorize multi-dimensional genomic data simultaneously to reveal multi-dimensional modules in an unsupervised manner. | Yes | No | Multiple | Zhang et al., |
| SNMNMF | Incorporate prior networks into jNMF for two types of data to enhance co-module discovery. | Yes | Yes | Pairwise | Zhang et al., |
| sMBPLS | Extend sparse PLS regression model for simultaneously analyzing multi-dimensional genomic data to reveal multi-dimensional regulatory modules. | No | No | Multiple | Li et al., |
| SNPLS | Incorporate prior networks into sMBPLS for pairwise data matrices to reveal co-module patterns. | No | Yes | Pairwise | Chen and Zhang, |
Figure 1Illustration of MIA. MIA implements four methods with multiple matrices representing different biological features on the same samples (and prior network knowledge) as input, and discovers different types of multi-dimensional modules (md-modules) as output. MIA also provides several basic statistics about the md-modules, such as their sizes, member lists, size distributions (A), and heatmaps (B). The reported md-modules can be easily adapted for further biological analysis such as survival analysis (C), various enrichment analysis (D), and network analysis (E).
Figure 2An example for output figures of SNMNMF. (A) Heatmaps of input data. (B) An example for heatmaps of identified md-modules (circled in red lines) and randomly selected features for comparison. (C) Sample-wise correlations between original data and corresponding reconstructed data by factorized matrices. (D) Size distributions for three types of components in md-modules.
Figure 3Comparison of jNMF and sMBPLS in terms of relevance scores under two types of golden standards, derived by the characteristics of NMF method (A) and PLS method (B). Here, we apply jNMF and sMBPLS to 50 sets of simulated data matrices for each level of data noise, respectively.
| Input.YBlockInd = []; |
| ‘TCGA |
| …;‘hsa |