| Literature DB >> 16503973 |
Pedro Carmona-Saez1, Roberto D Pascual-Marqui, F Tirado, Jose M Carazo, Alberto Pascual-Montano.
Abstract
BACKGROUND: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states.Entities:
Mesh:
Year: 2006 PMID: 16503973 PMCID: PMC1434777 DOI: 10.1186/1471-2105-7-78
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1General schema of the method nsNMF approximates the original matrix as a product of two submatrices, W and H. Columns of W are basis experiments while rows of H constitute basis genes (columns of W and rows of H are separated for a better visibility). Coefficients in each pair of basis gene and experiment are used to sort conditions and genes in the original matrix. Conditions and genes with high values in the same basis gene and basis experiment are highly related in a sub-portion of the data and are co-clustered in the upper left corner of the sorted array.
Figure 2Results from synthetic dataset A (a) Original dataset with the two embedded patterns. (b) Dataset sorted by two-way hierarchical clustering. Dataset sorted by (c) the first basis gene and basis experiment and (d) the second basis gene and basis experiment yielded by nsNMF at k = 3. Conditions belonging to pattern Pla are marked in green and conditions belonging to pattern P2a are depicted in orange. The two plots over the heatmaps represent the coefficients of conditions in each sorted basis gene.
Figure 3Results from synthetic dataset B (a) Original dataset with the three embedded patterns and (b) the same dataset sorted by two-way hierarchical clustering. Heatmaps of the original dataset sorted by the (c) first, (d) second, (e) third and (f) fourth basis genes and basis experiments yielded by nsNMF at k = 4 are shown in the bottom part of the figure. Non-overlapping conditions of Plb are marked in red, non-overlapping conditions of P2b are marked in green and non-overlapping conditions of P3b are marked in magenta. The overlapped area between Plb and P2b is marked in brown while the overlapped columns between P2b and P3b are marked in orange. Columns of P4b are marked in blue. Plots over the heatmaps represent coefficients of conditions in each sorted basis gene. The sorted basis genes present gaps indicating the set of conditions belonging to each pattern.
Figure 4Structures from the human transcriptome dataset Plots in the first row represent coefficients of samples in the (a) fourth, (b) fifth and (c) eighth sorted basis genes. Heatmaps in the second row represent the expression matrix in which genes (in rows) and samples (in columns) are sorted by their coefficients in the corresponding basis experiment and basis gene. Only genes that were highly representative of each basis experiment are shown. Dash lines in the third heatmap represent positions of genes that were included in the testis-gene module but were clustered in distant positions to the testis-gene group by hierarchical clustering.
Enrichment of GO categories in gene modules. Enrichment of GO categories in gene modules obtained from (a) the human transcriptome dataset and (b) the soft tissue tumor dataset. Only functional categories containing at least 6 genes and p-values less than 0.01 are reported.
| a) | |||
| Factor 4 (726 genes) | Neurogenesis | 43 | 0.0 |
| Cell adhesion | 33 | 1.30E-04 | |
| Transport | 32 | 0.003 | |
| Synaptic transmission | 31 | 0.0 | |
| Regulation of transcription, DNA-dependent | 25 | 0.0 | |
| Central nervous system development | 17 | 0.0 | |
| Small GTPase mediated signal transduction | 17 | 4.70E-04 | |
| Potassium ion transport | 11 | 0.008 | |
| Sodium ion transport | 10 | 2.40E-04 | |
| Microtubule-based movement | 9 | 0.0 | |
| Neuropeptide signaling pathway | 8 | 2.40E-04 | |
| Regulation of apoptosis | 8 | 0.009 | |
| ATP synthesis coupled proton transport | 6 | 0.001 | |
| Microtubule polymerization | 6 | 3.00E-05 | |
| Vesicle-mediated transport | 6 | 7.70E-04 | |
| Factor 5 (414 genes) | Immune response | 78 | 0.0 |
| Signal transduction | 47 | 0.0 | |
| Intracellular signaling cascade | 29 | 0.0 | |
| Inflammatory response | 26 | 0.0 | |
| Cellular defense response | 21 | 0.0 | |
| Antigen presentation, endogenous antigen | 18 | 0.0 | |
| Antigen processing, endogenous antigen via MHC class I | 18 | 0.0 | |
| Proteolysis and peptidolysis | 17 | 0.003 | |
| Cell motility | 16 | 0.0 | |
| Cell surface receptor linked signal transduction | 15 | 0.0 | |
| Chemotaxis | 13 | 0.0 | |
| Positive regulation of l-kappaB kinase/NF-kappaB cascade | 13 | 0.0 | |
| Regulation of apoptosis | 12 | 0.0 | |
| Heterophilic cell adhesion | 11 | 0.0 | |
| Antimicrobial tumoral response (sensu Vertebrata) | 10 | 1.00E-05 | |
| Small GTPase mediated signal transduction | 10 | 0.004 | |
| Anti-apoptosis | 9 | 1.40E-04 | |
| Defense response | 8 | 7.00E-05 | |
| Induction of apoptosis | 8 | 0.002 | |
| Response to virus | 7 | 0.0 | |
| Cell recognition | 6 | 0.0 | |
| Integrin-mediated signaling pathway | 6 | 2.90E-04 | |
| Factor 8 (339 genes) | Spermatogenesis | 13 | 0.0 |
| Transcription | 11 | 0.0 | |
| Mitosis | 6 | 0.002 | |
| b) | |||
| Factor 1 (546 genes) | Regulation of transcription, DNA-dependent | 32 | 2.00E-05 |
| Development | 16 | 0.003 | |
| Neurogenesis | 9 | 0.008 | |
| Transcription from Pol II promoter | 7 | 0.007 | |
| Morphogenesis | 6 | 0.002 | |
| Skeletal development | 6 | 0.007 | |
| Chromosome organization and biogenesis (sensu Eukarya) | 6 | 2.00E-05 | |
| Factor 2 (674 genes) | Signal transduction | 32 | 7.00E-05 |
| Immune response | 30 | 0.0 | |
| Cell adhesion | 24 | 0.0 | |
| Inflammatory response | 17 | 0.0 | |
| Chemotaxis | 16 | 0.0 | |
| Proteolysis and peptidolysis | 15 | 0.002 | |
| Cell growth and/or maintenance | 13 | 0.002 | |
| Cell-cell signaling | 13 | 7.60E-04 | |
| Cell proliferation | 13 | 0.002 | |
| Antimicrobial humoral response (sensu Vertebrata) | 12 | 0.0 | |
| G-protein coupled receptor protein signaling pathway | 11 | 3.80E-04 | |
| Cell motility | 10 | 2.00E-05 | |
| Cellular defense response | 9 | 0.0 | |
| Protein complex assembly | 6 | 0.002 | |
| Positive regulation of cell proliferation | 6 | 0.008 | |
| Cell-matrix adhesion | 6 | 8.00E-05 | |
| Blood coagulation | 6 | 0.001 | |
| Hetorophilic cell adhesion | 6 | 0.002 | |
| Factor 3 (524 genes) | Signal transduction | 22 | 0.004 |
| Protein folding | 6 | 0.002 | |
| Factor 4 (610 genes) | Metabolism | 16 | 1.00E-04 |
| Muscle development | 11 | 1.00E-05 | |
| Electron transport | 10 | 0.005 | |
| Carbohydrate metabolism | 9 | 1.90E-04 | |
| Muscle contraction | 9 | 0.0 | |
| DNA replication | 6 | 0.002 | |
| Energy pathways | 6 | 4.20E-04 | |
| Fatty acid metabolism | 6 | 5.00E-05 | |
Figure 5Structures from the soft-tissue tumor dataset Each heatmap represents the expression matrix in which samples and genes were sorted by (a) the first, (b) second, (c) third and (d) fourth basis gene and basis experiment. Only genes that were selected as highly representative of each basis experiment are shown. Blue line corresponds to monophasic synovial sarcomas, brown line to gastrointestinal stromal tumors and orange line to six of the eleven leiomyosarcomas samples.