| Literature DB >> 23527177 |
Yan Cui1, Chun-Hou Zheng, Jian Yang.
Abstract
Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters.Entities:
Mesh:
Year: 2013 PMID: 23527177 PMCID: PMC3602020 DOI: 10.1371/journal.pone.0059377
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Clustering and subspace clustering of a gene expression matrix: (A) a gene cluster must contain all columns, (B) subspace clusters correspond to arbitrary subsets of rows and columns, shown here as rectangles.
Figure 2ROC curves for synthetic data. (SNR denotes the signal-to-noise ratio).
AUC statistics for synthetic data.
| SNR = 0.5 | SNR = 1.0 | SNR = 1.5 | SNR = 2.0 | |
|
| 0.6643 | 0.7547 | 0.8253 | 0.9233 |
| GPCA | 0.6145 | 0.7128 | 0.8652 | 0.9255 |
| LRR | 0.8928 | 0.9435 | 0.9681 | 0.9908 |
The most enriched GO categories of modular enrichment in each gene clusters uncovered by LRR from yeast dataset.
| Cluster | No. of genes with in functional category | Major GO categories | Corrected |
| C1(121genes) | 10 | Starch and sucrose metabolism | 2.99342E-11 |
| C2(86genes) | 4 | structural constituent of cytoskeleton | 4.7389E-2 |
| C3(30genes) | 14 | response to stress | 6.33965E-30 |
| C4(663genes) | 151 | integral to membrane | 3.67556E-2 |
| C5(45genes) | 27 | oxidation-reduction process | 1.04843E-25 |
| C6(38genes) | 3 | DNA repair | 4.92582E-5 |
| C7(69genes) | 10 | ion transport | 6.84207E-13 |
| C8(71genes) | 16 | Glycolysis/Gluconeogenesis | 3.15299E-19 |
| C9(181genes) | 92 | ribosome biogenesis | 9.62533E-119 |
| C10(87genes) | 5 | prospore membrane | 2.30387E-5 |
| C11(34genes) | 10 | helicase activity | 9.29836E-11 |
| C12(414genes) | 10 | hydrolase activity | 2.44361E-5 |
| C13(551genes) | 11 | regulation of transcription, DNA-dependent | 2.0876E-3 |
| C14(114genes) | 37 | cellular amino acid biosynthetic process | 2.33136E-41 |
| C15(393genes) | 3 | mitotic recombination | 1.67731E-4 |
| C16(25genes) | 20 | transposition, RNA-mediated | 3.23626E-35 |
| C17(130genes) | 116 | structural constituent of ribosome | 3.58803E-202 |
| C18(454genes) | 77 | Biosynthesis of secondary metabolites | 1.57577E-32 |
| C19(83genes) | 3 | sporulation resulting in formation of a cellular spore | 2.12806E-2 |
| C20(511genes) | 3 | oxidation-reduction process | 4.34938E-4 |
| C21(75genes) | 3 | transferase activity, transferring phosphorus-containing groups | 9.06387E-3 |
| C22(27genes) | 3 | metal ion binding | 1.04585E-5 |
| C23(675genes) | 5 | transport | 5.76795E-3 |
| C24(183genes) | 92 | ribisome biogenesis | 3.82873E-112 |
| C25(553genes) | 73 | transcription, DNA-dependent | 2.91171E-7 |
| C26(287genes) | 35 | extracellular region | 2.62718E-24 |
| C27(50genes) | 31 | mitochondrion | 2.79651E-53 |
| C28(801genes) | 38 | vesicle-mediated transport | 1.40759E-8 |
| C29(347genes) | 5 | guanyl-nucleotide exchage factor activity | 1.6044E-4 |
| C30(258genes) | 30 | fungal-type cell wall | 1.30198E-22 |
The columns of the table summarize the total sizes of the module (numbers in parentheses), the number of genes annotated in the cluster, the GO categories associated with the cluster, and the P-value after FDR correction.
Results of genes analysis in gene clusters uncovered by LRR from yeast dataset.
| Cluster | Major GOcategories | Genes |
| C3 (14/30) | response to stress | YBL075C,YBR082C,YBR169C,YDR171W,YDR214W,YDR258C,YER103W,YFL016C,YJL034W,YJR045C,YLL024C,YLL026W, YLR259C,YMR186W |
| C16 (20/25) | transposition,RNA-mediated | YAR009C,YBL005W-A,YBR012W-A,YBR012WB, YCL019W,YCL020W,YER138C,YER160C,YHR214CB, YJR026W,YJR027W,YJR028W,YJR029W,YML039W,YML040W,YML045W,YMR045C,YMR046C,YMR050C,YMR051C |
Only selected two enriched functional categories and the corresponding annotated genes are presented. The columns of the table summarize the number of annotated genes in the module versus the total size of the cluster (numbers in the parentheses), the GO categories associated with the cluster, and a set of annotated genes.
Singular enrichment of GO (or KEGG) categories in gene clusters uncovered by LRR from yeast dataset.
| Cluster | NG | Corrected | Annotations |
| C3 | 17 | 1.22794E-23 | protein folding (BP) |
| 15 | 2.25951E-20 | unfolded protein binding (MF) | |
| 3 | 2.89824E-5 | TRC complex (CC) | |
| 11 | 5.73217E-14 | Protein processing in endoplasmic reticulum (KEGG) | |
| C5 | 25 | 3.27373E-25 | oxidation-reduction process (BP) |
| 27 | 1.23536E-25 | oxidoreductase activity (MF) | |
| 4 | 2.51171E-3 | mitochondrial intermembrane space (CC) | |
| 5 | 1.79973E-8 | Linoleic acid metabolism (KEGG) | |
| C9 | 92 | 5.76648E-107 | ribosome biogenesis (BP) |
| 13 | 1.13044E-15 | snoRNA binding (MF) | |
| 113 | 3.1653E-108 | nucleolus (CC) | |
| 26 | 7.46338E-20 | ribosome biogenesis in eukaryotes (KEGG) | |
| C14 | 37 | 6.24985E-41 | cellular amino acid biosynthetic process (BP) |
| 30 | 2.12287E-10 | catalytic activity (MF) | |
| 2 | 4.07894E-3 | sulfite reductase complex (NADPH) (CC) | |
| 32 | 3.31541E-20 | Biosynthesis of secondary metabolites (KEGG) | |
| C16 | 20 | 9.50421E-34 | transposition,RNA-mediated (BP) |
| 12 | 5.27074E-20 | ribonuclease H activity (MF) | |
| 20 | 1.38697E-34 | retrotransposon nucleocapsid (CC) | |
| C17 | 113 | 3.08315E-199 | cytoplasmic translation (BP) |
| 116 | 2.51791E-169 | structural constituent of ribosome (MF) | |
| 67 | 5.19679E-102 | cytosolic large ribosomal subunit (CC) | |
| 116 | 1.16868E-198 | Ribosome (KEGG) | |
| C18 | 54 | 8.94213E-11 | oxidation-reduction process (BP) |
| 69 | 2.07793E-11 | catalytic activity (MF) | |
| 215 | 7.75913E-15 | plasma membrane enriched fraction (CC) | |
| 77 | 6.06951E-34 | Biosynthesis of secondary metabolites (KEGG) | |
| C24 | 92 | 3.82873E-112 | ribisome biogenesis (BP) |
| 13 | 1.33056E-15 | snoRNA binding (MF) | |
| 113 | 2.07758E-107 | nucleolus (CC) | |
| 26 | 9.9672E-20 | Ribosome biogenesis in eukaryotes (KEGG) | |
| C26 | 31 | 7.8069E-19 | cellular cell wall organization (BP) |
| 11 | 8.43025E-8 | cyclin-dependent protein kinase regulator activity (MF) | |
| 35 | 4.84777E-25 | fungal-type cell wall (CC) | |
| 26 | 1.1831E-10 | Cell cycle-yeast (KEGG) | |
| C27 | 10 | 1.40896E-16 | ATP synthesis coupled proton transport (BP) |
| 10 | 8.57624E-14 | proton-transporting ATPase activity, rotational mechanism (MF) | |
| 31 | 1.69019E-34 | mitochondrial inner membrane (CC) | |
| 31 | 1.42892E-50 | Oxidative phosphorylation (KEGG) | |
| C30 | 50 | 4.52988E-17 | cell cycle (BP) |
| 10 | 5.70948E-7 | cyclin-dependent protein kinase regulator activity (MF) | |
| 30 | 2.94031E-22 | fungal-type cell wall (CC) | |
| 26 | 9.06875E-12 | Cell cycle-yeast (KEGG) |
Only significantly enriched functional categories (corrected P-value<10−20) are presented. The columns of the table summarize the total sizes of the cluster (numbers in parentheses), the number of annotated genes in the cluster, the P-value after FDR correction, and the GO categories associated with the cluster.
Figure 3Enriched combinations of significant annotations of Biological Process of Cluster C17: (A) pie graph, (B) bar graph.
Figure 4Enriched combinations of significant annotations of Molecular Function of Cluster C17: (A) pie graph, (B) bar graph.
The most enriched categories of modular enrichment in each gene clusters uncovered by K-means clustering from yeast dataset.
| Cluster | No. of genes with infunctional category | Major GO categories | Corrected |
| C1(133genes) | 39 | regulation of cyclin-dependent protein | 1.4052E-11 |
| C2(259genes) | 54 | regulation of transcription, DNA-dependent | 2.05448E-10 |
| C3(259genes) | 57 | DNA binding | 2.72567E-13 |
| C4(327genes) | 11 | Peroxisome | 4.10943E-6 |
| C5(219genes) | 11 | protein targeting to ER | 3.96874E-5 |
| C6(131genes) | 33 | regulation of transcription, DNA-dependent | 2.1956E-8 |
| C7(216genes) | 17 | nucleotide binding | 4.16395E-4 |
| C8(152genes) | 22 | protein folding | 1.68904E-15 |
| C9(193genes) | 5 | ATP binding | 2.8324E-4 |
| C10(203genes) | 18 | hydrolase activity | 1.07105E-13 |
| C11(396genes) | 131 | translation | 7.02517E-134 |
| C12(171genes) | 3 | nucleic acid binding | 3.46107E-3 |
| C13(152genes) | 3 | nucleosome assembly | 1.48603E-2 |
| C14(126genes) | 30 | mitochondrial translation | 3.81352E-44 |
| C15(191genes) | 12 | mRNA processing | 9.64789E-6 |
| C16(261genes) | 71 | membrane | 2.59589E-20 |
| C17(131genes) | 21 | response to stress | 2.73901E-9 |
| C18(96genes) | 13 | cellular amino acid biosynthetic process | 5.76184E-16 |
| C19(321genes) | 14 | cytoplasm | 8.42409E-13 |
| C20(130genes) | 13 | nucleotide binding | 2.13419E-8 |
| C21(227genes) | 3 | membrane | 3.85374E-2 |
| C22(154genes) | 79 | integral to membrane | 4.31568E-11 |
| C23(207genes) | 51 | transcription | 2.17224E-13 |
| C24(435genes) | 133 | Ribosome biogenesis | 1.04031E-103 |
| C25(127genes) | 59 | integral to membrane | 2.25528E-4 |
| C26(180genes) | 66 | integral to membrane | 4.56057E-4 |
| C27(277genes) | 17 | sporulation resulting in formation of a cellular spore | 4.20206E-2 |
| C28(165genes) | 4 | ubiquitin-dependent protein catabolic process | 2.09826E-4 |
| C29(123genes) | 24 | mitochondrion | 6.43371E-33 |
| C30(223genes) | 12 | oxidation-reduction process | 6.82702E-4 |
The columns of the table summarize the total sizes of the cluster (numbers in parentheses), the number of genes annotated in the cluster, the GO categories associated with the cluster, and the P-value after FDR correction.
The most enriched categories of modular enrichment in each gene clusters uncovered by GPCA from yeast dataset.
| Cluster | No. of genes with infunctional category | Major GO categories | Corrected |
| C1(271genes) | 4 | oxidation-reduction process | 3.05689E-2 |
| C2(194genes) | 3 | metabolic process | 2.5142E-2 |
| C3(214genes) | 5 | ribosome biogenesis | 1.92418E-3 |
| C4(234genes) | 17 | regulation of transcription, DNA-dependent | 4.2973E-4 |
| C5(203genes) | 17 | transposition, RNA-mediated | 1.61358E-7 |
| C6(207genes) | 6 | proteolysis | 5.36602E-6 |
| C7(194genes) | 3 | metal ion binding | 2.00179E-5 |
| C8(228genes) | 5 | DNA replication | 7.49636E-5 |
| C9(200genes) | 7 | catalytic activity | 5.24079E-5 |
| C10(173genes) | 20 | cytoplasmic translation | 6.84959E-11 |
| C11(219genes) | 3 | ubiquitin-protein ligase activity | 2.88482E-5 |
| C12(205genes) | 6 | glycolysis | 1.42749E-8 |
| C13(210genes) | 5 | protein refolding | 4.29455E-7 |
| C14(183genes) | 5 | transport | 1.77438E-5 |
| C15(224genes) | 5 | purine base biosynthetic process | 1.73835E-7 |
| C16(235genes) | 4 | phosphorylation | 1.65757E-5 |
| C17(200genes) | 7 | ATP binding | 5.24079E-5 |
| C18(89genes) | 3 | DNA repair | 1.89736E-6 |
| C19(203genes) | 7 | structural constituent of ribosome | 5.25787E-7 |
| C20(189genes) | 4 | flavin adenine dinucleotide binding | 1.10535E-3 |
| C21(215genes) | 3 | mitotic spindle elongation | 1.06714E-4 |
| C22(185genes) | 3 | sequence-specific DNA binding | 1.66864E-4 |
| C23(197genes) | 52 | ribosome biogenesis | 1.69208E-32 |
| C24(202genes) | 9 | nucleosome assembly | 5.30185E-13 |
| C25(190genes) | 17 | rRNA processing | 7.47168E-7 |
| C26(168genes) | 3 | small GTPase mediated signal transduction | 1.2521E-4 |
| C27(219genes) | 36 | regulation of transcription, DNA-dependent | 6.05907E-9 |
| C28(216genes) | 8 | cellular aldehyde metabolic process | 9.75059E-11 |
| C29(221genes) | 8 | ergosterol biosynthetic process | 8.66254E-10 |
| C30(205genes) | 28 | structural constituent of ribosome | 6.60734E-14 |
The columns of the table summarize the total sizes of the cluster(numbers in parentheses), the number of genes annotated in the cluster, the GO categories associated with the cluster, and the P-value after FDR correction.
Figure 5Two heatmaps of expression values of genes analyzed by the proposed algorithm from the yeast dataset: (A) a heatmap of expression values of genes in Cluster C17, and the heatmap shows similar expression patterns of genes in different samples, (B) a heatmap of expression values of genes in Cluster C14, and the heatmap shows different expression patterns of genes in different samples (denoted as a and b).
Comparison of statistical significance of enriched functional categories in gene clusters uncovered by LRR and K-means from yeast dataset.
| Major GO categories | LRR |
|
| hydrolase activity | 2.44361E-5 (10/414) | 1.07105E-13 (18/203) |
| response to stress | 6.33965E-30 (14/30) | 2.73901E-9 (21/131) |
| cellular amino acid biosynthetic process | 2.33136E-41 (37/114) | 5.76184E-16 (13/96) |
| integral to membrane | 3.67556E-2 (151/663) | 4.56057E-4 (66/180) |
| sporulation resulting in formation of a cellular spore | 2.12806E-2 (3/83) | 4.20206E-2 (17/277) |
| mitochondrion | 2.79651E-53 (31/50) | 6.43371E-33 (24/123) |
| oxidation-reduction process | 1.04843E-25 (27/45) | 6.82702E-4 (12/223) |
| regulation of transcription, DNA-dependent | 2.0876E-3 (11/551) | 2.05448E-10 (54/259) |
Only selected common significantly enriched functional categories are presented. The columns of the table summarize the GO categories associated with the cluster, the P-values after FDR correction by each approach, and the number of genes in the cluster that are annotated with the corresponding GO category versus the total size of the cluster(numbers in the parentheses).
The average values of negative logarithm of corrected P-value on three datasets.
| a | b | c | |
| Yeast Dataset | Yeast_Spellman Dataset | Normal Human Tissue Dataset | |
|
| 17.0343 | 10.7687 | 6.6664 |
| GPCA | 6.7035 | 5.3445 | 9.7273 |
| LRR | 27.0948 | 13.5402 | 20.1414 |
In the table, (a), (b) and (c) list the average values of negative logarithm of corrected P-value on Yeast Dataset, Yeast_Spellman dataset and Normal Human Tissue Dataset using three methods, respectively.
Figure 6Two heatmaps of expression values of genes analyzed by the proposed algorithm from the yeast_Spellman dataset: (A) a heatmap of expression values of genes in Cluster C27, and the heatmap shows similar expression patterns of genes in different samples, (B) a heatmap of expression values of genes in Cluster C10, and the heatmap shows different expression patterns of genes in different samples (denoted as a and b).
Figure 7Two heatmaps of expression values of genes analyzed by the proposed algorithm from the normal human tissue dataset: (A) a heatmap of expression values of genes in Cluster C18, and the heatmap shows similar expression patterns of genes in different samples, (B) a heatmap of expression values of genes in Cluster C3, and the heatmap shows different expression patterns of genes in different samples (denoted as a and b).