| Literature DB >> 24800208 |
Md Altaf-Ul-Amin1, Tetsuo Katsuragi1, Tetsuo Sato1, Naoaki Ono1, Shigehiko Kanaya1.
Abstract
This work presents a novel approach to predict functional relations between genes using gene expression data. Genes may have various types of relations between them, for example, regulatory relations, or they may be concerned with the same protein complex or metabolic/signaling pathways and obviously gene expression data should contain some clues to such relations. The present approach first digitizes the log-ratio type gene expression data of S. cerevisiae to a matrix consisting of 1, 0, and -1 indicating highly expressed, no major change, and highly suppressed conditions for genes, respectively. For each gene pair, a probability density mass function table is constructed indicating nine joint probabilities. Then gene pairs were selected based on linear and probabilistic relation between their profiles indicated by the sum of probability density masses in selected points. The selected gene pairs share many Gene Ontology terms. Furthermore a network is constructed by selecting a large number of gene pairs based on FDR analysis and the clustering of the network generates many modules rich with similar function genes. Also, the promoters of the gene sets in many modules are rich with binding sites of known transcription factors indicating the effectiveness of the proposed approach in predicting regulatory relations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24800208 PMCID: PMC3988973 DOI: 10.1155/2014/154594
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Distribution of the genes with respect to the count of 1 in their profiles in the context of the digitized matrix.
Nine joint probabilities calculated for each gene pair.
|
| 1 | 0 | −1 |
|---|---|---|---|
| 1 |
|
|
|
| 0 |
|
|
|
| −1 |
|
|
|
Figure 2Distribution of gene pairs in the context of (a) P(1,1) and (b) LPRpos.
Figure 3(a) x-axis is percentage of gene pairs of the distribution of Figure 2(b) selected based on higher LPRpos values and y-axis is percentage of selected gene pairs that share at least 1, 2, or 3 GO terms. Empty markers correspond to gene pairs selected by the proposed method and filled markers corresponding to equal number of randomly selected gene pairs. (b) Actual number of GO terms shared by selected and random gene pairs corresponding to the 1% point of (a).
Figure 4(a) Distribution of the gene pairs with respect to the χ-square P-values. (b) Plot of FDR with respect to cutoff P-values.
Figure 5Distribution of the modules with respect to −log(P-value). P-values determined in the context of all three types of GO terms (a) biological process (BP), (b) molecular function (MF), and (c) cellular compartment (CC). The lower part of each graph is enlarged in the insets.
Richness of similar function genes in selected clusters. For each cluster, hypergeometric P-values, corresponding GO terms, and also the actual number of genes of a particular function are indicated.
| CID | Total number of genes |
| Some relevant GO terms (corresponding number of genes) |
|---|---|---|---|
| 4 | 97 | 1.20 | Cytosolic ribosome (94), structural constituent of ribosome (94), cytoplasmic translation (93), ribosome (96) |
|
| |||
| 16 | 76 | 6.42 | Ribosomal subunit (37), structural molecule activity (38) |
|
| |||
| 19 | 73 | 3.29 | Ribonucleoprotein complex (47), intracellular part (73) |
|
| |||
| 226 | 8 | 1.50 | Nuclear nucleosome (8), DNA bending complex (8) |
|
| |||
| 1 | 113 | 1.42 | Cellular metabolic process (104), intracellular part (109) |
|
| |||
| 44 | 34 | 2.89 | Cytosolic part (21), cytoplasm (34) |
|
| |||
| 35 | 44 | 3.35 | Gene expression (41), primary metabolic process (43) |
|
| |||
| 85 | 17 | 4.76 | Mitochondrial part (14), mitochondrion (16) |
|
| |||
| 155 | 11 | 6.28 | Protein folding (9), protein binding (11), cellular protein metabolic process (10) |
|
| |||
| 278 | 7 | 3.00 | Proteasome complex (7), proteasome storage granule (5) |
|
| |||
| 87 | 16 | 5.26 | Nucleolus (12), non-membrane-bounded organelle (14) |
|
| |||
| 107 | 14 | 1.97 | Mitochondrion organization (12), cellular component organization (13) |
|
| |||
| 121 | 13 | 5.32 | Glycolysis (7), generation of precursor metabolites and energy (9) |
|
| |||
| 442 | 5 | 1.55 | Mitochondrial respiratory chain (5), oxidoreductase complex (5) |
|
| |||
| 173 | 10 | 1.56 | Protein folding (7), unfolded protein binding (5), protein binding (8) |
|
| |||
| 282 | 7 | 5.58 | Modification-dependent protein catabolic process (7), roteasomal ubiquitin-independent protein catabolic process (5) |
|
| |||
| 71 | 15 | 5.90 | Ribosome (13), ribonucleoprotein complex (14) |
|
| |||
| 725 | 3 | 1.61 | Acid phosphatase activity (2) |
|
| |||
| 214 | 9 | 2.88 | Hydrogen ion transmembrane transporter activity (5), single-organism metabolic process (7) |
|
| |||
| 736 | 3 | 4.03 | Asparaginase activity (3) |
|
| |||
| 1092 | 3 | 2.26 | Heme-copper terminal oxidase activity (3) |
|
| |||
| 270 | 7 | 2.32 | Ion transmembrane transporter activity (6) |
Richness of binding sites in the promoters of the module genes corresponding to 10 different transcription factors.
| CID | Size | TF | Number of Promo. (PRIMA) |
| Known regulatory relations (YEASTRACT) |
|---|---|---|---|---|---|
| 3 | 98 | YP00066 [SFP1] | 58 | 2.82 | 98 |
| 5 | 95 | M00213 [RAP1] | 55 | 3.82 | 93 |
| 72 | 18 | YP00036 [MBP1] | 10 | 4.40 | 12 |
| 155 | 11 | M00169 [HSF] | 7 | 2.38 | 11 |
| 230 | 8 | YP00068 [SIP4] | 5 | 7.89 | 4 |
| 227 | 8 | YP00064 [RPN4] | 8 | 1.01 | 8 |
| 725 | 3 | M00064 [PHO4] | 3 | 1.08 | 3 |
| 259 | 7 | YP00076 [STB1] | 5 | 8.97 | 2 |
| 736 | 3 | YP00013 [DAL82] | 3 | 3.65 | 0 |
| 233 | 8 | YP00043 [MSN4] | 8 | 1.03 | 7 |