| Literature DB >> 20169000 |
Huanping Zhang1, Xiaofeng Song, Huinan Wang, Xiaobai Zhang.
Abstract
Computational analysis of microarray data has provided an effective way to identify disease-related genes. Traditional disease gene selection methods from microarray data such as statistical test always focus on differentially expressed genes in different samples by individual gene prioritization. These traditional methods might miss differentially coexpressed (DCE) gene subsets because they ignore the interaction between genes. In this paper, MIClique algorithm is proposed to identify DEC gene subsets based on mutual information and clique analysis. Mutual information is used to measure the coexpression relationship between each pair of genes in two different kinds of samples. Clique analysis is a commonly used method in biological network, which generally represents biological module of similar function. By applying the MIClique algorithm to real gene expression data, some DEC gene subsets which correlated under one experimental condition but uncorrelated under another condition are detected from the graph of colon dataset and leukemia dataset.Entities:
Mesh:
Year: 2010 PMID: 20169000 PMCID: PMC2822236 DOI: 10.1155/2009/642524
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1Illustration of differentially coexpressed (DEC) disease gene subset between normal samples and disease samples. The left 20 samples are normal samples and the right 20 samples are disease samples.
Concepts of entropy and MI defined by Shannon's theory of information.
| Concepts of Shannon's theory of information | Descriptions |
|---|---|
| The uncertainty of a random variable X is measured by its entropy | |
| The uncertainty of a random variably | |
| The uncertainty of a pair of random variables | |
| Given two random variables | |
| MI( |
Figure 2Gene networks for simulated gene data with different thresholds. (a) T1 = 2.2; T2 = 0.8; (b) T1 = 2.0; T2 = 1.0; (c) T1 = 1.8; T2 = 1.2.
Figure 3Different threshold values lead to different results for colon dataset. (a) Percentage of isolated vertices. (b) Density of the graph (number of edges divided by maximum possible number of edges, which is C20002).
Genes accession numbers in each clique identified by MIClique from colon dataset.
| Clique number | Genes in each clique |
|---|---|
| 1 | M63391 H64489 R87126 X74295 |
| 2 | H64489 R87126 T92451 X74295 |
| 3 | H64489 R87126 X74295 J02854 |
| 4 | R87126 X74295 X86693 U19969 |
| 5 | R87126 X74295 J02854 U19969 |
| M63391 R87126 X74295 U19969 |
Figure 4The cohesive subgroup identified from colon dataset; the overlapped clique group with six cliques and eight genes.
Figure 5Images of the MI matrices for the eight genes in colon dataset. (a) Normal samples. (b) Disease samples.
Eight differentially coexpressed genes in cohesive subgroup identified from colon dataset.
| Accession number | Gene symbol | UniProtKB ID | Gene descriptions |
|---|---|---|---|
| M63391 | DESMIN (DES) | P17661 | Human desmin gene, complete cds |
| H64489 | CD37 | P11049 | Leukoyte antigen CD37 (Homo sapiens) |
| R87126 | MYH9_HUMAN | P14105 | Myosin heavy chain, nonmuscle (Gallus gallus) |
| T92451 | TPM2 | P07951 | Tropomyosin, Fibroblast and epithelial muscle-type (Human) |
| X74295 | ITGA7 | Q13683 | H.sapiens mRNA for alpha 7B integrin |
| J02854 | MYL2 | P10916 | Myosin regulatory light chain 2, smooth muscle isoform (Human) |
| X86693 | SPARCL1(Hevin) | Q14515 | H.sapiens mRNA for hevin like protein |
| U19969 | ZEB1(ZEB) | Q13088 | Human two-handed zinc finger protein ZEB mRNA |
Figure 6Differentially coexpressed profiles of the eight genes in two kinds of samples; samples 1–22 represent normal samples and samples 23–62 are disease samples.
GO annotations of eight DEC genes identified from colon dataset by MIClique.
| Gene Symbol | Ontology | GO Terms | |
|---|---|---|---|
| DESMIN | Biological process | Cytoskeleton organization; muscle contraction; regulation of heart contraction | |
| Cellular component | Z disc | ||
| Molecular function | Protein binding | ||
| CD37 | Biological process | Protein amino acid N-linked glycosylation | |
| Cellular component | Plasma membrane | ||
| MYH9 | Biological process | Actin cytoskeleton reorganization; actin filament-based movement; angiogenesis; blood vessel endothelial cell migration; cytokinesis; membrane protein ectodomain proteolysis; monocyte differentiation; platelet formation; protein transport; regulation of cell shape | |
| Cellular component | Cleavage furrow; contractile ring; cytoplasm; cytosol; integrin complex; nucleus; plasma membrane; ruffle; stress fiber | ||
| Molecular function | Actin filament binding; ATPase activity; microfilament motor activity; protein anchor; protein homodimerization activity | ||
| TPM2 | Biological process | Regulation of ATPase activity | |
| Cellular component | Muscle thin filament tropomyosin | ||
| Molecular function | Actin binding; structural constituent of muscle | ||
| ITGA7 | Biological process | Cell-matrix adhesion; muscle organ development; integrin-mediated signaling pathway | |
| Molecular function | Calcium ion binding; protein binding; receptor activity | ||
| MYL2 | Biological process | Cardiac myofibril assembly; heart contraction; negative regulation of cell growth; regulation of striated muscle contraction; ventricular cardiac muscle morphogenesis | |
| Cellular component | Sarcomere | ||
| Molecular function | Actin monomer binding; calcium ion binding; myosin heavy chain binding; protein binding; structural constituent of muscle | ||
| SPARCL1 | Biological process | Signal transduction | |
| Molecular function | Calcium ion binding | ||
| ZEB1 | Biological process | Cell proliferation; immune response; negative regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA-dependent | |
| Molecular function | Transcription coactivator activity; transcription corepressor activity; transcription factor activity; zinc ion binding | ||
Figure 7Images of the Euclidean distance matrices for the eight genes from colon dataset. (a) Normal samples. (b) Disease samples.
Figure 8Images of the Pearson correlation coefficient matrices for the eight genes from colon dataset. (a) Normal samples. (b) Disease samples.
Differentially coexpressed genes correlated in ALL but not in AML in Leukemia dataset.
| Accession numbers | Gene symbols | UniProt | Gene descriptions |
|---|---|---|---|
| HG4074-HT4344 | FEN1(RAD2) | P39748 | Rad2 |
| L41870 | RB1 | P06400 | Retinoblastoma 1 (including osteosarcoma) |
| U18062 | TAF7(TAFII55) | Q15545 | Human TFIID subunit TAFII55 mRNA |
| M92287 | CCND3 | P30281 | Cyclin D3 |
| U28833 | RCAN1(DSCR1) | Q9UF15 | Down syndrome critical region protein (DSCR1) mRNA |
| X56468 | YWHAQ | P27348 | 14-3-3 protein tau |
| X84373 | NRIP1(RIP140) | P48552 | Nuclear factor RIP140 |
| Z23064 | RBMX | P38159 | Heterogeneous nuclear ribonucleoprotein G |