| Literature DB >> 19014681 |
Michael Gormley1, Aydin Tozeren.
Abstract
BACKGROUND: Large-scale compilation of gene expression microarray datasets across diverse biological phenotypes provided a means of gathering a priori knowledge in the form of identification and annotation of bimodal genes in the human and mouse genomes. These switch-like genes consist of 15% of known human genes, and are enriched with genes coding for extracellular and membrane proteins. It is of interest to determine the prediction potential of bimodal genes for class discovery in large-scale datasets.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19014681 PMCID: PMC2620272 DOI: 10.1186/1471-2105-9-486
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Model-based clustering of bimodal gene expression identifies cohesive clusters in 19 tissue types. Heat map representation of posterior pairwise probabilities for classification of tissue phenotype. Left column: classification with 1265 bimodal genes. Right column: classification with 300 bimodal genes translated into extracellular matrix or plasma membrane proteins. Top row: Model-based clustering, identifies all tissues distinctly. Middle and bottom rows: Kmeans and hierarchical clustering classify samples into three/four tissue types: brain, cardiac and skeletal muscle and remaining tissues. Blue, green, yellow, orange and red regions of color bar indicate ovary, stomach, small intestine, pancreas and thymus tissue samples respectively. Tissues in the heat map were ordered according to decreasing sample size from left to right.
Microarray datasets used in this study
| Tissue Phenotype Data | ||
| Tissue | No. of Samples | Gene Expression Omnibus/Array Express Accn. # |
| Adipose | 10 | GSE3526 |
| Adrenal | 20 | GSE3526, GSE8514, GSE2316 |
| Brain | 89 | GSE3526, GSE7621, GSE7307, GSE2361, E_AFMX-11, E-TABM-20, |
| Colon | 10 | E-TABM-176, GSE8671, GSE9254, GSE9452 |
| Epidermal | 25 | GSE1133, GSE2361, GSE3419, GSE3526, GSE7307 |
| Heart | 38 | E_AFMX-11, E-MIMR-27, GSE1133, GSE2240, GSE2361, GSE3526, GSE3585, GSE7307 |
| Kidney | 10 | E_AFMX-11, GSE2004, GSE2361, GSE3526, GSE7392 |
| Liver | 10 | E_AFMX-11, GSE2004, GSE3526, GSE6764 |
| Lung | 26 | E-MEXP-231, GSE10072, GSE1133, GSE2361, GSE3526 |
| Mammary | 15 | E-TABM-66, GSE2361, GSE3526, GSE7307, GSE7904 |
| Muscle | 64 | GSE10760, GSE2328, GSE3526, GSE5110, GSE6798, GSE7307, GSE9103, |
| Ovary | 10 | GSE2361, GSE3526, GSE6008, GSE7307 |
| Pancreas | 6 | GSE1133, GSE2361, GSE7307 |
| Peripheral blood | 12 | GSE7462, GSE8608, GSE8668, GSE8762, GSE9692 |
| Small intestine | 7 | GSE2361, GSE7307 |
| Spleen | 12 | GSE2004, GSE2361, GSE3526, GSE7307 |
| Stomach | 10 | GSE2361, GSE3526, GSE7307 |
| Testis | 38 | E_AFMX-11, GSE1133, GSE2361, GSE3218, GSE3526, GSE7307, GSE7808 |
| Thymus | 5 | GSE1133, GSE2361, GSE7307 |
| Infectious Disease | ||
| Disease | No. of Samples | Gene Expression Omnibus/Array Express Accn. # |
| Hepititis C | 147 | GSE11190, GSE7123 |
| HIV | 41 | GSE6740, GSE9927 |
| Influenza A | 28 | GSE6269 |
| Malaria | 15 | GSE5418 |
Adjusted Rand Index compares observed partitions with true classification of samples in tissue phenotype data
| Kmeans | Hierarchical | Model-based | |
| All bimodal genes | 0.291 | 0.463 | 0.683 |
| ECM/MEM genes | 0.456 | 0.304 | 0.881 |
Figure 2Binarized expression of bimodal genes in brain, lung, skeletal muscle and cardiac muscle. Top figure: heat map of 1265 bimodal gene expression in 217 tissue samples. A black/white point at i, j indicates gene i is "on"/"off" in sample j. Bottom figure: bimodal gene expression in KEGG cell adhesion molecules diagram. Genes marked with red are "on" in brain tissue and "off" in muscle tissue. Genes marked with yellow are "off" in muscle tissue.
GO categories significantly enriched with "on" genes in brain tissue
| ▪ Neuron migration | ▪ Cytoskeleton | ▪ Actin binding |
| ▪ Transport | ▪ Microtubule | ▪ GTPase activity |
| ▪ Ion transport | ▪ Microtubule associated complex | ▪ Transmembrane receptor protein tyrosine |
| ▪ Negative regulation of microtubule depolymerization | ▪ Neurofilament | ▪ Structural molecule activity |
| ▪ Cell adhesion | ▪ Membrane | ▪ Strucutural constituent of cytoskeleton |
| ▪ Neuron adhesion | ▪ Integral to membrane | ▪ Ion channel activity |
| ▪ Transmembrane receptor protein tyrosine phosphatase signaling pathway | ▪ Synaptosome | ▪ Structural constituent of myelin sheath |
| ▪ Synaptic transmission | ▪ Cell junction | |
| ▪ Neuromuscular synaptic transmission | ▪ Axon | |
| ▪ Nervous system development | ▪ Growth cone | |
| ▪ Synaptogenesis | ▪ Synapse | |
| ▪ Central nervous system development | ▪ Postsynaptic membrane | |
| ▪ Neuron recognition | ||
| ▪ Anterograde axon cargo transport | ||
| ▪ Neuron differentiation |
P-values < = 0.001 indicates significance.
Figure 3Model-based clustering of bimodal gene expression classifies infectious disease states separately and identifies tissue-specificity in hepatitis C infection. Heat map representation of pairwise posterior probabilities derived from model-based clustering of infectious disease expression data. Left column: Classification of hepatitis C, HIV, influenza A, and malaria profiled in peripheral blood mononuclear cells (PBMCs). Right column: Classification of hepatitis C infection profiled in peripheral blood mononuclear cells and liver biopsies.
GO categories significantly enriched with "on" genes in infectious disease
| ▪ Immune response1, 2, 3, 4, 5 | ▪ B cell receptor complex1, 2, 4, 5 | ▪ Antigen binding1, 2, 4, 5 |
| ▪ Humoral immune response by circulating immunoglobin1, 2, 4, 5 | ▪ Immunoglobulin complex, circulating1, 2, 4, 5 | ▪ Succinate dehydrogenase activity2,3,4 |
| ▪ Positive regulation of B cell proliferation1, 2, 4, 5 | ▪ Perinuclear region of cytoplasm1, 2, 4, 5 | ▪ RNA binding3 |
| ▪ Early endosome to late endosome transport1, 2, 4, 5 | ▪ External side of plasma membrane1,4 | ▪ Structural constituent of cytoskeleton3 |
| ▪ Positive regulation of peptidyl-tyrosine phosphorylation1, 2, 4, 5 | ▪ Membrane fraction4,5 | ▪ Protein binding3 |
| ▪ B cell receptor signaling pathway1, 2, 4, 5 | ▪ Cytoplasm3,5 | ▪ Electron-transferring-flavoprotein |
| ▪ Activation of MAPK activity1, 2, 4 | ▪ Cytoskeleton3 | ▪ dehydrogenase activity5 |
| ▪ tRNA aminoacylation for protein translation1,4 | ▪ Actin cytoskeleton3 | ▪ Endopeptidase inhibitor activity5 |
| ▪ Antigen processing and Presentation1,4 | ▪ Extracellular region5 | ▪ Structural molecule activity5 |
| ▪ DNA methylation3 | ▪ Proteinaceous extracellular matrix5 | ▪ Extracellular matrix structural constituent5 |
| ▪ Translational initiation3 | ▪ Collagen5 | |
| ▪ Negative regulation of protein kinase activity3 | ||
| ▪ Defense response3 | ||
| ▪ Inflammatory response4 | ||
| ▪ Hemocyte development4 | ||
| ▪ Cell-cell adhesion4 | ||
| ▪ Pyridine nucleotide biosynthetic process4 | ||
| ▪ Respiratory burst4 | ||
| ▪ Response to calcium ion3,4 | ||
| ▪ Tricarboxylic acid cycle5 | ||
| ▪ Cell adhesion5 | ||
| ▪ Blood coagulation5 | ||
| ▪ Sensory perception of sound3,5 |
P-values < = 0.001 indicate significance in malaria, influenza A, hepatitis C-PBMCs and hepatitis C-Liver. P-values < = 0.01 indicate significance in HIV. 1malaria, 2influenza A, 3HIV, 4hepatitis C-PBMC, 5hepatitis C-liver
Figure 4Bimodal genes that were switched "on" as a result of HIV infection in KEGG T-cell receptor signalling pathways. Bimodal genes marked with red are "on" in the KEGG T-cell receptor signaling pathway in HIV infection.
Figure 5Effect of sample size, separation and number of informative genes on classification of simulated expression data. Classification accuracy is measured with the area under the receiver operating characteristic curve, which plots 1-specificity versus sensitivity as shown. Expression data was simulated controlling for the separation between classes, the number of samples and the number of genes related to class distinction.
Figure 6Classification accuracy in supervised clustering of tissue phenotypes. Values equal the proportion of true class versus predicted class membership over 100 iterations of training and testing. Values representing correct classification are outlined in bold.
Figure 7Classification accuracy in supervised clustering of infectious disease. Values equal the proportion of true class versus predicted class membership over 100 iterations of training and testing.
Contingency table comparing two partitions
| ⋯ | |||||
| ⋯ | |||||
| ⋯ | |||||
| ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | |
| ⋯ | |||||
| ⋯ | |||||