| Literature DB >> 16232321 |
Markey C McNutt1, Ron Tongbai, Wenwu Cui, Irene Collins, Wendy J Freebern, Idalia Montano, Cynthia M Haggerty, Gvr Chandramouli, Kevin Gardner.
Abstract
BACKGROUND: The purpose of this study is to determine whether or not there exists nonrandom grouping of cis-regulatory elements within gene promoters that can be perceived independent of gene expression data and whether or not there is any correlation between this grouping and the biological function of the gene.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16232321 PMCID: PMC1274301 DOI: 10.1186/1471-2105-6-259
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Cluster analysis of gene expression data set from mitogen stimulated T-cells compared to promoter TFBS composition. (a) K-means cluster analysis of cDNA expression profiles of phorbol ester and ionomycin stimulated Jurkat T-cells collected at 0, 1, 2, 6, 12, and 24 hours after stimulation [11]. Total genes in each cluster is indicated in parentheses. (b) Centroid plot representing average kinetic profiles of the four clusters at the six measured time intervals. (c) Principal component analysis (PCA) of TFBS frequencies in the genomic sequences extracted from the 7,298 genes profiled in Figure 1a. (1200 base pairs upstream and 200 base pairs down stream from the start of transcription). Prior to analysis, each gene was color-coded by its respective cluster shown in Figure 1a (red = cluster/group 1, no change, green = cluster/group 2 early elevated expression, blue = cluster/group 3, repressed expression, and yellow = cluster/group 4 late elevated expression). (d) The extracted promoter sequences of each gene were then compared with respect to TFBS composition alone by K-means clustering. Nine out of sixteen clusters contained more the 4 genes (indicated as groups 1,4,6,9,10,12,13,15, and 16).
Distribution of Ontology terms within Gene Clusters. The gene clusters identified in Figure 1d were analyzed for asymmetric distribution of ontology terms using the Gominer Software [16]. The top 40 gene ontology terms for each cluster ranked by significance scoring (Fishers exact T-test) are shown. Total numbers of genes in each cluster are indicated in parentheses. Statistical ranking of asymmetrically distributed gene ontology terms is represented by an estimated p-value (Fisher's Exact T-test).
| 0.0003 | DNA dependent DNA replication |
| 0.0003 | mitotic cell cycle |
| 0.0008 | DNA replication |
| 0.001 | structural constituent of cytoskeleton |
| 0.0014 | metabolism |
| 0.0015 | proteolysis and peptidolysis |
| 0.0016 | cell cycle |
| 0.0016 | hydrolase activity |
| 0.0016 | S phase of mitotic cell cycle |
| 0.0021 | protein metabolism |
| 0.0024 | protein catabolism |
| 0.0028 | DNA replication and chromosome cycle |
| 0.0029 | small ribosomal subunit |
| 0.0031 | intracellular |
| 0.0031 | extracellular |
| 0.004 | DNA replication factor C complex |
| 0.0059 | nucleic acid binding activity |
| 0.0059 | ATP dependent helicase activity |
| 0.006 | transmembrane receptor protein phosphatase activity |
| 0.006 | transmembrane receptor protein tyrosine phosphatase activity |
| 0.0061 | cell proliferation |
| 0.0063 | mitochondrial inner membrane |
| 0.0065 | extracellular space |
| 0.0065 | macromolecule catabolism |
| 0.0071 | protein phosphatase activity |
| 0.0073 | nucleobase, nucleoside, nucleotide and nucleic acid metabolism |
| 0.0074 | replication fork |
| 0.0078 | protein amino acid dephosphorylation |
| 0.0078 | dephosphorylation |
| 0.0078 | protein-ligand dependent protein catabolism |
| 0.0081 | mitochondrial ribosome |
| 0.009 | inner membrane |
| 0.0092 | mitochondrion |
| 0.0095 | cellular_component unknown |
| 0.0111 | helicase activity |
| 0.0113 | organellar ribosome |
| 0.0123 | N-linked glycosylation |
| 0.0123 | di-, tri-valent inorganic cation homeostasis |
| 0.014 | proton-transporting ATP synthase complex |
| 0.014 | spindle |
| 0.0003 | mitochondrion |
| 0.0005 | metabolism |
| 0.0008 | intracellular |
| 0.0018 | biosynthesis |
| 0.0022 | complement activation, alternative pathway |
| 0.003 | complement activation |
| 0.0044 | complement activity |
| 0.0047 | sugar binding activity |
| 0.0047 | carbohydrate binding activity |
| 0.006 | humoral defense mechanism (sensu Vertebrata) |
| 0.0067 | plasma membrane |
| 0.007 | cell adhesion molecule activity |
| 0.0071 | 1-phosphatidylinositol 3-kinase complex |
| 0.0071 | membrane attack complex |
| 0.0071 | hydrolase activity, acting on acid anhydrides, catalyzing transmembrane movement of substances |
| 0.0071 | phosphatidylinositol 3-kinase activity |
| 0.0079 | ATP-binding cassette (ABC) transporter activity |
| 0.0098 | cell adhesion |
| 0.0099 | chemotaxis |
| 0.0099 | taxis |
| 0.0125 | cell-cell adhesion |
| 0.013 | mitochondrial membrane |
| 0.0151 | lectin |
| 0.0156 | G-protein coupled receptor protein signaling pathway |
| 0.0176 | cellular_component unknown |
| 0.0187 | P-P-bond-hydrolysis-driven transporter activity |
| 0.02 | thyroid hormone generation |
| 0.02 | lipid raft |
| 0.02 | ethanol oxidation |
| 0.02 | ethanol metabolism |
| 0.02 | flowering |
| 0.02 | thyroid hormone metabolism |
| 0.02 | aldo-keto reductase activity |
| 0.02 | alcohol dehydrogenase activity, iron-dependent |
| 0.02 | alcohol dehydrogenase activity, metal ion-independent |
| 0.02 | T-cell differentiation |
| 0.02 | negative regulation of Wnt receptor signaling pathway |
| 0.02 | fluid secretion |
| 0.022 | homophilic cell adhesion |
| 0.0266 | humoral immune response |
| 0.0002 | cytoplasm |
| 0.001 | transcription |
| 0.0012 | regulation of transcription, DNA-dependent |
| 0.0013 | regulation of transcription |
| 0.0015 | transcription, DNA-dependent |
| 0.0029 | immune response |
| 0.0029 | nucleus |
| 0.0034 | transferase activity, transferring sulfur-containing groups |
| 0.0034 | solute:sodium symporter activity |
| 0.005 | defense response |
| 0.0051 | phenol metabolism |
| 0.0051 | catecholamine metabolism |
| 0.0051 | organic acid transporter activity |
| 0.0053 | cell communication |
| 0.0055 | response to biotic stimulus |
| 0.0059 | protein modification |
| 0.0063 | protein kinase CK2 activity |
| 0.0069 | solute:cation symporter activity |
| 0.0071 | response to external stimulus |
| 0.0084 | negative regulation of transcription |
| 0.0093 | biogenic amine metabolism |
| 0.0093 | adherens junction |
| 0.0096 | cAMP-dependent protein kinase activity |
| 0.0096 | cyclic-nucleotide dependent protein kinase activity |
| 0.0096 | casein kinase activity |
| 0.0097 | transcription from Pol II promoter |
| 0.0099 | secretin-like receptor activity |
| 0.0099 | neurotransmitter:sodium symporter activity |
| 0.0099 | neurotransmitter transporter activity |
| 0.0099 | biogenic amine biosynthesis |
| 0.0103 | protein amino acid phosphorylation |
| 0.0106 | G-protein coupled receptor activity |
| 0.0112 | neurogenesis |
| 0.0119 | transmembrane receptor protein serine/threonine kinase signaling pathway |
| 0.0128 | phosphorylation |
| 0.0139 | small GTPase mediated signal transduction |
| 0.0141 | protein kinase activity |
| 0.0151 | brain development |
| 0.016 | frizzled receptor signaling pathway |
| 0.016 | frizzled receptor activity |
| <.0001 | nucleobase, nucleoside, nucleotide and nucleic acid metabolism |
| <.0001 | nucleus |
| <.0001 | intracellular |
| <.0001 | extracellular space |
| <.0001 | extracellular |
| <.0001 | RNA binding activity |
| <.0001 | nucleic acid binding activity |
| 0.0001 | plasma glycoprotein |
| 0.0001 | oxidoreductase activity, acting on the CH-NH2 group of donors, oxygen as acceptor |
| 0.0003 | oxidoreductase activity, acting on the CH-NH2 group of donors |
| 0.0003 | molecular_function |
| 0.0003 | alpha-type channel activity |
| 0.0004 | response to external stimulus |
| 0.0004 | channel/pore class transporter activity |
| 0.0005 | chymotrypsin activity |
| 0.0005 | RNA metabolism |
| 0.0007 | trypsin activity |
| 0.0011 | metabolism |
| 0.0014 | immune response |
| 0.0015 | defense response |
| 0.0016 | RNA processing |
| 0.0025 | response to biotic stimulus |
| 0.0028 | cell surface receptor linked signal transduction |
| 0.0028 | integral to membrane |
| 0.0031 | regulation of transcription |
| 0.0032 | transcription |
| 0.0037 | signal transducer activity |
| 0.0039 | translation regulator activity |
| 0.004 | regulation of transcription, DNA-dependent |
| 0.004 | membrane |
| 0.0042 | voltage-gated ion channel activity |
| 0.0044 | ligand-dependent nuclear receptor activity |
| 0.0044 | potassium channel activity |
| 0.0044 | steroid hormone receptor activity |
| 0.0045 | ion transport |
| 0.005 | small GTPase mediated signal transduction |
| 0.0051 | nucleoplasm |
| 0.0052 | cation channel activity |
| 0.0054 | digestion |
| 0.0058 | ligand-regulated transcription factor activity |
| 0.0002 | development |
| 0.0002 | extracellular matrix structural constituent |
| 0.0003 | muscle development |
| 0.0004 | muscle contraction |
| 0.0007 | intramolecular isomerase activity |
| 0.0013 | cell differentiation |
| 0.0014 | mitochondrion |
| 0.002 | cellular process |
| 0.002 | organogenesis |
| 0.0022 | cell adhesion |
| 0.0027 | cytoskeleton |
| 0.0032 | oncogenesis |
| 0.0032 | structural constituent of cytoskeleton |
| 0.0033 | cell communication |
| 0.0036 | morphogenesis |
| 0.0037 | troponin complex |
| 0.0037 | NGF/TNF (6 C-domain) receptor activity |
| 0.0042 | circulation |
| 0.0046 | structural molecule activity |
| 0.0048 | actin cytoskeleton |
| 0.0049 | cell motility |
| 0.005 | muscle fiber |
| 0.0056 | photoreceptor activity |
| 0.0056 | G-protein coupled photoreceptor activity |
| 0.0056 | collagen type I |
| 0.011 | intermediate filament cytoskeleton |
| 0.011 | intermediate filament |
| 0.0125 | transcription cofactor activity |
| 0.0128 | extracellular matrix structural constituent conferring tensile strength activity |
| 0.0128 | sarcomere |
| 0.0128 | myofibril |
| 0.0128 | collagen |
| 0.0139 | response to stress |
| 0.0149 | hydrolase activity |
| 0.016 | intramolecular isomerase activity, interconverting aldoses and ketoses |
| 0.016 | phosphagen metabolism |
| 0.016 | neurofilament |
| 0.016 | galactose binding lectin |
| 0.016 | inactivation of MAPK |
| 0.0176 | striated muscle thin filament |
| <.0001 | cell communication |
| 0.0001 | signal transduction |
| 0.0078 | development |
| 0.0103 | phosphate metabolism |
| 0.0103 | phosphorus metabolism |
| 0.0159 | neurogenesis |
| 0.016 | cell adhesion |
| 0.0179 | intracellular signaling cascade |
| 0.0196 | amino acid transport |
| 0.0311 | small GTPase mediated signal transduction |
| 0.0384 | coreceptor activity |
| 0.0464 | heme-copper terminal oxidase activity |
| 0.0464 | acute-phase response |
| 0.0464 | regulation of metabolism |
| 0.0476 | cell-cell signaling |
| 0.085 | beta3-adrenergic receptor activity |
| 0.085 | purine ribonucleoside catabolism |
| 0.085 | purine ribonucleoside metabolism |
| 0.085 | pentose catabolism |
| 0.085 | pentose metabolism |
| 0.085 | ribose catabolism |
| 0.085 | adenosine metabolism |
| 0.085 | manganese ion transport |
| 0.085 | ADP-sugar diphosphatase activity |
| 0.085 | bile acid biosynthesis |
| 0.0858 | cellular respiration |
| 0.094 | organelle organization and biogenesis |
| 0.0966 | alcohol catabolism |
| 0.1096 | xenobiotic metabolism |
| 0.1096 | neuropeptide signaling pathway |
| 0.1105 | meiosis |
| 0.1136 | deaminase activity |
| 0.1198 | synaptic transmission |
| 0.1215 | transmission of nerve impulse |
| 0.1314 | monovalent inorganic cation transporter activity |
| 0.1491 | chloride transport |
| 0.1627 | internalization receptor activity |
| 0.1627 | regulation of mitotic cell cycle |
| 0.1627 | cAMP metabolism |
| 0.1627 | regulation of cell volume |
| 0.0002 | mitochondrion |
| 0.0004 | intracellular |
| 0.0008 | metabolism |
| 0.0012 | extracellular |
| 0.0026 | DNA repair |
| 0.0031 | immune response |
| 0.0041 | extracellular space |
| 0.0045 | phosphatidylinositol transporter activity |
| 0.0061 | cytosolic large ribosomal subunit (sensu Eukarya) |
| 0.0065 | defense response |
| 0.0069 | nucleobase, nucleoside, nucleotide and nucleic acid metabolism |
| 0.0071 | RNA binding activity |
| 0.0078 | large ribosomal subunit |
| 0.0083 | heme biosynthesis |
| 0.0083 | sex determination |
| 0.0095 | G-protein coupled receptor protein signaling pathway |
| 0.0097 | integral to membrane |
| 0.0098 | biosynthesis |
| 0.0101 | integral to plasma membrane |
| 0.0109 | mitotic cell cycle |
| 0.0147 | pigment biosynthesis |
| 0.0147 | post Golgi transport |
| 0.015 | nucleus |
| 0.0157 | cyclohydrolase activity |
| 0.0157 | protein amino acid methylation |
| 0.0157 | RNA-nucleus export |
| 0.0157 | transferase activity, transferring pentosyl groups |
| 0.0169 | porphyrin biosynthesis |
| 0.0169 | chromatin remodeling complex |
| 0.0169 | heme metabolism |
| 0.0178 | plasma membrane |
| 0.0192 | S phase of mitotic cell cycle |
| 0.0193 | coenzymes and prosthetic group biosynthesis |
| 0.0209 | cell surface receptor linked signal transduction |
| 0.021 | ion transport |
| 0.0233 | trypsin activity |
| 0.0234 | pigment metabolism |
| 0.0236 | inorganic anion transport |
| 0.0266 | apoptosis regulator activity |
| 0.0268 | nucleic acid binding activity |
| 0.0007 | blood vessel development |
| 0.0007 | angiogenesis |
| 0.001 | phosphotransferase activity, alcohol group as acceptor |
| 0.0013 | nuclear localization sequence binding activity |
| 0.0017 | protein kinase activity |
| 0.002 | response to pest/pathogen/parasite |
| 0.0023 | protein serine/threonine kinase activity |
| 0.0024 | kinase activity |
| 0.0025 | cellular process |
| 0.0028 | cell migration |
| 0.0038 | actin polymerization and/or depolymerization |
| 0.0048 | spermatid development |
| 0.0048 | NLS-bearing substrate-nucleus import |
| 0.0048 | galactosyltransferase activity |
| 0.0051 | signal transduction |
| 0.0053 | protein tyrosine kinase activity |
| 0.0059 | embryogenesis and morphogenesis |
| 0.006 | neurogenesis |
| 0.007 | immune response |
| 0.0073 | cell-matrix adhesion |
| 0.0073 | nucleotide binding activity |
| 0.0077 | Golgi apparatus |
| 0.0079 | transferase activity, transferring phosphorus-containing groups |
| 0.0088 | phosphate metabolism |
| 0.0088 | phosphorus metabolism |
| 0.0091 | protein amino acid phosphorylation |
| 0.0097 | response to wounding |
| 0.0097 | response to biotic stimulus |
| 0.0106 | phosphorylation |
| 0.0111 | RAN protein binding activity |
| 0.0112 | morphogenesis |
| 0.0113 | development |
| 0.0113 | purine nucleotide binding activity |
| 0.012 | actin filament-based process |
| 0.0121 | importin, beta-subunit |
| 0.0121 | actin modulating activity |
| 0.0121 | actin monomer binding activity |
| 0.0121 | regulation of actin polymerization and/or depolymerization |
| 0.0124 | cytoskeleton organization and biogenesis |
| 0.0137 | cell communication |
| 0.0004 | immune response |
| 0.0008 | oncogenesis |
| 0.0009 | defense response |
| 0.0042 | ionic insulation of neurons by glial cells |
| 0.0125 | inflammatory response |
| 0.0245 | histogenesis and organogenesis |
| 0.0261 | sarcomere alignment |
| 0.0261 | phagocytosis, engulfment |
| 0.0261 | negative regulation of osteoclast differentiation |
| 0.0261 | regulation of osteoclast differentiation |
| 0.0261 | negative regulation of cell differentiation |
| 0.0261 | NO mediated signal transduction |
| 0.0326 | activation of NF-kappaB-inducing kinase |
| 0.0327 | oogenesis |
| 0.0453 | cell activation |
| 0.0483 | humoral immune response |
| 0.0491 | protein modification |
| 0.0575 | regulation of cell differentiation |
| 0.0673 | cell cycle |
| 0.07 | biotin metabolism |
| 0.0806 | phosphate metabolism |
| 0.0806 | phosphorus metabolism |
| 0.0888 | sensory organ development |
| 0.0888 | G-protein signaling, adenylate cyclase activating pathway |
| 0.1073 | pattern specification |
| 0.1111 | gametogenesis |
| 0.119 | peptide receptor activity |
| 0.1221 | microtubule-based process |
| 0.1251 | phosphate transport |
| 0.1251 | glutathione conjugation reaction |
| 0.1251 | G-protein chemoattractant receptor activity |
| 0.1256 | phagocytosis |
| 0.1256 | carbohydrate kinase activity |
| 0.1299 | regulation of transcription |
| 0.1309 | fatty acid metabolism |
| 0.1435 | antimicrobial humoral response (sensu Invertebrata) |
| 0.1435 | protein amino acid phosphorylation |
| 0.1454 | NIK-I-kappaB/NF-kappaB cascade |
| 0.1507 | protein phosphatase type 2C activity |
| 0.1507 | heavy metal ion transport |
Figure 2Significance ranking of TFBSs in respective clusters. (a) The TFBSs in the sequences of each cluster were sorted and ranked by ANOVA analysis to determine those sites that best discriminated the different clusters. The TFBSs in each cluster were then assigned ranks (1–164) according to their significance (p-value) from the ANOVA analysis. Highest ranking in red, lowest in blue. (b) Partitioning of gene promoter composition with more stringent PWM matrix similarity thresholds reduces the number of clusters identified by K-means analysis. Shown are four of six clusters containing greater than 4 genes. (groups 1, 2, 5 and 6). (c) Analysis of the most discriminating TFBSs in the four clusters in Figure 2b by ANOVA, as in Figure 2a.
Figure 3Analysis of Ontology term distribution. The top 20 best discriminating gene ontology terms in each cluster were sorted for over-representation (RED) and under-representation (Green) and compared to the top 10 discriminating TFBSs for each cluster as determined by ANOVA (Figure 2). The top 10 over-represented (Red) and under-represented (Green) TFBSs for each cluster are shown. The transcription factors that recognize the TFBSs were grouped and then analyzed for asymmetric distribution of ontology terms using GoMiner (TF ontology terms, right). Transcription factor genes that are known to bind the over-represented TFBSs (TF Genes, enriched) are shown enclosed in boxes. Transcription factor ontology terms that overlap the gene cluster ontology terms within 2 branches of the ontology clade are shown in bold. Those terms with exact matches in the gene cluster ontologies are indicated with an asterisk. The numbers in parentheses indicate the total number of ontology terms associated with each respective cluster. The numbers in brackets indicated those ontology terms with a significance measurement p-value < 0.05 (Fisher Exact T-test). Representative genes from Clusters one, six and thirteen are shown in supplemental Table 3.
Figure 4Analysis of Ontology term distribution. The top 20 best discriminating gene ontology terms in each cluster were sorted for over-representation (RED) and under-representation (Green) and compared to the top 10 discriminating TFBSs for each cluster as determined by ANOVA (Figure 2). The top 10 over-represented (Red) and under-represented (Green) TFBSs for each cluster are shown. The transcription factors that recognize the TFBSs were grouped and then analyzed for asymmetric distribution of ontology terms using GoMiner (TF ontology terms, right). Transcription factor genes that are known to bind the over-represented TFBSs (TF Genes, enriched) are shown enclosed in boxes. Transcription factor ontology terms that overlap the gene cluster ontology terms within 2 branches of the ontology clade are shown in bold. Those terms with exact matches in the gene cluster ontologies are indicated with an asterisk. The numbers in parentheses indicate the total number of ontology terms associated with each respective cluster. The numbers in brackets indicated those ontology terms with a significance measurement p-value < 0.05 (Fisher Exact T-test). Representative genes from Clusters one, six and thirteen are shown in supplemental Table 3.
Figure 5Analysis of Ontology term distribution. The top 20 best discriminating gene ontology terms in each cluster were sorted for over-representation (RED) and under-representation (Green) and compared to the top 10 discriminating TFBSs for each cluster as determined by ANOVA (Figure 2). The top 10 over-represented (Red) and under-represented (Green) TFBSs for each cluster are shown. The transcription factors that recognize the TFBSs were grouped and then analyzed for asymmetric distribution of ontology terms using GoMiner (TF ontology terms, right). Transcription factor genes that are known to bind the over-represented TFBSs (TF Genes, enriched) are shown enclosed in boxes. Transcription factor ontology terms that overlap the gene cluster ontology terms within 2 branches of the ontology clade are shown in bold. Those terms with exact matches in the gene cluster ontologies are indicated with an asterisk. The numbers in parentheses indicate the total number of ontology terms associated with each respective cluster. The numbers in brackets indicated those ontology terms with a significance measurement p-value < 0.05 (Fisher Exact T-test). Representative genes from Clusters one, six and thirteen are shown in supplemental Table 3.
Figure 6Analysis of Ontology term distribution. The top 20 best discriminating gene ontology terms in each cluster were sorted for over-representation (RED) and under-representation (Green) and compared to the top 10 discriminating TFBSs for each cluster as determined by ANOVA (Figure 2). The top 10 over-represented (Red) and under-represented (Green) TFBSs for each cluster are shown. The transcription factors that recognize the TFBSs were grouped and then analyzed for asymmetric distribution of ontology terms using GoMiner (TF ontology terms, right). Transcription factor genes that are known to bind the over-represented TFBSs (TF Genes, enriched) are shown enclosed in boxes. Transcription factor ontology terms that overlap the gene cluster ontology terms within 2 branches of the ontology clade are shown in bold. Those terms with exact matches in the gene cluster ontologies are indicated with an asterisk. The numbers in parentheses indicate the total number of ontology terms associated with each respective cluster. The numbers in brackets indicated those ontology terms with a significance measurement p-value < 0.05 (Fisher Exact T-test). Representative genes from Clusters one, six and thirteen are shown in supplemental Table 3.
Figure 7Analysis of Ontology term distribution. The top 20 best discriminating gene ontology terms in each cluster were sorted for over-representation (RED) and under-representation (Green) and compared to the top 10 discriminating TFBSs for each cluster as determined by ANOVA (Figure 2). The top 10 over-represented (Red) and under-represented (Green) TFBSs for each cluster are shown. The transcription factors that recognize the TFBSs were grouped and then analyzed for asymmetric distribution of ontology terms using GoMiner (TF ontology terms, right). Transcription factor genes that are known to bind the over-represented TFBSs (TF Genes, enriched) are shown enclosed in boxes. Transcription factor ontology terms that overlap the gene cluster ontology terms within 2 branches of the ontology clade are shown in bold. Those terms with exact matches in the gene cluster ontologies are indicated with an asterisk. The numbers in parentheses indicate the total number of ontology terms associated with each respective cluster. The numbers in brackets indicated those ontology terms with a significance measurement p-value < 0.05 (Fisher Exact T-test). Representative genes from Clusters one, six and thirteen are shown in supplemental Table 3.