| Literature DB >> 21910886 |
Jérémy Gruel1, Michel LeBorgne, Nolwenn LeMeur, Nathalie Théret.
Abstract
BACKGROUND: Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods.Entities:
Mesh:
Year: 2011 PMID: 21910886 PMCID: PMC3215511 DOI: 10.1186/1471-2105-12-365
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1. Each panel indicates the cmean, median, standard deviation (σ) and the minimal and maximal obtained z(zand z).
Figure 2Overview of the workflow leading to CEXlists. A) The atomic motifs are extracted from the cisRED database for gene pairs associating the gene of interest g and the 18.000 other genes described in the database. B) The number of SSMs is computed for all the studied SSM types (L) and all the gene pairs. C) The numbers of SSMs are corrected by the amount of potential SSMs (SSMC) and the p-value testing the null hypothesis: "the SSMC value is not greater than expected by chance" is computed. D) The lowest p-value obtained for each gene pair is used as the cp-value. The CEXlist(g, L, t) retains the genes for which a cp-value below a threshold t is computed when paired with g.
Figure 3Analysis of the correlation between the number of SSMs and the number of potential SSMs. SSMs were identified for 50,000 pairs of randomly selected genes. Results are presented for 4 SSM types: (6,0)SSM; (8,1)SSM; (10,2)SSM and (14,4)SSM.
CEXlist test set
| GENE SYMBOL | DESCRIPTION |
|---|---|
| APLP1 | Amyloid beta (A4) precursor-like protein 1 |
| C6orf62 | HBV X-transactivated gene 12 protein |
| C9orf3 | aminopeptidase O |
| CLK3 | CDC-like kinase 3 |
| DEFA3 | Defensin, alpha 3, neutrophil-specific |
| DUSP12 | Dual specificity phosphatase 12 |
| EEF1D | Eukaryotic translation elongation factor 1 delta (guanine nucleotide exchange protein) |
| FSHR | Follicle stimulating hormone receptor |
| MNT | MAX binding protein |
| MRGPRF | MAS-related GPR, member F |
| SH3D19 | SH3 domain protein D19 |
| TRIM61 | Putative tripartite motif-containing protein 61 |
| C1orf216 | chromosome 1 open reading frame 216 |
| C2orf67 | chromosome 2 open reading frame 67 |
| OR51Q1 | Olfactory receptor, family 51, subfamily Q, member 1 |
| CCDC64B | Coiled-coil domain-containing protein 64B |
| SLC9A3R2 | solute carrier family 9 isoform 3 regulator 2 |
| SPG7 | Spastic paraplegia protein 7 |
| WISP2 | WNT1 inducible signaling pathway protein 2 |
| SNRPD2 | Small nuclear ribonucleoprotein Sm D2 (snRNP core protein D2) (Sm-D2) |
| ADAM12 | ADAM metallopeptidase domain 12 (meltrin alpha) |
| SMAD2 | SMAD family member 2 |
| SMAD3 | SMAD family member 3 |
| AURKA | Aurora kinase A |
| AURKB | Aurora kinase B |
| AURKC | Aurora kinase C |
| ACTA1 | Actin, alpha 1, skeletal muscle |
| ALB | Albumin |
| ALDOA | aldolase A, fructose-bisphosphate |
| DES | Desmin |
| LRRTM1 | Leucine rich repeat transmembrane neuronal 1 |
Twenty genes (lane 1 to 20) were randomly selected from the cisRED database. Eleven genes (lane 21 to 31) with known functions were arbitrarily added to complete the random selection and used as internal controls for gene annotation analyses.
Figure 4Variation of the . SSM analysis was applied to genes from the test set and non-specific variations were calculated for 200 random gene lists (green area).
Representative categories
| CATEGORY | p-value |
|---|---|
| olfactory receptor activity (MF) | 5.7 × 10-10 |
| sensory preception of smell (BP) | 2.4 × 10-10 |
| skeletal muscle fiber development (BP) | 3.8 × 10-3 |
| myoblast migration (BP) | 4.8 × 10-3 |
| fibroblast growth factor activity (MF) | 1.2 × 10-3 |
| intracellular signaling cascade (BP) | 1.8 × 10-3 |
| intracellular (CC) | 1.7 × 10-7 |
| focal adhesion (K) | 6.1 × 10-3 |
| positive regulation of neurogenesis (BP) | 2.5 × 10-4 |
Representative categories in CEXlists from OR51Q1, MYOG and ADAM12. MF, Molecular Function; BP, Biological Process; CC, Cellular Component and K, KEGG pathways.
Figure 5Association between SSM enrichment and co-expression. Genes (circles) from the sample set were submitted to both SSM and Gemma analyses and the overlap between genes was expressed as a density value (number of Gemma genes per gene in CEXlist) according to different cp-value thresholds. Circle size is correlated with the number of genes in CEXlists.
Comparison between co-expressed genes and not co-expressed genes in CEXlists.
| 0.05 | 0.01 | 0.005 | 0.001 | |
|---|---|---|---|---|
| ACTA1 | 1.601 | 1.168 | 1.187 | 2.215 |
| DES | 1.366* | 1.285 | 0.659 | 0.000 |
| SPG7_HUMAN | 2.078* | 1.957* | 1.846* | 2.078* |
| SMAD2 | 1.877* | 1.414* | 1.297 | 1.361 |
| SMAD3 | 1.363* | 1.354 | 1.206 | 0.000 |
| ADAM12 | 0.803 | 0.965 | 1.003 | 0.777 |
| C9orf3 | 1.662* | 1.400* | 1.443* | 1.381 |
| APLP1 | 0.879 | 1.228 | 1.380 | 0.525 |
| WISP2 | 1.311 | 0.559 | 0.000 | 0.000 |
| SLC9A3R2 | 1.079 | 1.083 | 1.380 | 2.673 |
| MNT | 1.789* | 1.893* | 1.426 | 0.855 |
| ALDOA_HUMAN | 1.008 | 0.986 | 0.759 | 0.000 |
| DEFA3 | 0.974 | 1.576 | 0.000 | 0.000 |
| AURKA | 1.274 | 1.247 | 0.687 | 1.145 |
| AURKB | 1.999* | 2.263* | 2.249* | 2.353* |
| FSHR | 0.000 | 0.000 | 0.000 | 0.000 |
| LRRTM1 | 0.871 | 0.000 | 0.000 | 0.000 |
| AURKC_HUMAN | 0.556* | 0.515 | 1.180 | 0.000 |
| MRGPRF | 1.559 | 1.697 | 1.300 | 0.000 |
| Q658L9_HUMAN | 0.000 | 0.000 | 0.000 | 0.000 |
| DUSP12 | 1.446* | 2.139* | 1.555 | 1.993 |
| EEF1D | 1.649* | 1.515* | 1.634* | 1.348 |
| ALB | 1.953* | 1.877* | 1.769* | 0.972 |
| C6orf62 | 1.930* | 1.756* | 1.926* | 1.626 |
| CLK3 | 0.759* | 0.970 | 0.558 | 1.953 |
For four cp-value thresholds, t, the table compares Gemma genes present in CEXlists (cp-value
Comparative analysis of SSM counts with TIGER and KEGG databases
| Threshold | KEGG | TIGER |
|---|---|---|
| 0.01 | 1.081* | 1.117* |
| 0.005 | 1.087* | 1.134* |
| 0.001 | 1.094* | 1.115 |
| 0.0001 | 1.197 | 1.234 |
| 0.00001 | 1.404 | 3.484* |
Analysis of SSM count for genes expressed within specific tissues (TIGER database) or pathways (KEGG database). According to different thresholds, the number of gene pairs with an exceptional SSM count divided by total gene pairs was computed for genes either associated or not associated, within the same tissue or pathway. Data are expressed as a ratio between these two values, a ratio superior to 1 indicating an enrichment of gene pairs with an exceptional SSM number when genes are expressed in a same tissue or pathway(*, p ≤ 0.05)