| Literature DB >> 15882449 |
Sven Nelander1, Erik Larsson, Erik Kristiansson, Robert Månsson, Olle Nerman, Mikael Sigvardsson, Petter Mostad, Per Lindahl.
Abstract
BACKGROUND: The expression of gene batteries, genomic units of functionally linked genes which are activated by similar sets of cis- and trans-acting regulators, has been proposed as a major determinant of cell specialization in metazoans. We developed a predictive procedure to screen the mouse and human genomes and transcriptomes for cases of gene-battery-like regulation.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15882449 PMCID: PMC1134656 DOI: 10.1186/1471-2164-6-68
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Gene coverage of the analysis
| Ensembl genes total: | 23954 | 21961 | |
| Ensembl transcripts total: | 34076 | 35685 | |
| Ortholog pairs of Ensembl genes: | 20188 | ||
| Ortholog pairs with upstream sequence extracted : | 13272 | ||
| Ortholog pairs with upstream sequence extracted (redundancies removed): | |||
| Ensembl genes matching SymAtlas probes: | 17552 | 16929 | |
| ortholog pairs with expression data in both species: | |||
| Two-species expression data AND regulatory sequence: |
Numbers in the 'MOUSE' and 'HUMAN' columns signify the number of unique Ensembl identifiers in each respective species. Numbers in column 'BOTH' signify ortholog pairs of Ensembl entries. The overlap between the nonredundant sequence database (*) and the nonredundant expression dataset (**) was 9561 ortholog pairs.
Figure 1Cluster statistics A: Histogram showing the log number of clusters as a function of log cluster size, based on the clustering at Pearson correlation coefficient 0.75 cut-off. Numbers on the x axis denote cluster size intervals (2), (3–4), (5–8), (9–16),... B: Co-expression as a predictor for shared function, protein interaction and paralogy. We identified all gene pairs that correlated above or below a threshold T (X-axis). We measured the fraction of such pairs for which there was (i) a BIND database protein-protein interaction recorded in human, (ii) at least one shared gene ontology term, and (iii) evidence of paralogy. We then computed the relative probability for genes above T with this feature, compared to gene pairs below T. At expression correlation 0.80, co-expression was associated with a 100-fold relative probability for genes to encode protein interactors, a 10-fold probability for genes to share functional annotation, but only a 3-fold probability for genes to be paralogs. C: Fraction of clusters with at least one over-represented GO term (Y axis), as a function of cluster size (X axis). GO term over-representations were computed at a 10% false discovery rate.
Figure 2A smooth muscle differentiation battery: The bar chart (left) illustrates the average expression level of cluster members (Y axis) across arbitrarily ordered tissues (X axis) in two species (red = mouse and blue = human). Three tables list over-represented functional terms (upper small table), over represented motifs (PFMs) (middle table), and cluster members (lower table).
Figure 3B-lymphocyte differentiation battery: Tables and charts are organized as in figure 2.
Figure 4Testis selective battery: Over representation of RFX and SOX17 motifs indicates new roles for these factors as coordinators of testis selective gene expression. Tables and charts are organized as in figure 2.
Figure 5Endoplasmatic reticulum associated genes: Over representation of XBP-1, NRF and RTS motifs suggest novel functions for NRF and ETS family factors in the regulation of ER-related genes. Tables and charts are organized as in figure 2.
Figure 6Ribosomal genes: Tables and charts are organized as in figure 2.
Figure 7NF-kappaB pathway: Over representation of REL and NFkappaB motifs indicates feed back signalling. Tables and charts are organized as in figure 2.
over-represented motifs detected at <10% false discovery rate
| 1: Protein synthesis | <2.5% | 190 | M00025:Elk-1, M00007:Elk-1 |
| <2.5% | 110 | M00050:E2F, MA0024:E2F | |
| <2.5% | 57 | M00108:NRF-2, MA0028:Elk-1, MA0062:NRF-2 | |
| <2.5% | 181 | MA0076:SAP-1 | |
| <10% | 18 | M00074:c-Ets-1(p54) | |
| <10% | 78 | M00262:Staf | |
| 2: Oocyte / fertilized egg | <2.5% | 71 | M00024:E2F |
| <2.5% | 190 | M00025:Elk-1, M00007:Elk-1 | |
| <2.5% | 9 | M00032:c-Ets-1(p54) | |
| <2.5% | 110 | M00050:E2F, MA0024:E2F | |
| <2.5% | 57 | M00108:NRF-2, MA0028:Elk-1, MA0062:NRF-2 | |
| <2.5% | 181 | MA0076:SAP-1 | |
| <10% | 238 | MA0088:Staf, M00264:Staf | |
| 3: Neural tissues | <2.5% | 99 | M00189:AP-2 |
| <2.5% | 115 | M00196:Sp1 | |
| <2.5% | 141 | M00256:NRSF | |
| <10% | 75 | M00243:Egr-1 | |
| 4: Lymphocytes | <2.5% | 143 | MA0050:Irf-1, M00062:IRF-1, M00063:IRF-2 |
| <10% | 74 | M00054:NF-kappaB, MA0061:NF-kappaB | |
| <10% | 28 | M00258:ISRE | |
| 5: Testis / spermatogenesis | <2.5% | 109 | M00281:RFX1 |
| <2.5% | 142 | MA0078:SOX17 | |
| <10% | 108 | M00036:v-Jun | |
| <10% | 248 | M00041:CRE-BP1/c-Jun | |
| <10% | 65 | M00100:CdxA | |
| 6: Liver | <2.5% | 16 | M00134:HNF-4 |
| <2.5% | 212 | M00158:COUP-TF / HNF-4, MA0017:COUP-TF | |
| <2.5% | 33 | M00206:HNF-1 | |
| <2.5% | 203 | MA0046:HNF-1, M00132:HNF-1 | |
| <2.5% | 234 | MA0047:HNF-3beta, M00131:HNF-3beta | |
| <2.5% | 113 | MA0065:PPARgamma-RXRal | |
| <10% | 46 | M00155:ARP-1 | |
| <10% | 212 | M00158:COUP-TF / HNF-4, MA0017:COUP-TF | |
| <10% | 146 | MA0071:RORalfa-1, M00156:RORalpha1 | |
| 8: ECM | <10% | 215 | M00378:Pax-4 |
| 9: Cardiac muscle | <2.5% | 223 | M00026:RSRFC4 |
| <2.5% | 144 | M00152:SRF | |
| <2.5% | 59 | M00231:MEF-2 | |
| <2.5% | 222 | M00232:MEF-2 | |
| <2.5% | 161 | M00252:TATA | |
| <2.5% | 259 | M00418:TGIF, M00419:MEIS1 | |
| <2.5% | 160 | MA0052:MEF2 | |
| <10% | 60 | M00006:MEF-2 | |
| 12: Skeletal muscle | <2.5% | 201 | M00184:MyoD, M00001:MyoD |
| <10% | 17 | M00002:E47 | |
| <10% | 59 | M00231:MEF-2 | |
| 13: Endoplasmatic reticulum | <10% | 190 | M00025:Elk-1, M00007:Elk-1 |
| <10% | 57 | M00108:NRF-2, MA0028:Elk-1, MA0062:NRF-2 | |
| <10% | 181 | MA0076:SAP-1 | |
| 15: Erythrocyte | <10% | 209 | M00128:GATA-1, M00127:GATA-1 |
| <10% | 122 | M00203:GATA-X | |
| <10% | 198 | M00413:AREB6 | |
| 16: B lymphocyte | <2.5% | 133 | MA0081:SPI-B |
| 17: Kidney | <2.5% | 33 | M00206:HNF-1 |
| <2.5% | 188 | M00411:HNF-4alpha1 | |
| 22: Cell cycle genes | <10% | 110 | M00050:E2F, MA0024:E2F |
| 24: Pancreas | <10% | 121 | M00071:E47 |
| <10% | 193 | M00080:Evi-1, M00082:Evi-1 | |
| 30: Small intestine | <10% | 31 | M00346:GATA-1, M00347:GATA-1, M00348:GATA-2 |
| 40: Smooth muscle | <2.5% | 144 | M00152:SRF |
| <2.5% | 245 | M00186:SRF, M00215:SRF | |
| <2.5% | 88 | MA0083:SRF | |
| 44: Retina | <2.5% | 196 | M00087:Ik-2 |
| 45: Testis (mouse signal only) | <2.5% | 164 | M00253:cap |
| 49: Lung/endothelium (mouse signal only) | <10% | 66 | M00199:AP-1, M00037:NF-E2 |
| 65: NfkappaB signalling | <2.5% | 235 | M00051:NF-kappaB (p50), MA0105:p50 |
Over-represented motifs arranged by cluster number. FDR column: False Discovery Rate (estimated probability for the over-representation to be a spurious detection). Motifs are are shown both by their numerical identifiers (PFM number) and by their annotation (PFM annotation). In cases where a PFM is a composite based on more than one source, the components are given separated by commas. The data where generated from the PCC = 0.75 clustering, 2 kb sequence database, at 90% phylogenetic conservation.
Figure 8EMSA validation of EBF binding sites: A: The figure displays EMSAs in which binding of EBF to a mb-1 promoter EBF site is competed for by the inclusion of 300 or 1000-fold molar excess of unlabelled oligonucleotides that correspond to the predicted motifs. The name of the gene and the position of the motif is given in the figure. (m) indicates mouse and (h) human. "EBF" shows the position of the DNA/protein complex, and "Probe" indicates the position of free DNA. See supplementary information for a detailed description of the sites. B: The mb-1 promoter EBF site interacts specifically with EBF protein in a pre-B cell nuclear extract. The figure displays an autoradiogram in which a labelled EBF binding site from the mb-1 promoter has been incubated with nuclear extracts from 40EI pre-B cells and competitors or antibodies as indicated. EBF denotes the bound EBF protein and EBF-SS the super-shifted complex obtained by the addition of the EBF reactive antibody to the reaction mixture.
Figure 9False discovery rate estimation. A: Illustration of how false discovery rates (FDRs) were estimated by use of simulations. Values on the y axis represent the number of motifs below a certain p-score (x axis). Triangles show results for observed data, filled circles show results for permuted data (error bars show the 90% confidence interval from 100 simulations). The FDR was calculated as the ratio between the simulated expectation and the observation. Red dotted line: The number of clusters with at least one enriched motif. Note that several motifs where over-represented in the same cluster. B: Removal of paralogous genes from each cluster did not affect the number of detected motifs. Consequently, co-expressed paralogs is not an important source of false positives. C: The amount of DNA used per gene, and the phylogenetic footprinting stringency has a strong effect on the number of detected over-represented motifs. The sensitivity is higher when the amount of DNA is reduced. Error bars in B and C were obtained by using the 5th and 95th percentiles in the simulation to define the FDR.
Interpretation of over-represented motifs with respect to published evidence. (See footnote for definition of the categories.)
| 4 | lymphocytes | Irf1/Irf2/ISRE, NFKB [34-36] |
| 6 | liver | HNF-1 alpha and beta, HNF4 alpha [32, 59-61] |
| 9,12 | cardiac and skeletal muscle | MyoD/E47, MEF2 family factors, SRF [4, 6, 48] |
| 15 | erythroid cells | GATA-1 [62] |
| 16 | B lymphocytes | Spi-B, Oct-I* [37, 40, 63] |
| 22 | cell cycle | E2F family factors [49, 50] |
| 40 | smooth muscle | SRF [5, 64] |
| 3 | neural tissue | NRSF [65] |
| 5 | testis | SOX17, RFX2 (RFX1-RFX3) [41-43] |
| 6 | liver | ARP-1/COUP-TF, PPARγ [66, 67] |
| 13 | ER | XBP-1 [68] |
| 15,16 | erythroid cells/ B cells | AREB6* [38] |
| 17 | kidney | HNF-4alpha, HNF-1-alpha/beta [69, 70] |
| 1 | protein synthesis | NRF and ETS family factors |
| 2 | oocyte | E2F and ETS family factors |
| 6 | liver | ROR-alpha |
| 8 | ECM | Pax4 |
| 9 | cardiac muscle | TALE family factors TGIF and MEIS1 |
| 13 | ER | NRF and ETS family factors |
| 24 | pancreas | E47 |
| 30 | small intestine | HNF4-alpha and GATA factors |
| 44 | retina/eye | Ik-2 |
| 45 | testis | cap |
Criteria for inclusion in the different groups: I. Existing evidence of co-regulation of cluster member by the factor, and/or a mutant mouse phenotype that affects differentiation of the cell type. II. Evidence of the factor acting on a limited number of genes in the tissue. Evidence of the transcription factor being expressed in the tissue. III. No or limited evidence in the literature. * = Detected at 20% FDR.