| Literature DB >> 17224917 |
Andrew D Smith1, Pavel Sumazin, Michael Q Zhang.
Abstract
Transcription factor-binding sites and the cis-regulatory modules they compose are central determinants of gene expression. We previously showed that binding site motifs and modules in proximal promoters can be used to predict a significant portion of mammalian tissue-specific transcription. Here, we report on a systematic analysis of promoters controlling tissue-specific expression in heart, kidney, liver, pancreas, skeletal muscle, testis and CD4 T cells, for both human and mouse. We integrated multiple sources of expression data to compile sets of transcripts with strong evidence for tissue-specific regulation. The analysis of the promoters corresponding to these sets produced a catalog of predicted tissue-specific motifs and modules, and cis-regulatory elements. Predicted regulatory interactions are supported by statistical evidence, and provide a foundation for targeted experiments that will improve our understanding of tissue-specific regulatory networks. In a broader context, methods used to construct the catalog provide a model for the analysis of genomic regions that regulate differentially expressed genes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17224917 PMCID: PMC1800356 DOI: 10.1038/msb4100114
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 11.429
Ability of single versus multiple votes to predict tissue-specificity of a transcript's ortholog
| Human | Mouse | Common evidence | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Tissue | Multiple | Single | Multiple | Single | Multiple | Single | |||
| CD4 T-cells | 2 | 247 | 6 | 212 | 3/7 | 42.9% | 36/435 | 8.3% | 1.79E−02 |
| Heart | 28 | 260 | 105 | 560 | 35/122 | 28.7% | 102/766 | 13.3% | 3.78E−05 |
| Kidney | 43 | 188 | 172 | 540 | 42/200 | 21.0% | 66/706 | 9.3% | 1.74E−05 |
| Liver | 152 | 411 | 271 | 651 | 148/354 | 41.8% | 184/982 | 18.7% | 6.46E−17 |
| Pancreas | 31 | 186 | 47 | 313 | 26/61 | 42.6% | 75/450 | 16.7% | 9.93E−06 |
| Skeletal muscle | 49 | 394 | 141 | 681 | 52/174 | 29.9% | 137/1000 | 13.7% | 4.47E−07 |
| Testis | 38 | 287 | 446 | 668 | 67/471 | 14.2% | 86/923 | 9.3% | 4.09E−03 |
| Columns labeled ‘Multiple' and ‘Single' give the number of transcripts with multiple and single votes for specificity, respectively. Columns under ‘Common evidence' show the proportion of transcripts with single and multiple votes for tissue-specificity that have an ortholog with at least one vote for specificity in the same tissue. Excluding CD4 T-cells (which represents a small sample), human and mouse transcripts with multiple votes for tissue-specificity are significantly more likely to have an ortholog with at least one vote for specificity in the same tissue ( | |||||||||
Transcripts with multiple votes for tissue-specificity in both human and mouse skeletal muscle
| Symbol | Name | Human RefSeq | Mouse RefSeq | Votes |
|---|---|---|---|---|
| MYH2 | Myosin, heavy polypeptide 2 | NM_017534 | NM_144961 | 7 |
| TTID | Myotilin | NM_006790 | NM_021484 | 6 |
| TNNT3 | Troponin T type 3 | NM_006757 | NM_011620 | 6 |
| TNNC2 | Troponin C type 2 | NM_003279 | NM_009394 | 6 |
| MYBPC2 | Myosin binding protein C | NM_004533 | NM_146189 | 6 |
| HUMMLC2B | Fast skeletal myosin light chain 2 | NM_013292 | NM_016754 | 6 |
| ACTN2 | Actinin α2 | NM_001103 | NM_033268 | 5 |
| VAMP5 | Vesicle-associated membrane protein 5 | NM_006634 | NM_016872 | 4 |
| TRIP10 | Thyroid hormone receptor interactor 10 | NM_004240 | NM_134125 | 4 |
| TPM3 | Tropomyosin 3 | NM_153649 | NM_022314 | 4 |
| SGCG | Sarcoglycan γ | NM_000231 | NM_011892 | 4 |
| MYOD1 | Myogenic differentiation 1 | NM_002478 | NM_010866 | 4 |
| MYF6 | Myogenic factor 6 (herculin) | NM_002469 | NM_008657 | 4 |
| CKM | Creatine kinase, muscle | NM_001824 | NM_007710 | 4 |
| CACNG1 | Calcium channel, voltage-dependent γ1 | NM_000727 | NM_007582 | 4 |
| The ‘Votes' column gives the total number of votes for skeletal muscle-specificity in both human and mouse. Our analysis used promoter sets of size 100 for both tissues, including promoters that correspond to transcripts with a single vote for tissue-specific regulation. Tables for the remaining 6 tissues are given in |
Significance of elevated ranks for motifs associated with important factors in liver, skeletal muscle and testis
| Tissue | Factors | Motifs | Human | Mouse |
|---|---|---|---|---|
| Liver | HNF-1, HNF-3, HNF-4, C/EBP, DBP | 68 | 2.72E−18 | 4.19E−12 |
| Skeletal muscle | MEF-2, SRF, Myogenin, Sp1 | 45 | 1.33E−14 | 2.29E−5 |
| Testis | SRY, CREM, RFX | 30 | 0.087 | 1.89E−4 |
| Motifs give the total number of motifs associated with the listed factors. |
Evidence for expression and classification quality of binding-site motifs for factors with known tissue-specific regulatory roles
| Expressed | Classifies | |||||||
|---|---|---|---|---|---|---|---|---|
| Tissue | Factor | Similarmotifs | Hs | Mm | Hs | Mm | Comment | |
| Liver | HNF-1 | Hepatocyte nuclear factor 1; Member of the homeodomain factor family. | Yes | Yes | Yes | Yes | Ranks in the top 3 in both species. | |
| HNF-3 | Hepatocyte nuclear factor 3; Member of the fork-head factor family. | HFH, XFD and certain other fork-head family members. | Yes | No | Yes | Yes | Unranked in mouse or mouse orthologs of human liver-specific promoters. | |
| HNF-4 | Hepatocyte nuclear factor 4; Member of thyroid hormone receptor-like family. | PPAR, COUP; Steroid/thyroid hormone receptor-like super-family. | Yes | Yes | Yes | Yes | Ranks 1st in both species. | |
| HNF-6 | Hepatocyte nuclear factor 6; A homeodomain factor from the CUT subfamily. | No | No | No | No | The known motif in TRANSFAC (v9.4) may be poorly characterized. | ||
| C/EBP | CCAAT/enhancer binding protein; Variants form a subfamily of basic region leucine zipper family. | Yes | Yes | Yes | Yes | Ranks in the top 10 in both species and is known to interact with other high-ranking motifs. | ||
| DBP | Albumin D-site binding protein; A member of PAR family of b-ZIP factors. | PAR family: HLF, TEF and VBP. | No | Yes | Yes | Yes | All three sources call DBP present in mouse, none of the three call DBP present in human. | |
| Skeletal Muscle | MEF-2 | Myocyte-specific enhancer factor 2; A member of the MADS domain family. | Splice variant RSRFC4. | Yes | Yes | Yes | Yes | Mouse skeletal muscle promoters are enriched with C/G-rich motifs. The A/T-rich MEF-2 ranks 2nd after GC-content correction. |
| SRF | Serum response factor; a member of the MADS domain family. | Yes | Yes | Yes | No | The A/T-rich SRF is not identified in mouse even after GC-content correction. | ||
| MyoD | Myogenic factor 3; Member of the myogenin family. | Myogenin family, E-box binding factors. | Yes | Yes | Yes | Yes | Ranks in the top 5 in both species and predicted to interact with SRF and MEF-2. | |
| Sp1 | Stimulating protein 1; A ubiquitous factor with a Cys2His2 zinc finger domain. | Yes | Yes | Yes | Yes | The C/G-rich motif is higly ranked with and without GC-content correction. | ||
| Testis | SRY | Sex-determining region on Y chromosome; Member of the high-mobility group (HMG) class of factors. | SOX factors. | Yes | Yes | Yes | No | The A/T-rich motif is ranked 2nd in human after GC-content correction. |
| CREM | Cyclic AMP-responsive element modulator; Member of the CREB/ATF subfamily of the bZIP factors. | CREB/ATF family. | Yes | Yes | Yes | Yes | C/G-balanced and synergistic, ranks 1st in mouse with and without GC-content correction. | |
| RFX | Regulatory factor X; subfamily of the forkhead factors with winged-helix binding domains. | Yes | Yes | No | Yes | The core of the RFX motif (GTTGCCA) is highly similar to the reverse of the core MYB motif (CCGTTG), ranking top in human. | ||
| A motif classifies foreground from background if it is ranked in the top 20 distinct motif classifiers. | ||||||||
Figure 1Verified and predicted binding sites in human albumin and skeletal muscle α-actin promoters. Predicted sites are represented by horizontal bars and verified sites by vertical bars. Verified sites for albumin (Paonessa ; Sawadaishi ; Frain ; Li ) and for α-actin (Boxer ; MacLellan ) were mapped to the promoter from CSHLmpd to obtain their correct locations relative to the TSS.
Figure 2Predicted binding sites for selected factors in promoters from the human tissue-specific sets. The selected factors are among the top ranked in the corresponding tissues.