| Literature DB >> 17996093 |
Xueping Yu1, Jimmy Lin, Donald J Zack, Jiang Qian.
Abstract
BACKGROUND: Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17996093 PMCID: PMC2194798 DOI: 10.1186/1471-2105-8-437
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic of module detection method based on TF interactions. Based on gene expression profiles across different tissues, we identified groups of genes that are preferentially expressed in tissues (e.g. gene C and D in the schematic). For each group of genes, we searched the binding sites of known TFs in promoter regions and determined the TF pairs whose binding sites tend to co-occur in close proximity. A tissue-specific TF interaction network was obtained from the analysis. We then scanned the genomic regions and identified cis-regulatory regions (CRMs). The CRMs are defined as regions enriched with TF interactions. Note the first steps were implemented in our previous work [22] while this paper focuses on the last step.
Figure 2Two examples of predicted CRMs. (A) upstream 5 k to translational start site for gene ALDOA. (B) same for gene CNGB3. Upper panels are the "potential energy" based on TF interactions. Middle panels show the density of all known TFBSs (total 306 TFBSs) in a sliding window along the region. Bottom panels depict the conservation scores of the regions. The dashed lines are the thresholds used in our prediction. The positions with lower energy than the threshold are predicted as CRMs (indicated by vertical bars). The red dots in (A) indicate the positions of known regulatory sites.
Summary statistics of the predictions for the 30 tissues examined.
| Bladder | 58 | 35 | SREBP-1, NRF-1, NF-Y, ETF, MAX |
| Blood | 319 | 150 | FOXJ2, ELF-1, ETF, CDP, PEA3 |
| bone | 4 | 4 | OCT-1, RFX1, EF-C, FOXJ2, NKX3A |
| bone marrow | 138 | 73 | SREBP-1, NRF-1, STAT1, HLF, TEF |
| brain | 757 | 149 | FREAC-7, OCT-1, SOX-9, FREAC-3, NKX6-2 |
| cervix | 174 | 90 | ETF, NRF-1, SP1, AP-2, C-MYC/MAX |
| colon | 225 | 68 | CDP, AFP1, HNF-1, OCT-1, ALX-4 |
| eye | 242 | 78 | FOXJ2, POU3F2, OCT-1, CHX10, CRX |
| heart | 192 | 62 | MEF-2, POU3F2, GATA-6, AP-1, IRF1 |
| kidney | 180 | 83 | HNF-1, COUP-TF/HNF-4, CRX, OCT-1 |
| larynx | 225 | 99 | ETF, SP1, NRF-1, AP-2, WHN |
| liver | 300 | 110 | HNF-1, ALX-4, HNF-3alpha, C/EBPgamma |
| lung | 64 | 28 | ETF, MTF-1, C-MYC/MAX, LHX3, NRF-1 |
| lymph node | 194 | 111 | ICSBP, PU.1, ETF, ELK-1, NRF-1 |
| mammary gland | 180 | 69 | RSRFC4, CDP, MEF-2, FAC1, LHX3 |
| muscle | 169 | 69 | MEF-2, RSRFC4, SRF, MYOD, CDP_CR3 |
| ovary | 121 | 47 | VDR, MAZ, MZF1, SP1, CREB |
| pancreas | 110 | 48 | MYOD, ATF, SP1, E47, AREB6 |
| PNS | 260 | 55 | POU3F2, OCT-1, NKX6-2, HNF-6, HFH-3 |
| placenta | 121 | 57 | LHX3, AFP1, CHX10, NKX6-2, CDP |
| prostate | 212 | 64 | LHX3, POU3F2, CDC5, C/EBPgamma, CART-1 |
| skin | 50 | 28 | AREB6, LMO2_COMPLEX, ALX-4, ARP-1 |
| small intestine | 435 | 68 | POU3F2, LHX3, NKX6-2, HNF-1, FOXD3 |
| soft tissue | 217 | 70 | FOXO4, C/EBPgamma, FOXO1, SRY, RSRFC4 |
| spleen | 94 | 44 | RSRFC4, LBP-1, CDP, MEF-2, NF-KAPPAB |
| stomach | 131 | 76 | ETF, SP1, AP-2gamma, AREB6, SRY |
| testis | 579 | 296 | NRF-1, ETF, SP1, AP-2, C-MYC/MAX |
| thymus | 155 | 47 | POU3F2, NKX6-2, E4BP4, TAX/CREB, ETF |
| tongue | 183 | 104 | ETF, SP1, NRF-1, HIF-1, CREB |
| uterus | 143 | 28 | POU3F2, OCT-1, E4BP4, POU1F1, NKX6-1 |
Important TFs of a tissue are the top 5 TFs which contribute most to the potential energies of the CRMs.
Figure 3Enrichment and sensitivity of predictions. We evaluated the performance of predictions using sensitivity and enrichment. Two types of predictions were compared: one is the TF interaction based method and the other is the solely conservation based method. (A) Using known regulatory elements as positive controls. (B) Using DNase I hypersensitive sites as positive controls.
Figure 4Dependence of regulatory activity on positions relative to gene structure. We calculated the probability for each position containing a CRM. The reference positions (origins in the x-axis) are transcription start sites, the respective start sites of introns and transcription end sites in three regions, respectively. The pink curve in the left panel is from random sequences which were generated with the same nucleic acids compositions and 1st order transition probabilities, respectively, as those of the all promoter sequences in the human genome.
Figure 5The energy landscapes for PITX2. The landscape in the upper panel was calculated based on placenta-specific interactions between TFs. The one in bottom panel was based on eye-specific TF interactions.
Enriched functional categories in cCRMs and ncCRMs.
| transcription factor activity | 53 | 26.3 | 6.0 | ||
| ubiquitin conjugating enzyme activity | 7 | 1.3 | 3.8 | ||
| protein amino acid dephosphorylation | 12 | 3.6 | 3.8 | ||
| nuclear matrix | 4 | 0.4 | 3.7 | ||
| postsynaptic membrane | 12 | 3.2 | 5.3 | ||
| signal complex formation | 6 | 1.1 | 4.5 | ||
| lamin binding | 6 | 1.1 | 4.5 | ||
| synaptic transmission | 31 | 15.3 | 4.4 | ||