| Literature DB >> 23519433 |
Mark Ziemann1, Antony Kaspi, Ross Lazarus, Assam El-Osta.
Abstract
UNLABELLED: Reliable identification of cis regulatory elements influencing transcription remains a challenging problem in molecular bioinformatics. This is especially true for enhancer elements which are often located hundreds of kilobases from the gene promoter. High resolution DNase hypersensitivity and connectivity profiling by the ENCODE consortium provides evidence of millions of interacting cis-acting elements in the human genome. This prior knowledge can be incorporated into genome-wide expression analyses, in the form of gene sets sharing regulatory sequence motifs in known DNase hypersensitivity peak regions. High proportions of enrichment among the most extreme differentially transcribed genes from controlled biological experiments may suggest novel hypotheses about signalling pathways. The utility of this approach is demonstrated with the reanalysis of a microarray-derived gene expression data set through the Gene Set Enrichment Analysis pipeline, uncovering new putative distal cis elements in the context of innate immunity. The DNase Hypersensitivity Connectivity informed Motif Enrichment in Gene Expression (DHC-MEGE) method described here has the advantage of identifying distal elements such as enhancers, which are often overlooked with standard promoter motif analysis. AVAILABILITY: The DHC-MEGE shell script can be obtained from Sourceforge https://sourceforge.net/projects/dhcmege/ and the generated GMT file is attached as supplementary data.Entities:
Keywords: DNAse hypersensitivity; enhancer; gene expression; gene set enrichment analysis; motif enrichment
Year: 2013 PMID: 23519433 PMCID: PMC3602893 DOI: 10.6026/97320630009212
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1(A) Schematic diagram of DHC-MEGE methodology of generating custom motif-gene association sets with the file format shown in brackets; (B) Statistics from an example analysis using a publicly-available microarray expression data set with a correlation threshold of 0.8, minimum sequence similarity score of 10 and maximum gene size of 1000.
Figure 2Example GSEA using DHC motif gene sets for gene expression analysis. The LPS-stimulated THP-1 cell microarray data is publically available [10]. (A) The top 10 ranked motif gene sets in up-regulated genes contains known and novel motifs (ranked by FDR-adj p-value). Abbreviations; SIZE, number of genes in the set detected in the experiment; ES, enrichment score; NES, normalised enrichment score; NOM p-val, nominal p-value; (B) Enrichment plot of the newly identified TATGACAATC motif gene set, showing the majority of genes associated with this motif are highly up-regulated; (C) List of the top 10 up-regulated genes associated with the TATGACAATC motif; (D) Example of long-range cis elements interacting with the ADORA2A promoter. The TATGACAATC motif is positioned in a distal DH peak 274 kbp upstream of the ADORA2A promoter DH peak (red line) in a CABIN1 intron. ADORA2A is also associated with GATA-IR3 (orange), CTTACGTAAGTT (blue), FOXA1 (pink) motifs that were significant in GSEA analysis (FDR<0.05).