| Literature DB >> 21736739 |
Eugenia G Giannopoulou1, Olivier Elemento.
Abstract
BACKGROUND: Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq), enables unbiased and genome-wide mapping of protein-DNA interactions and epigenetic marks. The first step in ChIP-seq data analysis involves the identification of peaks (i.e., genomic locations with high density of mapped sequence reads). The next step consists of interpreting the biological meaning of the peaks through their association with known genes, pathways, regulatory elements, and integration with other experiments. Although several programs have been published for the analysis of ChIP-seq data, they often focus on the peak detection step and are usually not well suited for thorough, integrative analysis of the detected peaks.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21736739 PMCID: PMC3145611 DOI: 10.1186/1471-2105-12-277
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
The main tools of the ChIPseeqer framework.
| Tool name | Description | GUI availability |
|---|---|---|
| QcAnalysisTools | Offers basic quality control tools. | NA |
| ChIPseeqerSplitReadFiles | Splits read files (e.g., bed, eland) into one read file per chromosome. | √ |
| ChIPseeqer | Peak detection algorithm. | √ |
| ChIPseeqerSummaryPromoters | Creates a promoters-based annotation of the detected peaks (i.e., gene name-description, peaks) | √ |
| ChIPseeqerAnnotate | Finds the peaks distribution in the genome (e.g., exons/introns/intergenic) and creates lists of these peaks. | √ |
| ChIPseeqerPeaksTrack | Creates a UCSC Genome Browser track for the detected peaks. | √ |
| ChIPseeqerMakeReadDensityTrack | Creates a UCSC Genome Browser track for the reads density. | √ |
| ChIPseeqerNongenicAnnotate | Finds the peaks that overlap with repeating elements, CpG islands and segmental duplicates. | √ |
| ChIPseeqerFIRE | Runs FIRE for the detected peaks, in order to perform an unsupervised motif discovery. | √ |
| ChIPseeqerMotifMatch | Runs MyScanACE for the detected peaks, in order to look for specific motifs (Jaspar, Bulyk PBM databases). | √ |
| ChIPseeqeriPAGE | Runs PAGE for the genes associated with the detected peaks, in order to perform pathways analysis. | √ |
| ChIPseeqerPathwayMatch | Looks for genes (and their corresponding peaks) that are associated to a specific pathway (e.g., apoptosis, GO:0060742). | √ |
| ChIPseeqerCons | Estimates the conservation scores for the detected peaks and for random intervals to allow comparison. | √ |
| ChIPseeqerDensityMatrix | Creates a reads density matrix for a window around the TSS or the TES of the genes, or for any interval selected. | NA |
| ChIPseeqerReadCountMatrix | Estimates the avg/max reads number for every input peak, across multiple ChIP-seq datasets and creates a peak-based reads matrix. | NA |
| ChIPseeqerCluster | Clusters a matrix (e.g., k-means, hierarchical, SOMs) and visualizes the clustering. | NA |
| CompareIntervals | Compares two lists of peaks and finds the overlapping peaks and the peaks that are unique in each list. | √ |
| CompareGenes | Compares two lists of genes and finds the common genes and the genes that are unique in each list. | √ |
| ChIPseeqerComputeJaccardIndex | Estimates the Jaccard similarity coefficient for a set of peak files. The larger the coefficient, the more similarity you have between two peak files | √ |
| ChIPseeqerMakeGenepartsMatrix | Creates gene-based matrices (one for promoters, one for exons, etc) for many peak files. Summarizes the number of peaks that fall in specific gene parts, across many different peak files (TFs). | NA |
| ChIPseeqerFindDistalPeaks | Finds peaks that are away from known genes. | NA |
| ChIPseeqerFindClosestGenes | Finds the closest gene(s) for each peak. | NA |
| ChIPseeqerGetReadCountInPeakRegions | Estimates the avg/max reads number for every peak, for a ChIP-seq dataset and creates a peak-based read matrix. | NA |
| FindPeaksWithMotif | Extracts the peaks that have a specific FIRE motif (can be applied after running FIRE). | NA |
| MakePAGEInput | Creates the input file for iPAGE from a list of genes. | NA |
The table shows the names of the tools, short description of their functionality and their availability within the ChIPseeqer interface. This is not an exhaustive list; all available tools are documented online [37].
Figure 1Workflow use cases. Examples of workflows that can be easily generated using tools from the ChIPseeqer framework are shown. The starting point is always the result of peak detection: a set of enriched regions/peaks. (A) The aim of the workflow is to analyze a subset of the peaks that have a specific motif. From all the peaks that have the motif, we look for those that bind in the promoters of known genes. Pathways analysis is then performed on these genes in order to reveal enriched pathways associated with this particular subset of peaks. (B) This workflow allows locating and characterizing distal regulatory elements (i.e., intergenic peaks) that overlap with enhancer marks (e.g., H3K4me1 binding), in terms of motifs and conservation. Different workflows can be created using any combination of the ChIPseeqer tools.
Figure 2Analysis of the ETS1 ChIP-seq dataset. (A) The ChIPseeqerAnnotate module outputs the distribution of the ETS1 binding peaks in gene parts, as well as several lists of peaks that were found in a specific gene part (e.g., promoters, exons, introns). (B) The occurrence of specific motifs among the ETS1 peaks is shown, after using ChIPseeqerMotifMatch. The underlined motifs represent transcription factors of the ETS domain. (C) Unsupervised motif discovery, using ChIPseeqerFIRE, reveals multiple motifs that derive from the same regions. The fraction of ETS1 peaks containing at least one instance of each motif is given, with the expected frequency of the motif in the random regions given in the parentheses.
Figure 3Identification of putative enhancers. This workflow shows the identification of putative enhancers, by progressively filtering the distal peaks with histone modification enhancer marks (i.e., presence of H3K4me1 and absence of H3K4me3) and CBP binding. De novo motif discovery and conservation analysis were then performed, which showed highly enriched ETS-domain motifs and high conservation scores in the set of putative enhancers compared to random regions.
Pathways analysis between the ETS1 distal and promoter peaks.
| Leukocyte differentiation, GO:0002521 | p < 1 | ||
| Lymphocyte activation, GO:0046649 | p < 1 | ||
| Hemopoiesis, GO:0030097 | p < 1 | ||
| Hemopoietic or lymphoid organ development, GO:0048534 | p < 1 | ||
| Immune response, GO:0006955 | p < 1 | ||
| Immune system development, GO:0002520 | p < 1 | ||
| B cell proliferation, GO:0042100 | p < 1 | ||
| B cell activation, GO:0042113 | p < 1 | ||
| Biopolymer catabolic process, GO:0043285 | p < 1 | ||
| RNA splicing, GO:0008380 | p < 1 | ||
| DNA metabolic process, GO:0006259 | p < 1 | ||
| Tcell_PIind_CalciumDefPtdown4x_Feske_Fig6 | p < 1 | ||
| CD40_upregulated_Burkitt_lymphoma | p < 1 | ||
| CD40_downregulated_Burkitt_lymphoma | p < 1 | ||
| Pax5_repressed | p < 1 | ||
| Tcell_PIind4x_Feske_Fig6 | p < 1 | ||
| Tcell_PIind_CsAdown4x | p < 1 | ||
| Ribosomal_protein | p < 1 | ||
| Myeloma_PR_subgroup_up | p < 1 | ||
The table shows some of the pathways and lymphoma-related signatures that were found enriched in the distal peaks and the promoter peaks groups. The distal peaks group was highly associated with T cell and B cell related ontologies and signatures, while for the promoter peaks group more general and housekeeping categories were enriched. The Gene Ontology and the SignatureDB gene expression signatures were used for this analysis (ChIPseeqeriPAGE module). The p-values for each pathway for both groups are also shown.
List of the 39 genes with both promoter and distal ETS1 peaks.
| # Gene ID Gene Description |
|---|
| AKAP11 A-kinase anchor protein 11 2 |
| AKR1A1 alcohol dehydrogenase 3 |
| ATP5O ATP synthase subunit O, mitochondrial precursor 4 |
| C1orf109 hypothetical protein LOC54955 5 |
| C2orf29 hypothetical protein LOC55571 6 |
| C9orf123 transmembrane protein C9orf123 7 |
| CDK9 cell division protein kinase 9 8 |
| CHSY1 chondroitin sulfate synthase 1 9 |
| CKAP2L cytoskeleton-associated protein 2-like 10 |
| CLINT1 clathrin interactor 1 11 |
| DUSP2 dual specificity protein phosphatase 2 12 |
| DUSP6 dual specificity protein phosphatase 6 isoform 13 |
| HSPC157 hypothetical LOC29092 14 |
| KIAA0427 CBP80/20-dependent translation initiation factor 15 |
| LDHA L-lactate dehydrogenase A chain isoform 5 16 |
| LOC100188949 hypothetical LOC100188949 17 |
| LOC285456 hypothetical LOC285456 18 |
| LSM14B protein LSM14 homolog B 19 |
| MAX protein max isoform a 20 |
| MRPS18A 28S ribosomal protein S18a, mitochondrial 21 |
| MTF2 metal-response element-binding transcription 22 |
| NAIF1 nuclear apoptosis-inducing factor 1 23 |
| NDUFA10 NADH dehydrogenase [ubiquinone] 1 alpha 24 |
| POMP proteasome maturation protein 25 |
| PSMA6 proteasome subunit alpha type-6 26 |
| RBM16 putative RNA-binding protein 16 27 |
| RBM38 RNA-binding protein 38 isoform a 28 |
| RPN1 dolichyl-diphosphooligosaccharide--protein 29 |
| SEPHS2 selenide, water dikinase 2 30 |
| SIRPG signal-regulatory protein gamma isoform 1 31 |
| SPRED2 sprouty-related, EVH1 domain-containing protein 32 |
| TFRC transferrin receptor protein 1 33 |
| TMEM18 transmembrane protein 18 34 |
| TRIP13 thyroid receptor-interacting protein 13 isoform 35 |
| TXN2 thioredoxin, mitochondrial precursor 36 |
| UBE2D2 ubiquitin-conjugating enzyme E2 D2 isoform 1 37 |
| ZFAT zinc finger protein ZFAT isoform 1 38 |
| ZNF212 zinc finger protein 212 39 |
| ZNF683 zinc finger protein 683 |
The table shows the 39 genes that were found to have both promoter and intergenic ETS1 peaks. It is possible that ETS1 binding at the promoters and enhancers of these genes is explained by looping of the distal elements onto proximal promoters. This hypothesis could be tested using chromosome conformation capture based techniques.
Figure 4ChIPseeqer graphical interface. (A) The users can control all parameters of the tools. For example, in the Find Pathway tool (the GUI version of ChIPseeqerPathwayMatch) the user can select: the input peaks, the species of their data, the gene annotation database used to extract the genes related to the input peaks, which subset of the peaks to include in the analysis (e.g., promoter peaks, intergenic peaks), and which pathways database to use in order to look for the pathway. The desired pathway can be either selected from a list of available pathways or typed by the user (e.g., apoptosis, development). (B) The typical output of each tool is a table summarizing all peaks resulting from the analysis, as well as basic statistics (e.g., how many peaks found). Here, the peaks that contain the TCCTAGA motif are shown, after using the Find Motif in peaks tool (the GUI version of ChIPseeqerMotifMatch). (C) Several tools also provide graphical output. For example, the summary result of iPAGE tool (the GUI version of ChIPseeqeriPAGE) is a pathway enrichment table showing the level of enrichment for all pathways found in the genes related to the input peaks (category 1), compared to the genes used as background (category 0). (D) The output of the Similarity coefficient tool (the GUI version of ChIPseeqerComputeJaccardIndex) is a color-coded matrix, showing the pairs of datasets that have more common peaks than others, with darker red color.