| Literature DB >> 22369349 |
Hatice Ulku Osmanbeyoglu1, Ryan J Hartmaier, Steffi Oesterreich, Xinghua Lu.
Abstract
BACKGROUND: Chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) is increasingly being applied to study genome-wide binding sites of transcription factors. There is an increasing interest in understanding the mechanism of action of co-regulator proteins, which do not bind DNA directly, but exert their effects by binding to transcription factors such as the estrogen receptor (ER). However, due to the nature of detecting indirect protein-DNA interaction, ChIP-seq signals from co-regulators can be relatively weak and thus biologically meaningful interactions remain difficult to identify.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22369349 PMCID: PMC3439677 DOI: 10.1186/1471-2164-13-S1-S1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
The number of peaks called by different algorithms and at thresholds, and corresponding number of mapped genes
| Method | Total number of peaks | Number of genes mapped |
|---|---|---|
| MACS, p = 1E-8 | 1,966 | 996 |
| MACS, p = 1E-5 | 4,678 | 2,054 |
| MACS, p = 1E-3 | 23,306 | 6,341 |
| T-PIC, p = 1E-3 | 4,453 | 1,676 |
| T-PIC, p = 1E-2 | 6,598 | 2,318 |
| BayesPeak (PP = 0.90) | 15,622 | 4,495 |
| BayesPeak (PP = 0.70) | 21,373 | 5,507 |
| BayesPeak (PP = 0.5) | 27,990 | 6,533 |
| Union* | 38,324 | 8,057 |
| Intersection* | 4,811 | 2,029 |
* Union and intersection of the peaks by the three methods, as shown in Figure 1.
Figure 1Peak calling by different algorithms. A Venn diagram shows the overlaps among the peaks called by MACS (P value cutoff of 10-3), T-PIC (P value cutoff of 10-2) and BayesPeak (PP cutoff of 0.5). The number of peaks are shown. The numbers of the union and intersection of the peaks and the mapped genes by the algorithms are shown in Table 1.
Comparison of the performances by different machine learning algorithms
| Number of peaks | Number of peaks with ERE motif | Ratio of peaks with ERE motif match | |
|---|---|---|---|
| MACS p = 1E-10 | 1,286 | 941 | 0.73 |
| MACS p = 1E-8 | 1,966 | 1,416 | 0.72 |
| MACS p = 1E-5 | 4,678 | 3,077 | 0.66 |
| k-means (city block) | |||
| Cluster 1 | 26,211 | 11,943 | 0.46 |
| Cluster 2 | 12,113 | 3,245 | 0.27 |
| supervised-NB(th = 0.8,1:2) | |||
| Positively labeled | 11,835 | 8,196 | 0.69 |
| Negatively labeled | 26,489 | 6,992 | 0.26 |
| supervised-SVM(kernel = polynomal,1:2) | |||
| Positively labeled | 14,915 | 8,425 | 0.56 |
| Negatively labeled | 23,409 | 6,763 | 0.29 |
| supervised-RF(th = 0.7,1:2) | |||
| Positively labeled | 10,428 | 6,514 | 0.62 |
| Negatively labeled | 27,896 | 8,674 | 0.31 |
| semi-supervised-NB(th = 0.8,1:2, I = 75) | |||
| Positively labeled | 12,597 | 8,458 | 0.67 |
| Negatively labeled | 25,727 | 6,730 | 0.26 |
Comparison of different methods for identifying functional peaks
| Method | Total number of peaks | Number of genes mapped | Intersection with SRC1-dependent genes |
|---|---|---|---|
| MACS p = 1E-10 | 1,286 | 684 | 44 |
| MACS p = 1E-8 | 1,966 | 996 | 57 |
| MACS p = 1E-5 | 4,678 | 2,054 | 123 |
| supervised-NB(th = 0.8,1:2) | 11,835 | 3,875 | 238 |
Performance of different classifiers under 9-fold cross-validation setting
| Classifier | Precision | Recall | Accuracy |
|---|---|---|---|
| NB(th = 0.8,1:2) | 0.89 | 1 | 0.96 |
| SVM(kernel = polynomial,1:2) | 0.89 | 0.96 | 0.94 |
| RF(th = 0.7,1:2) | 0.72 | 1 | 0.91 |
Figure 2Self-training. Percentage of predicted positive peaks with ERE motifs (over iterations for different TP:TN ratios for training set as indicated in the legends).
Figure 3Overlapping top trigrams with ERE motif. This figure shows potential matching locations of the top-ranking nucleotide trigrams identified by feature selection algorithms.