| Literature DB >> 19857265 |
Mingqi Wu1, Faming Liang, Yanan Tian.
Abstract
BACKGROUND: The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations.Entities:
Mesh:
Year: 2009 PMID: 19857265 PMCID: PMC2779819 DOI: 10.1186/1471-2105-10-352
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Comparison results for the ER data. (a) original data; (b) the joint posterior probability produced by the Bayesian latent model; (c) the joint posterior probability produced by BAC; and (d) the posterior probability produced by tileHMM.
Figure 2Sensitivity analysis for the hyperparameters.
Sensitivity analysis for the parameters w and m.
| Adjusted Rand Index | ||||
|---|---|---|---|---|
| 3 | 5 | 7 | ||
| 2 | 0.987(0.006) | -- | -- | |
| 5 | 0.987(0.006) | 0.994(0.005) | 0.839(0.020) | |
| 7 | 0.991(0.005) | 0.985(0.006) | 0.834(0.010) | |
| 10 | 0.994 (0.001) | 0.987(0.007) | 0.831(0.007) | |
The average of adjusted Rand indices (standard error in the parentheses) is calculated based on 5 independent runs. The entries for the cells with 2w
Computational results for the p53-FL data with a cutoff of 0.5.
| Chip A | Chip B | Chip C | p53 | |||||
|---|---|---|---|---|---|---|---|---|
| Method | V(2) | Total | V(3) | Total | V(9) | Total | V(14) | Total |
| Bayesian latent | 2 | 15 | 2 | 28 | 8 | 27 | 12 | 70 (127) |
| BAC | 2 | 38 | 1 | 29 | 9 | 33 | 12 | 100 (1864) |
| tileHMM | 2 | 29708 | 3 | 1944 | 9 | 2144 | 14 | 33796 |
Both the total numbers of regions and quantitative PCR verified(V) ones detected by each method on each chip are reported. The columns under "p53" summarize the results on chips A, B and C. The number in the parentheses is the number of clusters needed to cover all 14 experimentally validated bound regions.
Figure 3Averaged ROC curves and error rates for different models on simulated datasets. (a) ROC curve; (b) error rate. All the plots were obtained by averaging over the results for the 10 datasets. The plots on the right provide a closer view for the area enclosed by the dotted line and axis on the left.
Computational results for the simulated datasets.
| Method | Total | ND | FD | ||
|---|---|---|---|---|---|
| Bayesian Latent | 50.5 (0.58) | 2.3 (0.33) | 2.8 (0.57) | 0.9545 (0.0080) | -- |
| tileHMM | 48 (0.77) | 4.2 (0.57) | 2.2 (0.55) | 0.9250 (0.0107) | 0.02 |
| BAC | 2934.7 (6.60) | 0 (0) | 2884.7 (6.6) | 0.0609 (0.0003) | 0.00 |
| Wilcox | 56.1 (0.95) | 3.9 (0.48) | 6.4 (0.62) | 0.9221 (0.0088) | 0.007 |
| 78.9 (2.11) | 3.1 (0.31) | 27.6 (1.71) | 0.9047 (0.0089) | 0.0003 | |
| EB | 71.5 (1.52) | 3.0 (0.39) | 20.9 (1.38) | 0.9176 (0.0068) | 0.001 |
"Total" denotes the average number of bound regions identified for each of the 10 datasets, ND denotes the number of true bound regions that are not discovered by the algorithm, FD denotes the number of false bound regions discovered by the algorithm, r is the adjusted Rand index, the number in the parentheses is the standard error, and "EB t-scan" refers to the empirical Bayesian t-scan method proposed by Ji and Wong [6].