| Literature DB >> 20128916 |
Russell L Zaretzki1, Michael A Gilchrist, William M Briggs, Artin Armagan.
Abstract
BACKGROUND: Tag-based techniques, such asEntities:
Mesh:
Substances:
Year: 2010 PMID: 20128916 PMCID: PMC2829012 DOI: 10.1186/1471-2105-11-72
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Tag formation model. Plot showing cDNA cleavage sites for SAGE with associated probabilities of tag formation. Adopted from [[6], Figure 1].
Figure 2Dirichlet-Poisson-Binomial estimates with the flat prior. Probability estimates and inferences for S. cerevisiae log-phase data based on the DPB model with flat prior, α= 1, for all genes. The 20 genes with the largest tag counts are arranged in decreasing rank order along the x-axis. The observed tag proportions are marked in open circles, the bias corrected MLE in open triangles. The analytically computed posterior mode when α= 1 coincides exactly with the corrected MLE. Also included are the estimated posterior mean and upper and lower marginal 95% Bayesian posterior bounds based on MCMC sampling.
Figure 3Dirichlet-Poisson-Binomial estimates with the tub prior. Probability estimates and inferences for S. cerevisiae log-phase data based on the DPB model with tub prior, α= 1/l, for all genes. The 20 genes with the largest tag counts are arranged in decreasing rank order along the x-axis. The observed tag proportions are marked with open circles, the bias corrected MLE in open triangles. In this case analytically derived posterior modes deviate substantially from the corrected MLE while the estimated posterior mean is identical to it. Upper and lower marginal 95% Bayesian posterior bounds are also given.
Autocorrelation in Gibbs samples of proportions.
| DPB | DMB | MD | ||||
|---|---|---|---|---|---|---|
| 10 | 0.011 | 0.000 | 0.015 | 0.019 | 0.053 | 0.008 |
| 20 | 0.047 | 0.000 | -0.025 | 0.013 | -0.012 | 0.010 |
| 40 | -0.001 | -0.028 | -0.019 | 0.013 | -0.034 | 0.012 |
| 80 | 0.000 | 0.027 | 0.000 | -0.007 | 0.011 | -0.010 |
Autocorrelation estimates for three proposed algorithms across posterior samples of the mRNA proportion at the the open reading frame YAL003W in S. cerevisiae log-phase data.
Autocorrelation in Gibbs samples of population size.
| DPB | MD | |||
|---|---|---|---|---|
| 10 | 0.905 | 0.024 | 0.606 | 0.007 |
| 20 | 0.863 | 0.018 | 0.344 | -0.012 |
| 40 | 0.783 | -0.029 | 0.088 | -0.034 |
| 80 | 0.666 | 0.027 | -0.011 | 0.011 |
Autocorrelation estimates for mRNA population size N in the DPB algorithm and number of unconverted transcripts r in the MD algorithm.
Simulated coverage probabilities for proposed methods.
| Library Size | Gene Count | Prior | ||||||
|---|---|---|---|---|---|---|---|---|
| Flat | Tub | |||||||
| DPB | DMB | MD | DPB | DMB | MD | |||
| 0 - 1.67 × 10-5 | 6178 | 1181 | 0.800 | 0.855 | 0.854 | 0.104 | 0.104 | 0.104 |
| - | 1000 | 25 | 0.825 | 0.834 | 0.834 | 0.107 | 0.108 | 0.107 |
| 1.67 × 10-5 - 1.23 × 10-4 | 6178 | 3678 | 0.947 | 0.975 | 0.976 | 0.520 | 0.520 | 0.520 |
| - | 1000 | 209 | 0.9425 | 0.950 | 0.950 | 0.558 | 0.558 | 0.558 |
| 1.23 × 10-4 - 9.12 × 10-4 | 6178 | 1173 | 0.961 | 0.951 | 0.948 | 0.899 | 0.898 | 0.898 |
| - | 1000 | 578 | 0.952 | 0.957 | 0.957 | 0.911 | 0.911 | 0.911 |
| 9.12 × 10-4 - 6.74 × 10-3 | 6178 | 133 | 0.939 | 0.485 | 0.479 | 0.945 | 0.944 | 0.944 |
| - | 1000 | 165 | 0.950 | 0.950 | 0.945 | 0.934 | 0.934 | 0.934 |
| 6.74 × 10-3 - 1.35 × 10-1 | 6178 | 13 | 0.809 | 0.009 | 0.005 | 0.953 | 0.951 | 0.951 |
| - | 1000 | 23 | 0.948 | 0.794 | 0.780 | 0.943 | 0.939 | 0.939 |
Coverage percentages for 95% posterior intervals for MCMC methods. Percentages represent the average number of intervals out of 1000 which covered the true proportion mover the given range of mvalues. Standard errors of proportions reported are below 1.5% and typically close to 1%.
Figure 4Trends in coverage probabilities. Comparison of marginal coverage probabilities of 95% posterior intervals for both priors across 13 genes for simulated libraries with 6,187 genes and 15,000 tags. Upper and lower bars represent the average upper and lower endpoints across 1,000 simulated libraries. Percentages shown give coverage probability. Percentages are located at the true generating proportion. The left and right panels correspond to the flat and tub priors, respectively.
Figure 5Simulated intervals for a large proportion gene. 95% posterior intervals for the gene with 2nd largest count across a range of simulated libraries. Left frame represents the flat prior (Actual Coverage = 0.652), right frame the tub prior (Actual Coverage = 0.961). U and L are upper and lower confidence points for each simulated value. The open circle represents the mean value of samples from each library. The solid line is the true generating proportion m2 = 0.027. ϕ = .959.
Figure 6Simulated intervals for a small proportion gene. 95% posterior intervals for gene with 5500th largest count across a range of simulated libraries. Left frame represents the flat prior (Actual Coverage = 0.867), right frame the tub prior (Actual Coverage = 0.127). U and L are upper and lower confidence points for each simulated value. The open circle represents the mean value samples from each library. The solid line is the true generating proportion m5500 = 0.000009, ϕ = 0.909.