| Literature DB >> 17945026 |
Michael A Gilchrist1, Hong Qin, Russell Zaretzki.
Abstract
BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17945026 PMCID: PMC2217564 DOI: 10.1186/1471-2105-8-403
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Diagram of hypothetical mRNA transcript with its potential AE cut sites (indicated by arrows) and their tag formation probability φ. AE sites are assumed to be cleaved independently of one another with cleaving efficiency p. From an individual mRNA, the tag formed is from the 3' most AE site that is actually cleaved. The probability of forming a tag at the jth site is, therefore, p(1 - p)(.
Symbol Definitions
| Symbol | Definitions |
| Total number of genes with potential AE sites in a transcriptome. | |
| Total number of AE cleavage sites within the transcripts of a gene (or gene | |
| Total number of AE cleavage sites within the coding region of gene | |
| Global cleavage efficiency of the AE. | |
| The frequency of mRNA for the | |
| { | |
| mRNA frequency of gene | |
| mRNA frequency of gene | |
| The frequency of the observed tags for the | |
| { | |
| Tag frequency of gene | |
| Tag formation probability of gene | |
| Mean tag formation progbability which is the sum of | |
| Number of observed tags (at the | |
| { | |
| Total number of observed informative tags, which is ∑ | |
| Parameter for the prior of | |
| { | |
| Sum of all prior parameters, i.e. ∑ | |
| Sum of prior parameters for genes other than |
Parameter Estimates
| Experimental Treatment | |||
| Variables | L | S | G2/M |
| 0.577 (0.545, 0.569) | 0.61 (0.597,0.623) | 0.748 (0.735, 0.758) | |
| Joint Posterior Mode Estimate: | 0.764 | 0.797 | 0.862 |
| Simulation Based Estimate: | 0.777 (0.773, 0.781) | 0.806 (0.802,0.809) | 0.861 (0.857,0.865) |
Posterior mode and 95%PI values for the AE cleavage efficency p, posterior mode value for the mean tag formation probability , and simulation based estimates for for the three experiments in [9]: log growth (L), S phase-arrest, and G2/M phase-arrest. Numbers in parentheses are the lower and upper bounds of 95% PI. Parameters for the Dirichlet prior distribution on was α= 1 for all genes.
Figure 2Posterior probability distributions for the AE cutting effciencies from three different SAGE experiments dicussed in [9]. The experiments were performed with cells at either log growth (L), S-phase arrested (S) or differ G2M-phase arrested. Distributions were generated as in Appendix A. The posterior modes and 95% confidence intervals are provided in Table 2.
Figure 3Composite diagram of tag formation probabilities φ and adjustment of mRNA estimates due to the tagging process for the Log Phase experiment in [9]. Histogram of the relative frequencies of tag formation probabilities φfor the Saccharomyces cerevisiae genome during log growth phase and the corresponding scaling. Histogram scale is indicated on the left axis. Tag formation probabilities were calculated using eqns. (1) and (2) with the cutting efficiency parameter set to the posterior mode for this experiment, i.e. p = 0.56. The relative difference between the adjusted and standard mRNA estimates, , for each gene is plotted relative to the right axis and indicated with a •.
Figure 4Examples of posterior marginal probability distributions for four genes, YFL060C, YPR035W, YOL040C, and YKL152C, during log phase based on data in [9]. Genes were chosen to cover a wide range of tag formation probabilities φand counts T. More specifically, these genes had tag formation probabilities φof 0.356879, 0.44494, 0.98255, and 0.555, respectively, and observed tags counts of 0, 10, 103, and 228, respectively.
Figure 5Illustration of how changing the tag formation probability φ affects the posterior marginal distributions under two different scenarios: (a) when no tags are observed for a particular gene and (b) when ten tags are observed for a particular gene. In (a) where no tags are observed, the posterior mode occurs on the boundary of the parameter space and changing φ has no effect on the mode. Increasing φ does, however, decrease the width of the distribution. In (b) where ten tags are observed, increasing φ leads to a decrease in the mode and also decreases the absolute width of the distribution (which is indicated on the log scale by shifting to the left).
Figure 6Comparison of the tag and mRNA frequency marginal modes ( and respectively) during log growth phase. Data is presented on a log-log scale with a 1:1 line for reference. Genes whose tag formation cutting probability φis greater that the mean tag formation probability are over represented in the tag pool and, consequently, occur below the 1:1 line. Conversely, genes whose tag formation cutting probability φis less that are under represented in the tag pool and occur above the 1:1 line.