| Literature DB >> 19374774 |
Jessica C Mar1, Yasumasa Kimura, Kate Schroder, Katharine M Irvine, Yoshihide Hayashizaki, Harukazu Suzuki, David Hume, John Quackenbush.
Abstract
BACKGROUND: High-throughput real-time quantitative reverse transcriptase polymerase chain reaction (qPCR) is a widely used technique in experiments where expression patterns of genes are to be profiled. Current stage technology allows the acquisition of profiles for a moderate number of genes (50 to a few thousand), and this number continues to grow. The use of appropriate normalization algorithms for qPCR-based data is therefore a highly important aspect of the data preprocessing pipeline.Entities:
Mesh:
Year: 2009 PMID: 19374774 PMCID: PMC2680405 DOI: 10.1186/1471-2105-10-110
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
GO Categories for the Rank-Invariant Genes
| GAPDH | Cytoplasm | Glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity; protein binding | Glycolysis |
| ENO1 | -- | Phosphopyruvate hydratase activity | -- |
| HSPCB | Cytoplasm | Nitric-oxide synthase regulator activity; nitric-oxide synthase regulator activity | Response to unfolded protein; positive regulation of nitric oxide biosynthetic process |
| ACTB | Cytoplasm; cytoskeleton | Protein binding; structural constituent of cytoskeleton | Cell motility |
| EEF1A1 | Cytoplasm; eukaryotic elongation factor 1 complex | GTP binding; protein binding | Translational elongation |
The GO categories are listed for the five genes in the rank-invariant set that were identified as reasonable controls to be used for normalization of the PMA dataset. The presence of GAPDH and ACTB as controls were not surprising for this dataset. The inclusion of some of the other categories heat-shock protein activity and translation, were more surprising.
Figure 1Coefficient of Variation for Different Normalized Data Sets. The CV values for the three different normalization methods on the PMA dataset are represented here in a barchart. The CV for the non-normalized (raw) dataset is included as a reference. The quantile method is associated with the lowest CV, implying the greatest reduction in technical variation in the data.
Figure 2Exemplar graph to clarify the interpretation of Figure 3. The graph presents a visual pairwise comparison between two normalization algorithms Q1 and Q2 on the same data set. For each gene, we calculate the variance of its Q1-normalized expression profile and its Q2-normalized expression profile and plot the log2-ratio of this variance on the y-axis where Y = log2 [Q1-normalized: Q2-normalized]. A gene's log variance ratio is plotted against its expression (mean Ct value) on the x-axis. The regions where the data points fall in the graph give us an indication of which normalization algorithm produces noisier data and whether there is a differential bias in expression for genes most affected by this noise.
Figure 3Pairwise Comparisons of Different Normalized Data Sets. Pairwise comparisons between the three different normalization methods and the non-normalized dataset. The graphs represent the log variance ratios for each gene versus its average Ct value. The red line is the smoothed lowess curve that captures the overall trend of the data in the plot. The dotted blue line represents horizontal axis. The direction of the ratio is reflected in each individual figure title, e.g. the ratios in Figure 3.3A are constructed by taking the log2 transformation of the GAPDH-normalized variance divided by the non-normalized variance for each gene. Points below the dotted blue line correspond to those genes where single gene GAPDH normalization has resulted in a greater reduction in variance relative to the variance of these genes in the non-normalized data.