| Literature DB >> 20451162 |
Guillaume Brysbaert1, François-Xavier Pellay, Sebastian Noth, Arndt Benecke.
Abstract
In view of potential application to biomedical diagnosis, tight transcriptome data quality control is compulsory. Usually, quality control is achieved using labeling and hybridization controls added at different stages throughout the processing of the biologic RNA samples. These control measures, however, only reflect the performance of the individual technical manipulations during the entire process and have no bearing as to the continued integrity of the RNA sample itself. Here we demonstrate that intrinsic statistical properties of the resulting transcriptome data signal and signal-variance distributions and their invariance can be identified independently of the animal species studied and the labeling protocol used. From these invariant properties we have developed a data model, the parameters of which can be estimated from individual experiments and used to compute relative quality measures based on similarity with large reference datasets. These quality measures add supplementary, non-redundant information to standard quality control estimates based on spike-in and hybridization controls, and are exploitable in data analysis. A software application for analyzing datasets as well as a reference dataset for AB1700 arrays are provided. They should allow AB1700 users to easily integrate this method into their analysis pipeline, and might instigate similar developments for other transcriptome platforms. 2010 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20451162 PMCID: PMC5054119 DOI: 10.1016/S1672-0229(10)60006-X
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1A. Heatmaps of signal-normalized logarithmic coefficient of variance (V) against logarithmic signal (S) for one actual dataset example from each of the three species for which AB1700 arrays are available. The color gradient from blue to black to red indicates the density of the data points. The distribution is characteristic of the AB1700 data. B. Similar heatmaps showing the distribution of ln(V) vs. ln(S) for two monkey species hybridized to the human AB1700 arrays. C. A heatmap of the same distribution for random data generated from our AB1700 signal and signal-variance model. Note that the distribution closely resembles those from actual data.
Figure 2Superposition of the two S.I. value histograms for the 750× and 500× reference files. Note that the first (empty) bin is of different size than the others.
Figure 3Superposition of the S.I. value histograms for the two data series GSE3155 (black) and GSE6806 (red) as calculated using the 500× reference dataset. Note that the first (empty) bin is of different size than the others. The auto-referenced set of 500 arrays making up the 500× is shown as backdrop (white bars).
Figure 4A. Ln(V) versus ln(S) heatmaps for two experiments from the GSE6806 series. The S.I. values based on the 500× reference set are indicated. B. A Venn diagram depicting the number of probes considered statistically significantly (P<0.05) upregulated when comparing the Dicer knockdown (Dicer-KO) versus the wild-type (Dicer-WT) case either with or without the GSM157090 experiment.
Comparison of the biological processes detected as significantly (P<0.05) enriched in the analysis of statistically significantly upregulated probes either including or not the GSM157090 experiment
| Biological process | Dicer KO vs. Dicer WT | Dicer KO vs. Dicer WT [w/o GSM157090] | ||||
|---|---|---|---|---|---|---|
| Count | Expect | Count | Expect | |||
| Protein biosynthesis | 157 | 32.68 | 8.31E-55 | 33.02 | ||
| Chrom. Segreg. | 25 | 6.15 | 1.72E-6 | 6.22 | ||
| Translational Reg. | 20 | 4.97 | 5.90E-5 | 20 | 5.02 | 6.89E-5 |
| Chrom. Pack & Remod. | 31 | 12.36 | 8.69E-4 | 31 | 12.48 | 1.05E-3 |
| DNA replication | 19 | 6.60 | 1.03E-2 | 19 | 6.66 | 1.17E-2 |
| Oxidat. Phos. | 14 | 4.23 | 2.49E-2 | 14 | 4.28 | 2.75E-2 |
| Nuclear transport | 15 | 5.02 | 4.13E-2 | 15 | 5.07 | 4.58E-2 |
Note: “Count” indicates the number of probes annotated to the corresponding ontology term and present in the list of statistically significantly regulated probes of the corresponding condition. “Expect” is the number of probes corresponding to the ontology term that would be expected based on a random zero-hypothesis. P values were determined using a Bonferoni correction for multiple testing.
Figure 5A. S.I. histogram of the individual experiments of series GSE10503 using the 500× reference file. The inlet shows the same series analyzed using the 300× reference file. The two outlier experiments are displayed with a red border. B. A principal component analysis in correspondence space of the GSE10503 series. The two outlier experiments as identified in (A) are indicated using the same color code as in (A). Every biological condition is displayed using its own coloring. C. A Venn diagram depicting the number of probes considered statistically significantly (P<0.01) regulated when comparing the Hdac3-null versus the Hdac3-control experiments either with or without the two outliers identified in (A).
Comparison of the biological process detected as most significantly enriched in the analysis of statistically significantly regulated probes either including or not the outlier experiments
| Biological process | P17∩P28 Hdac3-null vs. Hdac3-Control | P17∩P28 Hdac3-null vs. Hdac3-Control [w/o GSM265476 and GSM 265477] | ||||
|---|---|---|---|---|---|---|
| Count | Expect | Count | Expect | |||
| Lipid, fatty acid and steroid metabolism | 26 | 4.29 | 7.12E-12 | 5.37 | ||
Additional genes identified as being statistically significantly regulated in both the P17 and P28 biological conditions after removal of the two outliers in the P17 condition
| Probe ID | Gene name | Gene symbol | Average fold change | |
|---|---|---|---|---|
| P17 [w/o replicates] | P28 | |||
| 381504 | monoacylglycerol O-acyltransferase 2 | Mogat2 | 2.9058 | 5.9897 |
| 400599 | ATP-binding cassette, sub-family A (ABC1), member 8a | Abca8a | -1.1872 | -3.1906 |
| 437440 | hydroxysteroid (17-beta) dehydrogenase 9 | Hsd17b9 | -1.2689 | -4.2387 |
| 441362 | sulfotransferase family, cytosolic, 1C, member 2 | Sult1c2 | -1.0925 | -1.9719 |
| 501043 | ethanolamine kinase 2 | Etnk2 | -1.2624 | -1.8293 |
| 772131 | cytochrome P450, family 2, subfamily d, polypeptide 13 | Cyp2d13 | -1.0796 | -5.3676 |
| 829262 | acyl-CoA thioesterase 10|acyl-CoA thioesterase 9 | Acot10|Acot9 | 1.2657 | 2.2433 |
| 916709 | hexosaminidase A | Hexa | 1.2189 | 1.0132 |
| 920047 | cytochrome b5 reductase 3 | Cyb5r3 | 1.0222 | 1.4030 |
Note: The average fold changes are expressed as log2.