| Literature DB >> 16840527 |
Asim S Siddiqui1, Allen D Delaney, Angelique Schnerch, Obi L Griffith, Steven J M Jones, Marco A Marra.
Abstract
We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, 'Classic' Massively Parallel Signature Sequencing (MPSS) and 'Signature' MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).Entities:
Mesh:
Substances:
Year: 2006 PMID: 16840527 PMCID: PMC1524917 DOI: 10.1093/nar/gkl404
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Differences in G+C content lead to differences in observed gene sets
| Species | Tissue | Additional information | Technology | Library identifier | Sampling depth | Genes identified | Genes identified per sampled tag | Number of genes identified by both methods | Genes identified by this method only | A+T/C+G library biasa | A+T/C+G bias of tags/probes of genes missed by this methodb |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Liver | Same mRNA used in both experiments | LongSAGE | SM100 | 108 117 | 3593 | 0.033 | 1425 | 2168 | −4.31 | −0.88 | |
| MPSS (signature) | GSM17243 | 1 724 799 | 2266 | 0.0013 | 841 | −18.91 | 8.7 | ||||
| Kidney | P84, male Adult male, cortex | LongSAGE | SM104 | 883 305 | 7291 | 0.0083 | 2924 | 4367 | 4.49 | −5.22 | |
| MPSS (signature) | GSM34298 | 2 230 467 | 3547 | 0.0016 | 623 | −34.06 | 25.24 | ||||
| Visual cortex | P27, visual cortex, male | LongSAGE | SM029 | 115 803 | 5473 | 0.047 | 3713 | 1718 | −0.32 | −7.58 | |
| LongSAGE-Lite | SM040 | 137 562 | 4702 | 0.034 | 989 | −11.84 | 10.89 | ||||
| Callus | Biological replicate | MPSS (classic) | CAF | 1 637 407 | 9732 | 0.0059 | 6106 | 3626 | 3.46 | −13.2 | |
| MPSS (signature) | CAS | 1 433 143 | 7075 | 0.0049 | 969 | −26.31 | 29.78 | ||||
| Inflorescence | Biological replicate | MPSS (classic) | INF | 1 455 847 | 9922 | 0.0068 | 7904 | 2018 | 3.65 | −13.43 | |
| MPSS (signature) | INS | 2 516 138 | 9943 | 0.004 | 2039 | −19.39 | 25.95 | ||||
| Leaves | Biological replicate | MPSS (classic) | LEF | 2 457 736 | 10 311 | 0.0042 | 7592 | 2719 | 3.22 | −11.22 | |
| MPSS (signature) | LES | 2 752 425 | 9071 | 0.0033 | 1479 | −23.16 | 30.58 | ||||
| Root | Biological replicate | MPSS (classic) | ROF | 3 002 218 | 9900 | 0.0033 | 7244 | 2656 | 10.01 | −20.68 | |
| MPSS (signature) | ROS | 2 047 569 | 9193 | 0.0045 | 1949 | −26.38 | 38.27 | ||||
| Silique | Biological replicate | MPSS (classic) | SIF | 1 673 908 | 9715 | 0.0058 | 6157 | 3558 | 7.04 | −13.05 | |
| MPSS (signature) | SIS | 1 869 453 | 7373 | 0.0039 | 1216 | −22.51 | 30.63 | ||||
| ES | H7 p22 | Affymetrix (A/B) | WA07 a22b24 | n/a | 6423 | n/a | 3643 | 2780 | 22.44 | −16.76 | |
| LongSAGE | SHE13 | 272 422 | 6440 | 0.024 | 2797 | −2.08 | 1.42 | ||||
| H9 MEFs p38 | Affymetrix (A/B) | WA09 a1b2 | n/a | 6014 | n/a | 3773 | 2241 | 25.68 | −21.22 | ||
| LongSAGE | SHES2 | 466 042 | 7358 | 0.016 | 3585 | 2.57 | 1.11 | ||||
| H1 p54 matrigel | Affymetrix (A/B) | WA01 a20b21 | n/a | 6101 | n/a | 3375 | 2727 | 25.74 | −18.11 | ||
| LongSAGE | SHE16 | 218 169 | 5894 | 0.027 | 2519 | −4.03 | 3.13 | ||||
| H14 p22 | Affymetrix (A/B) | WA14 a23b25 | n/a | 6028 | n/a | 3326 | 2702 | 24.35 | −17.29 | ||
| LongSAGE | SHE14 | 212 136 | 6020 | 0.028 | 2694 | 2.56 | 2.50 | ||||
| HES4 p36 | Affymetrix (A/B) | ES04 a87b91 | n/a | 5940 | n/a | 3262 | 2678 | 26.21 | −19.0 | ||
| LongSAGE | SHE11 | 209 177 | 6134 | 0.029 | 2872 | −0.36 | 3.77 | ||||
| Heart—atria | Theiler stage 14 | LongSAGE | SM006 | 106 604 | 4600 | 0.043 | 3618 | 982 | 2.9 | −4.22 | |
| Heart—bulbus cordis | Theiler stage 14 | SM005 | 107 297 | 4761 | 0.044 | 1143 | −3.22 | 6.9 | |||
| Liver | Left lobe | MPSS (signature) | GSM32357 | 1 900 569 | 2910 | 0.0015 | 1580 | 1330 | −17.48 | −10.46 | |
| Right lobe | GSM36724 | 1 810 280 | 2014 | 0.001 | 434 | −21.76 | −3.48 | ||||
| Visual cortex, P27 | Biological replicate | LongSAGE-Lite | SM073 | 109 117 | 4495 | 0.041 | 3527 | 968 | −10.28 | 1.1 | |
| SM040 | 137 562 | 4702 | 0.034 | 1175 | −11.84 | 4.3 | |||||
| ES | H1 p54 matrigel | Affymetrix (A/B) | WA01 a26b30 | n/a | 6285 | n/a | 5827 | 458 | 27.91 | −3.47 | |
| H1 p54 matrigel | Affymetrix (A/B) | WA01 a21b22 | n/a | 6102 | n/a | 275 | 25.74 | 2.01 | |||
aThe A+T/C+G bias of the library is measured in units of the number of standard deviations by which the observed bias deviates from neutral. A positive number indicates that the dataset is A+T rich relative to an unbiased sample. Within each pair of experiments the A+T rich dataset is always given first.
bThis column provides the A+T/C+G bias of the tags of genes that were not observed. Consistently, the A+T rich dataset misses C+G rich tags and vice-versa.
The number of NM RefSeq genes that can be identified by each method
| Experiment | Species | Number of genes that can be identified |
|---|---|---|
| LongSAGE (NlaIII) | Human | 14 129 |
| MPSS (DpnII) | Human | 13 555 |
| Affymetrix U133A/B | Human | 11 700 (14 186 probe sets) |
| Affymetrix U133 A | Human | 11 193 (13 173 probe sets) |
| LongSAGE (NlaIII) | Mouse | 15 054 |
| MPSS (DpnII) | Mouse | 14 685 |
| MPSS (DpnII) | 22 381 |
For SAGE and MPPS, unique mappings are required.
Figure 1Histogram showing the distribution of bias among five experimental methods. The A+T/C+G bias of an individual experiment is measured in units of the number of standard deviations by which the observed bias deviates from the expected bias. The biases of the individual experiments comprising each series are plotted as a histogram. The position of the peak and the width of the distribution are different for each method and illustrate differences in systematic and random error with respect to A+T/G+C bias.
Mean and standard deviation of the distributions of G+C DS for each experimental series (Figure 1)
| Experiment series | Number of experiments | Mean G+C DS | Standard deviation of G+C DS | Number of replicates required to achieve a standard error in the mean of 1.0 |
|---|---|---|---|---|
| MPSS (Signature) Mus. | 67 | −27.76 | 6.48 | 42 |
| LongSAGE | 83 | 0.15 | 3.75 | 15 |
| LongSAGELite Mus. | 21 | −9.48 | 4.89 | 24 |
| MPSS (Classic) Arab. | 5 | 5.48 | 2.98 | 9 |
| MPSS (Signature) Arab. | 12 | −23.7 | 5.79 | 35 |
| Affymetrix (ES series) U133A/B | 28 | 26.36 | 2.32 | 6 |
| Affymetrix (ES series) U133A RMA | 28 | −7.56 | 3.54 | 13 |
| Affymetrix (ES series) U133A GC-RMA | 28 | 18.35 | 1.96 | 4 |
| Affymetrix (993 expts.) U133A | 993 | 21.43 | 7.62 | 59 |
| LongSAGE Human ES | 16 | −0.57 | 4.07 | 17 |
Figure 2Histogram showing the distribution of bias among individual series of Affymetrix experiments. For each series, the position of the peak and the width of the distribution are different. The figure illustrates individual laboratories have their own bias and variation in bias. Thus, combining experiments from different laboratories leads to a greater width in the summed distribution (Affymetrix 993 experiments). The series identifiers in the figure (e.g. GSE994) are GEO dataset identifiers.