| Literature DB >> 16539732 |
Frank F Millenaar1, John Okyere, Sean T May, Martijn van Zanten, Laurentius A C J Voesenek, Anton J M Peeters.
Abstract
BACKGROUND: Short oligonucleotide arrays for transcript profiling have been available for several years. Generally, raw data from these arrays are analysed with the aid of the Microarray Analysis Suite or GeneChip Operating Software (MAS or GCOS) from Affymetrix. Recently, more methods to analyse the raw data have become available. Ideally all these methods should come up with more or less the same results. We set out to evaluate the different methods and include work on our own data set, in order to test which method gives the most reliable results.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16539732 PMCID: PMC1431565 DOI: 10.1186/1471-2105-7-137
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The number of differentially expressed genes. Number of genes significant (Ttest p < 0.05) up or down regulated between air control and ethylene or low-light treatment in six different algorithms. The number of biological replicates for each treatment is three. The number of probe pairs per chip is 22746.
| Method | Control – ethylene | Control – Low-light |
| MAS 5.0 | 2201 | 2384 |
| dChip PMMM | 2559 | 2665 |
| dChip PM | 2435 | 2583 |
| RMA | 2470 | 2705 |
| GC-RMA | 2258 | 2411 |
| PDNN | 2972 | 3262 |
Figure 1Venn diagram. Venn diagram of genes significant (Ttest p < 0.05) up or down regulated after three hours of ethylene exposure, depending of the method used to calculate gene expression. This diagram shows exactly the differences and similarities between all the methods. PDNN, MAS 5.0 (MAS, or GCOS), dChip PMMM (PMMM), dChip PM only (PM), RMA and GC-RMA were used. Only 790 genes were in common for all four algorithms. Comparable results were obtained from the low-light treatment. Areas with one letter shows genes which are unique for one method, areas with two letters shows genes which are only in common between these two methods, and so on.
Figure 2Gene signal intensity. Gene signal intensity from control plants of all genes (22747) calculated with six methods, MAS 5.0 (MAS), dChip PMMM (PMMM), dChip PM only (PM), RMA, GC-RMA and PDNN. The signal intensity is sorted from low to high. Similar results where observed with expression data from treated plants.
Correlations between the expression values. Pearson correlation between the expression values calculated with six different methods. The average signal intensity from all 22746 probe sets from the three control chips was used.
| MAS5 | dCHip PMMM | dCHip PM | RMA | GC-RMA | |
| dCHip PMMM | 0.9831 | ||||
| dCHip PM | 0.9846 | 0.9889 | |||
| RMA | 0.9913 | 0.9778 | 0.9862 | ||
| GC-RMA | 0.9435 | 0.9325 | 0.9391 | 0.9417 | |
| PDNN | 0.9767 | 0.9736 | 0.9793 | 0.9764 | 0.9534 |
Figure 3Relation between signal intensity from MAS 5.0 and RMA. Relation between the signal intensity calculated with MAS 5.0 and RMA software of all probe sets. In general there is a good correlation (r2 = 0.9913), see also table 1. However, variation increased closer to the unity. For example a signal of 4 in MAS 5.0 results in a signal between 4 to 5.5 in RMA on a ln scale.
Figure 4Spiked-in data. (A) Average observed ln intensity plotted against normalized ln concentration for 42 spiked-in genes of the Affymetrix spike-in experiment. The observed concentrations are adjusted so that all lines have the same intercept at a ln concentration of 2.8 (16 pmol). The solid line without symbols represents the ideal slope-1 line. (B) The accuracy of picking up the spiked-in genes. The significance between two successive spike-in concentrations (0–0.125; 0.125–0.25; etc.) was calculated for each gene. The number of genes where calculated per spike-in concentration that significantly where up regulated, and presented on the y-axis as percentage. This means that at "1" all 42 genes where significant at a given concentration.
Minimum signal. Signal intensity from the RNA spike-in experiments at a concentration (1 pmol) where the spiked-in concentrations were significant different from the background. The number (#) and percentage (%) genes below the minimal signal (Min. signal) is obtained from non-transformed data.
| MAS5 | dCHip PMMM | dCHip PM | RMA | GC-RMA | PDNN | |
| Min signal | 40.4 | 46.5 | 112.2 | 59.2 | 27.4 | 261.3 |
| # genes | 11109 | 2651 | 3895 | 6750 | 9174 | 2651 |
| % genes | 48.8 | 11.7 | 17.1 | 29.7 | 40.3 | 11.7 |
Figure 5Reproducibility. Reproducibility of expression data between three biological replicates (air), compared between MAS 5.0, dChip PM, dChip PMMM, RMA, GC-RMA and PDNN. Reproducibility is calculated as the standard deviation divided by the average signal, which is the coefficient of variation (CV). The CV values are sorted from low to high. The PM, RMA and PDNN algorithms are giving the best reproducible results and MAS 5.0 the worst. Reproducibility of the two other replicated treatments ethylene and low-light gave similar results (data not shown).
Biological comparisons. Expression of genes in ethylene biosynthesis and signal transduction cascade as calculated with six different methods. Numbers in brackets after a gene name show how many genes of a family are on the microarray, followed by the size of the family. Expression data of a gene family is the summation of expression of the individual gene members. The columns with a p showing the significance of the fold change ('x' columns) between air control and ethylene (eth) treated plants. ns = not significant * = p < 0.05, ** = p < 0.01, *** = p < 0.001.
| MAS 5.0 | dChip PMMM | dChip PM | ||||||||||
| gene | Air | eth | p | x | Air | eth | p | X | Air | eth | p | x |
| ACC synthase (10/11) | 378 | 339 | ns | - | 1882 | 1760 | * | 0.9 | 2489 | 2373 | ns | - |
| ACC oxidase (10/10) | 2960 | 5845 | ** | 2.0 | 6196 | 12866 | ** | 2.1 | 8301 | 16931 | *** | 2.0 |
| Receptors (5/5) | 327 | 932 | ** | 2.9 | 806 | 2104 | ** | 2.6 | 1331 | 2881 | *** | 2.2 |
| 55 | 242 | *** | 4.4 | 120 | 614 | *** | 5.1 | 278 | 877 | *** | 3.1 | |
| 191 | 182 | ns | - | 429 | 415 | ns | - | 588 | 558 | ns | - | |
| 578 | 652 | ** | 1.1 | 1464 | 1714 | ** | 1.2 | 2075 | 2377 | ** | 1.1 | |
| RMA | PDNN | GC-RMA | ||||||||||
| gene | Air | eth | p | x | Air | eth | p | X | Air | eth | p | x |
| ACC synthase (10/11) | 1304 | 1186 | ns | - | 4337 | 4057 | * | 0.9 | 801 | 567 | *** | 0.7 |
| ACC oxidase (10/10) | 7341 | 15768 | *** | 2.2 | 14731 | 28773 | *** | 2.0 | 9116 | 23075 | *** | 2.5 |
| Receptors (5/5) | 869 | 2150 | *** | 2.5 | 2419 | 5059 | *** | 2.1 | 600 | 2221 | *** | 3.7 |
| 210 | 904 | *** | 4.3 | 558 | 1566 | *** | 2.8 | 140 | 1031 | *** | 7.4 | |
| 608 | 563 | ns | - | 1029 | 974 | ns | 407 | 369 | ns | 0.9 | ||
| 1493 | 1740 | *** | 1.2 | 3678 | 4416 | * | 1.2 | 1785 | 2207 | *** | 1.2 | |
Figure 6Examples of the relation between PM and MM signals. Relation between the PM and MM signals of four probe sets from all 9 arrays (A...D). Only the data point are plotted when the MM signal intensity is smaller than the PM signal. In panel A and B there is no correlation between the PM and MM signals as can been seen by the low slope and Pearson correlation coefficient. This in contrast to results in panel C and D were the slope and Pearson correlation coefficient are large. These signals are obtained from the microarray scanner and are the input for the six calculation methods.
Figure 7Relation between PM and MM signals. Slope and Pearson correlation coefficient calculated between the PM and MM signals from 200 random chosen probe sets. Only probe sets are used which represents one gene. Both slope and correlation are sorted from low to high. See figure 6 for further explanation and individual examples.
Correlation between microarray and RT-PCR data. Pearson correlation of expression values between six microarray algorithms and three Real Time RT-PCR algorithms on a subset of 18 genes. The negative sign from the numbers in the second and last column are removed, because a lower ΔCt value means more gene product present, consequently leading to a negative correlation with microarray data. All correlations were significant. ns = not significant (p < 0.01). * = p < 0.05, ** = p < 0.01
| ΔCt | Assumption Free | ΔCt + Assumption Free | |
| MAS 5.0 | 0.515** | 0.311* | 0.406** |
| dChip PMMM | 0.468** | 0.298* | 0.382** |
| dChip PM | 0.460** | 0.300* | 0.397** |
| RMA | 0.378** | 0.459** | |
| GC-RMA | 0.495** | 0.297* | 0.407** |
| PDNN | 0.462** | 0.252ns | 0.386** |
RT-PCR data of RMA specific genes. Ethylene/air expression signals from 10 genes calculated with RMA and compared to the Real Time RT-PCR data as calculated with ΔCt. All RMA signals from ethylene treated plants are significantly (0.05 > p > 0.01) different from the air controls (average p = 0.037). The Real Time RT-PCR data shown, are averages, standard deviations and p values after comparison with the control. The average control signal is set to 1. The numbers in bold indicate genes which show a change in expression similar for both methods. For RMA, n = 3 and for Real Time RT-PCR, n = 3–5.
| AGI | Eth/air RMA | Eth/air RT-PCR | ||
| average | average | stdev | p | |
| At5g23060 | 0.075 | 0.456 | ||
| At2g36850 | 0.135 | 0.181 | ||
| At1g70710 | 0.089 | 0.155 | ||
| At5g03760 | 0.109 | 0.021 | ||
| At3g49530 | 1.29 | 0.99 | 0.134 | 0.958 |
| At2g31800 | 1.31 | 0.72 | 0.104 | 0.065 |
| At5g25930 | 0.523 | 0.700 | ||
| At1g77380 | 1.44 | 0.84 | 0.066 | 0.270 |
| At5g57120 | 1.51 | 0.72 | 0.227 | 0.356 |
| At1g64710 | 1.99 | 0.96 | 0.132 | 0.804 |
RT-PCR primers. Primers for the genes used in Real-Time RT-PCR and the length of the PCR product. Each gene is represented with the Affymetrix probe number and the Arabidopsis Genome Initiative (AGI) code.
| probe | AGI | length (bp) | Forward primer | Reverse primer |
| Genes common for all microarray methods | ||||
| 245098_at | At2g40940 | 148 | 5'-CGGAACTCAGAGGAACCATT-3' | 5'-GCAGATACCAAGCCTGATGA-3' |
| 245264_at | At4g17245 | 50 | 5'-CAAGACGGTGACACCTTACG-3' | 5'-TGGAATCCATGTTTGCATCT-3' |
| 247486_at | At5g62140 | 158 | 5'-ACGGAGGAATCGATAGGAGA-3' | 5'-TGTAGATCGGCGAGACACTC-3' |
| 247954_at | At5g56870 | 96 | 5'-CAGAGAGTTCCGGTGTGAGA-3' | 5'-TTCCTGCTGGTGTAGCAAAC-3' |
| 249125_at | At5g43450 | 99 | 5'-GTTCTTGAGCGTGGAGCATA-3' | 5'-ACCGTGGAATTTGGAGAAAG-3' |
| 250598_at | At5g07690 | 75 | 5'-AACAAGCGTTGATGAAGACG-3' | 5'-AAATCGGAATGGTCAAGGAG-3' |
| 250911_at | At5g03730 | 67 | 5'-CGAGATGAGCCGTCTAATGA-3' | 5'-TAGCAAGCTCCCACAAGATG-3' |
| 251058_at | At5g01790 | 133 | 5'-AGAAACTGTCGGCTTCATCA-3' | 5'-TCAGCAGAAGAGTCGAAGGA-3' |
| 251373_at | At3g60530 | 129 | 5'-ACCGCTTGGACCTAAAACAC-3' | 5'-TTCCGATGAGAGTTCGAATG-3' |
| 253302_at | At4g33660 | 83 | 5'-ACCACCACGAAAAGTTGGTT-3' | 5'-GGTCACAGCAACATTCATCC-3' |
| 254371_at | At4g21760 | 51 | 5'-TTCATCTTCCAGCACAGAGC-3' | 5'-TTGTTTCAGGCACCAATCAT-3' |
| 258181_at | At3g21670 | 96 | 5'-GCTTACGTTGGACAGCTTGA-3' | 5'-TCCCATCGATATCGTGCTTA-3' |
| 258468_at | At3g06070 | 100 | 5'-GTTCCTTTATCCCCAAGCAA-3' | 5'-GCATGATGAAAGGTGATGCT-3' |
| 259982_at | At1g76410 | 142 | 5'-TGGCTTGGATCACACTCTTC-3' | 5'-CAGGACCATCTTCACGTTGT-3' |
| 263653_at | At1g04310 | 83 | 5'-ACGCTTGCCAAAACATTGTA-3' | 5'-TGAGACGCTTTTCACCAAAC-3' |
| 264624_at | At1g08930 | 135 | 5'-TGGAATGCATCAGGAATGTT-3' | 5'-TTGCACAGAGTTGTTGAGCA-3' |
| 265194_at | At1g05010 | 59 | 5'-TATAATCCGGGAAGCGACTC-3' | 5'-GCTTCTTTTCCGATCAGCTC-3' |
| 266884_at | At2g44790 | 152 | 5'-ACTCCTACCACACCGGAATC-3' | 5'-ATCGAGACTCCCACCAAAAC-3' |
| RMA specific genes | ||||
| 262870_at | At1g64710 | 129 | 5'-CCGATGGAAAGACCAGATTC-3' | 5'-GAAACAGAGGGTCCACCTTG-3' |
| 260181_at | At1g70710 | 107 | 5'-ACACAAAGCCTCGAGGAAAC-3' | 5'-CATGCTTATTGTCCCAACCA-3' |
| 246389_at | At1g77380 | 61 | 5'-AATGTACATCGCGCAGAAGA-3' | 5'-GACTTGAAGGCAAACCCATC-3' |
| 263461_at | At2g31800 | 52 | 5'-GGGACCTTGGGAGCTATCTT-3' | 5'-ACTTTGGCTGGAGAAAGACG-3' |
| 263891_at | At2g36850 | 139 | 5'-ATGGCTACTCGTGGGTTGTT-3' | 5'-CCAAGGACACATGCAAACAT-3' |
| 267592_at | At2g39710 | 128 | 5'-CGTATTGCATATCCGGTTCA-3' | 5'-CCGGTCGAAATAAGGTAACG-3' |
| 252278_at | At3g49530 | 73 | 5'-CGTGACCGGTTTTGTGTTTA-3' | 5'-TACTGCCGCCCTAAAGAGTC-3' |
| 250892_at | At5g03760 | 77 | 5'-CTGCTTGTGGACTCTCATGG-3' | 5'-TTTGATCGTTGGATCAGTGG-3' |
| 249876_at | At5g23060 | 103 | 5'-GCACAACAGACGTCAAAAGC-3' | 5'-CAGCAGCAACAACAATGACA-3' |
| 249822_at | At5g23710 | 52 | 5'-CGGGAAAAAGTTCGGTAATG-3' | 5'-AATCGAAGCAAAAGCTCCAG-3' |