BACKGROUND: The measurement of gene expression using microarray technology is a complicated process in which a large number of factors can be varied. Due to the lack of standard calibration samples such as are used in traditional chemical analysis it may be a problem to evaluate whether changes done to the microarray procedure actually improve the identification of truly differentially expressed genes. The purpose of the present work is to report the optimization of several steps in the microarray process both in laboratory practices and in data processing using criteria that do not rely on external standards. RESULTS: We performed a cDNA microarry experiment including RNA from samples with high expected differential gene expression termed "high contrasts" (rat cell lines AR42J and NRK52E) compared to self-self hybridization, and optimized a pipeline to maximize the number of genes found to be differentially expressed in the "high contrasts" RNA samples by estimating the false discovery rate (FDR) using a null distribution obtained from the self-self experiment. The proposed high-contrast versus self-self method (HCSSM) requires only four microarrays per evaluation. The effects of blocking reagent dose, filtering, and background corrections methodologies were investigated. In our experiments a dose of 250 ng LNA (locked nucleic acid) dT blocker, no background correction and weight based filtering gave the largest number of differentially expressed genes. The choice of background correction method had a stronger impact on the estimated number of differentially expressed genes than the choice of filtering method. Cross platform microarray (Illumina) analysis was used to validate that the increase in the number of differentially expressed genes found by HCSSM was real. CONCLUSION: The results show that HCSSM can be a useful and simple approach to optimize microarray procedures without including external standards. Our optimizing method is highly applicable to both long oligo-probe microarrays which have become commonly used for well characterized organisms such as man, mouse and rat, as well as to cDNA microarrays which are still of importance for organisms with incomplete genome sequence information such as many bacteria, plants and fish.
BACKGROUND: The measurement of gene expression using microarray technology is a complicated process in which a large number of factors can be varied. Due to the lack of standard calibration samples such as are used in traditional chemical analysis it may be a problem to evaluate whether changes done to the microarray procedure actually improve the identification of truly differentially expressed genes. The purpose of the present work is to report the optimization of several steps in the microarray process both in laboratory practices and in data processing using criteria that do not rely on external standards. RESULTS: We performed a cDNA microarry experiment including RNA from samples with high expected differential gene expression termed "high contrasts" (rat cell lines AR42J and NRK52E) compared to self-self hybridization, and optimized a pipeline to maximize the number of genes found to be differentially expressed in the "high contrasts" RNA samples by estimating the false discovery rate (FDR) using a null distribution obtained from the self-self experiment. The proposed high-contrast versus self-self method (HCSSM) requires only four microarrays per evaluation. The effects of blocking reagent dose, filtering, and background corrections methodologies were investigated. In our experiments a dose of 250 ng LNA (locked nucleic acid) dT blocker, no background correction and weight based filtering gave the largest number of differentially expressed genes. The choice of background correction method had a stronger impact on the estimated number of differentially expressed genes than the choice of filtering method. Cross platform microarray (Illumina) analysis was used to validate that the increase in the number of differentially expressed genes found by HCSSM was real. CONCLUSION: The results show that HCSSM can be a useful and simple approach to optimize microarray procedures without including external standards. Our optimizing method is highly applicable to both long oligo-probe microarrays which have become commonly used for well characterized organisms such as man, mouse and rat, as well as to cDNA microarrays which are still of importance for organisms with incomplete genome sequence information such as many bacteria, plants and fish.
Gene-expression microarrays are widely used for large scale studies in cells, and are emerging as promising tools in clinical diagnosis, with potential impact on the assessment of prognosis and choice of treatment [1]. Microarray data are also increasingly becoming essential for data driven model building in system biology approaches [2,3]. However, the measurement of gene expression using e.g. cDNA microarray technology is a complex process that involves several steps, which have not yet been fully optimized. In a typical cDNA microarray experiment, total RNA from two biological sources in which gene expression levels are to be compared, is labelled with two different fluorescent dyes and hybridized to an array of spotted cDNA probes. After hybridization and scanning the data are further pre-processed. This often includes image analysis, filtering of data and background subtraction before normalization and transformation of data. Relative gene expression levels are given as ratios of intensities of fluorescent emission from labelled the RNAs'. Many different approaches and methods can be used at each stage of a microarray experiment, and there is still no definitive consensus about which methods to choose (reviewed and discussed in [4]).Sample quality, labeling protocol, hybridization conditions, scanning protocols and image acquisition as well as the different stages in the microarray data analysis pipeline can all contribute to the overall uncertainty of the conclusions drawn. There has therefore recently been an emerging focus on the need for universally applicable standards, reference materials and analytic guidelines to assist in the standardization of microarray experiments [4-10]. In order to choose or optimize microarray methods several criteria can be used. The use of external RNA spike-in controls is the most recommended approach for technology assessment and optimization. However, the recommended use of spike-in RNA might not always be feasible. For optimal application several controls and probes must be used and the spike-in controls must be representative of the endogenous RNA with respect to e.g. length, sequence characteristics, melting temperature and cross-hybridization risk [7]. On some arrays suitable spike-in controls are not printed, and printing a sufficient number of spikes can leave insufficient room for the genes of biological interest. Other valuable tools are calibrated reference RNA samples such as those developed in the MAQC project [10] or mixed tissue RNA samples developed by Thompson et al. [11,12]. However, it is not always possible to have RNA samples with known differences in the expression level of many genes from the species of interest.One general strategy that can be used is to minimize the variability of self-self hybridizations [13]. Unfortunately, this criterion focuses on the zero ratios that are the least interesting biologically and will tend to minimize variability (noise) as well as signal, which may not improve the overall ability to detect differentially expressed genes. One major concern is that the optimization criterion should increase the chances of discovering truly differentially expressed genes. Here we propose an optimization method that requires only four microarrays per evaluation and uses both hybridizations of RNA from samples with marked differences in gene expression, termed "high contrasts" as well as self-self hybridizations. A change in either laboratory procedure or data processing can increase real signal or just increase the noise. If the noise is increased this will also affect the self-self hybridizations. On the other hand, if signal is lost and genes are pushed artificially towards zero ratios, this will shrink ratios in the high contrast experiments. In the high contrast versus self-self method (HCSSM) proposed here we maximize the difference between a self-self and a high contrast experiment. This difference is quantified as the number of significantly differentially expressed genes in the high contrast experiment at a certain false discovery rate (FDR) estimated using the self-self experiments as a null distribution. Genes are scored for differential expression on the basis of the high contrast and the self-self hybridization separately using a modified T- statistic [14].Where is the estimated effect for gene g, is the modified gene variance and νis the inverse of the number of degrees of freedom for gene g. A T-score threshold is chosen, and the FDR is the ratio between the number of genes above the cut-off on the self-self list and high contrast list. To find genes at a specific false discovery rate (e.g. 0.05), T-scores computed for the high contrast and the self-self experiment are sorted and the T-score cut-off is lowered until the chosen false discovery rate is obtained (Figure 1). The number of genes found at a standard false discovery rate (e.g. 5 %) may then be used as an optimization criterion to be maximized. It is important to note that this method of determining FDR rates relies on a large number of truly differentially expressed genes in the high contrast experiment and should not be used as a tool for analysis of biological experiments where only a few genes may be differentially expressed.
Figure 1
The High Contrast . The figure illustrates how the false discovery rate is determined in HCSSM. For a chosen T cut-off, the genes with T-scores larger than the cut-off are declared significant. The false discovery rate is then determined by dividing the number of genes deemed significant in the self-self experiment by the number of genes deemed significant in the high contrast experiment. For example, if for a T-cut-off six genes are declared significant from the self-self experiment and 127 genes are declared significant from the high contrast experiment the false discovery rate will be 6/127 ~0.05. If a specific false discovery rate is wanted (often 0.05 or 0.01), the T cut-off can be adjusted to obtain it.
The High Contrast . The figure illustrates how the false discovery rate is determined in HCSSM. For a chosen T cut-off, the genes with T-scores larger than the cut-off are declared significant. The false discovery rate is then determined by dividing the number of genes deemed significant in the self-self experiment by the number of genes deemed significant in the high contrast experiment. For example, if for a T-cut-off six genes are declared significant from the self-self experiment and 127 genes are declared significant from the high contrast experiment the false discovery rate will be 6/127 ~0.05. If a specific false discovery rate is wanted (often 0.05 or 0.01), the T cut-off can be adjusted to obtain it.
Applications
There is a large number of steps in the microarray production, hybridization, scanning and data analysis that have adjustable parameters that may be optimized. We will show how HCSSM can be used to optimize data analysis and hybridization protocols. As mentioned above, the standard microarray data analysis pipeline consists of image analysis, filtering of data, background correction, normalization and identification of differentially expressed genes. Recent studies have focused on the normalization step and evaluated a large number of normalization methods for cDNA microarray data [15,16]. We have therefore chosen to test the HCSSM on filtering and background correction, as these are fields where there is still considerable debate in the literature [4,17].
Effects of level of filtration on the number of differentially expressed genes
It is common in microarray analysis to reduce the impact of spots that are malformed or have intensities outside the linear range of the scanner. Such spots are either removed, or weighted down using some measure of spot quality. Filtering has been shown to introduce bias in microarray studies [18], but has also been shown to significantly reduce variation in self-self hybridizations [13]. However, as far as we know, no work has shown how the bias versus variance trade-off in practice influences the ability to detect differentially expressed genes. Therefore, we tested a number of different filtering methods of increasing complexity to evaluate how they influence the ability to find differentially expressed genes in a study.Often filtering consists of removing spots with intensities below a (arbitrary) threshold [19]. We have however found no investigations into the effect of such filtering. Several filtering methods based on spot quality statistics have been reported [13,18,20]. These quality measures can be used as filters by setting a spot quality threshold, but can also be used as weights to reduce the impact of low quality spots.We propose a simple quality measure using the variability information available from most image analysis software. The variation in a log ratio (S) can be approximated from the variation in the pixel intensities using Equation (2) [19]Where G and R are the green and red signal intensities, σ, σare the pixel standard deviations in the red and green channel respectively.Quality weighting can be obtained by:The delta term is a small number added to stabilize spots with very small uncertainties.
Effects of background correction on the number of differentially expressed genes
The rationale of background correction is that the e.g. observed green signal intensity (G) is a linear combination of a true signal coming from the labelled RNA (G) and a background signal (G) so that:More accurate log ratios can then be obtained by correcting the observed signal by subtracting the background. The problem then is to correctly estimate the background signal for each spot in the image analysis phase, and correcting this estimate to satisfy prior criteria e.g. background smoothness and non-negativity of the true signal. Background correction in itself may also destabilize results by increasing variance [5], and may decrease the ability to detect differentially expressed genes. Therefore, we use the HCSSM to compare the ability to find differentially expressed genes when no background correction is used versus a non negative background correction method [21].
Effect of choice and dose of dT blockers in cDNA microarray experiments
The HCSS-method can be applied to the laboratory procedures as well as to the data processing steps in microarray research. One major source of bias in cDNA microarrays is cross-hybridization of the labelled RNA to non-target homologous probe sequences on the array [22-24]. A substantial proportion of the non-specific cross-hybridization signal is due to poly(dA)-poly(dT) cross hybridization since poly(dT)-containing molecules produced during labelling of poly(dA) tails of sample RNAs by reverse transcription can bind promiscuously to the poly(dA) stretch of cDNA probe spots. These poly(dA) hybridization signals reduce the ability to detect differentially expressed genes. Thus, minimization of poly(dA) hybridization is a major challenge in cDNA microarray experiments. To reduce the poly(dA) signals it is common practice to add a blocker like synthetic poly(dA) in the probe or poly(dT) in the hybridization mixture [25-27]. Recently, an LNA dT Blocker containing Locked Nucleic Acid (LNA) nucleotides [28] has been introduced. There is currently no standardized protocol available that can be used to evaluate and optimize blocking procedures with respect to optimizing the observed number of differentially expressed genes. In the present study we therefore compared the effect of poly(dA)40–60 and different doses of LNA dT blocker on the number of differentially expressed genes, and used the HCSSM to optimize identification of truly differentially expressed genes.
Results and discussion
The choice and dose of blocking reagent influences the number of differentially expressed genes estimated
In order to evaluate how different conditions designed to block poly(dA)-poly(dT) cross hybridization affect the ability to detect differentially expressed genes, we applied the HCSS-method to analysis of microarray experiments performed in the presence of either poly(dA)40–60 or varying amounts of LNA. Poly(dA)40–60 binds to the poly(dT) segment of the labelled target in the hybridization solution and thus competes with the poly(dA) segment in the microarray cDNA probes. LNA dT blocker contains Locked Nucleic Acid (LNA) nucleotides [28] at key positions within the (dT) synthetic strand, and is designed to block poly (dA) sequences present in the microarray cDNA probes and prevent them from hybridizing to poly(dT) segment of the labelled targets.Six different blocking conditions with dye-swap and self versus self were investigated in a total of 24 hybridizations (Figure 2). Significantly differentially expressed genes were determined by estimating the false discovery rate using a null distribution obtained from the self-self experiments, as described in the Background section. Filtration was performed as described below and in the Methods section. Our results clearly demonstrate the effect of blocking the poly(dT) segment of the labelled target since addition of poly(dA)40–60 increased the number of estimated differentially expressed genes from 167 (no poly dA/dT blocking reagent) to 588. The LNA dT blocker further increased the number of estimated differentially expressed genes in a dose dependent manner with a maximum of 2064 differentially expressed genes estimated in the presence of 250 ng LNA (Figure 3). In general, more than five biological experiments are recommended for detecting differentially expressed genes in microarry experiments [4]. Technical replicates are mainly recommended in quality-control studies. We would like to emphasize that even our HCSS-method only requires four microarrays per evaluation; more experiments are needed to accurately determine whether the spike in differentially expressed genes for the 250 ng LNA polyA is a real optimum or noise in the dose response curve. It is also important to notice that the optimal dose and choice of blocker may vary for different arrays and microarray protocols. However, our results indicate that the concept of using self-self and high contrast hybridizations in control experiments is well suited to identify optimal blocking conditions.
Figure 2
Graphical representation of experimental design of the microarray experiment. The nodes correspond to RNA from samples with high expected differential gene expression (rat cell lines NRK52E and AR42J) compared to self-self hybridization (rat cell line AR42J). The samples were hybridized to rat 15 k cDNA duplicates under six different blocking conditions including no blocker, 1000 ng poly(dA)40–60, and 25 to 1000 ng LNA dT blocker. Dye-swap and self versus self were performed for all blocking conditions (total of 24 hybridizations). Green-labelled samples are placed at the tail and red labelled samples at the head of the arrows.
Figure 3
Effect of choice and dose of dT blocker on the number of differentially expressed genes. The figure shows data analysed with weighted filtration and no background correction (see Methods section).
Graphical representation of experimental design of the microarray experiment. The nodes correspond to RNA from samples with high expected differential gene expression (rat cell lines NRK52E and AR42J) compared to self-self hybridization (rat cell line AR42J). The samples were hybridized to rat 15 k cDNA duplicates under six different blocking conditions including no blocker, 1000 ng poly(dA)40–60, and 25 to 1000 ng LNA dT blocker. Dye-swap and self versus self were performed for all blocking conditions (total of 24 hybridizations). Green-labelled samples are placed at the tail and red labelled samples at the head of the arrows.Effect of choice and dose of dT blocker on the number of differentially expressed genes. The figure shows data analysed with weighted filtration and no background correction (see Methods section).
Background correction method has a stronger impact on the estimated number of differentially expressed genes than filtering
The data from the blocking study were also analysed using combinations of filter methods and background correction (Figure 4 and Additional file 1). We tested the following five different filtering methods of increasing complexity: coarse, medium, fine, uncertain and weighting filter. Filters were based on spot foreground intensity, percentage of pixels saturated and ratio uncertainties as described in the Methods section. In addition we applied three different background corrections: none were only foreground signal is used to calculate ratios, Edwards which is a subtraction method that ensures positive values [21] and dampened Edwards were a small number is added to the corrected signal to avoid extreme ratios in those cases where the background corrected intensities would be very low. Weight based filtering; using the ratio uncertainty to reduce the impact of "bad" spots without removing them completely found the highest numbers of differentially expressed genes regardless of background correction method and blocking agent dose (Figure 4 and Addition file 1). The choice of background correction method had a higher impact on the number of differentially expressed genes than filtering. No background correction resulted in the highest amount of differentially expressed genes estimated, the numbers ranging from 515 to 2064 under the five different filtering methods and blocking with 250 ng LNA (Figure 4). Edwards and dampened Edwards gave 66–120 and 191–588 differentially expressed genes, respectively, under the same filtering conditions. Our results strongly indicate that omission of background correction consistently improves the results. The reason may be that background correction introduces a lot of variability to remove a small bias. As long as the red and green backgrounds are highly correlated they will dampen the ratios, but only significantly for the low intensity spots, for which the ratios are uncertain anyway. One strategy may be to use error propagation to evaluate background correction, and only background correct the spots where the bias is large, thus not increasing the variance of all spots, but still removing the bias for some of the worst affected spots.
Figure 4
Effects of background correction and level of filtration on the number of differentially expressed genes estimated. The figure shows data from hybridization with 250 ng LNA blocker added.
Effects of background correction and level of filtration on the number of differentially expressed genes estimated. The figure shows data from hybridization with 250 ng LNA blocker added.
Validation of cDNA microarray data generated by HCSSM
In our experiments a dose of 250 ng LNA dT blocker gave the highest number of differentially expressed genes. When HCSSM was further used to evaluate steps in the computational analysis of microarray data we found that no background correction and weight based filtering estimated 2064 differentially expressed genes (gene list A in Figure 5) compared to 588 when the data was analysed by dampened Edwards background correction and weight based filtering (gene list B in Figure 5). Thus, omission of background correction added ~1478 unique genes to the list of genes identified as differentially expressed. Only 2 of the 588 genes identified with dampened Edwards background correction and weight based filtering were not detected when no background correction was performed (Figure 5).
Figure 5
Comparison of gene lists generated with different background correction methods. The left figure illustrates the number of differentially expressed genes estimated with no background correction (A) and dampened Edwards background correction (B). The Venn diagram shows overlap of differentially expressed genes estimated with no background correction (blue) and dampened Edwards (grey). All data are from hybridization with 250 ng LNA blocker added and weight based filtering.
Comparison of gene lists generated with different background correction methods. The left figure illustrates the number of differentially expressed genes estimated with no background correction (A) and dampened Edwards background correction (B). The Venn diagram shows overlap of differentially expressed genes estimated with no background correction (blue) and dampened Edwards (grey). All data are from hybridization with 250 ng LNA blocker added and weight based filtering.In order to investigate the level of concordance of biological themes represented in the gene lists A and B, we used GeneTools, an "all in one" annotation web tool package [29] recently described by Beisvag et al, 2006 [30]. The two gene lists (reporter lists) were submitted to eGOn (explore genontology), which automatically associates Gene Ontology (GO) terms from public databases to the submitted gene list. We found that the main GO terms were represented in both gene list A and gene list B. Furthermore, the proportion (number of genes in A/number of genes in B) associated with specific GO terms was similar for all terms, as illustrated for selected GO terms in Table 1. The average proportion for all GO terms associated with five or more genes was 2.9 ± 0.8, n = 64 (Additional file 2). This indicated that the increase in differently expressed genes estimated by HCSSM was not random. We than applied cross platform analysis to validate that the 1478 genes added to the gene list was real.
Table 1
Selected GO categories associated with genes differentially expressed in the cell lines AR42J versus NRK52E.
GO number
GO term
A
B
AUB
BÚA
AÚB
Proportion
GO:0008150
biological process
724
233
233
491
0
3.1
GO:0008152
metabolic process
459
135
135
324
0
3.4
GO:0032502
developmental process
246
91
91
155
0
2.7
GO:0007154
cell communication
202
69
69
133
0
2.9
GO:0050896
response to stimulus
165
64
64
101
0
2.6
GO:0048468
cell development
112
38
38
74
0
2.9
GO:0006950
response to stress
101
32
32
69
0
3.2
GO:0009056
catabolic process
65
30
30
35
0
2.2
GO:0007049
cell cycle
60
13
13
47
0
4.6
GO:0016265
death
60
18
18
42
0
3.3
GO:0006928
cell motility
51
17
17
34
0
3.0
GO:0040007
growth
42
19
19
23
0
2.2
GO:0006952
defence response
32
7
7
25
0
4.6
GO:0019725
cell homeostasis
30
12
12
18
0
2.5
A: Gene list generated with no background correction and weight based filtering. B: Gene list generated with dampened Edwards background correction and weight based filtering. AUB: number of genes differentially expressed in both gene list A and B. AÚB: number of genes differentially expressed in gene list A but not in gene list B. BÚA: number of genes differentially expressed in gene list B but not in gene list A. Proportion: number of genes in A/number of genes in B. The genes were annotated using GeneTools [30] according to UniGen build #163. All GO categories associated with more than five genes are shown in Additional file 2.
Selected GO categories associated with genes differentially expressed in the cell lines AR42J versus NRK52E.A: Gene list generated with no background correction and weight based filtering. B: Gene list generated with dampened Edwards background correction and weight based filtering. AUB: number of genes differentially expressed in both gene list A and B. AÚB: number of genes differentially expressed in gene list A but not in gene list B. BÚA: number of genes differentially expressed in gene list B but not in gene list A. Proportion: number of genes in A/number of genes in B. The genes were annotated using GeneTools [30] according to UniGen build #163. All GO categories associated with more than five genes are shown in Additional file 2.
Cross platform microarray analysis
The combined use of multiple microarray platforms has recently been suggested as an alternative that is complementary to qRT-PCR for validation of gene expression profiles [31-33]. In an attempt to validate a large fraction of the cDNA microarray results we used the Illumina Gene Expression system (San Diego, CA) for cross-platform analysis of differentially expressed genes in AR42J cells versus NRK52E cells. The same sources of total RNA used in the cDNA microarray experiments were hybridized to Illumina's Sentrix® RatRef-12 Expression BeadChips, with six technical replicates for each cell line as described in the Methods section. Only genes common to both platforms were included in the analysis. Of the 1478 additional genes identified with no background correction, 553 genes were represented on both the spotted cDNA arrays and the Illumina RatRef-12 Expression BeadChips. Of these 553 genes, 398 (72 %) were identified as differentially expressed in the same direction (higher versus lower) on both platforms (Fig 6). The overlaps of differentially expressed genes across commercially microarray platforms are recently reported to be~80–90 % [10,32,33].
Figure 6
Cross platform microarray analysis. The figure shows ratios (log2 based) from 553 genes comparable between the cDNA and the Illumina platforms. Red dots: genes significantly up-regulated on the Illumina platform. Green dots: genes significantly down-regulated on the Illumina platform. Black dots: genes not identified as significantly different between the cell lines (AR42J versus NRK52E) on the Illumina platform. All the genes were identified as differentially expressed on the cDNA platform, and the concordance between cDNA and Illumia is 72 %.
Cross platform microarray analysis. The figure shows ratios (log2 based) from 553 genes comparable between the cDNA and the Illumina platforms. Red dots: genes significantly up-regulated on the Illumina platform. Green dots: genes significantly down-regulated on the Illumina platform. Black dots: genes not identified as significantly different between the cell lines (AR42J versus NRK52E) on the Illumina platform. All the genes were identified as differentially expressed on the cDNA platform, and the concordance between cDNA and Illumia is 72 %.Since RT-PCR is still considered the "golden standard" for gene expression measurements, we further used SYBR green-based quantitative real-time PCR to validate the relative gene expression for a few of the 1478 additional genes identified with optimal conditions. Commonly, a 2 fold change is reported as the cut-off below which microarray and qRT-PCR data begin to loose correlation, although recent reports observed significant correlation where genes exhibited 1.4 fold change or higher [34,35]. We therefore selected seven genes in the fold ratio range 4.8 to 1.8 (log2 ratio range 2.2 to 0.9 on the cDNA arrays). Results are shown in Additional files 3 and 4. Description of PCR primers and qRT-PCR protocol are shown in Additional file 5. Five of the selected genes (Ica1, c-fos, Btg2, Uhrf and Ube2b) were detected as differentially expressed on both microarray platforms. Two genes (Hoxa2b and Irfrd) were only identified on the the cDNA platform. Five genes with fold ratio > 2 were verified by qRT-PCR (see Additional files 3 and 4). Two genes (Hoxa2 and Ube2b) with fold ratio < 2 were not verified. Hox2b was excluded from the results due to technical problems (see Additional file 4). The gene Ube2b was expressed in the opposite direction relative to the cDNA microarray, but in the same direction as on the Illumina platform (higher in AR42J than in NRK52E on Illumina, lower on cDNA). Consistent with other reports [35,36], the microarray ratios were compressed compared to qRT-PCR results (see Additional file 3). Overall, the qRT-PCR measurement corresponded 100 % (5/5) with the Illumina data and 71 % (5/7) with the cDNA microarray data. The latter confirmation rate is in accordance with cDNA microarray results of others ([37], and references therein).Some of the discrepancy between cDNA microarry versus Illumina and qRT-PCR may be due to errors (reported by e.g. [38]) in the IMAGE collection used for probes in our cDNA microarrays. We thus consider the results from qRT-PCR and the Illumina platform more likely to be "true data" in the present study. In spite of the known errors on cDNA platforms, cross platform analysis confirmed 72 % of the results. Validation of a few selected genes by qRT-PCR supported this finding. Taken together, functional annotation analysis and the alternative methods for validation of the cDNA micoarry data indicated that the increase in the number of differentially expressed genes found by HCSSM was mainly real.
Conclusion
In the present study we have used the HCSSM in cDNA microarray control experiments to evaluate blocking conditions, filtration and background correction. We found that the choice and dose of poly(dT)-poly(dA) blocking agents during hybridization influenced the estimated number of differentially expressed genes. When HCSSM was used to evaluate steps in the computational analysis of microarray data we found that no background correction and weight based filtering gave the largest estimated numbers of differentially expressed genes. The background correction method, however, has a stronger impact on the numbers of differentially expressed genes found than filtering. The results show that HCSSM may be a useful and simple approach to optimize cDNA microarrays procedures without including external standards.
Methods
Cells
AR42J (ratpancreatic acinar cell derived, ATCC, Rockville, MD) were grown in Dulbecco's modified Eagle's medium (DMEM) with 4.5 g/l glucose (Invitrogen, Carlsbad, CA), 15 % fetal calf serum (Euroclone Ltd, Devon, UK), 1 mM sodium pyruvate, 0.1 mg/ml L-glutamine (Invitrogen, Carlsbad, CA), 10 U/ml penicillin/streptomycin (Invitrogen, Carlsbad, CA), and 1 μg/ml fungizone (Invitrogen Carlsbad, CA). NRK52E cells (rat kidney epithelial cells, ATCC, Rockville, MD) were grown in DMEM with 4.5 g/l glucose (Invitrogen, Carlsbad, CA) supplemented with 5 % FCS (Euroclone Ltd), 0.1 mg/ml L-glutamine (Invitrogen, Carlsbad, CA) and 10 U/ml penicillin/streptomycin (Invitrogen, Carlsbad, CA). All cells were grown at 37°C in a humidified atmosphere at 5 % CO2.
Treatment of cells and isolation of RNA
AR42J and NRK52E cells were cultured in 75 cm2 culture flasks for 72 h until confluence was reached. Total RNA was isolated using RNeasy® Midi kit (Qiagen, Germantown, MD) according to the manufacturer's instruction. RNA was quantified and assessed for purity by measurement of OD260 and OD280 using a UV fiberoptic spectrophotometer (Nanodrop Technologies, Rockland, DE) and was qualitatively assessed by measurement of relative 28S and 18S ribosomal band intensities using a Bioanalyzer and RNA LabChip capillary gel electrophoresis assay (Agilent Technologies, Palo Alto, CA). Aliquots were kept frozen at -80°C until further processing.
cDNA Microarray procedure
cDNA Microarrays were manufactured by the Norwegian Microarray Consortium [39] using 15000 cDNA rat probes from Research Genetics (IMAGE collection) printed in duplicates on Corning CMT Gaps II slides (Corning Inc. NY). The probes were dissolved in 50 % DMSO to ensure high printing quality. The microarray slides were UV cross linked at 300 mJ in order to fix the DNA to the glass slides.Labelling was performed by reverse transcription of total RNA from the cell lines (3 μg each) in the presence of primers containing the capture sequence for subsequent hybridization of Cy3- and Cy5-labelled dendrimers, using the Genisphere 3DNA Array 350 Expression Array Detection kit (Genisphere, Hatfield, PA) as described in the manufacturer's protocol. Different combinations of blocking reagents were added to the hybridization mixture: Mouse COT-1 DNA (Life Technologies) was added to all mixtures (0.1 μg/μg RNA) in order to block hybridization of repetitive elements. Different concentrations (0.1–4.0 μl = 25–1000 ng) of LNA dT Blocker (Genisphere, Hatfield, PA) and 1 μl = 1000 ng Poly(dA)40–60 (Stratagen Spotreporter; La Jolla, CA) were used as illustrated in Figure 2.Hybridization was done in a humidified hybridization chamber (Corning Inc., NY) at 60°C for 14–15 h in a total hybridization volume of 60 μl. Post hybridization washes were done for 15 min at 55°C with 2× SSC and 0.2 % SDS, for 10 min at room temperature with 2× SSC, and finally for 10 min at room temperature with 0.2× SSC. After the washing, the Cy3- and Cy5-labelled dendrimers were hybridized to the capture sequence at the reverse transcribed sample RNA at 60°C for 3 h as described in the manufacturer's protocol. After dendrimer hybridization, the slides were washed as described above and dried by centrifugation at 1500 rpm for 5 min.
Scanning and image analysis of cDNA microarrays
The slides were scanned at a resolution of 10 μm by use of Packard Bioscience Scanarray Express HT scanner (Packard BioScience, Billerica, MA). A laser power of 100 % was used, and excitation of Cy3 and Cy5 was performed at a wavelength of 532 nm and 635 nm, respectively. Signals were detected by use of photomultiplier tubes (PMTs) with two channels. In order to maximizing the certainty of the weakest spots, the PMT voltage was adjusted to keep the background intensity between 200 and 400 (mean spot intensity) in each channel as suggested in [40]. The GenePix 5.0 image analysis software (Axon Instruments, Inc., Union City, CA) was used for spot segmentation and intensity calculations. Spots and regions with high unspecific binding of dye or dust particles were manually flagged and excluded from the analysis.
Data analysis of cDNA microarrays
Modified T-scores and false discovery rate (FDR) values were calculated as described under Applications in the Background section for all combinations of filter, background corrections, and blocking treatments.
Filtering methods
A progression of increasingly complex filter functions was evaluated.Coarse filter: Spots flagged as bad or missing during the image analysis are removed.Medium filter: Also removes spots with intensities below 200 and spots with more than 70 % saturation in both channels.Fine filter: Also removes spots that are small, i.e. spots with diameters less than 20.Uncertain filter: Removes spots flagged by GenePix 5.0 as "bad" or "missing" and spots with deltaM > 3.Weight based filter: Quality weighting can be obtained by using Equation (3) as described in the Background section:The delta term is a small number added to stabilize spots with very small uncertainties.
Background correction methods
Three different background corrections strategies were applied:None were only foreground signal is used to calculate ratios, Edwards which is a subtraction method that ensures positive values [21] and dampened Edwards were a small number (e.g. 20) is added to the corrected signal to avoid extreme ratios in those cases where the background corrected intensity would be very low.
Statistical analysis
After filtering and background correction, all arrays were normalized using loess to remove intensity dependent variations in the ratios [41]. Differentially expressed genes were found at the 5 % FDR level using the methodology described in the Background section.
Cross-platform microarray analysis
The Illumina Gene Expression system was used for cross-platform analysis of differentially expressed genes in AR42J cells versus NRK52E cells. RNA amplifications and hybridization were performed at the Finnish DNA Microarray Centre at Turku Centre of Biotechnology [42]. Briefly, aliquots from the same RNA as in the cDNA microarray experiments were amplified with Ambion's Illumina® TotalPrep RNA Amplification kit (cat no AMIL1791) using 400 ng of total RNA as input material. The in vitro transcription (IVT) amplification that incorporated biotin-labelled nucleotides was performed overnight (14 h) at 37°C. After the amplifications the cRNA concentrations where checked with NanoDrop ND-1000 and cRNA quality was controlled by BioRad's Experion electrophoresis station.A total of 750 ng of each biotin-labelled cRNA sample was hybridized to Illumina's Sentrix® RaRef-12-v1 Expression BeadChips at 58°C overnight (17 h) according to the manufacturer's instructions. The hybridized biotinylated cRNA was detected with 1 μg/ml Cyanine3-streptavidine, (GE Healthcare Biosciences; cat no PA43001) and the Beadchips were scanned with Illumina BeadArray Reader (Factor = 1, PMT = 521, Filter = 100 %). Numerical results were extracted with Bead Studio v3.0.19.0 without any normalization or background subtraction.The Illumina data were analysed using a modified T-test [14] with p-values adjusted for false discovery rate [43], and genes with adjusted p-values < 0.05 were taken as significant. Entrez Gene IDs were used as common identifiers for the microarray platforms. The genes were identified according to UniGen build #163 in GeneTools [29,30].
Database submission of microarray data
The microarray data were prepared according to minimum information about a microarray experiment (MIAME) recommendations [44] and deposited in the Array Express [45]. Detailed information about the microarray designs (platforms) and raw data files from the experiments are accessible in ArrayExpress by use of these accession numbers; A-MEXP-358 (cDNA platform), E-TABM-202 (cDNA experiment) and E-TABM-307 (Illumina experiment).
Authors' contributions
TB and EA contributed equally to this work, conceived the study and wrote the manuscript. BDE, HB and VB contributed with ideas on the microarray experiments and revised the manuscript. BDE and TB performed the cDNA microarray experiments and cDNA image analysis. TB performed and analysed the qRT-PCR assays. EA performed the data analysis and tested and improved the statistics. HB produced the spotted cDNA microarrays and performed the database submission of the cDNA microarray data. VB investigate the level of concordance of biological themes represented in different gene lists, and performed the database submission of the data from the Illumina Gene Expression system. AL supervised the study and revised the manuscript. All authors have read and approved the final manuscript.
Additional file 1
Table S1. Number of differentially expressed genes under different blocking, background correction and levels of filtration. The file shows all combinations of blocking, background correction and levels of filtration together with the estimated numbers of differentially expressed genes with the different combinations.Click here for file
Additional file 2
Table S2. Level of concordance of biological themes represented in the gene list generated with different background correction methods. The file shows all GO terms at hierarchy level 1–3 which was associated with five or more genes(n = 64).Click here for file
Additional file 3
Figure S1. Validation of differentially expressed genes by SYBR green-based quantitative real-time PCR. The figure shows fold changes for a few selected genes on the microarry platforms and the correspondence with relative gene expression data from qRT-PCR assays.Click here for file
Additional file 4
Cross platform microarray analysis (Table S3a) and SYBR green-based quantitative real-time PCR (Table S3b). The file shows log2 ratio differences and fold ratios for a few selected genes on the microarray platforms (Table S3a) compared to ΔΔCt values and calculated fold change values from validation by qRT-PCR (Table S3b).Click here for file
Additional file 5
PCR primers (Table S4) and qRT-PCR protocol. The file gives information about the PCR primers and the qRT-PCR protocol used in the study.Click here for file
Authors: A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron Journal: Nat Genet Date: 2001-12 Impact factor: 38.330
Authors: Daniel Handley; Nicoleta Serban; David Peters; Robert O'Doherty; Melvin Field; Larry Wasserman; Peter Spirtes; Richard Scheines; Clark Glymour Journal: Genomics Date: 2004-06 Impact factor: 5.736
Authors: Karol L Thompson; Barry A Rosenzweig; P Scott Pine; Jacques Retief; Yaron Turpaz; Cynthia A Afshari; Hisham K Hamadeh; Michael A Damore; Michael Boedigheimer; Eric Blomme; Rita Ciurlionis; Jeffrey F Waring; James C Fuscoe; Richard Paules; Charles J Tucker; Thomas Fare; Ernest M Coffey; Yudong He; Patrick J Collins; Kurt Jarnagin; Susan Fujimoto; Brigitte Ganter; Gretchen Kiser; Tamma Kaysser-Kranich; Joseph Sina; Frank D Sistare Journal: Nucleic Acids Res Date: 2005-12-23 Impact factor: 16.971
Authors: André Fujita; João Ricardo Sato; Leonardo de Oliveira Rodrigues; Carlos Eduardo Ferreira; Mari Cleide Sogayar Journal: BMC Bioinformatics Date: 2006-10-23 Impact factor: 3.169
Authors: Astrid J Feuerherm; Katarina M Jørgensen; Randi M Sommerfelt; Live E Eidem; Astrid Lægreid; Berit Johansen Journal: Mol Cell Biochem Date: 2013-08-24 Impact factor: 3.396