Literature DB >> 16504060

Tests for differential gene expression using weights in oligonucleotide microarray experiments.

Pingzhao Hu¹, Joseph Beyene, Celia M T Greenwood.

Abstract

BACKGROUND: Microarray data analysts commonly filter out genes based on a number of ad hoc criteria prior to any high-level statistical analysis. Such ad hoc approaches could lead to conflicting conclusions with no clear guidance as to which method is most likely to be reproducible. Furthermore, the number of tests performed with concomitant inflation in type I error also plagues the statistical analysis of microarray data, since the number of tested quantities in a study significantly affects the family-wise error rate. It would, therefore, be very useful to develop and adopt strategies that allow quantification of the quality of each probeset, to filter out or give little credence to low-quality or unexpressed probesets, and to incorporate these strategies into gene selection within a multiple testing framework.
RESULTS: We have proposed a unified scheme for filtering and gene selection. For Affymetrix gene expression microarrays, we developed new methods for measuring the reliability of a particular probeset in a single array, and we used these to develop measures for a set of arrays. These measures are then used as weights in standard t-statistic calculations, and are incorporated into the multiple testing procedures. We demonstrated the advantages of our methods using simulated data, publicly available spiked-in data as well as data comparing normal muscle to muscle from patients with Duchenne muscular dystrophy (DMD), in which a set of truly differentially expressed genes is known.
CONCLUSION: Our quality measures provide convenient ways to search for individual genes of high quality. The quality weighting strategies we proposed for testing differential gene expression have demonstrable improvement on the traditional filtering methods, the standard t-statistic and a regularized t-statistic in Affymetrix data analysis.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2006 PMID： 16504060 PMCID： PMC1420292 DOI： 10.1186/1471-2164-7-33

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

Affymetrix GeneChip™ microarrays are used to measure gene expression for thousands of transcripts simultaneously. Each transcript is measured by 11–20 probesets, where a probeset consists of two almost identical sequences of length 25 bp. One member of the pair is the perfect match (PM) probe, where the sequence is the exact complement of a section of the mRNA. The other member of the pair, the mismatch probe (MM), is identical to the PM probe except for the nucleotide in 13th position and is intended to measure and control for non-specific binding signals. The accuracy and sensitivity of the measurement for any one gene or EST depends on the uniqueness and the binding properties of the probes. It is well known that most probesets perform consistently and reliably, in that similar estimates of expression are obtained from two replicates of an experiment. However, in many cases, the signals from a probeset can be hard to interpret. There may be substantial variability across the probesets in the estimated level of expression, or in the PM-MM differences. PM signals can be smaller than MM signals suggesting high levels of non-specific binding. It is also well known that a large proportion of the genes are expressed in only a few tissues or at a particular developmental stage, and hence many of the genes are not expected to have a measurable transcribed product. Su et al. [1] have estimated that in most cases, only 30–40% of the genome will be expressed in any one tissue. In such situations, the probesets give measurements for PM and MM that fluctuate near the lower detectable limit. The latest release for the Affymetrix GeneChip Expression Analysis Platform, GeneChip Human Genome U133 Plus 2.0 Array [2], provides comprehensive coverage of the transcribed human genome on a single array. The array includes more than 54,000 probesets with 38,500 well-characterized human genes. When analyzing such a large number of genes, the adjustment of significance levels through multiple testing procedures such as the Bonferroni method [3], or Benjamini and Hochberg [4], may be dramatic enough to make it very difficult to identify differentially expressed genes. However, if we knew which transcripts were either not expressed in the tissue under study, or were measured unreliably due to poor probe specificity, we could exclude these transcripts from the analysis and pay a smaller penalty for multiple testing. In 2001, Affymetrix produced a new algorithm for summarizing the results of a probeset, and the algorithm includes a detection p-value, which represents the probability that the probeset (gene) expression is above zero (i.e., turned on), and measured reliably and consistently across the probe. In addition, "Present/Marginal/Absent" (P/M/A) calls for each transcript are based on the detection p-value together with thresholds that can be altered by the user [5]. The calls are often used as filters to keep genes whose transcripts are detectable in a particular experiment [6-8], and this filtering concept has been integrated into some software such as dChip [9,10]. For example, Blalock, et al. [7] removed a probeset from further statistical analysis if there were > = 60% absent calls, for that probeset, in at least one treatment group. However, the stringency of the filtering procedure can strongly affect downstream analyses and the final results. Too much filtering may exclude some important genes, whereas too little filtering will reduce power by increasing the number of tests performed. Moreover, it is possible that the statistical error introduced by imperfect filtering criteria makes the overall error worse. Pounds and Cheng [11] recently argued that it is better to define gene filters based on the detection p-value than on the calls, so that more of the available information about probeset reliability can be used. They developed two pooled filters for each probeset based on detection p-values. An alternative strategy to filtering is to treat probe-specific variability as a quality index to measure the reliability of each probeset. Although many summaries of Affymetrix GeneChip expression have been proposed (e.g. MAS5 [5], dChip [9,10], RMA [12], PLIER [13] and gcRMA [14]), there have been few studies on quality measures for Affymetrix GeneChip expression data. In previous studies, typically, once a filtering decision was made, the estimated intensities of each probeset were considered to be equally well measured, so that probesets with highly variable signals were considered as reliable as probesets with inconsistent signals. However, probesets that detect transcripts expressed at a very high level would be expected to show a more robust signal with greater quality than probesets that are performing poorly or detecting very low level transcripts. Seo et al. [15] used the detection p-values to develop weighted Pearson correlations between expression measurements from two different arrays. The weight was defined to be a function of the two detection p-values for each probeset. They incorporated these weighted correlations into unsupervised clustering analysis, with the goal of choosing the best algorithm for summarizing across the probesets. In this study, we focus on the problem of testing for differential expression between groups of arrays, while adjusting for multiple testing in a weighted framework. We redefine some of the widely used filtering methods [6-9] and also propose some new methods for measuring the quality of a particular probeset in a single array, in an experimental group of arrays, and in an entire study. These measures are then used as weights in t-statistic calculations, and are incorporated into the multiple testing procedures. We applied these methods to a dataset of expression in muscle tissue [16], Choe's spiked-in data [17] and simulated data. One of our quality measures for an experimental group is based on the measure developed very recently by Pounds and Cheng [11]; however, we go further in combining measures across groups, and in discussing the usefulness of these measures when testing for differential gene expression.

Results

Test statistics and quality measures

We propose a weighting paradigm for including quality measures into analysis when testing for differential expression. Suppose that an unweighted test statistic for gene g is represented by t, and that the quality measure is called Q. We propose to evaluate the significance of gene g using the weighted test statistic = tQ. The impact of the weighting on the pattern of results, across all genes, is then taken into account when calculating significance adjusted for multiple testing, or when estimating the false discovery rate. Conceptually, giving a low weight, Q≈0, to a particular gene can be thought of as excluding that gene from consideration. Therefore, our modifications to multiple-testing methods are based on adjusting the number of genes tested, based on the quality measures. Measure Qmust represent a summary across treatment groups w ∈ {1 ...W}. We propose where is a treatment-group specific measure of quality. By using the maximum value across groups in equation (1), a gene that is clearly expressed in one or more groups will be considered as a gene measured with high quality. We examine the performance of several choices for the group-specific quality measure . In particular, = 1.0 leads, after using equation (1), to a measure that is, in fact, no quality filtering: all genes are included in the analysis at full weight. The common practice of analyzing only genes with present calls can be based on { = 1.0 when all arrays in group w have present calls, and = 0 otherwise}. We will denote this as . Instead of using the Affymetrix present/absent calls, more sensitivity can be gained by basing on the actual detection p-values, qfor gene g, array i, and group w. Since a highly-expressed gene corresponds to a small detection p-value, we propose = 1 - , where is the mean detection p-value for group w, leading to a quality measure that we call . The mean may not give enough weight to small p-values, therefore we also investigated two measures that are based on parametric model assumptions for the distribution of detection p-values. Firstly, we assumed that -log(q) follow an exponential distribution with group-specific mean λ. Under this distribution, the probability that the detection p-values will be smaller than a threshold ν can be written as , with corresponding following from equation (1). Although, under the null hypothesis of no signal, the detection p-values can be expected to follow a uniform distribution (so that -log(q) is exponential with mean 1.0), when there is a signal, the p-value distribution is better described by the two-parameter Beta distribution. However, detection p-values are based on rank tests and there is often little variability in the p-values across arrays for highly expressed genes. This leads to difficulties in estimating two parameters. Pounds and Cheng [11] proposed a one-parameter Beta distribution for detection p-values, so that , and we incorporate this measure into our corresponding across-group quality measure . For comparison, two very different test statistics were also used: (1) Weights based on the detection p-values were incorporated directly in the calculation of group-specific means and standard deviations – we refer to this test statistic as to follow our naming convention, although no quality measure is defined; (2) In addition, we used the local pooled error (LPE) test [18], which is based on pooling errors within genes and between replicate arrays for genes in which expression values are similar. This test is referred to as . Unweighted multiple testing corrections were implemented for these two tests. Table 1 summarizes the different test statistics and quality measures used in the paper.

Table 1

Test statistics and quality measures. For the quality-based tests, the summarized quality measure, across treatment groups, is defined by Q= max.

Quality Measures	Array specific measure	Group-specific measure R g w MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGsbGudaqhaaWcbaGaem4zaCgabaGaem4DaChaaaaa@30D4@	Notes
Q g 0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaeGimaadaaaaa@3049@	1.0	1.0	No filtering
Q g c a l l MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4yamMaemyyaeMaemiBaWMaemiBaWgaaaaa@34B7@	q g i w * = { 1 i f c a l l i s P 0 o t h e r w i s e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGXbqCdaqhaaWcbaGaem4zaCMaemyAaKMaem4DaChabaGaeiOkaOcaaOGaeyypa0ZaaiqaaeaafaqabeGacaaabaGaeGymaedabaGaemyAaKMaemOzayMaaGPaVlabdogaJjabdggaHjabdYgaSjabdYgaSjaaykW7cqWGPbqAcqWGZbWCcaaMc8UaemiuaafabaGaeGimaadabaGaem4Ba8MaemiDaqNaemiAaGMaemyzauMaemOCaiNaem4DaCNaemyAaKMaem4CamNaemyzaugaaaGaay5Eaaaaaa@5486@	min ⁡ i ∈ { 1 … I } q g i w * MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacyGGTbqBcqGGPbqAcqGGUbGBdaWgaaWcbaGaemyAaKMaeyicI4Saei4EaSNaeGymaeJaeSOjGSKaemysaKKaeiyFa0habeaakiabdghaXnaaDaaaleaacqWGNbWzcqWGPbqAcqWG3bWDaeaacqGGQaGkaaaaaa@40AD@	All arrays must have present call to be included
Q g m e a n p MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemyBa0MaemyzauMaemyyaeMaemOBa4MaemiCaahaaaaa@362A@	q_giw= detection p-value	1 − q ¯ g w = 1 − ∑ i ∈ w ( q g i w ) / n w MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaIXaqmcqGHsislcuWGXbqCgaqeamaaBaaaleaacqWGNbWzcqWG3bWDaeqaaOGaeyypa0JaeGymaeJaeyOeI0YaaabuaeaacqGGOaakcqWGXbqCdaWgaaWcbaGaem4zaCMaemyAaKMaem4DaChabeaakiabcMcaPiabc+caViabd6gaUnaaBaaaleaacqWG3bWDaeqaaaqaaiabdMgaPjabgIGiolabdEha3bqab0GaeyyeIuoaaaa@47CB@	n_wis the number of arrays in group w
Q g exp ⁡ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGagiyzauMaeiiEaGNaeiiCaahaaaaa@338F@	q_giw= detection p-value	exp⁡⌊λ^gwlog⁡ν⌋ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacyGGLbqzcqGG4baEcqGGWbaCdaGbdaqaaGGaciqb=T7aSzaajaWaaSbaaSqaaiabdEgaNjabdEha3bqabaGccyGGSbaBcqGGVbWBcqGGNbWzcqWF9oGBaiaawcp+caGL7Jpaaaa@4098@ for sensitivity parameter ν	λ ^ g w = − n w / ∑ i ∈ w log ⁡ ( q g i w ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuaH7oaBgaqcamaaBaaaleaacqWGNbWzcqWG3bWDaeqaaOGaeyypa0JaeyOeI0IaemOBa42aaSbaaSqaaiabdEha3bqabaGccqGGVaWldaaeqbqaaiGbcYgaSjabc+gaVjabcEgaNbWcbaGaemyAaKMaeyicI4Saem4DaChabeqdcqGHris5aOGaeiikaGIaemyCae3aaSbaaSqaaiabdEgaNjabdMgaPjabdEha3bqabaGccqGGPaqkaaa@497C@
Q g b e t a MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemOyaiMaemyzauMaemiDaqNaemyyaegaaaaa@34B7@	q_giw= detection p-value	ν a ^ g w MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF9oGBdaahaaWcbeqaaiqbdggaHzaajaWaaSbaaWqaaiabdEgaNjabdEha3bqabaaaaaaa@32EE@	a ^ g w = q ¯ g w / ( 1 − q ¯ g w ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGHbqygaqcamaaBaaaleaacqWGNbWzcqWG3bWDaeqaaOGaeyypa0JafmyCaeNbaebadaWgaaWcbaGaem4zaCMaem4DaChabeaakiabc+caViabcIcaOiabigdaXiabgkHiTiqbdghaXzaaraWaaSbaaSqaaiabdEgaNjabdEha3bqabaGccqGGPaqkaaa@3F94@
Q g w e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4DaCNaemyzaugaaaaa@3225@	q_giw= detection p-value	N/A	Weighted means x¯gw MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaqeamaaBaaaleaacqWGNbWzcqWG3bWDaeqaaaaa@3137@ and weighted standard deviations S_gw
Q g L P E MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemitaWKaemiuaaLaemyraueaaaaa@32B8@	N/A	N/A	Local pooled error test [18]

Analysis of simulated data

The performance of our proposed methods was evaluated using simulated probe level data generated from a model incorporating probe level effects, optical noise, and non-specific binding, as well as true signals [14]. We have run five simulation models following the simulation procedures described in Materials and Methods. Treatment effects on the signal were varied between 0.5 (very small) and 2.5 (very large) in the five models. To compare performance, we used Summarized Receiver Operating Characteristic (SROC) curves, where the test sensitivities and specificities (true positive and true negative proportions) for a range of p-value cutoffs (or FDR cutoffs for results with multiple testing adjustments) were averaged over 100 simulated datasets. Figures 1 and 2 show SROC curves of models 2 and 4, where large and small treatment effect sizes, respectively, were chosen in the generated models.

Figure 1

Receiver Operator Characteristic (ROC) plots for tests of differential expression in the simulated data with treatment effect δg = 2.0. Six weighted tests and the local pooled error (LPE) test are compared. (a) p-values unadjusted for multiple testing, (b) p-values adjusted by the weighted Benjamini and Hochberg (WBH) multiple testing method.

Figure 2

Receiver Operator Characteristic (ROC) plots for tests of differential expression in the simulated data with treatment effect δg = 1.0. (a) p-values unadjusted for multiple testing, (b) p-values adjusted by the weighted Benjamini and Hochberg (WBH) multiple testing method.

The SROC curve's overall behavior can be measured by the Area Under the Curve (AUC) [19]. Table 2 shows AUCs for five simulation models under different weighting and multiple testing strategies. As seen from the table and figures, weighted t-statistics based on quality measure , in which a gene was excluded if one or more arrays have an absent call, had the lowest AUC in all models. Whether p-values were unadjusted or adjusted by the weighted Benjamini-Hochberg (WBH) method, outperformed the other quality scores in models 1, 2, and 3, where a moderate to large treatment effect was chosen. Although had the overall best performance for models 4 and 5, where a small treatment effect was used, has the best sensitivity when specificity is larger than 95% (shown in Figure 2). We also observed that had similar performance to for models 1, 2 and 3, but outperformed for models 4 and 5. has slightly worse performance than for models 1,2 and 3, but their performance is almost the same for models 4 and 5; conversely, the statistic performed better, relative to the other methods, for small effect sizes. The performance of no filtering at all, , was also a good choice for small changes in gene expression. Therefore, the best choice of weighting statistic in the presence of adjustments for multiple testing appears to depend on the size of the treatment effects. We also found that it is better to define quality measures directly from Affymetrix detection p-values rather than from the Affymetrix Present/Marginal/Absent calls.

Table 2

Area under the curves (AUCs) for five simulation models *

Model	Multiple Testing	Quality Measures

		Q g 0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaeGimaadaaaaa@3049@	Q g c a l l MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4yamMaemyyaeMaemiBaWMaemiBaWgaaaaa@34B7@	Q g exp ⁡ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGagiyzauMaeiiEaGNaeiiCaahaaaaa@338F@	Q g b e t a MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemOyaiMaemyzauMaemiDaqNaemyyaegaaaaa@34B7@	Q g m e a n p MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemyBa0MaemyzauMaemyyaeMaemOBa4MaemiCaahaaaaa@362A@	Q g w e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4DaCNaemyzaugaaaaa@3225@	Q g L P E MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemitaWKaemiuaaLaemyraueaaaaa@32B8@
Model 1 δ_g= 2.5	Unadjusted	0.777	0.380	0.925	0.892	0.871	0.743	0.924
	WBH	0.812	0.382	0.929	0.895	0.885	0.772	0.932
Model 2 δ_g= 2.0	Unadjusted	0.840	0.387	0.934	0.907	0.898	0.809	0.932
	WBH	0.878	0.387	0.937	0.912	0.911	0.846	0.937
Model 3 δ_g= 1.5	Unadjusted	0.895	0.394	0.933	0.918	0.922	0.868	0.932
	WBH	0.925	0.395	0.937	0.923	0.932	0.902	0.936
Model 4 δ_g= 1.0	Unadjusted	0.918	0.399	0.921	0.913	0.926	0.889	0.915
	WBH	0.935	0.401	0.917	0.918	0.934	0.912	0.891
Model 5 δ_g= 0.5	Unadjusted	0.876	0.408	0.875	0.876	0.884	0.850	0.850
	WBH	0.882	0.410	0.858	0.861	0.882	0.860	0.770

For and , the sensitivity parameter ν was set to 0.05.

* Simulated probe level data was summarized using RMA

Duchenne muscular dystrophy vs. normal muscle

Haslett et al. [16] compared gene expression between 12 quadriceps biopsies from Duchenne muscular dystrophy (DMD) patients and 12 quadriceps biopsies of normal skeletal muscle. Using a method called geometric fold change (GFC), they identified 133 differentially expressed genes (139 probesets) with a permutation-based false discovery rate of 2.3 × 10-3; Of these, 12 genes (13 probesets) were confirmed by RT-PCR. This set will be referred to as the "RT-PCR" probesets. Tables 3 and 4 compare the agreement between probesets selected by our test statistics and Haslett et al. [16]. In Table 3, for each test statistic, we selected the top T = 30, 50, 100 or 139 significantly expressed probesets, ranked based on our adjusted false discovery rates, and we counted the number of these probesets that were also in the top T probesets identified by GFC and ranked based on their absolute fold changes. Table 4 shows a similar comparison for the RT-PCR set.

Table 3

Agreement in probeset selections between our methods and Haslett et al. [16]: Given a chosen number of selected probesets, how many of the probesets selected by GFC were also selected by the methods in this paper (corresponding false discovery rate WBH-FDR)

Summarization Method	Number of probesets selected	Q g 0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaeGimaadaaaaa@3049@	Q g c a l l MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4yamMaemyyaeMaemiBaWMaemiBaWgaaaaa@34B7@	Q g exp ⁡ e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGagiyzauMaeiiEaGNaeiiCaa3aaSbaaWqaaiabbwgaLbqabaaaaaaa@350D@	Q g b e t a e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemOyaiMaemyzauMaemiDaqNaemyyae2aaSbaaWqaaiabbwgaLbqabaaaaaaa@3635@	Q g m e a n p MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemyBa0MaemyzauMaemyyaeMaemOBa4MaemiCaahaaaaa@362A@	Q g w e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4DaCNaemyzaugaaaaa@3225@	Q g L P E MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemitaWKaemiuaaLaemyraueaaaaa@32B8@
MAS5	30^a	10^b(3.6e-06^c)	6 (7.6e-06)	10 (6.2e-04)	10 (4.9e-06)	10 (3.6e-06)	11 (3.6e-06)	17 (2.3e-21)
	50	21 (1.4e-05)	13 (1.6e-05)	25 (2.3e-03)	22 (1.0e-05)	21 (1.2e-05)	22 (7.8e-06)	32 (6.0e-13)
	100	56 (3.5e-05)	40 (1.5e-04)	59 (4.1e-03)	57 (2.2e-05)	56 (2.6e-05)	56 (1.9e-05)	61 (3.3e-09)
	139 (2.3e-03)	94 (1.1e-04)	65 (3.3e-04)	96 (9.2e-03)	95 (8.4e-05)	94 (9.3e-05)	94 (6.1e-05)	76 (4.1e-07)
RMA	30	8^d(1.3e-05)	7 (2.5e-05)	9 (1.9e-03)	8 (7.5e-06)	8 (9.7e-06)	8 (5.6e-06)	20 (1.8e-18)
	50	24 (4.3e-05)	13 (5.0e-05)	26 (3.4e-03)	24 (2.8e-05)	24 (3.2e-05)	24 (2.0e-05)	30 (1.8e-11)
	100	60 (1.4e-04)	42 (3.3e-04)	63 (1.1e-02)	60 (9.8e-05)	60 (1.3e-04)	59 (8.0e-05)	62 (3.5e-07)
	139	85 (2.5e-04)	69 (7.4e-04)	92 (1.9e-02)	92 (2.0e-04)	89 (2.2e-04)	85 (1.5e-04)	77 (2.2e-05)

a The top 30 probesets were selected by each method.

b 10 probesets overlapped between the top 30 probesets selected by GFC and the top 30 probesets selected by , when data were summarized using MAS5.

c The WBH-FDR for the top 30 probesets selected by was 3.6e-06.

d 8 probesets overlapped between the top 30 probesets selected by GFC (data normalized by MAS5) and the top 30 probesets selected by (data normalized by RMA).

e Sensitivity parameter ν = 0.05.

Table 4

Comparison of our methods and Haslett et al. [16]. Identification of differentially-expressed probesets validated by RT-PCR: given a chosen number of probesets selected by GFC, and the number of probesets validated by RT-PCR within this set, how many of the RT-PCR probesets were selected by the methods in this paper.

Method	# of probesets selected: # of GFC selections validated by RT-PCR	Q g 0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaeGimaadaaaaa@3049@	Q g c a l l MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4yamMaemyyaeMaemiBaWMaemiBaWgaaaaa@34B7@	Q g exp ⁡ e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGagiyzauMaeiiEaGNaeiiCaa3aaSbaaWqaaiabbwgaLbqabaaaaaaa@350D@	Q g b e t a e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemOyaiMaemyzauMaemiDaqNaemyyae2aaSbaaWqaaiabbwgaLbqabaaaaaaa@3635@	Q g m e a n p MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemyBa0MaemyzauMaemyyaeMaemOBa4MaemiCaahaaaaa@362A@	Q g w e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4DaCNaemyzaugaaaaa@3225@	Q g L P E MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemitaWKaemiuaaLaemyraueaaaaa@32B8@
MAS5	30^a: 8^b	5^c	1	5	5	5	5	4
	50 : 11	8	2	7	7	8	7	6
	100 : 12	11	3	10	10	11	10	11
	139 : 13	12	4	11	11	12	12	12
RMA	30 : N/A	2^d	1	4	2	2	2	6
	50 : N/A	6	2	7	6	6	6	7
	100 : N/A	9	4	9	9	9	8	8
	139 : N/A	10	4	11	10	10	10	11

The top 30 probesets were selected by each method.

b 8 probesets in the top 30 were validated by RT-PCR in Haslett et al. [16].

c 5 RT-PCR probesets were among the top 30 probesets selected by . The 5 RT-PCR probesets are among the 8 RT-PCR probesets selected by GFC.

d 2 RT-PCR probesets were among the top 30 probesets selected by (data normalized by RMA). The 2 RT-PCR probesets are among the 8 RT-PCR probesets selected by GFC, using the MAS5 normalization.

e Sensitivity parameter ν = 0.05.

The measure had the worst performance in concordance for the lists of top T probesets in Table 3, and for identifying RT-PCR probesets in Table 4. The weighted t-statistics based on and performed quite similarly, and gave results that were close also to three other statistics (), but showed slightly better results in Table 3 than these other methods. Interestingly, the FDR values in Table 3 associated with are visibly larger than for the other methods. Although there is more agreement between GFC and in Table 3 than for other methods when the number of selected probesets is small (< = 100), the agreement decreases when more probesets are selected, even though the FDR estimates are smaller. This statistic is obviously very different from the others. When examining Table 3 for T = 30, agreement for all methods with the GFC paper was slightly better using MAS5. This may not be surprising since Haslett et al. [16] used MAS5 also, although, in Table 3, this relationship not entirely consistent across different values of T. In Table 4, however, the agreement is always better using MAS5.

Analysis of Choe's spiked-in data

The availability of data from spiked-in experiments (Choe et al. [17]) provides an excellent opportunity to examine the performance of our weighted statistics on real data where the answers are known. Selected transcripts were added at a range of known concentrations; some were chosen to have differential expression between two groups of samples (true positive changes in expression); others were spiked-in at the same concentration in the two groups (true negative changes in expression). Figures 3 and 4 show the ROC curves for the comparison of the two groups for MAS5 and RMA, respectively. Table 5 shows the AUC under different weighting and multiple testing strategies. For both the RMA and MAS5 strategies, it is still clear that whether p-values were unadjusted or adjusted by the WBH method, outperforms the other quality scores. However, unlike the results in the DMD as well as the simulated data, here the quality measure had better performance than , and . In general, the results based on the MAS5 strategy are slightly better than those based on RMA for any quality-based test statistic, but the relative rankings of the different statistics remain almost the same across the two summarization methods. The LPE test appears to be particularly sensitive to the summarization method. It performs very well for the MAS5 method without multiple testing corrections and very poorly when the BH multiple testing is used. Test statistics based on and had better performance than the LPE test when p-values were adjusted by the WBH in both MAS5 and RMA.

Figure 3

Receiver Operator Characteristic (ROC) plots for tests of differential expression in Choe's spiked-in data summarized by MAS5. (a) p-values unadjusted for multiple testing, (b) p-values adjusted by the weighted Benjamini and Hochberg (WBH) multiple testing method.

Figure 4

Receiver Operator Characteristic (ROC) plots for tests of differential expression in Choe's spiked-in data was summarized by RMA. (a) p-values unadjusted for multiple testing, (b) p-values adjusted by the weighted Benjamini and Hochberg (WBH) multiple testing method.

Table 5

Area under the curves (AUCs) of Choe's spiked-in data (ν = 0.05)

Method	Multiple Testing	Quality Measures

		Q g 0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaeGimaadaaaaa@3049@	Q g c a l l MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4yamMaemyyaeMaemiBaWMaemiBaWgaaaaa@34B7@	Q g exp ⁡ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGagiyzauMaeiiEaGNaeiiCaahaaaaa@338F@	Q g b e t a MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemOyaiMaemyzauMaemiDaqNaemyyaegaaaaa@34B7@	Q g m e a n p MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemyBa0MaemyzauMaemyyaeMaemOBa4MaemiCaahaaaaa@362A@	Q g w e MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaem4DaCNaemyzaugaaaaa@3225@	Q g L P E MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemitaWKaemiuaaLaemyraueaaaaa@32B8@
RMA	Unadjust	0.800	0.881	0.885	0.882	0.850	0804	0.854
	WBH	0.779	0.871	0.874	0.868	0.832	0.776	0.847
MAS5	Unadjust	0.815	0.889	0.901	0.896	0.869	0.814	0.922
	WBH	0.800	0.877	0.895	0.884	0.856	0.789	0.653

Discussion

In traditional gene selection methods, pre-filtering and selection are two separate procedures, and commonly used pre-filtering methods are based on Affymetrix calls [6-9]. We unified gene filtering and gene selection procedures together, where the importance of a gene is measured by its quality score, defined across all arrays and experimental groups, rather than by a given cutoff call value. The methods, therefore, overcome the shortcomings of the traditional methods to filter out genes before any high-level data analysis, such as gene selection, is carried out. Our measure is essentially the same as the traditional method of keeping only probesets where all arrays record a present call. Our results clearly demonstrated that the weighted t-statistic based on has the worst performance in the real data and all simulation models; Pounds and Cheng [11] also found that simple filtering based on Present/Marginal/Absent calls was a poor choice. However, this statistic performed well in the spiked-in data. This artificially-created dataset contained only transcripts that were added in known concentrations, therefore the expression signals are likely to be much more consistent across arrays than signals from different individuals. The best choice of weighting statistic in the presence of adjustments for multiple testing appears to depend on the size of the treatment effects. The exponential model measure, , appears to have a slight advantage over the one-parameter beta distribution , although for small effect sizes they perform very similarly. The distribution of quality scores for the exponential model gives, in general, lower weight to probesets with small detection p-values than the beta distribution. This may lead to a small reduction in sensitivity – the effect of these lower weights can be seen when examining the FDR cutoffs in the DMD data, and in the figures. However, these lower weights also improve specificity which tends to lead to better overall prediction. For quality measures and , the performance of the weight depends on the sensitivity parameter ν. In order to analyze the effect of different values of parameter ν on our results, we did another simulation study. Table 6 summarizes the sensitivity and specificity of the proposed test procedures to detect differentially expressed genes (using the weighted Benjamini-Hochberg procedure to control the false discovery rate at 5%), as a function of ν when the treatment effect was set to 2.0. The sensitivity decreases and specificity increases as ν gets smaller. The sensitivity associated with is slightly better than across this table, but the specificity is worse. It can be argued that the optimal choice of ν is the one that comes closest to perfect prediction. For ν = 0.05 and 0.01, the Euclidean distances from the point where sensitivity = 1 and specificity = 1 are 0.153 and 0.157, for the exponential distribution, and 0.219 and 0.221, for the beta distribution, respectively. Therefore, we fixed ν = 0.05 in our analysis; this choice can be thought of as a p-value of 0.05, leading to the interpretation that a representative transcript from a particular experimental group will be called present when the expected (under the assumed distribution) p-value <0.05. In fact, the significance level of 0.05 is widely used in statistical hypothesis testing and is comparable to the thresholds for Present/Marginal/Absent calls used in the Affymetrix software.

Table 6

Sensitivity and specificity for detecting differentially expressed genes, as a function of the sensitivity parameter ν

ν	Q g exp ⁡ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGagiyzauMaeiiEaGNaeiiCaahaaaaa@338F@		Q g b e t a MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaqhaaWcbaGaem4zaCgabaGaemOyaiMaemyzauMaemiDaqNaemyyaegaaaaa@34B7@

	Sensitivity	Specificity	Sensitivity	Specificity
0.6	0.9996	0.7562	0.9998	0.7207
0.5	0.9995	0.7763	0.9997	0.7372
0.4	0.9981	0.7930	0.9994	0.7490
0.3	0.9967	0.8070	0.9990	0.7590
0.2	0.9922	0.8211	0.9984	0.7668
0.1	0.9831	0.8373	0.9956	0.7752
0.05	0.9660	0.8507	0.9936	0.7812
0.01	0.9167	0.8665	0.9413	0.7873

It is now a common practice to use procedures for controlling multiple testing when identifying differentially expressed genes. Various procedures, such as the Bonferroni correction, the Benjamini and Hochberg false discovery rate [4], or the Benjamini and Yekutieli false discovery rate [20], have been widely used. Due to the quality adjustment in our proposed t-statistics, we can no longer assume that the p-values phave a uniform distribution under the null hypothesis, and we can not assume that the test statistics have identical distributions. Therefore, we developed weighted multiple testing procedures. Our findings showed that the proposed weighted Benjamini and Hochberg (WBH) adjustment procedure is better than the weighted Bonferroni (WB) and weighted Benjamini and Yekutieli (WBY) adjustment procedures (results not shown). However, we also observed that the proposed t-statistics using WB and WBY had poor sensitivity, regardless of the type of quality measure score (data not shown). We are planning to further extend weighted multiple testing methods in order to redefine Storey and Tibshirani's positive false discovery rate (pFDR) [21], where the test statistics are assumed to be identically distributed. Several other modified t-statistics, such as SAM [22], penalized t-statistics [23], or the local pooled error method used here [18], have been developed for microarray data analysis. All of these methods focus on overcoming the shortcoming of the ordinary t-statistics for ranking genes, due to unstable variance estimates that may arise when sample size is small. Each method used a slightly different strategy to estimate a penalty parameter for smoothing unstable variance estimates, using information from all genes rather than relying solely on variance estimates from an individual gene. However, as we discussed in the "Background" Section, the quality of this information is not the same across genes. Therefore, the estimate of the penalty parameter may not be reliable if we assume that all gene-specific information has the same quality. Here, we take another strategy to improve the performance of ordinary t statistics by putting a high weight for the genes with high quality and a low weight for those with low quality in the t-statistic calculations. The initial comparison of our strategy with the LPE test demonstrated that our weighted tests based on and are promising – AUC results were fairly similar, and better in several cases. Although our weighting strategy could be combined with a revised variance-smoothing algorithm to potentially improve the performance of the test statistics in small samples, this was not the focus of this work. It should be noted that the probe-level data analysis (such as background correction, normalization and summarization methods) may influence the results of the test procedures we discussed. Our initial analysis of real data and spiked-in data show that the weighted test statistics based on the MAS5 strategy may have slightly better performance than those based on the RMA strategy. In future investigations, we plan to explore how different probe-level data analysis methods, such as MAS5, dCHIP, RMA, PLIER and gcRMA, and different detection p-value calculation methods [17], may influence our weighting strategy in detail. Our results showed that some true differences are missed with any filtering method, as was noted by others [11]. It is known that the mismatch probes measure some true signal [12], and therefore is possible that, especially for genes expressed at a low level, quality scores for real data are lower than in our simulated data. Defining an alternative to the detection p-value that does not depend on the mismatch data may lead to better sensitivity. However, if mismatch probes do contain signal, and a study wishes to identify small changes in gene expression, the decision to use any filter at all must be carefully considered, given that some true effects are likely to be excluded. In addition, when effects are small, it must be realized that using multiple testing corrections will also lead to the exclusion of real effects. The proposed quality measures may also be useful in other applications of microarray experiments. For example, in microarray meta-analysis we could weight based on not just the quality of a study [24], but the quality of each measurement [25,26].

Methods

Quality measures

The quality of a probeset may depend on many quantities such as spatial arrangement on arrays, upper and lower threshold effects, etc. Here we focus on measuring reliability and consistency of a probeset's expression using the probeset's detection p-value. The distribution of expression within a probeset, leading to a detection p-value, is influenced by all stages of the microarray process including scanner brightness, background, RNA quality, chip design, etc. [5]. Therefore, it can be thought as a synthesis index to represent the probeset's quality. In this paper, we defined quality measures hierarchically at 3 levels (Table 1): (a) gene and array level, (b) gene and group level, summarizing across arrays within each group, (c) gene level, summarizing across arrays as well as groups. Suppose there are i = 1, 2, ..., I arrays in total, each containing g = 1, 2, ..., G genes. Further, assume that there are W treatment groups, each consisting of narrays, for w = 1, 2, ..., W. Therefore, for the first aspect (a), we measured the quality of the measure of expression for one transcript based on the detection p-value or the Affymetrix Present/Absent/Marginal call [5]. Two measures were defined, which we denote by q, the detection p-value, and , based on the present/absent call (see Table 1). Approaches for summarizing across a set of arrays take two forms. Firstly, the quality of a gene's measurement across a particular group of arrays can be defined; we call this . Several different quality scores are described in Table 1, with more detail below for the exponential and beta distributions. Secondly, group-level quality scores can be used to create a single summary measure that applies to all arrays being analyzed. For the latter, we use Q= max[] . Use of the maximum leads to the desirable property that genes present under one set of experimental conditions but absent under another will be retained for analysis with a high quality score. To develop a model-based quality measure, we argued as follows: if a gene is not expressed or cannot be measured, then the detection p-values (q) are expected to follow a uniform distribution. Equivalently, if a gene's expression cannot be detected, then we can assume a common distribution for this gene for all arrays in group w, such that -log(q) ~ Exp(1). That is, the negative log(q) follows an exponential distribution with a rate parameter value of 1 [25]. We therefore made the assumption that the qfor gene g and array i follows the one-parameter exponential distribution with a group specific mean λ, w = 1, ..., W, so that -log(q) ~ Exp(λ). The maximum likelihood estimate of λis the inverse of the group-specific sample mean. Define ν, a sensitivity parameter, to be a desired threshold for the detection p-values, representing 1 minus the probability that any probeset in a particular treatment group shows a detectable signal. Then P(q≤ ν) leads to . Although the distribution of the qmay not follow an exponential distribution exactly, this simple assumption may give adequate results for developing quality measures. Theoretically, p-values are expected to follow a two-parameter Beta distribution. However, estimating these two parameters is occasionally impossible due to the fact that the detection p-values are derived from rank tests, and all arrays may have exactly the same p-value when genes are highly expressed. Pounds and Cheng [11] recently proposed a one-parameter Beta distribution to model detection p-values within each experimental group, , with sensitivity parameter ν. The parameter can be estimated by [27], where

Tests of differential expression with quality weights

In traditional meta-analysis, quality measures are often used when combining results from different studies [28]. Without loss of generality, we can assume that we are comparing two groups of microarrays (with Narrays in group A and Nin group B), and testing for differentially-expressed genes with the two-sample Welch t-statistic, not assuming equal variances. For gene g, the test statistic is therefore , where and denote the sample average intensities in groups A and B, respectively, and and denote the corresponding sample variances. For more than two groups, an F statistic could be used instead. The quality measure (Q) for gene g is then incorporated by = tQ. Therefore, = twhen Q= 1, and goes to zero with low quality scores. We converted this modified t-test statistic to a p-value by reference to a standard t-distribution with degrees of freedom based on Satterthwaite's approximation [29], assuming unequal variances between groups A and B. Of course, it is clear that will no longer follow the t-distribution, since the kurtosis will be substantially altered as Qgets closer to zero. However, the next section describes how these altered p-values were used in modified adjustments for multiple testing. This simple approach of weighting the t-test statistics can be contrasted with the approach of weighting the expression intensities within the test statistic. Using standard formula for weighting based on quality, we constructed , where and are the group-specific weighted means and standard deviations [30], weighted by 1 - q. Here, w = A, B. We obtained pfrom a t-distribution with degrees of freedom based on Satterthwaite's approximation [29]. For brevity, we refer to this approach as even though there was no specific quality measure attached to each gene.

Multiple hypotheses testing with weights

Several authors have considered the problem of including weights in multiple hypothesis testing situations [31-33]. Often, weights have been introduced when some of the hypotheses Hare deemed more important than others. For example, Holm [31] first introduced weights into his sequentially rejective multiple hypothesis testing procedure, where weight was used to indicate the importance of the hypotheses. In that spirit, testing for differences in genes that are not expressed or non-specific in such genes should be considered less important than testing for differences among genes that are specific and well-expressed. To give another perspective, our quality-weighted statistics t* can also be considered as an implementation of a filtering method. For example, is a straightforward implementation of filtering based on including only genes with present calls on all arrays, and our other measures effectively exclude genes with small quality measures when the weighted statistic t* is close to zero. Therefore, we propose to adjust the effective number of genes tested to correspond to the number of tested genes of high quality. Assume that quality measures Q1, Q2, ..., Qand significance values p1, p2, ..., phave been calculated for all G genes. Let denote the ordered significance values, and let denote the quality measures in the same order. We redefined the Benjamini and Hochberg (BH) multiple testing procedure as . The sums of the quality measures in this formula estimate the number of high quality genes instead of the number of genes. We call this approach the Weighted Benjamini and Hochberg (WBH) method. Similar modifications lead to a weighted Benjamini and Yekutieli (WBY) procedure (not shown). We applied WBH to the t-statistics modified by the quality measures , , and . For the t-statistics modified by quality measures and as well as local pooled error (LPE) test , the standard form of the Benjamini and Hochberg (BH) multiple testing procedure was used.

Simulation method

Affymetrix probe level data were generated based on a unified model proposed by Wu et al. [14]. When treatment effects are considered for different conditions, such as cancer and normal tissues, gene expression across these conditions can be modeled as: where Yand Ware the PM and MM intensities for the probe j in probeset g on array i respectively; O denotes optical noise. N represents non-specific binding (NSB) noise. S is a quantity proportional to RNA expression and the coefficient 0 < Φ < 1 accounts for the fact that for some probe-pairs the MM detects signal. Following Wu et al. [14], we simulated and using independent draws from log2() ~ N(5, 0.1) and log2() ~ N(5, 0.1). We set = = 4.6, and assumed that and follow a bivariate normal distribution with mean 0, variance 1, and correlation 0.88. We then generated identically and independently distributed random variates e ~ N(0,0.08), so that and similarly . When mismatch probe j of gene g is attached by picking up stray signal, Φis generated as Φ~ Beta(0.5,5), otherwise, Φ= 0. The proportion of attached probes among total probes was set to 0.01. Since S follows a power law, we set its base to 2. Therefore, if we denote γas the baseline log expression level for probeset g, we can select log2(γ) expression levels from 0 to 12, which were generated from γ~ 12* Beta(1,3) + 1. δis the expected differential expression of gene g across different conditions, which is varied in the simulations. b, which describes the need for normalization, was set to be zero. αis the signal detecting ability of probe j in gene g on array i, which is assumed to follow a normal distribution with mean zero and signal detection variance . Multiplicative errors and were generated independently from N(0, ). Values of and were varied in the simulations. Since the theoretical maximum value of an Affymetrix scanner is 216, we kept only generated Yand Wless than 216, that is, and . The RMA algorithm [12] was used to summarize simulated probe-level data into a signal value, and the MAS5 algorithm [5] was used to obtain the detection p-value.

Design of simulation study

The simulation design is shown in Table 7. We assumed two groups, A and B, with Nand Narrays, respectively. G genes were generated, of them the proportion of expressed genes is k, and the proportion of differentially expressed genes is d of the G*k expressed genes. We set the number of up-and -down regulated genes to be the same in this study. Therefore, the G genes are divided into four groups: a non-expressed gene group where genes are not expressed across all arrays in group A and group B; a non-differentially expressed gene group where genes are expressed but not differentially between groups A and B; an up-regulated gene group where the mean gene expression of gene g in group B is larger by δthan that in group A; and a down-regulated gene group where the mean gene expression of gene g in group A is larger by δthan that in group B.

Table 7

Simulation structure of Affymetrix microarray data

		Group A			Group B

		Array 1	......	Array N_A	Array 1	......	Array N_B
Up-regulated Gene Group	g₁	γ_g			γ_g= γ_g+ δ_g
	...
	g_dkG/2
Down-regulated Gene Group	g_dkG/2+1	γ_g= γ_g+ δ_g			γ_g
	...
	g_dkG
Non-differentially Expressed Gene Group	g_dkG+1	γ_g≠ 0 δ_g= 0
	...
	g_k*G
Non- Expressed Gene Group	g_k*G+1	γ_g= 0δ_g= 0
	...
	G

We ran five simulation models following the above design. The specific parameters used in the five models are: number of genes: 1000; sample size: 25 arrays in groups A and B, respectively (for a total of 50 arrays); number of probes within each probeset: 16; proportion of expressed genes: 0.5; proportion of differentially expressed genes: 0.1; signal detection variance: 0.25; multiplicative error variance: 0.05 and sensitivity parameter ν: 0.05.

Duchenne muscular dystrophy data

Haslett et al. [16] hybridized the total RNA to HG-U95Av2 GeneChips. They used MAS5 to obtain signal intensities and normalized with a linear regression. Tests of differential expression were based on geometric fold change (GFC) [16]. The differential expression of 12 genes (13 probesets) was confirmed by quantitative RT-PCR analysis of seven DMD biopsies and four unaffected biopsies. We used only 23 arrays (only 11 DMD arrays) for our re-analysis since one file was truncated. Raw data were converted to signal estimates using both MAS5 [5] and RMA [12], both were implemented using the affy package in Bioconductor [34].

Spiked-in data of Choe et al. [17]

The 'spiked-in' experiment (Choe et al. [17]) for Affymetrix Genechips provides a controlled dataset of 3,860 RNA species with known sequence and known concentration. Two different samples were prepared and hybridized in triplicate to Affymetric GeneChips; these are called the 'constant' (C) and 'spike' (S) samples. Out of 3,860 RNA species, 2,551 of them were created to have the same concentrations in both samples while the rest (1,309) were spiked in with different concentrations between the S and C samples. Ten fold-change levels, ranging from 1.2 to 4-fold, were assigned to the spiked-in RNAs. Basically, all the RNAs with positive log fold changes can be thought of as differentially expressed genes. In this study, we considered the top 1000 probesets as differentially expressed genes (as did Choe et al. [17]).

Authors' contributions

CG initiated, designed and managed the study. CG also proposed the methods in this manuscript. JB participated in designing and managing the study. PH conducted data analysis and drafted the manuscript. CG and JB revised the manuscript. All authors read and approved the final manuscript.

22 in total

1. Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays.

Authors: Nitin Jain; Jayant Thatte; Thomas Braciale; Klaus Ley; Michael O'Connell; Jae K Lee
Journal: Bioinformatics Date: 2003-10-12 Impact factor: 6.937

2. Tests for differentiation in gene expression using a data-driven order or weights for hypotheses.

Authors: Gerhard Hommel; Siegfried Kropf
Journal: Biom J Date: 2005-08 Impact factor: 2.207

3. An approximate distribution of estimates of variance components.

Authors: F E SATTERTHWAITE
Journal: Biometrics Date: 1946-12 Impact factor: 2.571

4. Meta-analysis in epidemiology, with special reference to studies of the association between exposure to environmental tobacco smoke and lung cancer: a critique.

Authors: J L Fleiss; A J Gross
Journal: J Clin Epidemiol Date: 1991 Impact factor: 6.437

5. Large-scale analysis of the human and mouse transcriptomes.

Authors: Andrew I Su; Michael P Cooke; Keith A Ching; Yaron Hakak; John R Walker; Tim Wiltshire; Anthony P Orth; Raquel G Vega; Lisa M Sapinoso; Aziz Moqrich; Ardem Patapoutian; Garret M Hampton; Peter G Schultz; John B Hogenesch
Journal: Proc Natl Acad Sci U S A Date: 2002-03-19 Impact factor: 11.205

6. Gene expression comparison of biopsies from Duchenne muscular dystrophy (DMD) and normal skeletal muscle.

Authors: Judith N Haslett; Despina Sanoudou; Alvin T Kho; Richard R Bennett; Steven A Greenberg; Isaac S Kohane; Alan H Beggs; Louis M Kunkel
Journal: Proc Natl Acad Sci U S A Date: 2002-11-01 Impact factor: 11.205

7. The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Authors: J A Hanley; B J McNeil
Journal: Radiology Date: 1982-04 Impact factor: 11.105

8. Skeletal muscle dictates the fibrinolytic state after exercise training in overweight men with characteristics of metabolic syndrome.

Authors: Dustin S Hittel; William E Kraus; Eric P Hoffman
Journal: J Physiol Date: 2003-02-28 Impact factor: 5.182

9. Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays.

Authors: Jinwook Seo; Marina Bakay; Yi-Wen Chen; Sara Hilmer; Ben Shneiderman; Eric P Hoffman
Journal: Bioinformatics Date: 2004-04-29 Impact factor: 6.937

10. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset.

Authors: Sung E Choe; Michael Boutros; Alan M Michelson; George M Church; Marc S Halfon
Journal: Genome Biol Date: 2005-01-28 Impact factor: 13.583

4 in total

1. Data integration in genetics and genomics: methods and challenges.

Authors: Jemila S Hamid; Pingzhao Hu; Nicole M Roslin; Vicki Ling; Celia M T Greenwood; Joseph Beyene
Journal: Hum Genomics Proteomics Date: 2009-01-12

2. The effect of insulin on expression of genes and biochemical pathways in human skeletal muscle.

Authors: Xuxia Wu; Jelai Wang; Xiangqin Cui; Lidia Maianu; Brian Rhees; James Rosinski; W Venus So; Steven M Willi; Michael V Osier; Helliner S Hill; Grier P Page; David B Allison; Mitchell Martin; W Timothy Garvey
Journal: Endocrine Date: 2007-02 Impact factor: 3.633

Review 3. Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery.

Authors: Christopher J Walsh; Pingzhao Hu; Jane Batt; Claudia C Dos Santos
Journal: Microarrays (Basel) Date: 2015-08-21

4. Using the ratio of means as the effect size measure in combining results of microarray experiments.

Authors: Pingzhao Hu; Celia M T Greenwood; Joseph Beyene
Journal: BMC Syst Biol Date: 2009-11-05

4 in total