Henrik Rudolf1, Gerd Nuernberg2, Dirk Koczan3, Jens Vanselow4, Tanja Gempe5, Martin Beye6, Gérard Leboulle7, Kaspar Bienefeld8, Norbert Reinsch9. 1. Institut für Genetik und Biometrie, Leibniz-Institut für Nutztierbiologie, Dummerstorf, DE, Germany. rudolf@fbn-dummerstorf.de. 2. Institut für Genetik und Biometrie, Leibniz-Institut für Nutztierbiologie, Dummerstorf, DE, Germany. gnuernbg@fbn-dummerstorf.de. 3. Institut für Immunologie, Universität Rostock, Rostock, DE, Germany. dirk.koczan@med.uni-rostock.de. 4. Institut für Genetik und Biometrie, Leibniz-Institut für Nutztierbiologie, Dummerstorf, DE, Germany. vanselow@fbn-dummerstorf.de. 5. Institut für Evolutionsgenetik, Heinrich Heine Universität Düsseldorf, Düsseldorf, DE, Germany. tanja.gempe@uni-duesseldorf.de. 6. Institut für Evolutionsgenetik, Heinrich Heine Universität Düsseldorf, Düsseldorf, DE, Germany. martin.beye@uni-duesseldorf.de. 7. Neurobiologie, Freie Universität Berlin, Berlin, DE, Germany. gerleb@zedat.fu-berlin.de. 8. Länderinstitut für Bienenkunde Hohen Neuendorf, Hohen Neuendorf, DE, Germany. kaspar.bienefeld@hu-berlin.de. 9. Institut für Genetik und Biometrie, Leibniz-Institut für Nutztierbiologie, Dummerstorf, DE, Germany. reinsch@fbn-dummerstorf.de.
Abstract
BACKGROUND: Pooled samples are frequently used in experiments measuring gene expression. In this method, RNA from different individuals sharing the same experimental conditions and explanatory variables is blended and their concentrations are jointly measured. As a matter of principle, individuals are represented in equal shares in each pool. However, some degree of disproportionality may arise from the limits of technical precision. As a consequence a special kind of technical error occurs, which can be modelled by a respective variance component. Previously published theory - allowing for variable pool sizes - has been applied to four microarray gene expression data sets from different species in order to assess the practical relevance of this type of technical error in terms of significance and size of this variance component. RESULTS: The number of transcripts with a significant variance component due to imperfect blending was found to be 4329 (23 %) in mouse data and 7093 (49 %) in honey bees, but only 6 in rats and none whatsoever in human data. These results correspond to a false discovery rate of 5 % in each data set. The number of transcripts found to be differentially expressed between treatments was always higher when the blending error variance was neglected. Simulations clearly indicated overly-optimistic (anti-conservative) test results in terms of false discovery rates whenever this source of variability was not represented in the model. CONCLUSIONS: Imperfect equality of shares when blending RNA from different individuals into joint pools of variable size is a source of technical variation with relevance for experimental design, practice at the laboratory bench and data analysis. Its potentially adverse effects, incorrect identification of differentially expressed transcripts and overly-optimistic significance tests, can be fully avoided, however, by the sound application of recently established theory and models for data analysis.
BACKGROUND: Pooled samples are frequently used in experiments measuring gene expression. In this method, RNA from different individuals sharing the same experimental conditions and explanatory variables is blended and their concentrations are jointly measured. As a matter of principle, individuals are represented in equal shares in each pool. However, some degree of disproportionality may arise from the limits of technical precision. As a consequence a special kind of technical error occurs, which can be modelled by a respective variance component. Previously published theory - allowing for variable pool sizes - has been applied to four microarray gene expression data sets from different species in order to assess the practical relevance of this type of technical error in terms of significance and size of this variance component. RESULTS: The number of transcripts with a significant variance component due to imperfect blending was found to be 4329 (23 %) in mouse data and 7093 (49 %) in honey bees, but only 6 in rats and none whatsoever in human data. These results correspond to a false discovery rate of 5 % in each data set. The number of transcripts found to be differentially expressed between treatments was always higher when the blending error variance was neglected. Simulations clearly indicated overly-optimistic (anti-conservative) test results in terms of false discovery rates whenever this source of variability was not represented in the model. CONCLUSIONS: Imperfect equality of shares when blending RNA from different individuals into joint pools of variable size is a source of technical variation with relevance for experimental design, practice at the laboratory bench and data analysis. Its potentially adverse effects, incorrect identification of differentially expressed transcripts and overly-optimistic significance tests, can be fully avoided, however, by the sound application of recently established theory and models for data analysis.
In gene expression profiling pooling is a method to reduce hybridization costs and compensate for insufficient amounts of mRNA. In the subsequent statistical analyses of gene expression data, where a log-transformation during preprocessing is standard, it is important to consider how the expectation and variance of the gene expression of pools relate to individual samples. The impact of pooling on the identification of differential gene expression has been studied in Kendziorski et al. [1], separately for different pool sizes. It has been shown that biological averaging occurs for most of the transcripts and differential expression inferences are comparable for individuals and pools. In Zhang et al. [2] approximations for the expectation and variance of pooled samples were derived. Furthermore, it was shown that biases as well as heteroscedasticity are introduced by variable pool sizes. Experiments with unequal pool sizes therefore were recommended to be avoided. As demonstrated in Rudolf et al. [3], however, a wide class of experiments, in which pool size can be handled as a nuisance effect and is cross-classified with treatment, allows for tests of unbiased contrasts. In the case of a balanced cross-classification the pool size effect must not explicitly appear in the model at all, though hypotheses on treatments remain unbiased, as shown in Rudolf et al. [3]. In any case variable pool sizes have an effect on the covariance of observations. This can be taken into account by considering how many individuals are allocated to each pool and by introducing a random effect for blending along with a corresponding variance component. The latter can be interpreted as a second kind of technical variability induced by inaccuracies in blending slightly unequally-sized aliquots of mRNA from several individuals into common pools. Though this subject has been treated theoretically as described, investigation into the practical importance of this second kind of technical variability is lacking.Consequently a study was performed, in which gene expression data from experiments with four different species were analyzed to investigate the relevance of the aforementioned new kind of technical error in terms of size and significance of the corresponding variance component. Furthermore, we investigated potential consequences on the number of transcripts identified as differentially expressed between different treatments when analyses neglect this kind of error.
Methods
This section offers a short recap of the underlying statistical models. The four experimental data sets are then introduced. In all of them - whether from single- or two-color arrays - there are more observations than pools (see Table 1), which allows for the estimation of all desired variance components. Data simulations are also described and have been included as a useful aid for the interpretation of the experimental data results. Finally, the statistical methods applied for parameter estimation and statistical testing are described.
Table 1
Characteristics of experimental data sets
Characteristics
Mouse
Rat
Bee
Human
Individuals
60
24
14
55
Pools
12
22
12
16
Pool size
5
2,3,12
2,4
3
Observations
44
56
22
30
1-/2- color-array
1
1
2
2
Characteristics of experimental data sets
Random effects in gene expression experiments with variable pool sizes
When aliquots of mRNA from different individuals are blended into common pools, the inaccuracies of this procedure may induce a special kind of technical error. Respective random effects, together with a corresponding variance component, were proposed [3] as a means of modeling the variability of pooled observations in gene expression experiments with variable pool sizes (i.e. differing numbers of individuals per pool). Thus, for background-corrected and normalized log-intensities y (length of vector y equals the number of arrays) of a certain transcript, the model in matrix notation is:where X and Z are the design matrices of the fixed (=(μ,β)⊤) and random (u1,u2) effects. The distribution of u is assumed to be with covariance matrices ( are the variance components) and the residuals are . Random effects of single individuals are assumed to be independently identically distributed with a biological variance , while observations from a number of γ pooled individuals have a biological variance . The vector u1 may comprise biological effects of single individuals as well as average biological effects of groups of individuals constituting common pools, according to the experimental design.The random effect of blending (i.e. for the technical procedure of building a pool) only applies to observations from pools and not to observations from single individuals. Therefore, u2 consists of one effect per mixture, which had been prepared in the lab. The associated variance component is . So, the variance of the observations becomes:The model of this variance structure is based on the closed form approximation of the variance of pools on the scale of log-intensities, proposed in [2]where is the pooling technical variance and is the biological variance of individuals. The substitutions and led to our assumed variance structure (2).In the following, the relevance of accounting for the blending error variance component is investigated in four experimental data sets by comparing the described full model (m2) described above with a reduced one (m1) that lacks this particular variance component. The methodology was checked by a simulation beforehand.
Experimental data
Mouse data
Mouse data consisted of observations from 44 one-color microarrays. RNA for this experiment was extracted from the ovaries of 60 female mice, 30 of which came from a long-term selection line with an extraordinary litter size. All others came from a control line. Pooled samples were built by blending RNA from five mice per sample. Each mouse was only represented in a single pool. For the sake of technical replication, all 12 pooled samples were measured twice by preparing two microarrays per sample. Additionally, animals from two pools per line (ten animals per line) were measured individually. These individual measurements were not included in the previously published analysis of this data [4], where more details of the experiment can be found.Twenty-eight (14 per line) different biological effects were defined per transcript. The dimensions of the design matrix are therefore 44 × 28. In detail, random biological effects were assigned to all individually measured mice (individuals 1 to 10 within each line) and corresponding entries in equal 1. The biological effects of the same ten individuals (in two groups of five) were assigned to the observations from the first and second pooled samples in each line (two observations per pool due to technical replication). In this case, non-zero entries in are 1/5. However, pooled samples numbered three to six within each line each had a biological effect of their own, modeling the average effect of the five respective members of each pool. Note that for the pools three to six the corresponding non-zero entries in are 1. The 28×28 covariance matrix scales the random biological effects and has non-zero entries only on the diagonal, each of them equals the inverse pool size 1/γ. The 22 observations from the first line are represented in the upper part of the design matrix :The technical variability due to blending individual samples only comes into play when observing pooled samples, not for measurements of individuals. Since blending was done only once per pool, there are 12 different effects due to imperfect blending. Therefore, the 44×12 design matrix (see Additional file 1) contains zero rows for observations from single animals. The corresponding 12×12 covariance matrix is diagonal with entries , according to Eq. (3).This study did not involve in vivo experiments. Animals were housed according to the German law for animal protection (TierSchG) and in compliance with the European legislation on the care and use of animals.
Rat data
This data set was analyzed by Kendziorki et al. [1] and contains one-color array data. Rats of the treatment group were treated with Retinoic acid. For the details of data generation and preprocessing, please see the original paper [1]. Rats from the groups A (control) and B (treatment) were measured individually and in pools of various sizes. Each of the twelve rats from both groups was used four times, for an individual measurement and in pools of 2, 3, and 12. For the sample composition we again defined the random effects from the smallest disjunct elements. Therefore, with the help of the matrices and , convex linear combinations were built from the 24 individuals. Here, is the 24 × 24 unity matrix and contains a row for each measurement with entries according to reciprocal pool sizes. Per group, there are 28 measurements partitioned into 12 individual samples, 6 pools of 2, 4 pools of 3, and one of 12, plus 5 technical replications. Thus, the dimensions of the matrix are 56×24, detailed in the Additional file 1. In each group, there were 11 pools, and the diagonal matrix has the dimensions 22×22 with entries . The matrix was constructed analogously to .
Honey bee data
This data set stems from a honeybee project dealing with differences in the pathogen resistance of so-called hygienic and non-hygienic worker bees as far as they are reflected in gene expression differences. Bees designated as ’hygienic’ were observed to open brood cells and assisting the removal of diseased brood. The bees’ activities were recorded on a Varroa-parasitized section of a brood comb. Pooling was applied in a preliminary experiment with a limited number of bees and microarrays. For seven hygienic bees and seven control bees, mRNA was extracted from nerve tissues of the mushroom body (MB), antennal lobe (AL) and Antennae (ANT). The number of individuals blended into a pool was either two or four. Out of the 14 bees, six different sample compositions were built and analyzed for all three tissues with two-color arrays (for the design see Fig. 1). A few individual hybridizations were not carried out due to an insufficient amount of amplified RNA (single samples from AL). For the normalized two-color microarray data we used a model for differences M of log-intensities from the red (R) and green (G) channel
Fig. 1
Design of the honey bee experiment. Scheme for the design of the two-color microarray experiment with honey bees. The numbered arrows (1–22) represent two-color arrays, the arrowheads (tails) indicate Cy5 (Cy3) dye. Light (dark) boxes symbolize RNA from hygienic (control) bees. Pool size (1, 2, 4) and mixture (Roman numerals) are shown in each box. Tissues are abbreviated as MB (mushroom body), AL (antennal lobe), and ANT (Antennae). Boxes in the same column share the same biological effect, indicated as pool compositions 1 to 6
Design of the honey bee experiment. Scheme for the design of the two-color microarray experiment with honey bees. The numbered arrows (1–22) represent two-color arrays, the arrowheads (tails) indicate Cy5 (Cy3) dye. Light (dark) boxes symbolize RNA from hygienic (control) bees. Pool size (1, 2, 4) and mixture (Roman numerals) are shown in each box. Tissues are abbreviated as MB (mushroom body), AL (antennal lobe), and ANT (Antennae). Boxes in the same column share the same biological effect, indicated as pool compositions 1 to 6Here M is the vector of log-ratios () for one transcript with dimension n, equal to the number of arrays. The design matrix X for the fixed effects links observations to the overall mean μ (which includes the dye effect, i.e. the difference of red and green channel), the differences between the behaviors (hygienic minus control) and two differences between tissues (b12 for MB minus AL, b23 for AL minus ANT). The latter effect has been included since data from all tissues were jointly analyzed due to the limited number of arrays. The random effect u1 for each sample composition has a variance structure determined by and . The variance structure of the second random effect u2 for the blending of individuals is generated by and . Both design matrices for the random effects differ, however, from experiments with one-color arrays: each row of Z1 and Z2 contains two non-zero elements (as opposed to a single one) in order to model the differences between effects with entries of 1 for the red and −1 for the green channel. The residual errors are again assumed to be stochastically independent and include the technical errors created through the hybridization, imaging, and scanning of each array.
Human data
The human data was taken from the GC6 (Grand Challenge in Global Health no. 6 - Biomarkers of protective immunity against Tuberculosis) project. For the project data, please see Maertzdorf et al. [5] and Jacobsen et al. [6]. One focus of this project was to identify immune system differences between people who were exposed to Tuberculosis but never became sick and those who developed severe symptoms. Therefore, as a part of this larger study the three classes TST+, TST− and TB were compared, where TST stands for the tuberculosis skin test (+ and - indicate positive and negative results, respectively) and TB for acute tuberculosis. Overall, the data set consists of samples from 55 humans in 16 pools of three and in ten single samples, which were labeled on 30 two-color arrays. In the sample composition, one also sees correlations between pools in three cases, where individuals were used more than once, i.e. in different pools (see matrix ). For each observation we modeled fixed effects for the mean (including dye effect) and treatment (3 levels) as well as random effects of sample composition and imperfect blending. Because there were two samples on each array, the design matrix for the composition of the samples had two entries per row, as presented in the Additional files 1. Each pool was built only once, so is a diagonal matrix with dimensions 16×16 and entries . The random effects of imperfect blending were assigned to measurements via , with two non-zero entries per measurement.This study was approved by ethical committees in both Stellenbosch (South Africa) and Berlin (Germany) and written informed consent has been obtained from all study participants (details in [6]).
Simulated data
The relationship between the variance of a random effect of a pool and deviations from the homogene aliquots of individuals in a pool sample, given in Eq. (3), is based on a theoretically derived approximation [2]. Furthermore, true proportions of aliquots are not available. Therefore, the equality of the estimated variance component and the product of variances was checked by fitting the model to simulated data, in order to assay the estimations when the true state of nature is known.By setting the vector of individual gene expressions of the individuals of a pool and w the vector of weights (proportions of individuals in the pooled RNA of a joint sample), we calculated a value for true gene expression on the log-scale asThe technical errors, distributed as , were then added. Note that, due to (1), each observation is composed by the fixed effects X=μ, the distortion due to biological variation and the difference generated by imperfect blending , plus the log-bias . For the simulation of weights the Dirichlet distribution with parameters , i=1,…,γ was used. Then, , and the expectation of each weight is . Therefore, the variance of the weights - theoretically - is . Using the approximation for the variance of weights w from [2], the Dirichlet parameters a can be chosen in order to obtain weights with a given variance .Various proportions of transcripts (0, 1/3, 1) were simulated as affected by imperfect blending. In order to investigate the distribution of the RLRT-statistic under the null hypothesis (), the pooling technical variance was set to zero for all transcripts. Then, one third of the transcripts were simulated with imperfect blending, as well as data where all transcripts contained these effects.As a test case, further simulations were tailored for a comparison of models with regard to the power to detect differential expression in the presence of imperfect pooling at all loci. Variances were set to , and according to the estimations from the mouse data. This was simulated with 100 repetitions. An experiment consisting of 60 individuals from two equally-sized treatment groups was simulated, in a 44 one-color microarray setting. The observations generated were both from single individuals (20) and pools of size five (24). The individual values used in the first two pools of each line were also used as single individuals. For the full details of the design, please see the description of the mouse data set above, which has an identical structure. For each of the 9000 transcripts, a mean expression level was randomly chosen from a uniform distribution over the interval [8,14]. A subgroup of 3000 transcripts was randomly chosen to be differentially expressed between both treatment groups. For each of these, a mean treatment effect was sampled from a uniform distribution over the interval [0.5,1.5] with a random sign ∈{−1,1}. False positive and negative test results were then evaluated using the mean number of transcripts, averaged over all 100 repetitions.
Statistical analyses
Three variance components were considered: first, biological variance (); second, blending error variance (); and third, residual variance (). Similar models that lack the second variance component have been used previously (e.g. [7]). Transcripts were excluded from analyses if the log-expressions of both groups were smaller than eight (corresponds to 256 at the original scale), which is frequently considered to be a threshold for meaningful gene expression. This resulted in 8554 observations for the mouse data, 6264 for rats, 13,761 for bees and 12,348 for the human data set. An EM-REML algorithm was used to estimate the variance components. Then the mixed model equations
where and , were solved for the estimates of the fixed and random effects and the REML-log-likelihood was calculated.For each transcript, a residual likelihood ratio test (RLRT) was used to test the null hypothesis , thereby assuming a half-half mixture of a -distribution and a point mass at zero (see e.g. [8]). According to this assumed distribution of the test statistic, the distribution of p-values from all transcripts in one experiment under the null hypothesis deviates from the uniform distribution (see Fig. 2). The proportion of transcripts with a relevant blending error variance was estimated as . Therein, the estimated proportion of true null hypotheses () was estimated as described in Dabney and Storey [9]. The proportion was then compared with the proportion of transcripts simulated without blending errors. After correcting all p-values according to a false discovery rate (FDR) of 5 %, the transcripts with a significant RLRT were determined. Beyond that, we evaluated the proportions of the estimated variance component in relation to the total variance.
Fig. 2
Estimates of blending error variance for simulated data. Log-estimates of the blending error variance
(left) and p-values (right) of RLRT () for simulated data. Top: perfectly blended individuals were simulated. The p-values of the interval [0,0.5) are uniformly distributed and nearly half of the transcripts have a p-value of 1. Middle: 3000 out of 9000 transcripts affected by imperfect blending of individuals. Bottom: all transcripts were simulated with imperfect blending
Estimates of blending error variance for simulated data. Log-estimates of the blending error variance
(left) and p-values (right) of RLRT () for simulated data. Top: perfectly blended individuals were simulated. The p-values of the interval [0,0.5) are uniformly distributed and nearly half of the transcripts have a p-value of 1. Middle: 3000 out of 9000 transcripts affected by imperfect blending of individuals. Bottom: all transcripts were simulated with imperfect blendingThe practical relevance of the variance component for imperfect blending of samples was further investigated by comparing the number of transcripts identified as differentially expressed in different treatment levels by means of the full model (m2, Eq. 1) and the null model (m1) y=X+Z1u1+e without a random effect of imperfect blending. Degrees of freedom for the applied F-Tests of fixed effects in mixed models were adjusted according to Kenward and Roger [10]. In order to account for multiple testing, an FDR of 5 % was applied to the p-values of the latter F-tests.For the mouse data set, the normalization was done with the gcrma method [11]. Loess- and quantile normalization [12] was used for the two-color array data. The rat data set was downloaded as normalized.The open-source statistical programming package R [13] was used to implement an EM-REML algorithm for the estimation of all three variance components. The formulas for the expectation and maximization steps can be obtained from e.g. Mrode and Thompson [14]. Convergence of the EM algorithm was assumed when the conditionwas fulfilled [15], where ε=10−8 and is the vector of estimates of the variance components in the n-th iteration. False discovery rates were computed with the help of the R-package qvalue [16]. In the case of p-values from RLRT test statistics, the ’bootstrap’ option was used to estimate π0, as suggested by Storey [17].
Results and Discussion
Simulated data sets
First, the results of the RLRT for the blending error variance component are shown for the case of the validity of the null hypothesis (). Here, a uniform distribution of p-values can be observed on the interval [0,0.5) as expected (see Fig. 2, topright). The Distributions of log-estimates of (Fig. 2, left panels from top to bottom) show an increasing proportion of large values, in full accordance with the increase in the simulated proportions of transcripts with a relevant blending error variance (which was 0, 1/3 and 1). The corresponding p-values (right panels of Fig. 2, top to bottom) fairly mirror the same trend. The estimates for approximated the simulated proportions of affected transcripts well. However, when it came to the identification of individual transcripts, their number clearly lagged behind the proportions present in the data. Corresponding results are shown in Table 2.
Table 2
Number of transcripts with non-zero blending error variance
Results of the restricted likelihood ratio tests of the hypothesis for transcripts exceeding the minimum expression level (crit. >8). Numbers of transcripts with a significant variance component for imperfect blending (sign VC) were counted according to the FDR correction level of 5 %. is the estimated proportion of transcripts with . Simulated data sets s1, s2 and s3 refer to scenarios where none, one third, and all transcripts were associated with a non-zero blending error variance component
Number of transcripts with non-zero blending error varianceResults of the restricted likelihood ratio tests of the hypothesis for transcripts exceeding the minimum expression level (crit. >8). Numbers of transcripts with a significant variance component for imperfect blending (sign VC) were counted according to the FDR correction level of 5 %. is the estimated proportion of transcripts with . Simulated data sets s1, s2 and s3 refer to scenarios where none, one third, and all transcripts were associated with a non-zero blending error variance componentDifferences in both models’ abilities to find differential expression in the simulated data sets were also observed (Table 3). The null model yielded an average of 3407 expressed transcripts declared as differentially expressed, compared to 3157 from the full model. The average shared number is 3128, but the 3000 simulated as differentially expressed in a total of 9000 transcripts was clearly outbid by both models. Figure 3 shows the average numbers of four sets of transcripts and their intersections: the set of transcripts with a simulated differential expression, one set of transcripts identified as differentially expressed for each of both models, and the set of transcripts, which were identified as connected with an attributable (larger than zero in terms of FDR) blending error variance. Upon counting the numbers in the intersection regions which corresponded to true discoveries, a similarly high power for both models was observed. Only 7 (m1) and 10 (m2) of the transcripts simulated as differentially expressed have not been found. But, adding the numbers which correspond to false discoveries yielded a value of (1+25+64+77)/6000=0.028 for m2 and (65 +208 + 64 + 77)/6000=0.069 for m1. This is clearly larger than 5 %, the chosen level of permitted false discoveries. The number of transcripts incorrectly labelled as differentially expressed in the group of transcripts with a significant blending error variance was inflated by a factor of about three for m1 (285) in comparison with m2 (102).
Table 3
Detection of differential expression by model and data set
Number of
Data set
transcripts
Simulated
Experimental
identified
s1
s2
s3
Mouse
Rat
Human
m1 & m2
3112
3119
3128
3344
1636
787
m1
48
113
279
504
141
350
m2
4
13
29
516
12
154
Number of transcripts identified as differentially expressed at an FDR of 5 % by data set and model. Simulated data sets s1, s2, and s3 refer to scenarios where none, one third, and all transcripts were associated with a non-zero blending error variance component. The number of transcripts identified with both models is indicated by m1 & m2, transcripts identified solely with the null model (m1) or the full model (m2) are shown in the second-to-last and the last line
Fig. 3
Sets of DEGs and transcripts with significant RLRT for repeatedly simulated data. Sets of differentially expressed transcripts (DEGs) for both models and coincidences of transcripts with a significant variance component for imperfect blending. These were averaged over 100 repetitions of the simulated experiment based on the mouse design and variance components and (all transcripts with effects for imperfect blending). The average counts of the sets of differentially expressed transcripts are labeled with ’DEG m1’ for the null model, ’DEG m2’ for the full model, ’VC significant’ for transcripts with a significant blending error variance, and ’DEG sim’ for the transcripts simulated as differentially expressed
Sets of DEGs and transcripts with significant RLRT for repeatedly simulated data. Sets of differentially expressed transcripts (DEGs) for both models and coincidences of transcripts with a significant variance component for imperfect blending. These were averaged over 100 repetitions of the simulated experiment based on the mouse design and variance components and (all transcripts with effects for imperfect blending). The average counts of the sets of differentially expressed transcripts are labeled with ’DEG m1’ for the null model, ’DEG m2’ for the full model, ’VC significant’ for transcripts with a significant blending error variance, and ’DEG sim’ for the transcripts simulated as differentially expressedDetection of differential expression by model and data setNumber of transcripts identified as differentially expressed at an FDR of 5 % by data set and model. Simulated data sets s1, s2, and s3 refer to scenarios where none, one third, and all transcripts were associated with a non-zero blending error variance component. The number of transcripts identified with both models is indicated by m1 & m2, transcripts identified solely with the null model (m1) or the full model (m2) are shown in the second-to-last and the last lineFurthermore, in a series of simulations, the pooling technical variance was varied within the range of (0,2.7]. A plot of the obtained estimates of against the simulated values (see Additional file 2) shows nearly perfect consistency. The exception is some upward bias for very small simulated values, which can be attributed to the well-known properties of the REML-method [18].Therefore, it can be concluded at the very least that tests for differential expression with the m1 model tend to be too optimistic, depending on the given experimental conditions. To summarize, should the model contain the additional random effect of imperfect blending, the statistical analysis yields results which agree very well with the simulated characteristics.Histograms of log-transformed estimations of the variance components due to imperfect blending are shown in Fig. 4. Estimates range from nearly zero (10−6) to less than one hundred (102). A clear bimodal distribution can be observed in all cases, where the left part of each distribution (values less than approximately 10−3) represents very small values close to zero while the other part represents more substantial values. In the mouse and the bee data, the proportion of transcripts with substantially large values clearly exceeds the proportion of small values. For the human data, the proportion of small estimates also prevails somewhat, while a balance between minor and substantial values can be observed for the rat data. This is also reflected in the average (over all transcripts) of all three variance components obtained with the reduced (m1) and the full (m2) models, as shown in Table 4. In light of the averages, the inclusion of a blending error variance had the consequence of a more or less reduced residual variance, most pronounced in the mouse and honey bee data. In the human data, the average residual variance remained almost constant, yet the average biological variance decreased - a phenomenon not observed in the other data sets. Distributions of the size of relative to the total variance of a standard observation - with respective pool sizes of 5, 3, 4, and 3 for mouse, rat, bee and human data are given in Fig. 5 (right, top to bottom). All distributions exhibit a clear spike near zero, followed by estimates that nearly exceed the full range of variance ratios. The rat data are an exception; hardly any values larger than 0.6 were observed.
Fig. 4
Estimates of blending error variance in empirical data. Histogram of log-estimates of the variance component for the experimental data sets mouse, rat, bee, and human
Estimated variance components for residuals (), biological effects (), and imperfect blending () - averaged over all analyzed transcripts for the null model (m1) and the full model (m2)
Fig. 5
p-values of RLRT and variance ratios. For each experimental data set, a histogram of p-values of the likelihood ratio test statistic for the test of are shown (left), as well as histograms of the variance components for imperfect blending, expressed as the proportion of the total variance (right) of a standard observation. y-axis: count of transcripts
Estimates of blending error variance in empirical data. Histogram of log-estimates of the variance component for the experimental data sets mouse, rat, bee, and humanp-values of RLRT and variance ratios. For each experimental data set, a histogram of p-values of the likelihood ratio test statistic for the test of are shown (left), as well as histograms of the variance components for imperfect blending, expressed as the proportion of the total variance (right) of a standard observation. y-axis: count of transcriptsMean estimated variance componentsEstimated variance components for residuals (), biological effects (), and imperfect blending () - averaged over all analyzed transcripts for the null model (m1) and the full model (m2)These impressions are mirrored by the distributions of p-values from RLRT-tests for the hypothesis of a non-existing () blending error variance (left panels in Fig. 5, top to bottom). The number of individual transcripts, which could be associated with a non-zero blending error variance at a false discovery rate of 5 %, varied strongly between data sets. There were 4329 of such transcripts in the mouse data and 7093 in the honey bee data, while only six were identified in the rat data and none at all in the human data (Table 2). These high numbers are consistent with considerable estimates for the fraction () of non-zero variances in mouse () and honey bee () data (Table 2). Note that the respective estimated proportions were and in the rat and human data (Table 2), also indicating the existence of non-zero blending error variances in these two data sets, though almost no particular non-zero variance could have successfully been identified at the chosen false discovery rate of 5 %.Counts of differentially expressed transcripts detected with both models are shown in Table 3. About half of all transcripts analyzed were declared differentially expressed in the mouse data. About five hundred were exclusively detected with one of both models: 504 with the null model and 516 with the full model. The list of the top 100 transcripts - ranked by their p-values - showed a large dissimilarity as indicated by a value of 0.11 for Kendall’s correlation test. In the rat data, 1636 differentially expressed transcripts were jointly identified by both models, while 141 were solely found with the help of m1 and 12 with m2. No numbers appear in Table 3 for the honeybee data, as no differentially expressed transcripts were found. Finally, there were 1137 differentially expressed transcripts from the null model in the human data, from which only 787 were ’confirmed’ by the full model.
Conclusions
In light of the large numbers of blending error variances diagnosed as greater than zero in the mouse and honey bee data, the practical relevance of this second kind of technical error has been clearly demonstrated. In both other data sets, estimates of , the proportion of positive blending error variances, may be taken as an indicator of their existence, though hardly any particular values could be identified, presumably due to a lack of power. As demonstrated mainly by simulation, there are also consequences for the detection of differentially expressed transcripts, in which the nominal FDR-level was shown to be too optimistic when the blending error variance was not taken into account. Therefore, we strongly recommend the application of adequate models (as described in [3]) including random blending effects and their variances when observations from pools of different sizes are to be jointly analysed.
Availability of supporting data
The mouse and honey bee data sets have been deposited at the Gene Expression Omnibus (GEO) website (www.ncbi.nlm.nih.gov/geo), under the joint accession no. GSE72944. The human data is part of the data set GSE6112 and the rat data has the GEO accession no. GSE2331.
Authors: J Maertzdorf; D Repsilber; S K Parida; K Stanley; T Roberts; G Black; G Walzl; S H E Kaufmann Journal: Genes Immun Date: 2010-09-23 Impact factor: 2.676
Authors: Marc Jacobsen; Dirk Repsilber; Andrea Gutschmidt; Albert Neher; Knut Feldmann; Hans J Mollenkopf; Andreas Ziegler; Stefan H E Kaufmann Journal: J Mol Med (Berl) Date: 2007-02-23 Impact factor: 5.606