Literature DB >> 18767902

Key issues in conducting a meta-analysis of gene expression microarray datasets.

Adaikalavan Ramasamy¹, Adrian Mondry, Chris C Holmes, Douglas G Altman.

Abstract

Entities: Chemical Disease Gene Species

Year: 2008 PMID： 18767902 PMCID： PMC2528050 DOI： 10.1371/journal.pmed.0050184

Source DB: PubMed Journal: PLoS Med ISSN： 1549-1277 Impact factor: 11.069

× No keyword cloud information.

Microarray technology measures the mRNA levels of tens of thousands of genes in tissue samples simultaneously in a high-throughput and cost-effective manner. Since its introduction over a decade ago [1], it has found widespread use in the fields of molecular genetics and functional genomics. It has been applied in order to understand underlying biological mechanisms [2], to discover novel subgroups of diseases [3-5], to examine drug response [6,7], to classify patients into disease groups [3], and to predict disease outcomes [8-10]. Some molecular signatures discovered with microarray technology are now being evaluated in prospective randomized clinical trials [11,12]. Despite their great promise, microarray-based studies may report findings that are not reproducible [13] or not robust to the mildest of data perturbations [14,15]. Common causes include improper analysis or validation, insufficient control of false positives, and inadequate reporting of methods [16,17]. The situation is exacerbated by the small sample sizes relative to large numbers of potential predictors; typically tens of thousands of probes are investigated in only tens or hundreds of biological samples. Generalizability across studies [18] also needs to be assessed before considering widespread practical application. For example, the findings of a study using historical controls from a particular geographical region may not be applicable to newer cohorts of patients or different regions. Combining information from multiple existing studies can increase the reliability and generalizability of results. The use of statistical techniques to combine results from independent but related studies is called “meta-analysis.” However, the term meta-analysis is also widely used to describe the whole study process (as we do here), not just the statistical techniques, for which an alternative term is a “systematic review.” Through meta-analysis, we can increase the statistical power to obtain a more precise estimate of gene expression differentials, and assess the heterogeneity of the overall estimate. Meta-analysis is relatively inexpensive, since it makes comprehensive use of already available data. Indeed, the advantages of meta-analysis of gene expression microarray datasets have not gone unnoticed by researchers in various fields [19-28]. Several meta-analysis techniques have been proposed in the context of microarrays [19,22,29-40]. However, no comprehensive framework exists on how to carry out a meta-analysis of microarray datasets. There is a considerable literature to guide the whole review process, including statistical methods for clinical trials and epidemiological studies [41-43]. As yet, however, there is little guidance for conducting a meta-analysis of microarray datasets. Therefore, in this paper, we disentangle this complex topic and identify seven distinct key issues specific to meta-analysis of microarray datasets, each comprising several steps. The first five issues are related to data acquisition and curation. We discuss the sixth issue—choosing a meta-analysis technique—using the two-class comparison as an example. The seventh issue of analyzing, presenting, and interpreting data is discussed briefly using an illustrative meta-analysis of 25 datasets. We provide a practical checklist, shown in Table 1, that should enable the reader to make informed decisions on how to conduct a meta-analysis, and to understand better the underlying concepts that make this approach so attractive for analysis of microarray data.

Table 1

A Checklist for Conducting Meta-Analysis of Microarray Datasets

Improvements in microarray technology and its increasing use have led to the generation of many highly complex datasets that often try to address similar biological questions. Meta-analysis, a statistical approach that combines results from independent but related studies, is a relatively inexpensive option that has the potential to increase both the statistical power and generalizability of single-study analysis. Meta-analysis of microarray datasets, and genomic data in general, is desirable, and is much enhanced when raw data are available. We identify seven key issues and suggest a stepwise approach in conducting meta-analysis of microarray datasets: (1) Identify suitable microarray studies; (2) Extract the data from studies; (3) Prepare the individual datasets; (4) Annotate the individual datasets; (5) Resolve the many-to-many relationship between probes and genes; (6) Combine the study-specific estimates; (7) Analyze, present, and interpret results. We give practical guidance to assist those conducting or reviewing such a meta-analysis. The approaches presented here can be adapted to other areas of high-throughput biological data analysis.

Issue 1: Identify Suitable Microarray Datasets

The first step in any research project is to clearly define the objectives (Step 1). Meta-analysis could be used to identify genes expressed differentially between two groups [19,22,29,30,32,33,35,37,38,40], to robustify cross-platform classification [34], to identify overlaps between samples from heterologous datasets [30], to identify co-expressed genes, or to reconstruct gene networks [31,36,39]. Having a detailed review protocol can further help to clarify the research objectives and methods and to minimize bias from unplanned data-driven analysis. We suggest developing the review protocol by outlining the solutions to the steps in the checklist shown in Table 1. For example, Step 7 (Check the selected study against inclusion-exclusion criteria) might be expanded in the review protocol as follows: “Two reviewers will check the eligibility of the identified studies, with disagreements resolved by a third reviewer. A log of excluded studies, with reasons for exclusions, will be maintained.” The protocol can be turned into a useful project management tool by incorporating timelines and division of labor. The inclusion-exclusion criteria (Step 2) are eligibility criteria for studies that will help achieve the stated objectives. These criteria could be biological (e.g., specific disease, type of outcome, type of tissues) or technical (e.g., density of array, minimum number of arrays). The retrieved articles must be evaluated as to whether they met the inclusion criteria. Once the inclusion-exclusion criteria have been defined, one needs to perform a comprehensive literature search (Step 3) to identify suitable studies, usually based on appropriate keywords for automated queries. We recommend searching all the major online repositories of abstracts listed in Table 2 to maximize data acquisition. Reading the latest review articles and directly contacting researchers in relevant fields (Step 5) may help to identify both work potentially missed by automated search, and ongoing research efforts with possibly unpublished data.

Table 2

Useful Internet Resources to Identify Studies for Meta-Analysis of Microarray Studies

In the case of microarrays, one should also search public microarray data repositories [44-46] recommended by the Minimum Information About a Microarray Experiment (MIAME) requirements [47,48], as well as a few more specialized repositories [49,50], listed in Table 2 (Step 4). Having identified potentially eligible studies from abstracts, one needs to retrieve the articles, where available, and confirm eligibility (Step 7). This process may best be done by at least two people.

Issue 2: Extract Data from Studies

Before we consider how to extract the data, we need to first decide what type of data to extract. This partially depends on the choice of meta-analysis technique (Issue 6), but the underlying principles will be discussed here. Figure 1 shows the four types of data arising from microarray analysis.

Figure 1

The Flow from Data to Information to Biological Knowledge in Gene Expression Microarray Research

The image files are obtained from optical scanning of hybridized samples.

The Flow from Data to Information to Biological Knowledge in Gene Expression Microarray Research

The image files are obtained from optical scanning of hybridized samples. A published gene list (PGL) represents the genes that are declared as differently expressed in a given study. PGLs are often presented in the main or supplementary text of microarray-based studies and are thus easy to obtain. Unfortunately, such PGLs are of limited use for meta-analysis since they represent only a subset of the genes actually studied, and information from many genes will be completely absent. Furthermore, PGLs depend heavily on the preprocessing algorithm, the analysis method, the significance threshold, and the annotation builds used in the original study, all of which usually differ between studies [51]. Thus individual patient-level data (IPD), which for microarrays represents the measurement for every probe in every hybridization, are far more useful. Ioannidis et al. [52] discuss further the advantages of a meta-analysis using IPD versus PGLs. The gene expression data matrix (GEDM) represents the gene expression summary for every probe and sample and is thus ideally suited as input for meta-analysis. Published GEDMs, however, are unsuitable for meta-analysis because they depend on the choice of the preprocessing algorithms used, which may produce non-combinable results. At present, image files are neither routinely deposited in public microarray repositories nor technologically uniform enough to be used as input for meta-analysis. In order to eliminate bias due to specific algorithms used in the original studies, and to allow consistent handling of all datasets, we recommend obtaining the feature-level extraction output (FLEO) files (Step 8), such as CEL and GPR files, and converting them to GEDMs in a consistent manner (see Issue 3). FLEO files are likely to be available, especially for newer studies, because the widely supported MIAME requirements [48] now ask authors to make the FLEO data available in public microarray repositories. If the main text and supplementary information do not state the location of the FLEO data, then one should try searching public microarray repositories or the research group's Web page before contacting the authors (Step 9). If multiple publications use overlapping sets of data, one should identify and use the most comprehensive dataset available (Step 10), and combine any datasets that were split for algorithm training and validation purposes.

Issue 3: Prepare Datasets from Different Platforms

FLEO data have to be converted into GEDMs, which can then be used as input for the meta-analysis. The same preprocessing algorithm should be used for multiple studies conducted on the same platform. To combine studies from different platforms, which may have different designs and thus have different options of preprocessing algorithms, it is desirable to try to identify comparable preprocessing algorithms. There are many microarray platforms, but we focus on the most popular: the Affymetrix platform and a set of platforms that could be generically classified as “two-color technology” platforms. Before the preprocessing step, one may wish to first identify and remove any arrays that are of poor quality (Step 11). There are many comprehensive, free, and open-source packages in BioConductor [53] for quality assessment including arrayMagic [54] for the two-color technology platform and Simpleaffy [55], and affyPLM [56] for the Affymetrix platform. Next, all good quality arrays should be preprocessed consistently to remove any systematic differences (Step 12). This is an important stage, since preprocessing directly affects the gene expression measurements, and thus all subsequent steps. In practice, researchers are likely to combine datasets from multiple platforms and there are very few preprocessing algorithms that can be applied universally, such as the variance stabilizing normalization [57], which accounts for the dependence between variance and mean of the output expression measure. By contrast, it is more common to use different preprocessing algorithms for each platform [58-61]. Unfortunately, there is currently no consensus on which preprocessing algorithm(s) produce comparable expression measurements across different platforms. Third, one may also want to check and correct for any batch effects (Step 13), especially in large studies. Unsupervised visualization [62] can help to identify any grouping caused by experimental factors. Fourth, one needs to decide whether to use all available probes on the array, or a filtered set of probes (Step 14). It is common to filter out probes that have visible defects (e.g., using quality flags), probe-set calls (e.g., absent/present calls from MAS 5.0 preprocessing algorithm), or probes that show little variation (e.g., using minimum coefficient of variation) in single-study analysis. However, it is unclear if such filtering is beneficial from a meta-analysis perspective. Fifth, one needs to deal with multiple technical replicates (i.e., multiple measurements from the same biological subject) if relevant (Step 15). These should not be treated as independent observations. One approach is to select one of the replicates at random. Alternatively, one can average the replicates. If we assume that all technical replicates have similar array quality, then a simple average or median can be used. Finally, one could check that the processed expression values from multiple platforms are comparable (Step 16). Microarray platform manufacturers typically include housekeeping genes or negative controls, which are genes expected to be transcribed at a constant level, and may be used for this purpose. Additionally, one may use a visualization technique such as multidimensional scaling [63,64] to inspect for any clustering of arrays by studies.

Issue 4: Annotate the Individual Datasets

Microarray probe designers use short, highly specific regions in genes of interest because using the full-length gene sequence can lead to non-specific binding or noise. Different design criteria lead to the creation of multiple probes for the same gene. Therefore, one needs to identify which probes represent a given gene within and across the datasets. One option is to cluster the probes based on the sequence data (Step 17a) using the BLAST algorithm [65], for example, by using the Ensembl browser [66] (Step 18a). It has been shown that sequence-matched datasets can increase cross-platform concordance [67]. Such methods can also accommodate Affymetrix probe-set redefinitions [68], which better addresses the problem of alternative splicing. However, the probe sequence may not be available for all platforms and the clustering of probe sequences could be computer intensive for very large numbers of probes. Alternatively, one can map probe-level identifiers such as I.M.A.G.E. CloneID, Affymetrix ID, or GenBank accession numbers to a gene-level identifier such as UniGene, RefSeq, or Entrez Gene ID. UniGene [69], which is an experimental system for automatically partitioning sequences into non-redundant gene-oriented clusters, is a popular choice to unify the different datasets. For example, UniGene Build #211 (released March 12, 2008) reduces nearly 7 million human sequences to 124,181 clusters. To translate probe-level identifiers to gene-level identifiers, one can use either the annotation packages in BioConductor [53] or Web tools such as SOURCE [70] and RESOURCERER [71] (Step 18b). We suggest using I.M.A.G.E. CloneID [72] or Affymetrix ID first, if available, as they are more sequence-specific (Step 17b). The same mapping build, ideally the most recent, should be used for all datasets to avoid inconsistencies between releases [73,74].

Issue 5: Resolve the Many-to-Many Relationships between Probes and Genes

In this section, we will refer to either the sequence cluster ID or the gene-level identifier (such as UniGene ID or RefSeq ID) used to annotate the datasets, simply as the GeneID. Many probes can map to the same GeneID because of the clustering nature of the UniGene, RefSeq, and BLAST systems involved, or because the microarray chips used contain duplicate spotted probes. On the other hand, a probe may map to more than one GeneID if the probe sequence is not specific enough. Sometimes, a probe has insufficient information to be mapped to any GeneID, and we recommend omitting these from further analysis (Step 19). Inconsistencies between annotation databases or releases and software [73-75] complicate the matter further. The illustrative example of a meta-analysis of 25 datasets presented later in this paper contains 537,686 probes. Of these probes, 47,154 (or 8.7%) could not be mapped to any UniGene ID, while 29,774 (or 6.1%) of the remaining probes mapped to more than one UniGene ID. This “many-to-many” relationship can fragment the available information for meta-analysis. For example, a probe could map to GeneID X in half of the datasets but to both GeneIDs X and Y in the remaining datasets. Software that performs automated meta-analysis on several thousand genes will treat such probes as two separate gene entities, failing to fully combine the information for GeneID X from all studies. A simple approach is to use only the probes with one-to-one mapping for further analysis, but this means losing information, and so is not recommended. In the example above, potentially half of the information for GeneID X (i.e., from probes mapping to both X and Y) will be ignored. Therefore, when relevant, we recommend replacing probes with multiple GeneIDs by a new record for each GeneID (Step 21). This greedy approach of “expanding” the probes with multiple GeneIDs ensures the software uses all possible information. On the other hand, how should one deal with multiple probes that map to the same GeneID within a given study? Grützmann et al. [24] treated these as independent observations in the meta-analysis, but we recommend summarizing them (Step 22) into a single representative value per key within a study. Several options are available to summarize information in this situation. First, one could select a probe at random, but this means losing information. Simply averaging the expression profiles before proceeding is not desirable either, as different probe sequences have different binding affinity, giving rise to the problem of different measurement scales. Thus, it is preferable to work with standardized measures such as the p-value or effect size. When working with standardized measures, one could select the most extreme value, since it is least likely to occur by chance. For example, Rhodes et al. [19] used the smallest p-value of the probes that corresponded to each GeneID. A more sophisticated approach, when working with effect size, is to meta-analyze the probes. Recently, the MicroArray Quality Control (MAQC) project [61] described another alternative to resolve the many-to-many mapping. For a probe that mapped to multiple RefSeq IDs, the authors selected the RefSeq ID that was annotated by TaqMan assays and, secondarily, one that was present in the majority of platforms. Next, if many probes mapped to a given RefSeq ID, they chose the one closest to the 3′ end of the gene. After resolving for the many-to-many relationship by expanding and summarizing probes, we are left with one summary statistic per GeneID per study. In the next step, we proceed with meta-analyzing the summary statistic for each GeneID in turn across the studies.

Issue 6: Choosing a Meta-Analysis Technique

The choice of meta-analysis technique depends on the type of response (e.g., binary, continuous, survival) and objective. In this article, we focus on a fundamental application of microarrays: the two-class comparison where the objective is to identify genes expressed differentially between two well-known conditions. There are four generic ways of combining information in such a situation. (For clarity of presentation, we indicate the steps only for the inverse-variance technique.)

Vote counting.

Here, one counts the number of studies in which a gene was declared significant [76]. For very small numbers of studies, the results can be visualized using a Venn diagram [77]. Vote counting in the context of microarrays is perhaps best described by Rhodes et al. [22], who also suggest calculating the null distribution of votes using permutation testing. Alternatively, one could calculate the significance of the overlaps using the normal approximation to binomial as described in Smid et al. [30]. Yang et al. [35] extend both of these techniques into the concept of meta-analysis pattern matches.

Combining ranks.

Unlike vote counting, this technique accounts for the order of genes declared significant. DeConde et al. [37] use three different approaches to aggregate the rankings of, say, the top 100 lists (the 100 most significantly up-regulated or down-regulated genes) from different studies. Two of the algorithms use Markov chains to convert the pair-wise preference between the gene lists to a stationary distribution; the third algorithm is based on an order-statistics model. Zintzaras and Ioannidis [40] proposed METa-analysis of RAnked DISCovery datasets (METRADISC), which is based on the average of the standardized rank and has the advantage of incorporating the between-study heterogeneity (sum of squared deviations from the average). The null distributions for the average rank and heterogeneity are then estimated using non-parametric Monte Carlo permutation testing and matched for pattern of occurrence in studies. Hong et al. [38] proposed the RankProd [78], which calculates the product of the rank of pair-wise differences between every biological sample in one group versus another group across the studies.

Combining p-values.

Rhodes et al. [19] use Fisher's sum of logs method [79], which sums the logarithm of the (one-sided hypothesis testing) p-values across k studies for a given gene. The test statistic can be compared against a chi-square distribution with 2k degrees of freedom.

Combining effect sizes.

Choi et al. [29] and others [24,32,80] used the inverse-variance technique [81,82] in the context of microarrays. The first step is to calculate the effect size and the variance associated with the effect size for every gene in every study (Step 20). Effect size can be calculated as the Cohen's d [83], which is the difference in two group means standardized by its pooled standard deviation [84]. Hedges and Olkin (1985) showed that this standardized difference overestimates the effect size for studies with small sample sizes. They proposed a small correction factor to calculate the unbiased estimate of the effect size, which is known as the Hedges' adjusted g. The study-specific effect sizes for every gene are then combined across studies into a weighted average (Step 24). As the name suggests, the study weights are inversely proportional to the variance of the study-specific estimates. Additionally, the integrative correlation technique proposed by Parmigiani et al. [33] could be first used to select only the “reproducible” genes for meta-analysis. First, the correlation profile of gene G is calculated as the correlation between gene G and every other gene in a study. Next, the correlation of correlation profiles of gene G in every pair of studies is computed, and if the average exceeds a certain threshold, the gene is called reproducible. Given the various statistical options for meta-analysis, how should one choose the most suitable technique? We present a series of questions that could help a meta-analyst make an informed choice. First, what are the minimum data required for each technique? Fisher's method, the inverse-variance technique, METRADISC, and the RankProd all require IPD, which are less readily available than PGLs. Vote counting, DeConde and colleagues' algorithms, and combining p-values are techniques that in theory could use the PGLs, but may not be able to do so in practice. For example, most publications report the significant genes or their rankings based on two-sided p-values, while vote counting and rank aggregation techniques require a one-sided p-value. Using p-values from two-sided testing means ignoring the directionality of the significance and may lead one to select genes that are discordant in direction of gene regulation between the studies. As noted earlier in Issue 2, we strongly prefer to use the IPD to minimize the influence of differing methods across datasets. Second, which set of genes does each technique use? Vote counting and rank aggregation techniques (using PGLs) only consider the genes declared significant in the original studies. Thus, these techniques depend on an arbitrary threshold, and completely ignore genes that fall below this selected threshold. By contrast, the rank aggregation technique (using IPD), Fisher's method, and the inverse-variance technique consider information from all available genes. However, it is also important to note that the ranking of genes in an individual study depends on which other genes are included in the chip, and thus can influence the rank aggregation techniques. Since microarrays are often used as a hypothesis generating tool, we would prefer a technique that captures information from as many genes as possible. The third question, related to the previous question, is how does each technique treat frequently studied and rarely studied genes? Newer microarrays chips have more comprehensive sets of genes compared to older chips. Thus some genes will be studied more frequently across the studies than others. For example, Affymetrix version HGU-133 plus 2.0 (released in 2003) contains almost all of 6,065 UniGene IDs available in Affymetrix version HU-6800 (released in 1998), plus a further additional 13,624 UniGene IDs. Ideally, we would prefer a technique that treats a frequently studied and a rarely studied gene equally. Since vote counting and rank aggregation use the genes declared significant in the original studies, they do not account for the frequency of the genes. For example, a gene found significant in four studies and not significant in 16 studies will be favored over a gene found significant in three studies but absent in the other 17 studies. METRADISC accounts for this by matching each gene to the null distribution of genes that have the same absent/present patterns. Although the test statistic for Fisher's method is based on an unstandardized sum, it can address this problem by comparing it to a chi-square distribution where the degree of freedom is determined by the number of studies or by permutation. The inverse-variance technique addresses this problem directly as it calculates a weighted average of the effect sizes. Fourth, what is the ability of each technique to rank the genes, especially if only a small number of studies, say three to five, are available? A ranked list can help researchers to prioritize genes for further testing and validation. The vote counting technique produces very granular results, while other techniques produce results on a much finer scale. Fifth, what is the computational complexity involved for each technique once the datasets have been prepared and annotated? The computing time for meta-analyzing the prepared and annotated GEDM for the 25 datasets in the illustrative example that follows, using vote counting, Fisher's method and inverse-variance technique are approximately two minutes, two minutes, and eight minutes respectively. We used R version 2.5.1 [85] on a Windows-based personal computer with a 1.86 GHz Intel Pentium M processor and 1 GB of RAM memory. Further, any technique that uses PGLs has to extract the information and annotation in a standardized format. The question of computational complexity becomes important, especially when one wants to estimate the null distribution using permutation techniques. We believe that combining the effect sizes using an inverse-variance model is the most comprehensive approach for meta-analysis of two-class gene expression microarrays. In addition to the characteristics discussed above, this method has several other decisive advantages. First, it yields a biologically interpretable discrimination measure—the pooled effect size of differential expression and its standard error. Second, it is the only technique that weights the contribution of each study by its precision, which is related to the study sample size. Third, one is able to use a forest plot [86] to visually investigate the contributions of individual studies and the amount of heterogeneity across datasets. The use of effect size, a unitless measure not dependent on sample size, facilitates the combining of signals from one-color and expression ratios from two-color technology platforms.

Illustrative Example: Differential Gene Expression in Cancer Tissues

We demonstrate one exemplary meta-analysis using a subset of an ongoing meta-analysis where we look at the differences between cancerous tissues relative to normal tissues across various cancer types. This example stops short of discussing the biological significance of the findings, which is beyond the scope of this article. We concisely describe the meta-analysis protocol in Table 3, using the same ordering as in Table 1. Figure 2 shows the data acquisition process, and Table 4 lists the characteristics of the 21 studies included [87-107]. Arrays from the Affymetrix-based studies were preprocessed using the robust multichip average [108], and arrays from two-color technology were LOESS (local regression) normalized [109,110]. All analysis (unless stated otherwise) was carried out in R version 2.5.1 [85] and BioConductor release 2.0 [53]. The R codes are available upon request.

Table 3

Outline of the Illustrative Example of Meta-Analysis

Figure 2

Data Acquisition to Summarize Steps 3–10 in Table 3

In total, 21 studies (6 + 3 + 8 + 4) are included in the meta-analysis. The characteristics of the included studies are given in Table 4.

Table 4

Datasets Used in the Illustrative Meta-Analysis

Data Acquisition to Summarize Steps 3–10 in Table 3

In total, 21 studies (6 + 3 + 8 + 4) are included in the meta-analysis. The characteristics of the included studies are given in Table 4. We chose to combine the effect sizes using the inverse-variance model for the reasons described previously. Note that there are two variants of the inverse-variance technique. The random effects model used differs from the fixed effect model in that it incorporates the between-study heterogeneity into study weights. We use the random effects model in Step 24, where we can expect significant between-study heterogeneity since the studies combined are both biologically (e.g., different tumors) and technically diverse (e.g., different platforms, laboratories). We used the fixed effects used in Step 22 to summarize probes within a study as we can expect a reasonable level of homogeneity within a study. The pooled effect size and its 95% confidence interval for all 16,803 genes can be visualized simultaneously as in Figure 3.

Figure 3

A Summary Plot of the Pooled Effect Size (Black Dots) and Its 95% Confidence Interval (Gray Bars) Sorted by the FDR

The GenBank identifier (if available) for the top five most statistically significant up-regulated and down-regulated genes is shown.

A Summary Plot of the Pooled Effect Size (Black Dots) and Its 95% Confidence Interval (Gray Bars) Sorted by the FDR

The GenBank identifier (if available) for the top five most statistically significant up-regulated and down-regulated genes is shown. The z-statistic (ratio of the pooled effect size to its standard error) for every UniGene ID was compared to a standard normal distribution to obtain the p-value and adjusted for false discovery rate (FDR) [111] (Step 25). Table 5 shows the output from the inverse-variance technique for the top five statistically significant up-regulated and down-regulated genes.

Table 5

The Output from the Inverse-Variance Technique for the Top Five Statistically Significant Up-Regulated and Down-Regulated Genes

At the FDR rate of 1%, we found 168 significantly down-regulated and up-regulated genes. At this rate, we should expect 1% of the significant genes list, and in this case 1.68 and 3.25 in each list respectively, to be false positives. After having identified the genes of most interest, we can proceed as in a traditional meta-analysis and visualize the contribution of individual studies using forest plots (Step 27). Figure 4 shows the forest plot for the most significantly up-regulated (Hs.478481) and down-regulated (Hs.117835) genes.

Figure 4

Forest Plot of the Most Statistically Significant Up-Regulated and Down-Regulated Genes Identified from the Meta-Analysis

We can also proceed as in a typical single-study analysis. For example, using significant genes identified from the meta-analysis, we can use computational tools such as pathway enrichment (Step 28), conduct a literature search, and/or validate them on an alternative technology or on different patient sets (Step 29). In this illustrative example of a meta-analysis, we have shown how the inverse-variance technique can identify consistently up- or down-regulated genes, information that suggests further lines of investigation.

Discussion

Meta-analysis of microarray datasets shares many features with meta-analysis in other areas of health care research. Perhaps the main differences are the large numbers of variables involved and technical complexities of integrating data across multiple platforms. Furthermore, most microarray studies are not prospectively planned and often do not have detailed protocols, but rather tend to make use of existing samples. Table 6 gives an overview of the advantages and disadvantages of various aspects of meta-analysis of microarray datasets. We discuss some of these points below.

Table 6

Advantages and Disadvantages of Various Aspects of Meta-Analysis of Microarray Datasets

Working with FLEO files allows for better standardization of information and the incorporation of data from unpublished studies, but it also requires significant effort to acquire and manage the datasets due to increased data complexity. This is further hampered by data sharing issues ([112-115] and Ramasamy et al., unpublished data). Sample matching between “cases” and “controls” may be a problem in meta-analysis as much as in single studies. Leaving aside the choice of biological equivalency of cases and controls, the numerical problem is highlighted by the imbalance of samples between the two groups in the illustrative example (see Table 4). For example, while the proportion of normal to total biological samples in prostate and lung cancer (the two tissues with the greatest number of biological samples in the illustrative example) is far less than half, the proportions do vary (105 out of 452 or 23.2% in prostate cancer versus 60 out of 356 or 16.9% in lung cancer). Another major concern associated with meta-analysis in many clinical and epidemiological studies is the problem of publication bias, which is a consequence of selectively publishing statistically significant and favorable results [116,117]. On the surface, we do not expect to find a publication bias at a gene level in a given study because of the discovery-driven and high-density nature of microarrays. However, anecdotal evidence based on sales figures (J. P. Ioannidis, personal communication) suggests that data from only 10% of all the Affymetrix chips sold are published. The possibility of publication bias in microarray research needs further investigation. Furthermore, within a single-study microarray analysis, the particular choice of down-stream analysis may lead to different results depending on the objective of the study [118,119]. It is unclear to what extent this problem affects meta-analysis of microarrays, even with coherently preprocessed datasets. Finally, the sensitivity of the results from meta-analysis, as with any other research study, should be tested before a final conclusion is reached (Step 26). We did not present any sensitivity analysis for the illustrative example presented here, but there are several possibilities. First, we could investigate sensitivity of the results to the choices we made here (e.g., using probes present in at least five studies). Secondly, we can test if any particular study is particularly influential, by repeating the meta-analysis without each study in turn and comparing the change. Finally, we could test if the inclusion of studies that provide only the GEDM into the meta-analysis along with the studies that provide FLEO data changes the results. In this paper, we have formulated and explored key issues encountered in conducting a meta-analysis of microarray datasets. We considered the available solutions and made some practical recommendations. First, we showed how to obtain suitable datasets by searching the published literature and public microarray repositories. Second, we proposed that using FLEO files allows for better standardization of information. Third, we outlined the issues involved in preparing datasets from multiple platforms. Fourth, we discussed how to match the different datasets using gene-level identifiers. Fifth, we explained how to resolve the problems caused by the many-to-many relationship between the probes and genes by “expanding” probes with multiple GeneIDs and then “summarizing” the multiple probes that correspond to a GeneID within a study. Sixth, we argued that the inverse-variance technique, initially proposed in the microarray context by Choi et al. [29], has many desirable properties over other techniques used for two-class comparison of gene expression microarray studies. Finally, we presented an illustrative meta-analysis of 25 datasets to briefly demonstrate the issue of how to present, analyze, and interpret a meta-analysis of microarray datasets. All of this information is neatly captured in a practical checklist, shown in Table 1.

Glossary

Feature-level extraction output file (FLEO): A file representing the quantification of optical image scans of a microarray chip. Every row in this file gives the pixel-level summaries of foreground and background signals for a probe as well as any quality measure. Examples of FLEO files generated include those with .CEL and .GPR file extensions. Gene expression data matrix (GEDM): A file that contains the summary gene expression from all the FLEO files in a given study. The format is typically a matrix where every row represents a probe and every column represents a hybridization. Individual patient-level data (IPD): In microarray studies, a dataset that provides the gene expression summary for every hybridized sample. Minimum Information About a Microarray Experiment (MIAME): Data-reporting requirements that have been widely adopted by many journals. Preprocessing algorithm: An important step in microarray analysis that tries to minimize systematic variation. It typically consists of background noise correction within an array, normalization between arrays, and a probe-set summary. Probe: The DNA sequence spotted on the microarray surface to represent a gene. For a given gene, many probes can be designed. A probe can ambiguously map to more than one gene if its sequence is not specific enough. Published gene list (PGL): A published list of genes that are declared differently expressed in a given study. It depends on the preprocessing algorithm, analysis method, chosen significance threshold, and annotation build used. Sample: Biological material from a research participant or subject (e.g., a patient or animal) that can be hybridized onto a microarray chip.

102 in total

Review 1. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting.

Authors: Alain Dupuy; Richard M Simon
Journal: J Natl Cancer Inst Date: 2007-01-17 Impact factor: 13.506

2. RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis.

Authors: Fangxin Hong; Rainer Breitling; Connor W McEntee; Ben S Wittner; Jennifer L Nemhauser; Joanne Chory
Journal: Bioinformatics Date: 2006-09-18 Impact factor: 6.937

3. Lack of correct data format and comparability limits future integrative microarray research.

Authors: Ola Larsson; Rickard Sandberg
Journal: Nat Biotechnol Date: 2006-11 Impact factor: 54.908

4. Relation of a hypoxia metagene derived from head and neck cancer to prognosis of multiple cancers.

Authors: Stuart C Winter; Francesca M Buffa; Priyamal Silva; Crispin Miller; Helen R Valentine; Helen Turley; Ketan A Shah; Graham J Cox; Rogan J Corbridge; Jarrod J Homer; Brian Musgrove; Nick Slevin; Philip Sloan; Pat Price; Catharine M L West; Adrian L Harris
Journal: Cancer Res Date: 2007-04-01 Impact factor: 12.701

5. Selective discussion and transparency in microarray research findings for cancer outcomes.

Authors: John P A Ioannidis; Nikolaos P Polyzos; Thomas A Trikalinos
Journal: Eur J Cancer Date: 2007-07-12 Impact factor: 9.162

6. Development and clinical utility of a 21-gene recurrence score prognostic assay in patients with early breast cancer treated with tamoxifen.

Authors: Soonmyung Paik
Journal: Oncologist Date: 2007-06

7. Profiling meta-analysis reveals primarily gene coexpression concordance between systemic lupus erythematosus and rheumatoid arthritis.

Authors: Guilherme L Silva; Cristina M Junta; Stephano S Mello; Paula S Garcia; Diane M Rassi; Elza T Sakamoto-Hojo; Eduardo A Donadi; Geraldo A S Passos
Journal: Ann N Y Acad Sci Date: 2007-09 Impact factor: 5.691

8. Survival prediction of stage I lung adenocarcinomas by expression of 10 genes.

Authors: Fabrizio Bianchi; Paolo Nuciforo; Manuela Vecchi; Loris Bernard; Laura Tizzoni; Antonio Marchetti; Fiamma Buttitta; Lara Felicioni; Francesco Nicassio; Pier Paolo Di Fiore
Journal: J Clin Invest Date: 2007-11 Impact factor: 14.808

9. The Stanford Microarray Database: implementation of new analysis tools and open source release of software.

Authors: Janos Demeter; Catherine Beauheim; Jeremy Gollub; Tina Hernandez-Boussard; Heng Jin; Donald Maier; John C Matese; Michael Nitzberg; Farrell Wymore; Zachariah K Zachariah; Patrick O Brown; Gavin Sherlock; Catherine A Ball
Journal: Nucleic Acids Res Date: 2006-12-20 Impact factor: 16.971

10. Sharing detailed research data is associated with increased citation rate.

Authors: Heather A Piwowar; Roger S Day; Douglas B Fridsma
Journal: PLoS One Date: 2007-03-21 Impact factor: 3.240

246 in total

1. New meta-analysis tools reveal common transcriptional regulatory basis for multiple determinants of behavior.

Authors: Seth A Ament; Charles A Blatti; Cedric Alaux; Marsha M Wheeler; Amy L Toth; Yves Le Conte; Greg J Hunt; Ernesto Guzmán-Novoa; Gloria Degrandi-Hoffman; Jose Luis Uribe-Rubio; Gro V Amdam; Robert E Page; Sandra L Rodriguez-Zas; Gene E Robinson; Saurabh Sinha
Journal: Proc Natl Acad Sci U S A Date: 2012-06-12 Impact factor: 11.205

2. A novel bi-level meta-analysis approach: applied to biological pathway analysis.

Authors: Tin Nguyen; Rebecca Tagett; Michele Donato; Cristina Mitrea; Sorin Draghici
Journal: Bioinformatics Date: 2015-10-14 Impact factor: 6.937

3. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data.

Authors: Jianguo Xia; Erin E Gill; Robert E W Hancock
Journal: Nat Protoc Date: 2015-05-07 Impact factor: 13.491

4. Multi-omic meta-analysis identifies functional signatures of airway microbiome in chronic obstructive pulmonary disease.

Authors: Zhang Wang; Yuqiong Yang; Zhengzheng Yan; Haiyue Liu; Boxuan Chen; Zhenyu Liang; Fengyan Wang; Bruce E Miller; Ruth Tal-Singer; Xinzhu Yi; Jintian Li; Martin R Stampfli; Hongwei Zhou; Christopher E Brightling; James R Brown; Martin Wu; Rongchang Chen; Wensheng Shu
Journal: ISME J Date: 2020-07-27 Impact factor: 10.302

5. Conceptual aspects of large meta-analyses with publicly available microarray data: a case study in oncology.

Authors: Markus Schmidberger; Sabine Lennert; Ulrich Mansmann
Journal: Bioinform Biol Insights Date: 2011-01-23

6. Type I interferon related genes are common genes on the early stage after vaccination by meta-analysis of microarray data.

Authors: Junnan Zhang; Jie Shao; Xing Wu; Qunying Mao; Yiping Wang; Fan Gao; Wei Kong; Zhenglun Liang
Journal: Hum Vaccin Immunother Date: 2015 Impact factor: 3.452

Review 7. Systems immunology of human malaria.

Authors: Tuan M Tran; Babru Samal; Ewen Kirkness; Peter D Crompton
Journal: Trends Parasitol Date: 2012-05-15

8. Meta-analysis of age-related gene expression profiles identifies common signatures of aging.

Authors: João Pedro de Magalhães; João Curado; George M Church
Journal: Bioinformatics Date: 2009-02-02 Impact factor: 6.937

9. ESTOOLS Data@Hand: human stem cell gene expression resource.

Authors: Lingjia Kong; Kaisa-Leena Aho; Kirsi Granberg; Riikka Lund; Laura Järvenpää; Janne Seppälä; Paul Gokhale; Kalle Leinonen; Lauri Hahne; Jarno Mäkelä; Kirsti Laurila; Heidi Pukkila; Elisa Närvä; Olli Yli-Harja; Peter W Andrews; Matti Nykter; Riitta Lahesmaa; Christophe Roos; Reija Autio
Journal: Nat Methods Date: 2013-09 Impact factor: 28.547

Review 10. Genomic Approaches to Posttraumatic Stress Disorder: The Psychiatric Genomic Consortium Initiative.

Authors: Caroline M Nievergelt; Allison E Ashley-Koch; Shareefa Dalvie; Michael A Hauser; Rajendra A Morey; Alicia K Smith; Monica Uddin
Journal: Biol Psychiatry Date: 2018-02-02 Impact factor: 13.382