Literature DB >> 22212594

Genome-wide expression profiling of schizophrenia using a large combined cohort.

Abstract

Numerous studies have examined gene expression profiles in post-mortem human brain samples from individuals with schizophrenia compared with healthy controls, to gain insight into the molecular mechanisms of the disease. Although some findings have been replicated across studies, there is a general lack of consensus on which genes or pathways are affected. It has been unclear if these differences are due to the underlying cohorts or methodological considerations. Here, we present the most comprehensive analysis to date of expression patterns in the prefrontal cortex of schizophrenic, compared with unaffected controls. Using data from seven independent studies, we assembled a data set of 153 affected and 153 control individuals. Remarkably, we identified expression differences in the brains of schizophrenics that are validated by up to seven laboratories using independent cohorts. Our combined analysis revealed a signature of 39 probes that are upregulated in schizophrenia and 86 that are downregulated. Some of these genes were previously identified in studies that were not included in our analysis, while others are novel to our analysis. In particular, we observe gene expression changes associated with various aspects of neuronal communication and alterations of processes affected as a consequence of changes in synaptic functioning. A gene network analysis predicted previously unidentified functional relationships among the signature genes. Our results provide evidence for a common underlying expression signature in this heterogeneous disorder.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2012 PMID： 22212594 PMCID： PMC3323740 DOI： 10.1038/mp.2011.172

Source DB: PubMed Journal: Mol Psychiatry ISSN： 1359-4184 Impact factor: 15.992

Introduction

Schizophrenia is a severe psychotic disorder that affects approximately one percent of the population worldwide (1). Many groups have attempted to identify changes in gene expression in the brains of schizophrenics, often focusing on the prefrontal cortex (2–4). Such studies have suggested several altered molecular processes including (but not limited to) synaptic machinery and mitochondrial-related transcripts (5–8), immune function (9) and a reduction in oligodendrocyte and myelination-related genes (10–12). The variety and scope of these processes, found in different subject cohorts, raises the question as to whether there are underlying commonalities in molecular signatures among schizophrenics. Such commonalities are presupposed by most genetic studies, which look for alleles overrepresented in large numbers of schizophrenic individuals (13–15). It is important to establish if there are any common features of the disease at the molecular level. The diversity of results in transcriptome studies can be attributed to many sources. Besides differences in the sampled cohorts and disease heterogeneity, discrepancies between transcriptome studies can be due to methodological differences in sample preparation, choice of platform, and data analysis. There are issues that are especially pertinent to the analysis of post-mortem human brain tissue. One is the confounding effect of factors such as age, gender and medication. Such factors are often associated with relatively large gene expression changes (16), while psychiatric illnesses such as schizophrenia are associated with small effect sizes. If these factors are not correctly controlled for, they can mask or masquerade as expression patterns associated with the disease. Standard practice involves minimizing the effects of such factors either in the experimental design by sample matching or treating these factors as covariates in regression models. It is also increasingly appreciated that technical artifacts such as ‘batch effects’ can result in substantial variability (17–20). In addition, post-mortem brain tissue is a limited resource, leading to small sample sizes with low statistical power. For this reason, most studies have not applied multiple test correction, and perform validation only on the same RNA samples that were used for profiling. All of these issues are likely to contribute to the differences in findings across studies. We propose that a good way to address these problems is to re-analyze and meta-analyze the studies in question, a task we undertake in this paper. The use of meta-analyses to combine high-throughput genomics studies has become increasingly used in neuropsychiatry (14, 17, 21–23). Combining datasets across studies increases power and facilitates the identification of gene expression changes that are consistent and reliable, reducing false positives. In a meta-analysis, multiple studies are statistically pooled to provide an overall estimate of significance of an effect, highlighting important yet subtle variations. While meta-analysis has been used in the study of gene expression data (24–26), to our knowledge only a few studies have done so with post-mortem human brain data (17, 22, 27, 28). A cross-study analysis of psychosis was conducted across seven datasets using samples from the Stanley Medical Research Institute (SMRI) post-mortem brain collections (22). Additionally, the SMRI report results from a cross-study analysis across schizophrenia datasets in their online genomics database (http://stanleygenomics.org), computing ‘consensus’ fold changes while adjusting for confounding variables. However, the studies used in these analyses use samples from the same two brain collections and are therefore not entirely independent. More recently, a comparative analysis was conducted across two independent schizophrenia cohorts; probes were identified as differentially expressed within each study and the intersecting probes between the two studies were reported (29). Thus while there have been attempts to meta-analyze schizophrenia expression profiling data, there has not yet been an integration using the primary data of more than two independent microarray studies. In the current study we present a cross-study analysis of seven microarray datasets comprising a total of 153 schizophrenia samples and 153 normal controls. We applied a linear modeling approach to control for factors such as age, brain pH and batch effects, and applied multiple testing corrections to control the false discovery rate. We show that we are able to detect small yet consistent and statistically significant changes. Careful control of extraneous factors using probe-specific statistical modeling, results in gene expression changes associated with the disease effect. Our results confirm some previously reported expression changes in schizophrenia in addition to identifying potential new targets suggesting alterations in synaptic function.

Materials and Methods

Data pre-processing and quality control

Genome-wide expression data sets were selected on the basis of microarray platform, use of prefrontal cortex (BA 9, 10 or 46), the availability of information on covariates such as age, and finally the availability of the raw data. Each dataset is comprised of a cohort of neuropathologically normal subjects and a cohort of schizophrenia subjects, as diagnosed and reported in their respective studies (Table 1). Sources for data include the Stanley Medical Research Institute (SMRI), the Harvard Brain Bank, and the Gene Expression Omnibus (GEO). GEO studies were identified by extensive manual and keyword searches. While the SMRI has additional data sets, these represent repeated runs of the samples from the same subjects, so we selected one dataset to represent each of the two SMRI brain collections. Two additional studies were obtained from the authors (30, 31). Datasets consisted of single-channel intensity data generated from two Affymetrix platforms, but only probe sets on the HG-U133A chip from each dataset were used for analysis. Probe sets were re-annotated at the sequence level by alignment to the hg19 genome assembly, using methods essentially as described in (20), and also cross-referenced with problematic probe lists provided by http://masker.nci.nih.gov/ev/. The raw data (“CEL”) files from all the datasets were pooled together and expression levels were summarized, log transformed and normalized by using the R Bioconductor ‘affy’ package (R Development Core Team, 2005) using default settings for the RMA algorithm. Data was also processed using four other pre-processing methods for evaluating the robustness our meta-signatures (see Supplementary Text). We decided to retain standard RMA as the method on which to centre our analysis, because RMA has been shown independently to be a high performer on gold standard data sets (32–34). Sample outliers were then identified and removed from each dataset based on inter-sample correlation analysis (see Supplement), resulting in the removal of 13 samples (2 of these are the same outliers identified in a previous analysis of SMRI data; http://stanleygenomics.org). Batch information was obtained using the ‘scan date’ stored in the CEL files; chips run on different days were considered different batches. The final data matrix consisted of expression values for 22,215 probes sets and 306 samples. Sample characteristics for the subjects were collected and are summarized in Table 2.

Table 1

Schizophrenia Datasets

Dataset	Reference	Microarray Platform	Brain region(s)	No. of SubjectsCTL:SZ
Stanley Bahn	SMRI database	HG-U133A	Frontal BA46	31 : 34
Stanley AltarC	SMRI database	HG-U133A	Frontal BA46/10	11 : 9
Mclean	HBTRC	HG-U133A	Prefrontal cortex(BA9)	26 : 19
Mirnics	Garbett K. et al, 2008 (30)	HG-U133A/B	Prefrontal cortex(BA46)	6 : 9
Haroutunian	Katsel P. et al, 2005 (31)	HG-U133A/B	Frontal(BA10/46)	29 : 31
GSE17612	Maycox P. et al, 2009 (29)	HG-U133 Plus 2.0	Anteriorprefrontal cortex(BA10)	21: 26
GSE21138	Narayan S. et al, 2008 (52)	HG-U133 Plus 2.0	Frontal (BA46)	29 : 25

SMRI, Stanley Medical Research Institute; HBTRC, Harvard Brain Tissue Resource Centre (Mclean66 collection)

Table 2

Summary of demographic variables across combined cohort

	Control	Schizophrenia	P-value
Number of Subjects	153	153
Age	56.25 ± 20	55.27 ± 19	p = 0.67
Sex	101M : 52F	113M : 40F	p = 0.1
Brain pH	6.5 ± 0.28	6.39 ± 0.29	p = 0.001
PMI	21.95 ± 15.3	22.65 ± 15.2	p = 0.69

F, female; M, male; PMI, post-mortem interval. There were 319 samples collected across seven datasets of which 306 passed quality control analysis. The summary demographics (mean ± standard deviation) and t- test p-values for group differences are shown for those subjects used in the analysis. For sex we report the p-value generated from a chi-squared test for equality of proportions.

Statistical Modeling

Gene expression values for each probe set were modeled using a standard fixed effects linear model (FEM) framework. We treated Disease, Age, Brain pH, Batch date and Study as fixed effects for which unknown constants are to be estimated from the data. We also employed a model selection procedure, in which each probe set was modeled using the full model including all five factors, as well as various sub-models (details in Supplemental Methods; our approach is similar to that used in (35)). For each probe set, the t-statistic for the disease effect was then extracted from the selected model and p-values were computed using one-sided tests, preformed independently for the two alternative null hypotheses (i.e. gene expression does not increase with schizophrenia and gene expression does not decrease with schizophrenia). The resulting p-values for the up- and down-regulated signatures were further adjusted for multiple testing using the q-value method (36) to control the false discovery rate (FDR).

Literature-derived signatures

Our signatures were compared to probe lists obtained from the original publication for each of the datasets used in our analysis. As the two SMRI datasets were unpublished, gene lists were compiled from the SMRI online genomics database. For the Mclean dataset we used the list of ‘significant probes’ as reported in (29). For the Haroutunian data we chose to use probes selected at the ‘low stringency criteria’ described in (31). Details on each of these gene sets can be found in Table 4 (probes were excluded if they were not on the HG-U133A chip). Additional signatures for comparison were obtained for published schizophrenia expression profiling studies, and a list of the top 45 candidate schizophrenia genes reported in the SZGene database (13). Agreement of the meta-signature ranking with each validation gene set was assessed using receiver operating characteristic (ROC) curve analysis described in greater detail in Supplementary Methods.

Table 4

Comparison of meta-signatures with findings from original study

	Significance Criteria	Down-regulated			Up-regulated
Dataset		Probes	AUC	Overlap	Probes	AUC	Overlap
Stanley AltarC	p < 0.05	848	0.70	14	34	0.78	0
Stanley Bahn	p < 0.05	69	0.85	6	91	0.89	5
Mclean (29)	p < 0.05, intensity > 30	570	0.75	13	300	0.76	7
Mirnics (30)	p < 0.05, \|ALR\| > 0.58	7	0.55	0	4	0.94	1
Haroutunian(BA10) (31)	p < 0.05, \|FC\| > 1.4,present calls > 60%	5	0.5	0	14	0.66	1
Haroutunian(BA46) (31)	p < 0.05, \|FC\| > 1.4,present calls > 60%	50	0.21	0	11	0.59	0
GSE17612 (29)	p < 0.05, intensity > 30	548	0.74	22	466	0.71	7
GSE21138 (52)(Short DOI)	p < 0.05; \|FC\| > 1.25	482	0.50	1	173	0.57	1
GSE21138 (52)(Int DOI)	p < 0.05; \|FC\| >1.25	132	0.60	0	78	0.69	0
GSE21138 (52)(Long DOI)	p < 0.05; \|FC\| > 1.25	37	0.63	1	89	0.63	1

DOI, duration of illness; AUC, area under the curve; FC, fold change

The findings from each dataset are summarized including only probes used in our analysis. The ‘Probes’ column indicates number of probes found from the study based on study specific significance criteria. AUC values were computed from an ROC analysis of each gene set against the corresponding ranked meta-signatures. Overlap values report the number of probes in each gene set that overlaps with probes from the meta-signatures at q < 0.1.

Functional and network analysis

We analyzed each signature for enrichment of Gene Ontology (GO) terms (37), using the gene score re-sampling (GSR) method in ErmineJ (38, 39). We also evaluated the path-length and node degree (number of associations) properties of the meta-signature genes in a large human protein-protein interaction network (PPIN) obtained by aggregating data from multiple sources(40–45). The network contains 100,623 unique interactions among 11,697 genes. Path lengths in the network were measured using Dijkstra’s algorithm (46). Statistical significance was assessed by reference to an empirical null distribution obtained by randomly sampled 10000 gene sets of similar size and node degree.

Results

Schizophrenia and control groups had no significant differences in age and PMI, and the number of males and females between the groups were fairly well matched (Table 2). Brain pH was significantly different (t-test; p = 0.001). P-value distributions for each demographic variable were also assessed to help determine the selection of factors used as fixed effects for our model. We found it was necessary to correct for “batch effects” (technical artifacts caused by running chips on different days or even years (20)), as they contributed the vast majority of variance in gene expression (Supplementary Figure 1). Each factor was considered in a model selection procedure (see Methods and Supplementary Methods), and a final set of linear models were used to identify probe sets that were differential expressed between schizophrenic and control samples. After multiple test correction we identified a meta-signature of 39 up-regulated and 86 down-regulated probes at an FDR of 0.1 (Supplementary Table 2, Table 3). If we assess the number of unique genes that appear in each signature we obtain a list of 25 up-regulated and 70 down-regulated genes. These numbers highlight several cases of a gene which appears in our signature more than once, suggesting higher confidence in the finding of expression changes for those genes. Figure 1 shows the expression levels top down-regulated probe we identified. As expected, expression changes were small (~ 15% expression change), and more evident in some datasets. As required by our modeling procedure, the direction of expression changes is mostly consistent.

Figure 1

Example of consistent expression changes for a gene across data sets

Expression data within each dataset after covariate correction is presented for the top down-regulated gene NECAB3. Plots are labelled with the associated dataset. Samples were separated into disease and control cohorts and expression was plotted as a boxplot. Individual sample values were overlaid on with red squares representing control individuals and blue triangles representing schizophrenics.

To test the robustness of these findings, we used a jackknife procedure, sequentially removing one of the seven studies and performing the meta-analysis on the remaining six, for each study in turn. We expected that results highly influenced by a single data set would not be stable across jackknife runs. Each leave-out iteration resulted in a new meta-signature, which was then ranked by q-value and compared against the final meta-signature (Supplementary Table 3). The range of rank correlations among jackknife iterations (0.87 – 0.99) illustrates the robustness of our meta-signatures, demonstrating that our results are not highly biased by any single dataset. The lowest correlations were observed upon removal of the Bahn and GSE21138 datasets (0.88 and 0.87, respectively) suggesting that these datasets may be contributing a slightly stronger signal, particularly to the up-regulated signature. The lack of significant genes at a q < 0.1 in the signature for those jackknife runs corroborates this finding. Finally, the top 100 probes were taken from each jackknife signature and an intersection set was retained to form a ‘core signature’ of 16 down-regulated and 14 up-regulated probes (Table 3). We consider these probes to be the most reliable findings from our study as they are relatively insensitive to the choice of data sets used. In Figure 2, we have assembled the ‘core signatures’ and plotted expression levels within each dataset with samples separated into control and schizophrenia groups. For some studies we observe a more obvious gradient between the two groups illustrating expression change, and for others the difference is more subtle.

Table 3

Core signatures retained after jackknife validation

A: Up-regulated in schizophrenia
Probe	Gene Symbol	Gene Description	Fold Change	Q-value
210057_at	SMG1g	SMG1 homolog,phosphatidylinositol 3-kinase-related kinase	1.04	0.009
202619_s_at	PLOD2▲	procollagen-lysine, 2-oxoglutarate5-dioxygenase 2	1.12	0.05
203548_s_at	LPL▲	lipoprotein lipase	1.11	0.009
*202975_s_at	RHOBTB3 b	Rho-related BTB domain containing	1.16	0.063
*216048_s_at		3	1.11	0.02
*213015_at	BBX b	bobby sox homolog (Drosophila)	1.05	0.082
213016_at			1.11	0.086
219426_at	EIF2C3 b
213187_x_at	FTL	ferritin, light polypeptide	1.11	0.022
207543_s_at	P4HA1g	prolyl 4-hydroxylase, alphapolypeptide I	1.12	0.074
218345_at	TMEM176A	transmembrane protein 176A	1.10	0.071
221503_s_at	KPNA3	karyopherin alpha 3 (importin alpha4)	1.02	0.072
209069_s_at	LOC440093b, mH3F3B	multiple gene mappings	1.11	0.020
209747_at	BCYRN1b, gTGFB3	Multiple gene mappings	1.11	0.072

insensitive to pre-processing method;

identified in previous expression profiling study;

Altar study finding;

Bahn study finding;

GSE17612 study finding;

Mclean study finding.

Q-value is an FDR adjusted p-value, see (36)

Figure 2

Expression changes in the ‘core signatures’

For each probe in the core signatures (meaning they are retained as significant even after the removal of any single study), the corresponding data from each study was extracted and converted to a heat map. Expression values were normalized across all samples within each dataset, and as in Figure 1 the data are corrected for the covariates such as batch and age. Rows represent probes and are labeled with its unique gene mapping if one exists. Columns represent samples. Grey bars represent the control brain samples, and the black bar represents the schizophrenia samples. Light values in the heat map indicate higher expression values.

To assess the sensitivity of our results to the choice of pre-processing algorithm we re-analyzed our data with four different methods (see Supplemental Text). We obtained good agreement between the results of each method and our final meta-signatures despite dramatic changes to the preprocessing procedure (Supplementary Table 4). Additionally, we took the intersection of significant probes from each of the different methods to assemble a list of probes that are completely insensitive to the choice of preprocessing method. This list comprises a total of 5 up-regulated and 8 down-regulated probes, highlighting novel genes and genes that have been previously implicated in independent studies (Table 3; Supplementary Table 2). The set of differentially expressed genes identified from our analysis implicates a variety of genes and functional groups, many of which have been previously reported in the literature. For example, down-regulation of mu-crystallin (CRYM), potassium channel subfamily K member 1 (KCNK1), F-box protein 9 (FBXO9) and up-regulation of lipoprotein lipase (LPL) and lysyl hydroxylase 2 (PLOD2) are concordant with findings from previous studies (7, 9, 12, 47). We manually evaluated the significant genes in our list (q < 0.1) individually according to literature reports and Uniprot definitions for each, to characterize genes into high-level functional categories. In the down-regulated signature we found genes to cluster into functional groups pertaining to various molecular mechanisms of neuronal communication. On the pre-synaptic side we find genes involved in cell adhesion (for example, OPCML), and neurotransmitter secretion (for example, APBA2, PCSK2). We also find genes involved in signalling pathways that elicit metabotropic effects (for example, GNAL, OPN3, CRHR, RGS7, GNB5). Concordant with previous studies, we also identified various genes involved in oxidative phosphorylation (for example, CYP26B1, COQ4, SLC25A15, ATP5C1, SLC25A12) and ubiquitination (for example, FBXO9, COPS7B, USP19, TACC2, DCAF8). From our up-regulated signature we find a number of transcription-related genes (for example, BAZ1A, CBFA2T2, BBX, ANP32A) and genes involved in translation (for example, EIF3E, EIF2C3, PAIP2B). Other genes include cell organization/maintenance factors (for example, PKP4, PLOD2) and various stress response genes (for example, SMG1). Additionally for both signatures we find a small group of genes with unknown function. We performed a functional analysis to systematically detect enrichment of biological processes, using Gene Ontology (GO) annotations. After multiple test correction, we were unable to identify any significant terms using the over-representation method (ORA), but significant terms found using the threshold-free GSR algorithm (39) corroborate findings from the above manual evaluation. For genes with decreasing expression levels in schizophrenia, the top GO categories included those involved in energy metabolism, and ubiquitination, neurotransmitter transport and various metabolic processes. The schizophrenia up-regulated genes showed enrichment in various immune-related GO categories in addition to terms related to cellular localization (full results from this analysis can be found in Supplementary Table 5). Because the genes we identified were functionally diverse, we hypothesized there might be additional insight gained at the level of gene networks. In particular we asked whether the signature genes had any unusual properties in their protein interaction patterns, compared to carefully selected groups of background genes (see Methods). We specifically looked at within-group connectivity, node degree (the number of connections) and path lengths between genes. Our most striking finding is that the genes within our set were significantly closer to one another in the network than expected by chance (p<0.02). This relationship suggests a higher likelihood of functional relationships among the signature genes (41, 48). In contrast, the signature genes did not possess a particularly high node degree within the network (23rd percentile in the whole network), that is, they tend not to be ‘hubs’. We illustrate these properties for the up and down-regulated “core” signature genes in Supplementary Figure 2. We also evaluated each meta-signature against modules of co-expressed genes in the human cortex as reported in(49). Details on this analysis can be found in the Supplemental Methods. Our up- and down-regulated signatures significantly overlap with the “turquoise” and “brown” modules (p < 0.01 and p < 0.05 respectively; Supplementary Table 6). These are modules of interest as they display a notable extent of preservation across datasets in (49), suggesting that differential expression of our signature genes may be disrupting core networks in the human brain. This also reinforces the importance of gene network structure analysis in determining the basis of this disorder. To characterize our schizophrenia signatures with respect to cellular organization in the cortex we cross-referenced our ranked meta-signatures with published lists of CNS cell type markers (50). An ROC analysis of the meta-signatures for astrocytes, oligodendrocytes and neurons revealed no preferential association with our ranked meta-signatures. However, evaluating only the significant probes (q<0.1) in our signatures, we find an enrichment of probes mapping to neuronal markers in the down-regulated signature (Supplementary Table 2). While our linear modeling approach controlled for the effects of age and brain pH, we checked our signatures against gene lists for pH and age from our previous study of normal post-mortem human brain (16). The overlap was significant only for our down-regulated signature, which contains 33 genes previously identified to be down-regulated by age. Because our profiles are age-corrected and our cohorts age-matched, this suggests overlap in expression changes in age and schizophrenia rather than a confounding effect. We also sought to address factors that were not accounted for in our linear modeling, such as medication effects and alcohol and drug abuse. Using gene lists provided from the SMRI Online Genomics Database (http://www.stanleygenomics.org), we extracted significant gene lists (p < 0.001; FC>1.2) pertaining to the effects of lifetime alcohol use (23 genes), lifetime drug use (26 genes), and lifetime antipsychotics (69 genes) in subjects with schizophrenia. A comparison of each of these lists to our meta-signatures identified only two overlapping genes. We found KCNK1, which is present in our down-regulated signature, also increases with lifetime alcohol use. From the up-regulated signature the gene LPL, appears to increase with lifetime antipsychotic use and decrease with increased drug use. Each meta-signature was evaluated against the top 45 candidate schizophrenia genes reported in the SZGene database (http://www.szgene.org/). Agreement of the meta-signature ranking with the SZGene set was assessed using receiver operating characteristic (ROC) curve analysis. The SZGene list appears to be randomly distributed across our ranking. We also computed a simple overlap between the 45 candidate genes and our results, identifying OPCML as the only common gene. We were interested in comparing our re-analysis of these seven data sets to the “hit lists” provided by the data set providers. We first tested whether our meta-signature gene rankings were enriched for genes reported by the original study, using ROC analysis (Table 4; see Supplementary Methods). We observed high AUC scores for most gene sets; however the Haroutunian and GSE21138 studies exhibited exceptionally low scores, possibly in part because the original studies have an added dimension of variability as gene sets were generated for stratified cohorts as opposed to a case versus control comparison. While high AUCs suggest some similarity in the results, a more sensitive analysis examines just the very top of the rankings. We therefore computed the overlap of each reference gene set with the meta-signature of genes collected at q<0.1. This reveals a handful of probes in each study that also show up in our significant gene lists (Table 4). We also re-analyzed each individual dataset using our linear modeling approach. This allowed a more fair evaluation of the contribution of each to the final meta-signatures, since the original studies used a variety of methods for gene selection. After correcting for multiple testing, only two of the data sets (Altar and Haroutunian) yielded significant genes at q < 0.1. We therefore considered the top 100 probes from each dataset, and computed overlaps with our meta-signatures (Supplementary Table 7). The overlap is highest with the Bahn and GSE21138 datasets, which is in accord with the finding that these datasets contribute a stronger signal to the meta-signature than the others. Despite being the only two data sets which have significant differential expression after multiple test correction, the Altar and Haroutunian results showed very little overlap with the final meta-signature. We note that considering the 7 data sets independent of our meta-signature, there was no overlap among their top 100 probes. Similarly, there was little correlation of the overall rankings of probes among the data sets (Supplementary Table 8). Overall these results suggest that our re-analysis is concordant with the analysis conducted by the original study authors, subject to important differences likely attributable to our analytic approach (for example, correction for batch effects), and only revealing commonalities through meta-analysis which contribute weakly to the findings of the individual studies.

Discussion

In this study we present expression changes associated with schizophrenia consistent across up to seven independent cohorts of subjects. To our knowledge, the degree of validation and confirmation inherent in our analysis is unprecedented. Unlike previous studies, which use PCR assays to check results on the same RNA samples used for microarrays, or which compare at most two cohorts, we identified changes in expression that are shared across independent subject cohorts, analyzed by laboratories distributed around the world. Our study provides a new window into the molecular changes that might underlie schizophrenia. The larger number of down-regulated probes (86 vs. 39) is in agreement with previous reports (2, 8, 9). Many of the genes we have identified have been previously reported to be expressed in the brain, with some genes showing neuronal specificity. Some of the genes we report as differentially expressed have been previously implicated in schizophrenia, either through expression profiling studies of schizophrenia (KCNK1, CRYM, FBXO9), or genetic association studies (OPCML (15)). We also identify three genes in our signature (up-regulated genes WNK1 and ABCA1 and down-regulated gene SNN) that overlap with results from a comparative analysis of two of the studies we used (29). Additionally, we found functional gene groups discussed in previous expression studies of schizophrenia. Many of the same metabolic processes were observed in a study of 71 different metabolic genes groups in schizophrenia (7). Also in agreement, various energy pathway genes were found previously in DLPFC studies of schizophrenia (47, 51). Over-expression of immune responses from our GSR analysis is also concordant with recent findings of over-expression in genes related to immune function in schizophrenia (9, 52, 53).Thus, our results are supportive of at least some previous findings and reveal a previously unrecognized similarity across studies. Our meta-signatures contain a number of interesting new candidate genes, particularly our down-regulated meta-signature which potentially reflects alterations in neuronal communication. NOVA1 is a regulator of RNA splicing recently found to inhibit splicing of exon6 from the dopamine receptor D2 gene resulting in D2L, the long isoform of the receptor (54). With NOVA1 decreasing in expression in schizophrenia, inhibition may be repressed leading to higher than normal levels of the spliced D2S isoform which is involved in neuron firing and dopamine release. The DLGAP1 gene encodes a protein interacting with PSD-95 and a complex of other proteins in the postsynaptic density. Decreased expression of this scaffold protein may have consequences for anchoring and organizing receptors and signalling molecules on the postsynaptic side. Moreover, we have identified several genes associated with calcium signalling (CACNB3), binding (SLC25A12, NECAB3) and homeostasis (CCL3, ATP2B2), processes of likely relevance to schizophrenia (55). We have also identified genes that associate with the G-protein coupled receptor (GPCR) signalling pathway. One example is GNAL, a gene encoding for the alpha subunit of the G-protein Golf, expressed in many regions of the brain. Given the critical roles of G-proteins it is plausible that GNAL (and other GPCR related genes) may have a role in the pathophysiology of schizophrenia (56). GNAL expression has not been previously shown to be affected by schizophrenia, but it is located in a chromosomal region (18p.2) that has been linked to schizophrenia and bipolar disorder. More specifically, a di-nucleotide repeat in intron 5 of the GNAL gene has been linked to schizophrenia in some families (57). These expression changes concerning synaptic function may reduce neuronal energy demand in the brains of affected patients thus providing explanation for the down-regulation of various oxidative phosphorylation and energy metabolism genes that we observe. We also sought to examine whether our signature genes could be inferred to share some previously unknown function, making use of gene network analysis. One way to do this is by the principle of “guilt by association”, which states that genes with shared function are more likely to interact (58). However, the meta-signature genes have a fairly low number of interaction partners, making “guilt” difficult to ascertain. Another property to examine is path length in the network, where genes that have short paths between them might be more functionally related. In general, low node degrees would imply higher path lengths among the genes, but was not the case for our gene set. That is, the signature genes are linked by unusually short paths in the network. Additionally, we found each of our meta-signatures revealed a significant overlap with previously identified gene co-expression modules in the human cortex (49). This suggests a relationship among the genes that is not reflected in current annotations and a network analysis of these schizophrenia genes will need to be investigated in greater detail in future studies. We found that some of the down-regulated schizophrenia genes overlap with genes that decrease in expression with age. Many of the biological processes affected by age also tend to appear as affected processes in schizophrenia, both in this study and existing profiling studies (7, 9, 12, 47, 51). These findings suggest that many genes affected by age are also affected by schizophrenia, but also raise the possibility of confounding effects. As these effects could be confounded, one could filter the list of schizophrenia candidate genes from our results by simply removing known age- and pH-affected genes from the final signature (leaving 31 up- and 51 down-regulated probes) to investigate these effects more thoroughly. Our results should be interpreted in the context of several caveats. First, our approach is specifically designed to find concordant results across studies, and does not detract from the potential novel findings that might be found in any single data set. We do suggest that genes found to be commonly differentially expressed by multiple studies are of particularly high value in identifying underlying etiological influences in schizophrenia. As is the case for all postmortem brain studies, we also cannot be sure that the expression changes we have identified are direct effects of the illness or are secondary to an underlying pathology. An additional caveat is that because we were unable to obtain medication or illicit drug use information for all subjects, we were not able to incorporate this information into our analysis. To help address this we compared our signatures against gene lists derived from a recent review on convergent antipsychotic mechanisms(59). We observed no overlap with our signatures. In addition to antipsychotics, the use and abuse of other recreational drugs and smoking are also compounds that can confound the study of disease-related gene expression. Due to a lack of sufficient information on these factors we were unable to strictly control for them in our analysis. However, using gene lists provided from the SMRI Online Genomics Database (http://www.stanleygenomics.org) we were able to make comparisons to address some of these factors and identified two overlapping genes. While the small number of overlapping genes is suggestive that we have identified genes in our signature that are not affected by such extraneous factors; we acknowledge that we cannot entirely exclude the possibility that the gene expression changes we have identified are still in some way influenced. In conclusion, we have contributed the most comprehensive meta-analysis of schizophrenia expression profiling studies to date. Our most striking finding is that despite the heterogeneity of the disorder, we were able to detect a common signature of schizophrenia. Additionally, we have elaborated on the biological relevance of our gene list, illustrating a need for further genetic study to fully enhance our understanding of the direct implication of these changes in expression with the illness. The signatures we identified are consistent with current hypotheses of molecular dysfunction in schizophrenia, including alterations in synaptic transmission and energy metabolism. However, the diversity of genes we found suggests that systems biology approaches, exemplified by the analysis of gene network structure, will be of value in determining the basis of this disorder. The approaches used in our study should be applicable to other neuropsychiatric disorders if sufficient data are available.

58 in total

1. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions.

Authors: Hon Nian Chua; Wing-Kin Sung; Limsoon Wong
Journal: Bioinformatics Date: 2006-04-21 Impact factor: 6.937

Review 2. Molecular profiling of antipsychotic drug function: convergent mechanisms in the pathology and treatment of psychiatric disorders.

Authors: Elizabeth A Thomas
Journal: Mol Neurobiol Date: 2006-10 Impact factor: 5.590

3. Meta-analysis of 12 genomic studies in bipolar disorder.

Authors: Michael Elashoff; Brandon W Higgs; Robert H Yolken; Michael B Knable; Serge Weis; Maree J Webster; Beata M Barci; E Fuller Torrey
Journal: J Mol Neurosci Date: 2007 Impact factor: 3.444

Review 4. Gene expression profiling in schizophrenia and related mental disorders.

Authors: Kazuya Iwamoto; Tadafumi Kato
Journal: Neuroscientist Date: 2006-08 Impact factor: 7.519

5. A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function.

Authors: John D Cahoy; Ben Emery; Amit Kaushal; Lynette C Foo; Jennifer L Zamanian; Karen S Christopherson; Yi Xing; Jane L Lubischer; Paul A Krieg; Sergey A Krupenko; Wesley J Thompson; Ben A Barres
Journal: J Neurosci Date: 2008-01-02 Impact factor: 6.167

6. Molecular evidence for increased expression of genes related to immune and chaperone function in the prefrontal cortex in schizophrenia.

Authors: Dominique Arion; Travis Unger; David A Lewis; Pat Levitt; Károly Mirnics
Journal: Biol Psychiatry Date: 2007-06-13 Impact factor: 13.382

7. Meta-analysis of two genome-wide association studies of bipolar disorder reveals important points of agreement.

Authors: A E Baum; M Hamshere; E Green; S Cichon; M Rietschel; M M Noethen; N Craddock; F J McMahon
Journal: Mol Psychiatry Date: 2008-05 Impact factor: 15.992

8. MINT: the Molecular INTeraction database.

Authors: Andrew Chatr-aryamontri; Arnaud Ceol; Luisa Montecchi Palazzi; Giuliano Nardelli; Maria Victoria Schneider; Luisa Castagnoli; Gianni Cesareni
Journal: Nucleic Acids Res Date: 2006-11-29 Impact factor: 16.971

9. Capturing heterogeneity in gene expression studies by surrogate variable analysis.

Authors: Jeffrey T Leek; John D Storey
Journal: PLoS Genet Date: 2007-08-01 Impact factor: 5.917

Review 10. Inflammation-related genes up-regulated in schizophrenia brains.

Authors: Peter Saetre; Lina Emilsson; Elin Axelsson; Johan Kreuger; Eva Lindholm; Elena Jazin
Journal: BMC Psychiatry Date: 2007-09-06 Impact factor: 3.630

41 in total

1. Meta-Analysis of Gene Expression in Autism Spectrum Disorder.

Authors: Carolyn Ch'ng; Willie Kwok; Sanja Rogic; Paul Pavlidis
Journal: Autism Res Date: 2015-02-26 Impact factor: 5.216

2. Insights into Enzyme Catalysis and Thyroid Hormone Regulation of Cerebral Ketimine Reductase/μ-Crystallin Under Physiological Conditions.

Authors: André Hallen; Arthur J L Cooper; Joanne F Jamie; Peter Karuso
Journal: Neurochem Res Date: 2015-05-01 Impact factor: 3.996

Review 3. The developmental transcriptome of the human brain: implications for neurodevelopmental disorders.

Authors: Andrew T N Tebbenkamp; A Jeremy Willsey; Matthew W State; Nenad Sestan
Journal: Curr Opin Neurol Date: 2014-04 Impact factor: 5.710

4. Converging evidence implicates the abnormal microRNA system in schizophrenia.

Authors: Fuquan Zhang; Yong Xu; Yin Yao Shugart; Weihua Yue; Guoyang Qi; Guozhen Yuan; Zaohuo Cheng; Jianjun Yao; Jidong Wang; Guoqiang Wang; Hongbao Cao; Wei Guo; Zhenhe Zhou; Zhiqiang Wang; Lin Tian; Chunhui Jin; Jianmin Yuan; Chenxing Liu; Dai Zhang
Journal: Schizophr Bull Date: 2014-11-26 Impact factor: 9.306

5. Transcriptomic Evidence for Alterations in Astrocytes and Parvalbumin Interneurons in Subjects With Bipolar Disorder and Schizophrenia.

Authors: Lilah Toker; Burak Ogan Mancarci; Shreejoy Tripathy; Paul Pavlidis
Journal: Biol Psychiatry Date: 2018-07-21 Impact factor: 13.382

6. Meta-Analysis of Gene Expression Patterns in Animal Models of Prenatal Alcohol Exposure Suggests Role for Protein Synthesis Inhibition and Chromatin Remodeling.

Authors: Sanja Rogic; Albertina Wong; Paul Pavlidis
Journal: Alcohol Clin Exp Res Date: 2016-03-20 Impact factor: 3.455

7. A Whole Methylome CpG-SNP Association Study of Psychosis in Blood and Brain Tissue.

Authors: Edwin J C G van den Oord; Shaunna L Clark; Lin Ying Xie; Andrey A Shabalin; Mikhail G Dozmorov; Gaurav Kumar; Vladimir I Vladimirov; Patrik K E Magnusson; Karolina A Aberg
Journal: Schizophr Bull Date: 2015-12-09 Impact factor: 9.306

8. Genome-wide association study of patient-rated and clinician-rated global impression of severity during antipsychotic treatment.

Authors: Shaunna L Clark; Renan P Souza; Daniel E Adkins; Karolina Aberg; József Bukszár; Joseph L McClay; Patrick F Sullivan; Edwin J C G van den Oord
Journal: Pharmacogenet Genomics Date: 2013-02 Impact factor: 2.089

9. Oligodendroglial alterations and the role of microglia in white matter injury: relevance to schizophrenia.

Authors: Li-Jin Chew; Paolo Fusar-Poli; Thomas Schmitz
Journal: Dev Neurosci Date: 2013-02-27 Impact factor: 2.984

10. Transcriptome-wide mega-analyses reveal joint dysregulation of immunologic genes and transcription regulators in brain and blood in schizophrenia.

Authors: Jonathan L Hess; Daniel S Tylee; Rahul Barve; Simone de Jong; Roel A Ophoff; Nishantha Kumarasinghe; Paul Tooney; Ulrich Schall; Erin Gardiner; Natalie Jane Beveridge; Rodney J Scott; Surangi Yasawardene; Antionette Perera; Jayan Mendis; Vaughan Carr; Brian Kelly; Murray Cairns; Ming T Tsuang; Stephen J Glatt
Journal: Schizophr Res Date: 2016-07-20 Impact factor: 4.939