Literature DB >> 16594998

An online database for brain disease research.

Brandon W Higgs¹, Michael Elashoff, Sam Richman, Beata Barci.

Abstract

BACKGROUND: The Stanley Medical Research Institute online genomics database (SMRIDB) is a comprehensive web-based system for understanding the genetic effects of human brain disease (i.e. bipolar, schizophrenia, and depression). This database contains fully annotated clinical metadata and gene expression patterns generated within 12 controlled studies across 6 different microarray platforms. DESCRIPTION: A thorough collection of gene expression summaries are provided, inclusive of patient demographics, disease subclasses, regulated biological pathways, and functional classifications.
CONCLUSION: The combination of database content, structure, and query speed offers researchers an efficient tool for data mining of brain disease complete with information such as: cross-platform comparisons, biomarkers elucidation for target discovery, and lifestyle/demographic associations to brain diseases.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2006 PMID： 16594998 PMCID： PMC1489945 DOI： 10.1186/1471-2164-7-70

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

Brain disease studies based on experiments using genome-wide measurements with microarrays are traditionally challenging as compared to other disease areas. The biological results are often hindered by statistical issues of small sample sizes, small effect sizes, and patient-to-patient variability [1-3]. Also, clinical information for patients is typically sparse, such that unknown clinical covariates can either confound or confuse many of the gene expression patterns and trends, as opposed to the primary disease. Corrections using such clinical information can greatly improve inference in determining markers for disease, as well as elucidating patterns within the disease. Technical problems in microarray data can also affect the analyses. Meaningful results are often limited by array platform-to-platform comparisons and overall organization/presentation of large data sets/results. Studies conducted on disparate platforms are inherently more difficult to analyze than those conducted on the same platform [4]. Cross-platform comparisons present analysis challenges due to differences in scaling and sensitivity (to name a few) which introduce inconsistencies in reproducibility [5-8]. Large data sets and comprehensive results summaries present another challenge that requires good organization of both analytical and bioinformatics information (e.g. expression profiles, gene summary information, pathway diagrams, fold change value comparisons, etc.) into a user-friendly format to facilitate efficient data mining. A relational web-based tool that logically combines all of these factors can enhance researchers' ability to determine the underlying genomic patterns in brain disease. The SMRIDB is an online data warehouse and analytical system designed to aid researchers in understanding the biological associations both between and within the brain disorders of schizophrenia, bipolar, and major depression. This open source database combines genomic patterns of brain disease with patient clinical metadata into a user-friendly query interface to enable efficient data mining for purposes of biomarker discovery and elucidating biological mechanisms of brain disease. The metadata includes a full summary of clinical history for each patient with hyperlinks to disease-level information, such that demographic- and lifestyle-associated effects can be determined as they relate to brain disorders. The genomic data has been compiled from 12 separate labs (identified as studies), each data set generated from brain tissue isolated from two controlled populations of 165 patients, diagnosed with one of the three brain disorders (plus unaffected control brain tissue). This genomic data has been generated across 6 separate human array platforms (Affymetrix: hgu133a, hgu133plus, hgu95av2, Agilent, Codelink, and cDNA custom array) providing patterns/trends and analytical inferences that are not limited by platform dependencies.

Construction and content

Bioinformatics mappings

NCBI's Database for Annotation, Visualization and Integrated Discovery (DAVID 2.0) was used as the standard source for gene annotation information [9]. The primary fields extracted from DAVID include: LocusLink, gene symbol, and gene summary. Additional annotations include gene product mappings to the Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology Consortium (GO) for pathway and GO terms/classes, respectively. For Affymetrix arrays, queries were based on the Affymetrix probe ID (AFFYID). For other arrays, the Genbank accessions (GENBANK) were used.

Individual study-level analysis

For each of the individual studies, a series of analyses were performed. Each array (representing a single patient) was subjected to a quality control (QC) analysis for chip-level parameters (e.g. scaling factor, gene calls, control gene ratios, average correlation) with respect to the reference distribution for those parameters across the arrays. This QC analysis is represented with both graphical representations (e.g. heatmaps, scatter plots, and histograms (Figure 1)) and table summaries, allowing users to readily identify those arrays determined to be outliers in the study. A total of 41 clinical demographic variables (Tables 1, 2, 3, 4) were assessed for their effects on a gene-by-gene basis. Continuous variables and ordered categorical variables were cut at values as close as possible to the median (e.g. PMI>30 vs. PMI<30; Drug Use = 'Heavy' vs. Drug Use = 'None, Light, Moderate'). The genes determined to be most significant (p-value<0.01 and fold change >1.3) for each demographic variable is reported in a table, accompanied by a summary of the percentage of significant genes for each variable (Figure 2). Each gene found to be significant for a demographic variable links to a gene-centric page (discussed in Gene details page section). Such results allow researchers to determine markers that are related to lifestyle or clinical demographical information and identify confounding variables within a disease class.

Figure 1

QC histograms. Examples of distribution thresholds used to assess outliers for an individual study.

Table 1

Patient demographic variables for all diseases

All Patient Variables	Values
Age >45	Yes/No
PMI > 30	Yes/No
Brain pH > 6.5	Yes/No
Left Brain	Yes/No
Sex	Male/Female
Smoking at Time of Death	Yes/No
Herpes simplex virus 1 OD	High/Low
Herpes simplex virus 2 OD	High/Low
Toxolgg OD	High/Low
EBV OD	High/Low
HHV6 OD	High/Low
CMV OD	High/Low
Hervk 18 SNP	Positive/Negative
Hervk 18 Expression	Positive/Negative

Table 2

Patient demographic variables for Bipolar patients

Bipolar Patient Variables	Values
Bipolar Severity	Severe/Other
Bipolar Heavy Alcohol Use	Yes/No
Bipolar Heavy Drug Use	Yes/No
Bipolar Psychotic Feature	Yes/No
Bipolar Sudden Death	Yes/No
Bipolar Suicide Status	Yes/No
Bipolar Lifetime Antipsychotics >0	Yes/No
Bipolar Antipsychotics	Yes/No
Bipolar Antidepressants	Yes/No
Bipolar Mood Stabilizer	Yes/No
Bipolar Lithium	Yes/No
Bipolar Valproate	Yes/No

Table 3

Patient demographic variables for Schizophrenic patients

Schizophrenic Patient Variables	Values
Schizophrenia Severity	Severe/Other
Schizophrenia Heavy Alcohol Use	Yes/No
Schizophrenia Heavy Drug Use	Yes/No
Schizophrenia	Paranoid/Undiff
Schizophrenia Sudden Death	Yes/No
Schizophrenia Suicide	Yes/No
Schizophrenia Lifetime Antipsychotics >45,000	Yes/No
Schizophrenia Antichollnergic	Yes/No
Schizophrenia Antidepressants	Yes/No
Schizophrenia Stabilizer	Yes/No
Schizophrenia Lithium	Yes/No
Schizophrenia Valproate	Yes/No

Table 4

Patient demographic variables for Depressed patients

Depressed Patient Variables	Values
Depression Heavy Alcohol Use	Yes/No
Depression Heavy Drug Use	Yes/No
Depression Suicide	Yes/No

Figure 2

Demographic gene table. Table of genes determined to be significant (p < 0.01 and fold change > 1.3) with the demographic variables for an individual study.

Patient demographic variables for all diseases Patient demographic variables for Bipolar patients Patient demographic variables for Schizophrenic patients Patient demographic variables for Depressed patients QC histograms. Examples of distribution thresholds used to assess outliers for an individual study. Demographic gene table. Table of genes determined to be significant (p < 0.01 and fold change > 1.3) with the demographic variables for an individual study. The three disease classes were analyzed to provide a list of discriminating genes (adjusted for the demographic terms that met the criteria of significance for that gene) or markers indicative of disease (Figure 3) between the control patients and each disease class (schizophrenia, bipolar, depression). In addition to table summaries (genes in table also link to their respective gene detail page), both 2D clustering heatmaps (Figure 4) and principal components scatter plots (Figure 5) are provided for a visual representation of the data. Utilizing these disease markers, the most regulated pathways and GO terms were identified for each disease comparison based on a Fisher's exact test. Each pathway and GO term (from each of the three GO functional classifications separately) is ranked by p-value for each disease comparison to indicate the most regulated pathway/GO terms (Figure 6). Additionally, each pathway and GO term in the table links to a pathway/GO detail page.

Figure 3

Disease gene table. Table of genes determined to be significant (p < 0.01 and fold change > 1.3) with the disease for an individual study.

Figure 4

Study-level visuals (heatmap). Two-dimensional hierarchical clustering heatmap containing the most significant genes in schizophrenic disease for an individual study.

Figure 5

Study-level visuals (PCA scatter plot). Principal components plots generated with the most significant genes in schizophrenic disease for an individual study.

Figure 6

Pathway table. Table of most regulated pathways for an individual study.

Disease gene table. Table of genes determined to be significant (p < 0.01 and fold change > 1.3) with the disease for an individual study. Study-level visuals (heatmap). Two-dimensional hierarchical clustering heatmap containing the most significant genes in schizophrenic disease for an individual study. Study-level visuals (PCA scatter plot). Principal components plots generated with the most significant genes in schizophrenic disease for an individual study. Pathway table. Table of most regulated pathways for an individual study.

Pathway/GO details page

Within this pathway/GO detail page is a comprehensive summary of the gene expression profiles for each gene that is mapped to the associated pathway or GO term within each separate disease class. A confidence interval boxplot is provided within each disease comparison inclusive of every gene mapped to that pathway or GO term queried in the study (Figure 7), along with a link to the pathway network representation provided by KEGG. Such results allow researchers to understand the most regulated biological mechanisms and cellular sites for each disease class.

Figure 7

Fold change boxplots. Fold change (with confidence intervals) values for bipolar patients for every gene that maps to the Alzheimer's pathway.

Gene details page

For every probe across the 6 array platforms, primary annotations were determined such that each probe is mapped to either a gene name or EST identifier (refer to Bioinformatics mappings section for mapping criteria). So each gene summary page contains probe-level information for all of the 6 array platforms and 12 studies within the database. In addition to general bioinformatics annotations (e.g. biological summary, LocusLink ID, PubMed search link, and gene symbol) and pathway/GO mappings (associations with gene that link to pathway/GO-centric pages), this page contains gene expression summaries for every probe that maps to this gene across all studies (Figure 8). A cross study 'consensus' fold change was calculated for each gene and disease/demographic comparison, based on a weighted combination of the individual fold changes and standard errors for the probes that map to each gene across the platforms/studies. Weights were determined in a probeset-specific manner to account for the differing levels of precision associated with each probeset that maps to a given gene across the platforms. Confidence interval boxplots inclusive of each probe for the gene on this page are provided for the following: normalized expression across all patients, fold changes within each disease class, percent present calls for the former two comparisons, and all 41 demographic variables for the gene (Figure 9). Additionally, there is a general search engine that supports queries of gene name, symbol, pathway, GO term, and LocusLink ID designed for direct access to any gene detail page or pathway/GO detail page.

Figure 8

Gene summary page (truncated). Portion of gene summary page for the gene reelin (RELN).

Figure 9

Fold change boxplots. Fold change (with 99% confidence intervals) for the gene reelin across all 41 demographic variables.

Gene summary page (truncated). Portion of gene summary page for the gene reelin (RELN). Fold change boxplots. Fold change (with 99% confidence intervals) for the gene reelin across all 41 demographic variables.

Cross-platform analysis

To date, making comparisons across disparate gene expression platforms has been very difficult [5-8]. Chip manufacturing differences such as probe selection, processing protocols, and spot normalization algorithms contribute to variability that can distort mRNA transcript abundance measurements and introduce inconsistencies to hinder cross-platform comparisons. Some success has been demonstrated in reducing the problem to the most consistent sequence-verified gene annotations between two platforms (e.g. UniGene cluster membership) and examining correlations, ratio values, or gene calls, although sensitivity and global statistical inference of such approaches still remains a challenge [7,10-12]. The cross-platform comparisons within the SMRIDB are based on scaled representations of individual study-level analysis across studies to extract biological patterns and relationships. These cross-platform results are provided for both the gene level (Figure 10) and pathway/GO level in a study-centric (Figure 11) and gene-centric (Figure 12) visualization. For the gene-level cross-platform analysis, the fold changes and confidence intervals are calculated as described in the Gene details page section. For the pathway/GO-level analysis, the p-values calculated by the Fishers's exact test from each study individually for disease-related genes were scaled across studies and provided in an interactive sortable heatmap, where each cell has a clickable link to a pathway/GO details page. Additionally, this same analysis and visual representation is provided for the demographic variables (Figure 13). Such a data representation allows researchers to quickly determine the most regulated pathways or functional classifications across all platforms or for a specific demographic variable.

Figure 10

Summary statistic table. Gene-level summary table of significant probes across all studies for depression.

Figure 11

Pathway clickable heatmap. Study-centric clickable heatmap of top regulated pathways in schizophrenia. Each column can be sorted by a particular study or the three last summary columns. Study 12 was omitted from this visual.

Figure 12

GO term clickable heatmap. Gene-centric clickable heatmap of top regulated GO terms (molecular function) in schizophrenia. Each column can be sorted by a disease.

Figure 13

Pathway/demographic clickable heatmap. Demographic variable clickable heatmap of top regulated pathways. Each column can be sorted by a demographic variable.

Summary statistic table. Gene-level summary table of significant probes across all studies for depression. Pathway clickable heatmap. Study-centric clickable heatmap of top regulated pathways in schizophrenia. Each column can be sorted by a particular study or the three last summary columns. Study 12 was omitted from this visual. GO term clickable heatmap. Gene-centric clickable heatmap of top regulated GO terms (molecular function) in schizophrenia. Each column can be sorted by a disease. Pathway/demographic clickable heatmap. Demographic variable clickable heatmap of top regulated pathways. Each column can be sorted by a demographic variable.

Utility and discussion

The user interface was constructed to enable intuitive navigating and efficient data mining. The main site contains the primary index for the database's 4 general segmented areas: Patients, Studies, Genes, and Analysis, each of which is a gateway to unique focus areas, with mutual associations between each, such as clinical information vs. genomics results and individual study content vs. cross-platform combined analyses. The Genes tab contains an open text search engine (with partial matches) to enable queries by gene, LocusLink, or pathway for any single or combined study results. The intended users of the database include any genomics researchers facing the persistent challenges of sensitivity for biomarker discovery and cross-platform microarray comparisons. However, the content within the SMRIDB is primarily designed for biologists, clinical researchers, bioinformaticians, and scientist in the field of brain disease. The size and scope of the SMRIDB makes it a unique contribution to genomics-based brain disease research. With combined gene expression profile summaries across 12 studies and 6 platforms, there is greater confidence in scientific findings such as biomarkers for disease, biological functional roles, and regulated pathways, as compared to results obtained from any one individual study.

Conclusion

The SMRIDB is a comprehensive data mining tool to enable researchers to elucidate the biological mechanisms of bipolar disorder, schizophrenia, and depression. A diverse patient population combine with data generated across six microarray platforms and 12 studies to provide robust results to enhance the understanding of brain disease.

Availability and requirements

The SMRIDB can be accessed at . All users must register (name and email address) to obtain a username and password.

Authors' contributions

BWH and ME conducted the data analysis and were involved in drafting the manuscript. SR developed the web services and database backend. BB collected and catalogued the clinical information and samples. All authors read and approved the final manuscript.

12 in total

1. A comparison of oligonucleotide and cDNA-based microarray systems.

Authors: Nancy Mah; Anders Thelin; Tim Lu; Susanna Nikolaus; Tanja Kühbacher; Yesim Gurbuz; Holger Eickhoff; Günther Klöppel; Hans Lehrach; Björn Mellgård; Christine M Costello; Stefan Schreiber
Journal: Physiol Genomics Date: 2004-02-13 Impact factor: 3.107

2. DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Authors: Glynn Dennis; Brad T Sherman; Douglas A Hosack; Jun Yang; Wei Gao; H Clifford Lane; Richard A Lempicki
Journal: Genome Biol Date: 2003-04-03 Impact factor: 13.583

3. Evaluation of gene expression measurements from commercial microarray platforms.

Authors: Paul K Tan; Thomas J Downey; Edward L Spitznagel; Pin Xu; Dadin Fu; Dimiter S Dimitrov; Richard A Lempicki; Bruce M Raaka; Margaret C Cam
Journal: Nucleic Acids Res Date: 2003-10-01 Impact factor: 16.971

4. Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements.

Authors: Brigham H Mecham; Gregory T Klus; Jeffrey Strovel; Meena Augustus; David Byrne; Peter Bozso; Daniel Z Wetmore; Thomas J Mariani; Isaac S Kohane; Zoltan Szallasi
Journal: Nucleic Acids Res Date: 2004-05-25 Impact factor: 16.971

5. Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays.

Authors: Jiang Li; Matthew Pankratz; Jeffrey A Johnson
Journal: Toxicol Sci Date: 2002-10 Impact factor: 4.849

6. Analysis of matched mRNA measurements from two different microarray technologies.

Authors: Winston Patrick Kuo; Tor-Kristian Jenssen; Atul J Butte; Lucila Ohno-Machado; Isaac S Kohane
Journal: Bioinformatics Date: 2002-03 Impact factor: 6.937

7. Comparison of microarray-based mRNA profiling technologies for identification of psychiatric disease and drug signatures.

Authors: Linda W Jurata; Yury V Bukhman; Vinod Charles; Frank Capriglione; Jeffrey Bullard; Andrew L Lemire; Ali Mohammed; Quyen Pham; Pascal Laeng; Jeffrey A Brockman; C Anthony Altar
Journal: J Neurosci Methods Date: 2004-09-30 Impact factor: 2.390

8. Comparing cDNA and oligonucleotide array data: concordance of gene expression across platforms for the NCI-60 cancer cells.

Authors: Jae K Lee; Kimberly J Bussey; Fuad G Gwadry; William Reinhold; Gregory Riddick; Sandra L Pelletier; Satoshi Nishizuka; Gergely Szakacs; Jean-Phillipe Annereau; Uma Shankavaram; Samir Lababidi; Lawrence H Smith; Michael M Gottesman; John N Weinstein
Journal: Genome Biol Date: 2003-11-25 Impact factor: 13.583

9. Analysis of strain and regional variation in gene expression in mouse brain.

Authors: P Pavlidis; W S Noble
Journal: Genome Biol Date: 2001-09-27 Impact factor: 13.583

50 in total

1. 14-3-3 proteins in neurological disorders.

Authors: Molly Foote; Yi Zhou
Journal: Int J Biochem Mol Biol Date: 2012-05-18

2. The Establishment and Utility of a Free Online Database of Primary Bone Tumors.

Authors: Hairong Xu; Robert P Seifert; Xiaohui Niu; Yuan Li; Marilyn M Bui
Journal: Pathol Oncol Res Date: 2015-09-16 Impact factor: 3.201

3. Modelling schizophrenia using human induced pluripotent stem cells.

Authors: Kristen J Brennand; Anthony Simone; Jessica Jou; Chelsea Gelboin-Burkhart; Ngoc Tran; Sarah Sangar; Yan Li; Yangling Mu; Gong Chen; Diana Yu; Shane McCarthy; Jonathan Sebat; Fred H Gage
Journal: Nature Date: 2011-04-13 Impact factor: 49.962

4. Epigenomic profiling reveals DNA-methylation changes associated with major psychosis.

Authors: Jonathan Mill; Thomas Tang; Zachary Kaminsky; Tarang Khare; Simin Yazdanpanah; Luigi Bouchard; Peixin Jia; Abbas Assadzadeh; James Flanagan; Axel Schumacher; Sun-Chong Wang; Arturas Petronis
Journal: Am J Hum Genet Date: 2008-03 Impact factor: 11.025

5. Meta-analysis of 12 genomic studies in bipolar disorder.

Authors: Michael Elashoff; Brandon W Higgs; Robert H Yolken; Michael B Knable; Serge Weis; Maree J Webster; Beata M Barci; E Fuller Torrey
Journal: J Mol Neurosci Date: 2007 Impact factor: 3.444

6. An association between the reduced levels of SLC1A2 and GAD1 in the dorsolateral prefrontal cortex in major depressive disorder: possible involvement of an attenuated RAF/MEK/ERK signaling pathway.

Authors: Dong Hoon Oh; Daeyoung Oh; Hyeon Son; Maree J Webster; Cyndi S Weickert; Seok Hyeon Kim
Journal: J Neural Transm (Vienna) Date: 2014-03-22 Impact factor: 3.575

7. Kinase network dysregulation in a human induced pluripotent stem cell model of DISC1 schizophrenia.

Authors: Eduard Bentea; Erica A K Depasquale; Sinead M O'Donovan; Courtney R Sullivan; Micah Simmons; James H Meador-Woodruff; Ying Zhou; Chongchong Xu; Bing Bai; Junmin Peng; Hongjun Song; Guo-Li Ming; Jarek Meller; Zhexing Wen; Robert E McCullumsmith
Journal: Mol Omics Date: 2019-06-10

8. A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases-schizophrenia as a case.

Authors: Jingchun Sun; Peilin Jia; Ayman H Fanous; Bradley T Webb; Edwin J C G van den Oord; Xiangning Chen; Jozsef Bukszar; Kenneth S Kendler; Zhongming Zhao
Journal: Bioinformatics Date: 2009-07-14 Impact factor: 6.937

9. A comparison of four clustering methods for brain expression microarray data.

Authors: Alexander L Richards; Peter Holmans; Michael C O'Donovan; Michael J Owen; Lesley Jones
Journal: BMC Bioinformatics Date: 2008-11-25 Impact factor: 3.169

10. Effects of typical and atypical antipsychotic drugs on gene expression profiles in the liver of schizophrenia subjects.

Authors: Kwang H Choi; Brandon W Higgs; Serge Weis; Jonathan Song; Ida C Llenos; Jeannette R Dulay; Robert H Yolken; Maree J Webster
Journal: BMC Psychiatry Date: 2009-09-16 Impact factor: 3.630