Literature DB >> 21269471

Gene set enrichment analysis of microarray data from Pimephales promelas (Rafinesque), a non-mammalian model organism.

Michael A Thomas1, Luobin Yang, Barbara J Carter, Rebecca D Klaper.   

Abstract

BACKGROUND: Methods for gene-class testing, such as Gene Set Enrichment Analysis (GSEA), incorporate biological knowledge into the analysis and interpretation of microarray data by comparing gene expression patterns to pathways, systems and emergent phenotypes. However, to use GSEA to its full capability with non-mammalian model organisms, a microarray platform must be annotated with human gene symbols. Doing so enables the ability to relate a model organism's gene expression, in response to a given treatment, to potential human health consequences of that treatment. We enhanced the annotation of a microarray platform from a non-mammalian model organism, and then used the GSEA approach in a reanalysis of a study examining the biological significance of acute and chronic methylmercury exposure on liver tissue of fathead minnow (Pimephales promelas). Using GSEA, we tested the hypothesis that fathead livers, in response to methylmercury exposure, would exhibit gene expression patterns similar to diseased human livers.
RESULTS: We describe an enhanced annotation of the fathead minnow microarray platform with human gene symbols. This resource is now compatible with the GSEA approach for gene-class testing. We confirmed that GSEA, using this enhanced microarray platform, is able to recover results consistent with a previous analysis of fathead minnow exposure to methylmercury using standard analytical approaches. Using GSEA to compare fathead gene expression profiles to human phenotypes, we also found that fathead methylmercury-treated livers exhibited expression profiles that are homologous to human systems & pathways and results in damage that is similar to those of human liver damage associated with hepatocellular carcinoma and hepatitis B.
CONCLUSIONS: This study describes a powerful resource for enabling the use of non-mammalian model organisms in the study of human health significance. Results of microarray gene expression studies involving fathead minnow, typically used for aquatic ecological toxicology studies, can now be used to generate hypotheses regarding consequences of contaminants and other stressors on humans. The same approach can be used with other model organisms with microarray platforms annotated in a similar manner.

Entities:  

Mesh:

Year:  2011        PMID: 21269471      PMCID: PMC3037904          DOI: 10.1186/1471-2164-12-66

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

One of the challenges facing researchers conducting microarray studies is deriving meaning from lists of thousands of differentially expressed genes among the phenotypes examined [1-3]. A relatively new approach for systems-based analyses involves testing for enrichment of gene classes or sets; the most popular method employing gene-class analysis is Gene Set Enrichment Analysis (GSEA [4]). GSEA tests whether a set of genes, defined a priori, is enriched in expression in one treatment relative to another. Each GSEA set consists of genes united by a shared association (e.g., functional classification, pathway or disease state), leveraging prior knowledge into the analysis and thereby providing an advantage over an approach in which only individual genes are examined. An enrichment score is calculated for each set to reflect the distribution of set constituents across a list of genes ranked by correlation with the experimental treatment. A higher enrichment score corresponds to a shifting of gene set constituents towards either end of the ranked list representing strongly positive or negative correlations. For a specific microarray experiment, GSEA tests whether genes from a given set are randomly distributed or, alternatively, are up- or down-regulated in one phenotype relative to the other. The statistical significance of a set's enrichment score is determined by comparison to a distribution of scores generated by permuting the ranked list by phenotype class. A measure of expected false discovery rate (FDR) is used to refine the significance of high-scoring sets when a large collection of gene sets is analyzed. The GSEA approach uses a modified Kolmogorov-Smirnov test. Some authors have criticized the GSEA approach for being a Rube Goldberg Machine-like solution to a problem that could be adequately solved with a simpler instrument, such as a X 2 test that assumes a normal distribution of scores [5]. While it is unclear which statistical approach best tests for gene set enrichment, it is generally accepted that the GSEA approach is able to provide novel insights from complex expression patterns [2,3]. GSEA is human-centric: the GSEA platform [6] is built around human genome data and HUGO (Human Genome Organization) gene symbols, a standard vocabulary of gene terms [7]; array elements without associated HUGO symbols are ignored by GSEA. For each known human gene, the HUGO Gene Nomenclature Committee (HGNC) approves a single gene name and symbol. Researchers not using human microarray platforms must annotate elements in their array with HUGO terms in order to use GSEA along with its database of gene sets. This is clearly a non-trivial task for distantly related model organisms, as a substantial proportion of genetic elements on such an array will have no known human homolog. However, this enhanced annotation allows access to thousands of curated gene sets available in the Molecular Signatures Database (MSigDB [4,8]) that leverage human health knowledge, allow meaningful comparisons between humans and distantly related model organisms, and potentially provide novel insights into human health. McGary et al. (2010) demonstrated the value of comparisons between humans and distantly related model organisms for understanding the evolution of emergent phenotypes arising from sets of conserved genes [9]. In that study, sets of genes associated with specific human genetic disorders were mapped to sets of homologous genes in model organisms associated with functions distinct from the human genes (e.g., human X-linked breast cancer vs. a high frequency of male progeny in C. elegans). That approach was used to identify novel candidate genes for the human disorder. Similarly, a GSEA-based analysis could leverage phenotype homology with two different approaches. First, starting with a gene set associated with a given human disorder (e.g., genes associated with a given human autoimmune disorder characterized by an unknown environmental trigger), GSEA could be used to assay a number of conditions in which those genes might be differentially expressed in an appropriate model organism. In this way, GSEA could provide novel insights into that disorder by generating hypotheses about circumstances under which the human disorder is mimicked, exacerbated or even triggered. Second, GSEA could be used to predict the human health consequences of a given treatment or condition (e.g., selenium contamination and concentration in streams and wetlands, as in [10]) by comparing the gene expression profile associated with that contaminant (in an appropriate model organism) to a collection of candidate human gene sets chosen to represent a range of reasonable pathways, functions or phenotypes of interest. Here, we describe an enhanced annotation of the EcoArray fathead minnow 15 k microarray (EcoArray, Gainesville, Florida) using HUGO symbols [11]. This enhanced microarray resource allows analyses using the GSEA approach and comparisons between fathead expression and sets associated with human health. To test the ability to GSEA to recover results consistent with standard microarray analyses, we reanalyzed a previous study of fathead minnows, Pimephales promelas (Rafinesque), that used the EcoArray fathead minnow 15 k gene microarray platform to examine gene expression changes in response to methylmercury exposure [12,13]. That study considered genes with greater than two-fold differences from control in liver tissue from fish exposed to methylmercury over short (96 hour, "acute") or long (600 day, "chronic") exposure periods. They identified 650 genes that exceeded this threshold following acute treatment and 267 genes following chronic treatment. Examining these genes with the FatiGO functional profiling tool [1,14], they identified Gene Ontology (GO) categories [15] found to be enriched in response to the acute and chronic methylmercury treatments. In all treatments, analyses identified methylmercury-induced changes in expression of apoptosis-associated genes, including caspase, tumor necrosis factor and fatty acid synthase. In order to test the ability of GSEA to conduct meaningful comparisons of human disease-associated sets with model organism expression profiles, we compared liver gene expression profiles from the fathead minnow methylmercury treatments [12,13] to MSigDB-derived sets associated with gene expression in human livers damaged by hepatocellular carcinoma (HCC) and hepatitis B. We predicted that these sets would be enriched in the fish expression profiles, reflecting similar processes associated with generalized liver damage. These comparisons were designed to investigate the broader biological significance of methylmercury exposure by fathead minnows while providing a clear comparison of liver damage in a non-mammalian model organism to human liver damage associated with specific disorders.

Results

Annotation of EcoArray 15 k fathead minnow microarray for GSEA analysis

We identified 12,032 HUGO symbols for the 15,208 elements on the EcoArray fathead minnow microarray platform (79%). The annotated fish genes include homology to 10,069 unique HUGO symbols, about 36% of the ~28,000 symbols in the HUGO database. This number of human homologs is consistent with the number identified (10-11,000) for five other fish species using similar methods [16], given that the 15,208 elements on the fathead array were selected in part for their degree of conservation with other species [12,13]. Gene duplication in the fish lineage may have led to cases where more than one fish array elements mapped to the same human HUGO symbol. There were 1679 array elements that shared a HUGO symbol association with at least one other array element. These include situations where 1) fathead genes are represented by more than one array element; and 2) two (or more) fathead genes were homologous to the same human gene due to gene duplication in the fish lineage. Using the EST sequence data, it is difficult to distinguish between these two possibilities; GSEA assumes that all such duplicates fall into the first category. We concluded that cases from the second category would not unduly influence the interpretation of the analyses since the post-duplication sister genes likely have similar functions. Similarly, gene duplication in the mammalian lineage has led to cases in which a single fish array element mapped equally to more than one human protein (and HUGO symbol). Of the ~250 such cases, in which the BLAST search identified two or more hits with equal e-values (generally 0.0), we selected the hit with the higher score (see Materials & Methods). These cases were examined individually.

GSEA analysis results for GO categories derived from Klaper et al. study

Of the 26 GO-BP (Biological Processes) classes identified by the Klaper et al. [12,13] study (using FatiGO), 14 sets for acute and 12 sets for chronic treatments, GSEA identified six significantly enriched sets in the acute treatment (Table 1) and no sets in the chronic treatment (Table 2). Consistent with the Klaper study, GSEA identified the majority of sets to have experienced up-regulation (irrespective of statistical significance of the enrichment) in the treatments relative to the controls. Nearly all the enriched gene sets were associated with metabolic and biosynthetic processes.
Table 1

GSEA tests for GO classes enriched following acute treatment

GO termGO definitionSizeESNESNOM p-value
UP-regulated in treatment:
GO:0042254Ribosome biogenesis and assembly140.7081.5910.000
GO:0006364rRNA processing110.7171.5100.000
GO:0015031Protein transport1220.4371.6140.014
GO:0006754ATP biosynthesis560.4381.3600.014
GO:0006486Protein amino acid glycosylation750.3781.2800.133
GO:0006605Protein targeting850.4361.5570.013
GO:0009058Biosynthesis3270.3241.3150.149
GO:0045045Secretory pathway590.4221.4480.064
GO:0007046Ribosome biogenesis940.5521.5530.054
GO:0006457Protein folding440.4661.3870.086
GO:0006633Fatty acid biosynthesis110.4111.1620.296
GO:0006888ER to golgi vesicle mediated transport110.3650.8640.676
DOWN-regulated in treatment:
GO:0009165Nucleotide biosynthesis12-0.379-1.0470.458
GO:0007498Mesoderm development10-0.647-1.6100.000

Results for gene sets representing each GO-BP class identified by Klaper et al. [12,13], tested for enrichment by GSEA following acute methylmercury treatment. GO terms and definitions can be used to retrieve each set (and constituent genes) from MSigDB [8]; size refers to the number of genes in the set; ES and NES are the enrichment scores and normalized enrichment scores (respectively) for the set; NOM p-value is the nominal p-value for the NES. Gene sets representing GO-BP classes with statistically significant normalized enrichment scores are highlighted in bold.

Table 2

GSEA tests for GO classes enriched following chronic treatment

GO termGO definitionSizeESNESNOM p-value
UP-regulated in treatment:

GO:0006605Protein targeting850.3871.4160.054

GO:0016485Protein processing320.4131.2670.086

GO:0006897Endocytosis1620.2531.0910.278

GO:0000226Microtubule cytoskeleton organization and biogenesis280.3761.0040.484

GO:0007409Axonogenesis290.3620.9920.614

GO:0045893Positive regulation of transcription, DNA dependent2700.2000.8980.502

GO:0006887Exocytosis150.4020.8560.763

GO:0006289Nucleotide-excision repair140.2870.8420.806

GO:0007269Neurotransmitter secretion100.3020.7020.786

DOWN-regulated in treatment:

GO:0006917Induction of apoptosis208-0.286-1.0930.396

GO:0009615Response to virus21-0.301-1.1980.130

GO:0001756Somitogenesis17-0.356-1.0200.502

Results for each GO-BP class identified by Klaper et al. [12,13], tested for enrichment by GSEA following chronic methylmercury treatment. Size refers to the number of genes in the set; ES and NES are the enrichment scores and normalized enrichment scores (respectively) for the set; NOM p-value is the nominal p-value for the NES.

GSEA tests for GO classes enriched following acute treatment Results for gene sets representing each GO-BP class identified by Klaper et al. [12,13], tested for enrichment by GSEA following acute methylmercury treatment. GO terms and definitions can be used to retrieve each set (and constituent genes) from MSigDB [8]; size refers to the number of genes in the set; ES and NES are the enrichment scores and normalized enrichment scores (respectively) for the set; NOM p-value is the nominal p-value for the NES. Gene sets representing GO-BP classes with statistically significant normalized enrichment scores are highlighted in bold. GSEA tests for GO classes enriched following chronic treatment Results for each GO-BP class identified by Klaper et al. [12,13], tested for enrichment by GSEA following chronic methylmercury treatment. Size refers to the number of genes in the set; ES and NES are the enrichment scores and normalized enrichment scores (respectively) for the set; NOM p-value is the nominal p-value for the NES.

GSEA analysis of the MSigDB C2 collection of gene sets

Due to the highly conservative nature of the false discover rate (FDR) implemented by GSEA, many gene sets from the MSigDB C2 collection ("curated gene sets from online pathway databases, publications in PubMed, and knowledge of domain experts" [4]) with statistically significant enrichment scores failed to surpass the FDR threshold and, consequently, could not be definitively identified as enriched following methylmercury treatment. Table 3 lists gene sets with normalized enrichment scores (NES, see Table captions for details) that were below a normalized p-value of 0.05 and a FDR of 0.25.
Table 3

GSEA tests for enrichment of MSigDB C2 gene sets

Gene Set NameSizeESNESNOM p-valueFDR q-value
Acute exposureDown-regulatedNone
Up-regulatedXU_CBP_UP190.6331.9870.0000.051
CALRES_MOUSE_DN280.5751.9200.0000.108
BYSTRYKH_HSC_CIS_GLOCUS820.5161.8890.0000.097
BYSTRYKH_HSC_BRAIN_CIS_GLOCUS430.6001.8550.0000.102
PENG_GLUTAMINE_DN2090.5191.8280.0000.130
UVC_LOW_ALL_DN380.4731.8170.0000.121
CMV_ALL_UP640.5301.7640.0000.214
SANSOM_APC_LOSS5_UP550.5441.7340.0370.274
BLEO_MOUSE_LYMPH_HIGH_24HRS_DN310.6691.7300.0000.255
SHEPARD_CRASH_AND_BURN_MUT_VS_WT_UP1240.4521.7100.0000.289
PENG_RAPAMYCIN_DN1630.5161.7060.0230.275
PRMT5_KD_DN190.5721.6940.0380.298
HSA00970_AMINOACYL_TRNA_BIOSYNTHESIS290.7021.6900.0290.286
MRNA_SPLICING440.6331.6890.0000.266
WANG_MLL_CBP_VS_GMP_DN300.5791.6880.0000.254
JAIN_NEMO_DIFF590.4581.6880.0000.240
HSA00563_GLYCOSYLPHOSPHATIDYLINOSITOL_ANCHOR_BIOSYNTHESIS160.6161.6840.0000.239
BLEO_MOUSE_LYMPH_LOW_24HRS_DN220.6971.6810.0000.238
OLDONLY_FIBRO_UP280.4851.6740.0000.243
CMV_24HRS_UP540.5241.6730.0000.232
KUROKAWA_5FU_IFN_SENSITIVE_VS_RESISTANT_DN220.6241.6690.0000.232
CHANG_SERUM_RESPONSE_UP1090.4801.6680.0000.224
ROS_MOUSE_AORTA_UP210.7311.6560.0360.237
UVB_NHEK2_DN640.4561.6490.0000.242
SCHUMACHER_MYC_UP440.5041.6490.0230.234
STEMCELL_COMMON_UP1490.4581.6430.0230.242
MYC_ONCOGENIC_SIGNATURE1370.4061.6410.0000.239
ZHAN_MMPC_SIM_BC_AND_MM360.4721.6350.0000.242
AMINOACYL_TRNA_BIOSYNTHESIS210.7321.6330.0480.238
Chronic exposureDown-regulatedTNFR1PATHWAY24-0.724-1.9680.0000.026
CASPASEPATHWAY16-0.684-1.8650.0000.049
DEATHPATHWAY25-0.719-1.8360.0000.063
HIVNEFPATHWAY44-0.593-1.7650.0000.104
MITOCHONDRIAPATHWAY17-0.785-1.7280.0000.151
TSA_PANC50_UP25-0.584-1.6840.0000.216
Up-regulatedNone

Results for MSigDB C2 gene sets (drawn from 1892 sets representing all systems, pathways and functions in that database), listing sets with NES scores that are both statistically significant and below the FDR threshold. Size refers to the number of genes in the set; ES and NES are the enrichment scores and normalized enrichment scores (respectively) for the set; NOM p-value is the nominal p-value for the NES, FDR q-vlaue is the false discovery rate ratio (values lower than 0.25 are considered to be passing as per Subramanian et al. [4]). One of these sets (in bold) is a liver associated set highlighted in Table 4.

GSEA tests for enrichment of MSigDB C2 gene sets Results for MSigDB C2 gene sets (drawn from 1892 sets representing all systems, pathways and functions in that database), listing sets with NES scores that are both statistically significant and below the FDR threshold. Size refers to the number of genes in the set; ES and NES are the enrichment scores and normalized enrichment scores (respectively) for the set; NOM p-value is the nominal p-value for the NES, FDR q-vlaue is the false discovery rate ratio (values lower than 0.25 are considered to be passing as per Subramanian et al. [4]). One of these sets (in bold) is a liver associated set highlighted in Table 4. In the acute treatment, GSEA identified 20 significantly enriched up-regulated sets under the FDR threshold (Table 3). These consisted of metabolic, biosynthetic and several cancer-associated sets. There were no significantly enriched down-regulated sets passing the FDR test; failing FDR (not listed in Table 3) were significantly down-regulated sets associated with cytotoxicity, which is of biological significance for hepatotoxicity and, potentially, for HCC. In the chronic treatment, GSEA identified 6 significantly enriched down-regulated sets under the FDR threshold (Table 3). These primarily include apoptosis and caspase sets. There were no significantly enriched up-regulated sets passing the FDR test; failing FDR (not listed in Table 3) were significantly up-regulated metabolic and biosynthetic processes sets, which are of biological significance for hepatotoxicity and, potentially, for HCC.

GSEA analysis results for human HCC and hepatitis B gene sets

A sub-set of the MSigDB C2 collection, consisting of 38 human gene sets associated with HCC and hepatitis B, was also tested against the P. promelas methylmercury treatments (see Materials & Methods and Table 4). In the acute treatment, GSEA found one down-regulated ("Gene up-regulated both by expression of Hepatitis B virus HBx protein in normal hepatoctyes and by HBV infection in liver samples with chronic active hepatitis") and two up-regulated sets ("Genes highly expressed in HCC resistant to 5-Fluorouracil + interferon" and "Genes highly expressed in interferon-resistant hepatoma cell lines vs. sensitive cell lines"). In the chronic treatment, GSEA found no down-regulated and one up-regulated set ("Genes highly expressed in hepatocellular carcinoma with poor survival").
Table 4

GSEA tests for enrichment of HCC- and hepatitis-associated gene sets

Gene Set NameSizeESNESNOM p-value
Up-regulated in acute liver treatment

KUROKAWA_5FU_IFN_SENSITIVE_VS_RESISTANT_DN250.5851.6420.025

WONG_IFNA_HCC_RESISTANT_VS_SENSITIVE_UP100.5071.4070.046

Down-regulated in acute liver treatment

HBX_HEP_UP12-0.540-1.4600.038

Up-regulated in chronic liver treatment

HCC_SURVIVAL_GOOD_VS_POOR_DN1120.4521.5490.000

Down-regulated in chronic liver treatment

None

Results for gene sets representing Hepatocellular carcinoma (HCC) and hepatitis, tested for enrichment by GSEA following methylmercury treatment. Here, we test whether liver-associated sets are enriched following methylmercury exposure. Size refers to the number of genes in the set; ES and NES are the enrichment scores and normalized enrichment scores (respectively) for the set; NOM p-value is the nominal p-value for the NES. Descriptions of each set (and lists of constituent genes) can be found by searching for set names at the MSigDB [8].

GSEA tests for enrichment of HCC- and hepatitis-associated gene sets Results for gene sets representing Hepatocellular carcinoma (HCC) and hepatitis, tested for enrichment by GSEA following methylmercury treatment. Here, we test whether liver-associated sets are enriched following methylmercury exposure. Size refers to the number of genes in the set; ES and NES are the enrichment scores and normalized enrichment scores (respectively) for the set; NOM p-value is the nominal p-value for the NES. Descriptions of each set (and lists of constituent genes) can be found by searching for set names at the MSigDB [8].

Discussion

Comparisons of GSEA and FatiGO profiles

Our GSEA results are consistent with the FatiGO results of Klaper et al. [12,13]. While few of the FatiGO sets were significantly enriched in the GSEA analysis, all sets were up- or down-regulated in the same direction (mainly up-regulation) and the sets significant in the GSEA analysis tended to include those GO processes highlighted by Klaper et al. as being important consequences of liver methylmercury exposure. This includes predominantly up-regulation of metabolic and biosynthetic processes (following acute exposure) and down-regulation of apoptosis (following chronic exposure). It is noteworthy that the GSEA-based approach uses the entirety of the ~12,000 annotated elements on the array, rather than only the most highly differentially expressed genes. This difference in the scope of analysis provides an explanation for why all FatiGO process terms were not significantly enriched in the GSEA analysis (and why some GSEA-enriched sets were not identified by FatiGO; see below). In other words, GSEA enrichment of a set containing many genes associated with a given function is somewhat different than a FatiGO-style analysis that reports the function of a single given up-regulated gene: While FatiGO asks whether any system or pathway is statistically over-represented in the (100 or so) most up-regulated genes (out of ~15,000 on the chip), GSEA asks if the dozens of genes associated with a given system or pathway are significantly collectively enriched (i.e. up- or down-regulated) in a study that simultaneously considers all annotated genes on a chip (~12,000 in our experiment). There are important distinctions between the expression profiles associated with acute and chronic exposure, potentially reflecting metabolic changes occurring after prolonged exposure to a toxin (rather than short term, acute exposure) observed by other researchers [17]. This difference may be due to either accumulation of methylmercury over time or acclimation of the tissue over time. These differences also reflect the very different nature of the two experiments, performed by different labs using different protocols. Overall, the C2 results were limited by a FDR due to the size of the MSigDB-C2 database. The few gene sets that passed this threshold were consistent with FatiGO results and the general conclusions of Klaper et al. [12,13]. Primarily, these include sets associate with up-regulation of metabolic and biosynthetic processes and down-regulation of caspase and apoptosis pathways. Published studies of apoptosis induction by methylmercury involved neurological [18] and reproductive systems [19], so it is not necessarily unexpected to observe the opposite response in the liver, the primary site of detoxification. The up-regulation of general metabolic processes might be involved in detoxification, although none of the sets we identified were specifically associated with DNA repair, which has been noted in similar experiments [17,20].

Comparisons to human liver disease

The smaller, select group of gene sets, involving liver damage associated with HCC and hepatitis B, provided meaningful comparison between human liver disease and fish methylmercury exposure rather than simply a list of the altered biological functions (Table 4). The scores of these sets are similar to the scores of the highest-scoring sets from the broader analysis (those gene sets in Table 3). Fathead minnow liver tissues following acute methylmercury treatment were enriched for a set (KUROKAWA [21]) composed of genes that differentiate among patients with advanced HCC with respect to response to chemotherapy treatment. Fish in the acute treatment are also enriched for genes highly expressed in interferon-resistant hepatoma cell lines (WONG [22]). This indicates a shared response between acute methylmercury exposure (in fish) and hepatocellular carcinoma (in humans), which is potentially useful for guiding the understanding of methylmercury toxicology. Fathead minnow liver tissues following chronic methylmercury treatment were enriched for sets associated with HCC (HCC_survival, reviewed in [23]) and hepatitis B (Lizuka [24], HBX_NL [25]); see Table 4. These sets are indicative of a shared response between chronic exposure (in fish) and serious liver impairment (in humans) due to cancer and hepatitis infection. That both HCC and hepatitis B are enriched is not surprising, since there is a relationship between the two conditions [26]. While it is well known that other heavy metals are directly associated with cancer [27], for mercury it is less clear, despite indirect evidence involving liver cancer [28]. HCC is generally secondary to hepatitis infection or cirrhosis-associated liver damage by hepatotoxicity (caused by alcohol or non-alcohol toxins). Mercury induces oxidative stress, which leads to the enhanced biosynthesis of liver enzymes associated with antioxidant and toxic response systems [29] and genotoxicity [30]. Liver damage can occur from even topical exposure [31]. Using exhaustive histological and microarray comparisons between zebrafish livers and human cells exposed to mercury, Ung et al. [32] identified mercury-induced hepatotoxicity involving many of the same systems and pathways revealed by our GSEA-based study, including apoptosis, proteasomes, and other systems associated with toxicity response (in general) and hepatotoxicity (in particular).

Conclusions

The comparison with Klaper et al. [12,13] is an indication that GSEA (and gene-class approaches, in general) are useful for model organisms that are not yet endowed with completely sequenced and fully annotated genomes. This provides both a systems-based gene-class analytical tool (generally only used with human, mouse and rat data, and, recently, zebrafish [33]) and a mechanism to compare to gene expression profiles from non-human model organisms to profiles associated with human phenotypes and another tool for understanding human health implications when using model organisms and a framework from which testable hypotheses can be generated. For these applications, the strength of gene-class approaches is in the a priori designation of sets to be tested. Also, in considering all genes on a chip rather than only those most up-regulated, gene-class analyses can identify subtle patterns easily missed by FatiGO-like analyses.

Methods

Fish and microarray work

Klaper et al. [12,13] describe the fish methylmercury exposure and microarray experimental design. The data for these experiments have been deposited at NCBI GEO [34] (accession GEO:GSE22261). Briefly, fish were treated with either an acute or chronic exposure to methymercury. For the acute treatment, male adult fathead minnows were injected with a sublethal methylmercury concentration of 2.0 μg per g of body mass. There were 12 treated and 12 control fish (controls were given a vehicle injection without methylmercury). At the end of the 96-hour treatment period, livers were removed from euthanized fish (gonads were also collected but those data were not considered in the present GSEA reanalysis). For the chronic treatment, 90-day-old fathead minnows were exposed to methylmercury concentrations of 4.0 μg per g of dry mass (for treated fish) added to food, fed daily to the fish in quantities of approximately 5% of their body mass. (The fish food also contained naturally occurring methylmercury of ~0.05 μg per g of dry mass.) At the end of the 600 day treatment period, livers were removed from euthanized fish (as in the acute treatment, gonads were also collected). The microarray data were normalized using Gene Spring version GX10.0.1. A standard "thresholding" substitution was conducted, in which expression values below 0.01 were set to 0.01. This was performed to remove very small or negative expression values prior to log-transforming the data, in order to eliminate large negative or missing values in the normalized data. The baseline was set to the median of all data. There was a percentile shift to the 50th percentile, and each spot was normalized to the median of all spots. The EcoArray Fathead Minnow 8 × 15K Microarray v1.0 platform (NCBI GEO accession GEO:GPL7351) was enhanced with human HUGO gene symbols. To accomplish this, we conducted BLASTX searches using 6-frame translated full-length EST sequences (representing each array element) against the human RefSeq protein database (November 2009 release). We used the BLOSUM62 matrix and default protein search options, after determining that these were the correct settings given the sequence divergence between fish and mammals. The e-value threshold was set to 0.01 and the 10 best matching sequences were reported. This relatively high e-value allowed us to see the range of hits recovered by BLAST; for purposes of selecting a BLAST hit for our annotation, only hits with e-values smaller than 10-10 were considered. We developed custom Perl scripts to parse BLAST output files and extract the RefSeq protein ID from the hit with the lowest e-value. We found 10575 elements with hits having an e-value lower than < 10-10. When there were multiple hits with equally low e-values, all were extracted and converted into HUGO symbols and retained for inspection. Generally, when more than one protein was found by BLAST, they shared a single HUGO symbol. When the BLAST of a given EST sequence search had highly significant hits on multiple RefSeq protein IDs, we found they were generally isoforms of the same gene. In other cases, the results were individually curated to determine which gene symbol to use, if any. Generally, when the e-value was the same (e.g., 0.0), we chose the symbol associated with the highest score. To convert a RefSeq protein ID into a HUGO value, we used the Babelomics ID converter tool [35]. We confirmed this symbol by obtaining the NCBI-derived HUGO symbol, along with its known aliases, for that protein ID and comparing it to the Babelomics-derived symbol. We further compared these two symbols to the HGNC-derived HUGO symbol, which had to be mapped through the RefSeq mRNA ID: first, we extracted the RefSeq mRNA ID from NCBI for each RefSeq protein ID; we then downloaded the database of all gene symbols, their aliases, and RefSeq mRNA IDs from the HGNC website. For each protein ID, via the mRNA ID, we were able to identify the appropriate HGNC HUGO symbol and compare it to the Babelomics- and NCBI-derived symbols. These steps ensured that we assigned an appropriate HUGO symbol to each RefSeq protein ID, and that this was the primary symbol, rather than an alias. In order to support the result of the fathead EST to human RefSeq protein search, we also conducted a BLASTX search of each fathead ESTs against the zebrafish RefSeq protein database (June 21, 2010 version). We extracted the zebrafish RefSeq protein IDs if there is a significant BLAST hit with a low e-value (<10-10). We then downloaded the zebrafish to human orthology data file from ZFIN (http://zfin.org) and compared the zebrafish symbol to the human HUGO gene symbols; when these did not agree, they were human-curated (differences generally involved aliases). We assembled this enhanced annotation in a GSEA-formatted file, EcoArray_Fathead.chip (additional file 1). Finally, for each gene element in this file, we crosschecked the symbol we assigned with those in GENE_SYMBOL.chip (from GSEA) to ensure that our symbol was the GSEA-recognized HUGO symbol, rather than an alias.

Conducting GSEA analyses

Data files were created to GSEA specifications for the HUGO-annotated microarray (.CHIP format), normalized expression values from Klaper et al. [12,13] (.GCT format) paired with phenotype descriptions (.CLS format), and all gene sets used in GSEA analyses (.GMX format). These files are available as additional files 2, 3, 4, 5, 6. Analyses used GSEA release 2.06 and MSigDB release 2.5. Weighted enrichment scores were calculated using gene expression lists ranked by signal-to-noise ratio. The maximum gene set size was set to 500 genes; the minimum gene set size was set to 10 genes; the number of permutations was set to 1000. For details of GSEA parameter usage, see Subramanian et al. [4] and the GSEA web site [6]. Gene sets were examined to ensure they contained only GSEA-recognized primary HUGO symbols, rather than aliases or unapproved symbols. This was accomplished through the use of a custom PHP script that compared each gene in a given set to the GENE_SYMBOLS.chip file (see above) containing a list of HUGO symbols with accepted aliases. Gene set components listed as aliases in this file were replaced with the appropriate HUGO symbol. Three groups of analyses were conducted; each group consisted of a number of gene sets, with each set tested against one or both of the acute & chronic treatments relative to their respective control. In the first group of analyses, we tested gene sets corresponding to GO-BP classes identified by the Klaper FatiGO analysis (Appendix II) identified as enriched by acute or chronic treatments. We eliminated gene sets that contained insufficient or excessive numbers of genes for GSEA analysis (leaving 14 and 12 sets for acute and chronic treatments, respectively). For each methylmercury treatment tested by GSEA, only sets corresponding to GO categories found by Klaper et al. for that treatment were considered in GSEA tests. These analyses were designed to test whether GSEA results were consistent with conclusions drawn by Klaper et al. [12,13]. Second, we tested all sets from the MSigDB collection containing canonical pathways and chemical and genetic perturbations (MSigDB C2, containing 1892 sets, of which 1617 were specific to human) against acute and chronic liver treatments. Third, a more limited subset of the C2 collection, consisting of curated human gene sets associated with normal liver function, hepatocellular carcinoma (HCC) and hepatitis infection were tested against the fathead minnow liver treatments. These analyses were designed to highlight expression profiles in response to exposure to methylmercury that are similar to profiles associated with other liver insults in humans. These sets were obtained by searching the MSigDB C2 database for human gene sets using the search terms "liver OR hepatocellular OR hepatitis" in which 38 gene sets were recovered. These sets were pruned to 24 sets, eliminating sets with fewer than 10 genes with homologs in the EcoArray annotated microarray. Gene sets from each of the three collections were tested for enrichment among the ~12,000 annotated genes from the EcoArray 15 k chip, ranked by signal-to-noise ratio. The nominal p-value reports the significance for a given set in an analysis of a collection of sets in which the researcher is specifically interested (like our 24 human liver sets). However, in an exploratory analysis (like the 1892 C2 collection of functional sets), an FDR test is critical to control for multiple comparisons, as it accounts for potential overlap in the gene sets (i.e., a given gene may appear in several sets).

Authors' contributions

RDK conceived the original study from which data for this study were derived. MAT and RDK conceived this re-analysis. LY annotated the EcoArray fathead minnow microarray for GSEA analysis. BJC processed the original microarrays and conducted the Genespring analyses. MAT performed the new GSEA analyses, integrated these with the original GO analyses, and wrote the manuscript with RDK. All authors read and approved the final manuscript.

Additional file 1

EcoArrayFathead15 k.chip. The list of fathead minnow elements from the EcoArray 15 k chip, annotated in this study and formatted for GSEA analysis. Click here for file

Additional file 2

Liver-Acute.GMX. The list of genes used in the acute liver analysis, formatted for GSEA. Click here for file

Additional file 3

Liver-Chronic.GMX. The list of genes used in the chronic liver analysis, formatted for GSEA. Click here for file

Additional file 4

Liver-HCC-hepatitis-human2.GMX. The list of genes used in the human liver analysis, formatted for GSEA. Click here for file

Additional file 5

MeHg-data.GCT. The raw data from the methylmercury experiment, formatted for GSEA. Click here for file

Additional file 6

MeHg-phenotypes.CLS. The phenotypes in the methylmercury experiment, formatted for GSEA. Click here for file
  30 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.

Authors:  Fátima Al-Shahrour; Ramón Díaz-Uriarte; Joaquín Dopazo
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

3.  Molecular prediction of response to 5-fluorouracil and interferon-alpha combination chemotherapy in advanced hepatocellular carcinoma.

Authors:  Yukinori Kurokawa; Ryo Matoba; Hiroaki Nagano; Masato Sakon; Ichiro Takemasa; Shoji Nakamori; Keizo Dono; Koji Umeshita; Noriko Ueno; Shin Ishii; Kikuya Kato; Morito Monden
Journal:  Clin Cancer Res       Date:  2004-09-15       Impact factor: 12.531

Review 4.  [Mercury and neurotoxicity].

Authors:  M E Crespo-López; A M Herculano; T C Corvelo; J L Do Nascimento
Journal:  Rev Neurol       Date:  2005 Apr 1-15       Impact factor: 0.870

5.  Self-organizing-map-based molecular signature representing the development of hepatocellular carcinoma.

Authors:  Norio Iizuka; Masaaki Oka; Hisafumi Yamada-Okabe; Naohide Mori; Takao Tamesa; Toshimasa Okada; Norikazu Takemoto; Kazuhiko Sakamoto; Kenji Hamada; Hideo Ishitsuka; Takanobu Miyamoto; Shunji Uchimura; Yoshihiko Hamamoto
Journal:  FEBS Lett       Date:  2005-02-14       Impact factor: 4.124

6.  [Acute hepatitis after cutaneous application of an artisanal topical ointment with mercury].

Authors:  P-Y Girault; P Rey; Y Veran
Journal:  Gastroenterol Clin Biol       Date:  2008-09-19

7.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

8.  Endogenous antioxidant defence system in rat liver following mercury chloride oral intoxication.

Authors:  Inmaculada Bando; María Isabel Sánchez Reus; David Andrés; María Cascales
Journal:  J Biochem Mol Toxicol       Date:  2005       Impact factor: 3.642

9.  Transcriptional profiling identifies gene expression changes associated with IFN-alpha tolerance in hepatitis C-related hepatocellular carcinoma cells.

Authors:  Nathalie Wong; Kathy Y-Y Chan; Pascale F Macgregor; Paul B-S Lai; Jeremy A Squire; Ben Beheshti; Monique Albert; Thomas W-T Leung
Journal:  Clin Cancer Res       Date:  2005-02-01       Impact factor: 12.531

10.  Comparative effects of dietary methylmercury on gene expression in liver, skeletal muscle, and brain of the zebrafish (Danio rerio).

Authors:  P Gonzalez; Y Dominique; J C Massabuau; A Boudou; J P Bourdineaud
Journal:  Environ Sci Technol       Date:  2005-06-01       Impact factor: 9.028

View more
  21 in total

1.  Effect of periodontal pathogens on the metatranscriptome of a healthy multispecies biofilm model.

Authors:  Jorge Frias-Lopez; Ana Duran-Pinedo
Journal:  J Bacteriol       Date:  2012-02-10       Impact factor: 3.490

2.  Gene-class analysis of expression patterns induced by psychoactive pharmaceutical exposure in fathead minnow (Pimephales promelas) indicates induction of neuronal systems.

Authors:  Michael A Thomas; Parag P Joshi; Rebecca D Klaper
Journal:  Comp Biochem Physiol C Toxicol Pharmacol       Date:  2011-06-06       Impact factor: 3.228

3.  Identification of a novel glycolysis-related gene signature that can predict the survival of patients with lung adenocarcinoma.

Authors:  Chang Liu; Yinyan Li; Minjie Wei; Lin Zhao; Yangyang Yu; Guang Li
Journal:  Cell Cycle       Date:  2019-02-17       Impact factor: 4.534

4.  Identification of a glycolysis-related gene signature for predicting pancreatic cancer survival.

Authors:  Jiachao Zhang; Zhehao Liu; Zhensheng Zhang; Rong Tang; Yongchao Zeng; Pingping Chen
Journal:  J Gastrointest Oncol       Date:  2022-02

5.  A New Approach to Drug Repurposing with Two-Stage Prediction, Machine Learning, and Unsupervised Clustering of Gene Expression.

Authors:  Yi Cong; Misaki Shintani; Fuga Imanari; Naoki Osada; Toshinori Endo
Journal:  OMICS       Date:  2022-06-03

6.  Psychoactive pharmaceuticals induce fish gene expression profiles associated with human idiopathic autism.

Authors:  Michael A Thomas; Rebecca D Klaper
Journal:  PLoS One       Date:  2012-06-06       Impact factor: 3.240

7.  Genome-wide transcriptomics of aging in the rotifer Brachionus manjavacas, an emerging model system.

Authors:  Kristin E Gribble; David B Mark Welch
Journal:  BMC Genomics       Date:  2017-03-01       Impact factor: 3.969

8.  Identification of novel cell glycolysis related gene signature predicting survival in patients with endometrial cancer.

Authors:  Zi-Hao Wang; Yun-Zheng Zhang; Yu-Shan Wang; Xiao-Xin Ma
Journal:  Cancer Cell Int       Date:  2019-11-14       Impact factor: 5.722

9.  Glycolysis gene expression profilings screen for prognostic risk signature of hepatocellular carcinoma.

Authors:  Longyang Jiang; Lan Zhao; Jia Bi; Qiutong Guan; Aoshuang Qi; Qian Wei; Miao He; Minjie Wei; Lin Zhao
Journal:  Aging (Albany NY)       Date:  2019-12-02       Impact factor: 5.682

10.  Identification of a Novel Epithelial-Mesenchymal Transition Gene Signature Predicting Survival in Patients With HNSCC.

Authors:  Wei Xin; Chaoran Zhao; Longyang Jiang; Dongmei Pei; Lin Zhao; Chengpu Zhang
Journal:  Pathol Oncol Res       Date:  2021-03-29       Impact factor: 3.201

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.