Literature DB >> 18978021

ERGR: An ethanol-related gene resource.

An-Yuan Guo¹, Bradley T Webb, Michael F Miles, Mark P Zimmerman, Kenneth S Kendler, Zhongming Zhao.

Abstract

Over the last decade rapid progress has been made in the study of ethanol-related traits including alcohol abuse and dependence, and behavioral responses to ethanol in both humans and animal models. To collect, curate, integrate these results so as to make them easily accessible and interpretable for researchers, we developed ERGR, a comprehensive ethanol-related gene resource. We collected and curated more than 30 large-scale data sets including linkage, association and microarray gene expression from the literature and 21 mouse QTLs from public databases. At present, the ERGR deposits ethanol-related information of approximately 7000 genes from five organisms: human (3311), mouse (2129), rat (679), fly (614) and worm (228). ERGR provides gene annotations and orthologs, detailed gene study information (e.g. fold changes of gene expression, P-values), and both the text and BLAST searches. Moreover, ERGR has data integration tools such as for data union and intersection, and candidate gene selection based on evidence in multiple datasets or organisms. The ERGR database is evolving with new data releases. More functions will also be added. ERGR has a user-friendly web interface with browse and search functions at multiple levels. It is freely available at http://bioinfo.vipbg.vcu.edu/ERGR/.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Ethanol

Year: 2008 PMID： 18978021 PMCID： PMC2686553 DOI： 10.1093/nar/gkn816

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Alcohol dependence, ethanol response and ethanol-related traits have been extensively studied in both humans and animal models. It is now clear that there are correlations between acute behavioral responses to ethanol and ethanol consumption or incidence of alcoholism in both animals and humans (1). In humans, alcoholism (alcohol dependence) is a common, genetically influenced complex disorder across the world. Family, twin and adoption studies demonstrated that genetic factors play a strong role in the etiology of alcoholism, accounting for 50–60% of the population variance in both men and women (2,3). Although genetic factors are important, alcoholism is a complex disease with environmental influences. Further, the architecture likely involves many genes with small effects along with environmental influences, as well as potential interactions between them. Therefore, it is a challenge to explore the molecular mechanisms underlying the genetic propensity to excessive alcohol consumption and use these for the development of new treatments for alcoholism. Many experimental strategies [linkage scan, association study, quantitative trait loci (QTLs) and microarray gene expression] have been applied in the studies of alcoholism and ethanol response in order to identify genes or chromosomal regions in both humans and model organisms (4,5). Rapid progress in genetic studies over the past decade has identified a relatively large number of chromosomal locations or candidate genes that are linked to alcoholism, alcohol-related phenotypes and behavioral responses to ethanol (6–13). Human genetic studies have generally focused on alcohol dependence using linkage studies and association studies. The recent advances of high-throughput molecular technologies such as large-scale genotyping and DNA microarrays have greatly accelerated the generation of data used in studies searching for specific variants contributing to the genetic risk for alcoholism and ethanol response behaviors (14,15). The increasing rate of production for ethanol-related data is expected to accelerate in the near future since the cost of conducting genome-wide association studies (GWAS) is decreasing rapidly. Thus, these data provide us an unprecedented opportunity for integrating and making the wealth of results easily accessible and interpretable. So far, a few databases and computational tools, such as WebQTL (16), PhenoGen (17), WebGestalt (18) and Ontological Discovery Environment (http://ontologicaldiscovery.org/), have been developed for analyzing biological data of phenotypes and complex traits. However, the ethanol genetics research community has still lacked a comprehensive ethanol-related gene resource that presents and integrates cross-species and cross-platform data. Here, we present such a database of ethanol-related gene resource (ERGR, http://bioinfo.vipbg.vcu.edu/ERGR/). To the best of our knowledge, it is a unique public database for ethanol-related genes. Aiming to efficiently integrate and analyze all or most of the published ethanol-related gene studies, we collected and annotated the representative large-scale ethanol-related gene datasets. These data were generated by different approaches including linkage scan, genome-wide association study, microarray expression, QTLs, retrieved from other public databases, or collected by a systematic literature search. We obtained data from the five most-studied model organisms: human, mouse, rat, fly and worm. In addition to information such as dataset description, gene annotations and gene ortholog information, the ERGR also provides tools for data integration (e.g. data union and intersection) and candidate gene selection based on multiple datasets or organisms. ERGR seeks to be a useful resource for the ethanol research community and a model database of data collection and integration for other complex diseases such as schizophrenia and Alzheimer's disease.

DATA SOURCE AND METHODS

Data collection and curation

Currently, ERGR contains ethanol-related gene data from five organisms (human, mouse, rat, fly and worm). These data were collected from different technology platforms that have been widely used in alcohol dependence or ethanol response studies: linkage scan, genome-wide association study, microarray gene expression and QTLs, or by literature search. We collected these data by the following three approaches. First, we searched publications of large-scale alcohol dependence or ethanol response studies in NCBI PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) and then extracted and checked the data from these publications. Using this approach, we collected data including alcohol-related microarray gene expression studies from human or other animals, human alcohol dependence linkage studies and genome-wide association study. Specifically for the linkage data, we selected linkage regions by LOD scores and obtained the physical locations of the corresponding markers from UCSC genome browser (http://genome.ucsc.edu/) (19). Then, we retrieved the genes in the linkage regions from the Ensembl database (http://www.ensembl.org/index.html). Besides the genes in alcoholism or ethanol response studies, we also included other related genes. For example, we collected an addiction candidate gene list in which 130 genes were selected for a haplotype-based analysis of addiction (20). Second, we extracted related data from other public databases. We obtained the mouse alcohol behavior related QTLs from the PARC (Portland Alcohol Research Center, (http://www.ohsu.edu/parc/by_phen.shtml). Only those QTLs that were marked significant and whose genomic locations could be identified were extracted. Then, we retrieved genes mapped in the QTL regions from the Mouse Genome Informatics (MGI, http://www.informatics.jax.org/) (21). Further, we searched alcohol-related genes in HuGE Navigator (http://www.hugenavigator.net/), a database of genetic associations and human genome epidemiology (22). We found nine alcohol-related phenotypes in HuGE Navigator and extracted all the genes associated with them. Third, we performed a systematic literature search by searching the titles and abstracts of all the publications available in PubMed. We searched each protein coding gene symbol with one of the three keywords: alcohol, ethanol and alcoholism. To reduce false-positives, we manually checked those gene symbols having fewer than three letters/digits or having more than 100 hits of publications. For each dataset, we compiled a summary description from the original data source. The summary includes the experimental method and treatment, platform, organism, tissue and phenotype, and the publication. For each gene in a dataset, more detailed information was extracted from the data source such as gene expression fold change, P-value, and tissue type.

Gene ID

We used NCBI Entrez Gene ID as the central ID for cross linking and annotation. However, specific studies used different IDs, such as gene symbol, mRNA accession number, EST accession number, clone ID, Affymetrix probe ID, Ensembl ID or UniGene ID. We applied the following three approaches to convert different IDs to gene IDs. (i) We downloaded gene2accession, gene2unigene, and gene_info files from NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/) and obtained the corresponding gene IDs for the accession numbers, UniGene IDs or gene symbols used in different studies. (ii) We used an online tool, IDConverter (http://idconverter.bioinfo.cnio.es/IDconverter.php), to convert the original IDs in the publications to gene IDs (23). (iii) When the IDs could not be converted by the above two approaches, we manually searched the NCBI databases for the gene IDs.

Gene annotation

We downloaded gene annotation files from the NCBI FTP site. Then, we extracted annotation information for the genes in our database from these files. We parsed the gene_info file to retrieve the gene information such as gene symbol, alias, full name, chromosome, genetic location and gene type. We obtained gene ontology (GO) annotations from the gene2go file downloaded from the GO website (http://www.geneontology.org/) (24). The accession numbers of the reference sequences and chromosomal location of each gene was parsed from the gene2refseq file.

Orthologs

A cross-species gene discovery and validation scheme can provide both powerful confirmation of candidate genes and mechanistic information about gene-behavior relationships. Thus, we searched orthologs of the ethanol-related genes in this database. For the nonhuman genes in our datasets, their human ortholog information was obtained and curated. We obtained human/mouse and human/rat ortholog information from MGI (ftp://ftp.informatics.jax.org/pub/reports/index.html), and human/fly and human/worm from the Inparanoid database (version 6.1, http://inparanoid.sbc.su.se/download/6.1/) (25).

DATABASE CONTENT AND ORGANIZATION

Data overview

As summarized in Table 1, we have collected and curated more than 30 genome-wide or large-scale datasets including microarray gene expression, linkage and genome-wide association studies, results of literature search and 21 mouse QTLs. At present, ERGR includes ∼7000 genes in five organisms: human (3311), mouse (2129), rat (679), fly (614) and worm (228). For rat, fly and worm, ERGR only has microarray expression data. Because of the major research interest in humans, the human data in ERGR is the most comprehensive, which includes microarray gene expression, linkage studies, genome-wide association study and literature search results.

Table 1.

Summary of the data in ERGR

Species	Method	No. of datasets	No. of genes	Source
Human (Total: 3311 genes)	Microarray expression	7	831	Literature
	Genome-wide association	1	71	Literature
	Linkage	3	918	Literature
	HuGE Navigator	9 diseases	203	HuGE Navigator (http://www.hugenavigator.net/)
	Addiction array list	1	130	Literature
	Literature search	1	1726	Literature
Mouse (Total: 2129 genes)	Microarray expression	11	682	Literature
QTL	21 QTLs	1568	PARC Alcohol QTLs (http://www.ohsu.edu/parc/by_phen.shtml)
Rat	Microarray expression	6	679	Literature
Fly	Microarray expression	2	614	Literature
Worm	Microarray expression	1	228	Literature

Summary of the data in ERGR

Database organization

We used MySQL, a SQL client/sever relational open source database management system that has been commonly used in the development of biomedical databases, to store and manage the data. One table was specifically designed to store the summary descriptions of all datasets. Because the formats of the datasets generated by the different methods often varied, we managed datasets of gene expression, linkage, association or QTLs by separate tables. Annotations of gene information, gene ontology, reference sequences and ortholog information were also stored in individual tables. Dataset name, PubMed ID and Entrez gene ID are the keys to link between tables.

WEB INTERFACE

A user-friendly web interface was designed and implemented for ERGR. It is freely available at http://bioinfo.vipbg.vcu.edu/ERGR/. The user can browse and search all the data at different levels or combine the data by the integration functions.

Data browse

To help the user to browse the data easily, ERGR provides four different browsing methods: (i) by species; (ii) by method or platform such as microarray expression, linkage or association; (iii) by chromosome of a species; (iv) or through a summary page which lists all the datasets available in ERGR. A cascading style is applied for dataset browse, i.e., from dataset list to gene list, and then to gene information. By clicking the dataset name on a dataset list page it will show the dataset description and the corresponding list of genes. Selecting the gene ID will link to the gene information page, which includes gene ID, symbol and name, GO annotation, RefSeq, chromosome location, database cross links and the detailed study information (e.g. fold change of gene expression and P-value) extracted from the original ethanol studies. For example, based on the microarray dataset ‘15816859’, the fold change and P-value of ADH6 gene expression were 0.69 and 7.00 × 10−3, respectively (http://bioinfo.vipbg.vcu.edu/ERGR/geneinfo.php?id=130). Moreover, the user may find the detailed information of the studied single nucleotide polymorphisms (SNPs) via dynamic links to the NCBI dbSNP database.

Data search

ERGR provides three approaches for searching the data including text search and sequence search. First, the user may find a quick search box in the top right of the web page for searching gene ID and symbol. It supports wildcard searches such as using partial gene symbol (e.g. ADH) or using an asterisk (e.g. ADH*). Second, ERGR provides an advanced search page, on which users may combine different search terms (e.g. ID, symbol, phenotype, physical location and GO term) for a user-defined search. Third, BLAST search against the nucleotide or protein sequences of the ethanol-related genes in each organism or all the five organisms is available in ERGR.

DATA INTEGRATION

One current opportunity and challenge is the increasing large amount of public data that can be applied to the study of alcohol-related traits. Given the growth and scale of these data, efficient integration is necessary in studying a complex disorder such as alcohol dependence. Currently, ERGR provides the users data union and intersection functions for data integration; however, more functions are being developed in this ongoing project. Moreover, ERGR provides candidate gene selection results based on the evidence in multiple datasets and multiple organisms.

Data union and intersection

ERGR supports the union operation of any two datasets from the same organism and the intersection operation of any two datasets. There are three rules for the dataset intersection operation. First, if the two datasets to be compared are from the same organism, ERGR performs the intersection operation and outputs the results based on gene ID. Second, if the two datasets are from different organisms and one of them is from the human, ERGR uses human genes as the reference, transforms non-human genes to the human orthologs, and then performs the intersection operation based on human gene IDs. Third, if the two datasets are from two different non-human organisms, ERGR transforms genes in the both datasets to human orthologs and then performs the intersection operation based on human gene IDs. An example of data integration is shown in Figure 1.

Figure 1.

An example of data integration and gene information page. (A) Functional menu on the head of each page. (B) Data browser by method or dataset. (C) Data integration page. (D) An example of data integration. (E) Detailed information of each gene identified by the data integration (partial page).

Candidate gene selection

The candidate gene selection and prioritization function can be used to select candidate genes for follow up experimental replication or bioinformatics analysis. To make the ERGR data more useful and serve the community more effectively, ERGR provides some candidate gene selection results based on the evidence in multiple datasets either in one organism or multiple organisms. At present, ERGR contains four such candidate gene lists for ethanol response or alcoholism-related traits. The first one is a candidate gene list generated from the datasets of all organisms using human genes as reference. The non-human genes were mapped to human orthologous genes so that they could be compared systematically. There were 42 human genes or their orthologs that had evidence in more than four datasets. The other three candidate gene lists were selected from all the available datasets in one organism (human, mouse or rat) only. These candidate genes include those that have been well studied for alcohol dependence such as ADH, ALDH, GABA receptors, and NPY (4). However, the candidate gene lists also include genes with multiple lines of evidence but not so well studied such as CPE, GFAP, CRYAB, GAD1 and NTRK2, among which two [GAD1 (26) and NTRK2 (27)] had association studies reported. We found three genes (KCNJ9, GNB1 and ATP1A2) that had evidence in at least five datasets from the mouse, rat or fly but no evidence yet in any human datasets.

FUTURE PERSPECTIVES

ERGR is a unique database for ethanol-related genes and their annotations. It is freely available to public and also serves as a core data management system for the local VCU alcohol research community (VCU ARC). We will continue to collect and curate ethanol-related data, especially the genome-wide association studies. We will develop more tools that allow the users to customize their gene ranking, track their own data, and present the genes or integration results graphically.

FUNDING

The National Institute on Alcohol Abuse and Alcoholism (R21AA017437, R01AA011408, R01AA014717, U01AA016667 and U01AA016662). Funding for open access charge: R21AA017437. Conflict of interest statement. None declared.

27 in total

1. Nucleotide sequence variation within the human tyrosine kinase B neurotrophin receptor gene: association with antisocial alcohol dependence.

Authors: K Xu; T R Anderson; K M Neyer; N Lamparella; G Jenkins; Z Zhou; Q Yuan; M Virkkunen; R H Lipsky
Journal: Pharmacogenomics J Date: 2007-01-02 Impact factor: 3.550

2. Pooled association genome scanning for alcohol dependence using 104,268 SNPs: validation and use to identify alcoholism vulnerability loci in unrelated individuals from the collaborative study on the genetics of alcoholism.

Authors: Catherine Johnson; Tomas Drgon; Qing-Rong Liu; Donna Walther; Howard Edenberg; John Rice; Tatiana Foroud; George R Uhl
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2006-12-05 Impact factor: 3.568

3. Identification of susceptibility loci for alcohol-related traits in the Irish Affected Sib Pair Study of Alcohol Dependence.

Authors: Po-Hsiu Kuo; Michael C Neale; Brien P Riley; Bradley Todd Webb; Patrick F Sullivan; Jen Vittum; Diana G Patterson; Dawn L Thiselton; Edwin J van den Oord; Dermot Walsh; Kenneth S Kendler; Carol A Prescott
Journal: Alcohol Clin Exp Res Date: 2006-11 Impact factor: 3.455

4. Glutamate decarboxylase genes and alcoholism in Han Taiwanese men.

Authors: El-Wui Loh; Hsien-Yuan Lane; Chien-Hsiun Chen; Pi-Shan Chang; Li-Wen Ku; Kathy H T Wang; Andrew T A Cheng
Journal: Alcohol Clin Exp Res Date: 2006-11 Impact factor: 3.455

5. Genomewide linkage study in the Irish affected sib pair study of alcohol dependence: evidence for a susceptibility region for symptoms of alcohol dependence on chromosome 4.

Authors: C A Prescott; P F Sullivan; P-H Kuo; B T Webb; J Vittum; D G Patterson; D L Thiselton; J M Myers; M Devitt; L J Halberstadt; V P Robinson; M C Neale; E J van den Oord; D Walsh; B P Riley; K S Kendler
Journal: Mol Psychiatry Date: 2006-06 Impact factor: 15.992

6. WebGestalt: an integrated system for exploring gene sets in various biological contexts.

Authors: Bing Zhang; Stefan Kirov; Jay Snoddy
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

7. IDconverter and IDClight: conversion and annotation of gene and protein IDs.

Authors: Andreu Alibés; Patricio Yankilevich; Andrés Cañada; Ramón Díaz-Uriarte
Journal: BMC Bioinformatics Date: 2007-01-10 Impact factor: 3.169

8. The UCSC Genome Browser Database: 2008 update.

Authors: D Karolchik; R M Kuhn; R Baertsch; G P Barber; H Clawson; M Diekhans; B Giardine; R A Harte; A S Hinrichs; F Hsu; K M Kober; W Miller; J S Pedersen; A Pohl; B J Raney; B Rhead; K R Rosenbloom; K E Smith; M Stanke; A Thakkapallayil; H Trumbower; T Wang; A S Zweig; D Haussler; W J Kent
Journal: Nucleic Acids Res Date: 2007-12-17 Impact factor: 16.971

9. InParanoid 6: eukaryotic ortholog clusters with inparalogs.

Authors: Ann-Charlotte Berglund; Erik Sjölund; Gabriel Ostlund; Erik L L Sonnhammer
Journal: Nucleic Acids Res Date: 2007-11-30 Impact factor: 16.971

10. The PhenoGen informatics website: tools for analyses of complex traits.

Authors: Sanjiv V Bhave; Cheryl Hornbaker; Tzu L Phang; Laura Saba; Razvan Lapadat; Katherina Kechris; Jeanette Gaydos; Daniel McGoldrick; Andrew Dolbey; Sonia Leach; Brian Soriano; Allison Ellington; Eric Ellington; Kendra Jones; Jonathan Mangion; John K Belknap; Robert W Williams; Lawrence E Hunter; Paula L Hoffman; Boris Tabakoff
Journal: BMC Genet Date: 2007-08-30 Impact factor: 2.797

25 in total

1. Bio-ethanol production from non-food parts of cassava (Manihot esculenta Crantz).

Authors: Ephraim Nuwamanya; Linley Chiwona-Karltun; Robert S Kawuki; Yona Baguma
Journal: Ambio Date: 2011-10-11 Impact factor: 5.129

2. Integrating GWASs and human protein interaction networks identifies a gene subnetwork underlying alcohol dependence.

Authors: Shizhong Han; Bao-Zhu Yang; Henry R Kranzler; Xiaoming Liu; Hongyu Zhao; Lindsay A Farrer; Eric Boerwinkle; James B Potash; Joel Gelernter
Journal: Am J Hum Genet Date: 2013-11-21 Impact factor: 11.025

3. Molecular profiles of drinking alcohol to intoxication in C57BL/6J mice.

Authors: Megan K Mulligan; Justin S Rhodes; John C Crabbe; R Dayne Mayfield; R Adron Harris; Igor Ponomarev
Journal: Alcohol Clin Exp Res Date: 2011-01-11 Impact factor: 3.455

Review 4. From gene networks to drugs: systems pharmacology approaches for AUD.

Authors: Laura B Ferguson; R Adron Harris; Roy Dayne Mayfield
Journal: Psychopharmacology (Berl) Date: 2018-03-01 Impact factor: 4.530

Review 5. Using expression genetics to study the neurobiology of ethanol and alcoholism.

Authors: Sean P Farris; Aaron R Wolen; Michael F Miles
Journal: Int Rev Neurobiol Date: 2010 Impact factor: 3.230

6. Bioethanol production from microwave-assisted acid or alkali-pretreated agricultural residues of cassava using separate hydrolysis and fermentation (SHF).

Authors: N S Pooja; M S Sajeev; M L Jeeva; G Padmaja
Journal: 3 Biotech Date: 2018-01-13 Impact factor: 2.406

7. Chloride intracellular channels modulate acute ethanol behaviors in Drosophila, Caenorhabditis elegans and mice.

Authors: P Bhandari; J S Hill; S P Farris; B Costin; I Martin; C-L Chan; J T Alaimo; J C Bettinger; A G Davies; M F Miles; M Grotewiel
Journal: Genes Brain Behav Date: 2012-01-28 Impact factor: 3.449

8. Brain Imaging-Guided Analysis Reveals DNA Methylation Profiles Correlated with Insular Surface Area and Alcohol Use Disorder.

Authors: Yihong Zhao; Yongchao Ge; Zhi-Liang Zheng
Journal: Alcohol Clin Exp Res Date: 2019-03-04 Impact factor: 3.455

9. Network analysis of EtOH-related candidate genes.

Authors: An-Yuan Guo; Jingchun Sun; Peilin Jia; Zhongming Zhao
Journal: Chem Biodivers Date: 2010-05 Impact factor: 2.408

10. Identification of methylation quantitative trait loci (mQTLs) influencing promoter DNA methylation of alcohol dependence risk genes.

Authors: Huiping Zhang; Fan Wang; Henry R Kranzler; Can Yang; Hongqin Xu; Zuoheng Wang; Hongyu Zhao; Joel Gelernter
Journal: Hum Genet Date: 2014-06-03 Impact factor: 4.132