Literature DB >> 19864262

Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations.

W Yu1, M Clyne, M J Khoury, M Gwinn.   

Abstract

SUMMARY: We developed web-based applications that encourage the exploration of the literature on human genetic associations by using a database that is continuously updated from PubMed. These applications provide user-friendly interfaces for searching summarized information on human genetic associations, using either genes or diseases as the starting point. AVAILABILITY: Phenopedia and Genopedia can be freely accessed at http://www.hugenavigator.net/HuGENavigator/startPagePhenoPedia.do and http://www.hugenavigator.net/HuGENavigator/startPagePedia.do, respectively.

Entities:  

Mesh:

Year:  2009        PMID: 19864262      PMCID: PMC2796820          DOI: 10.1093/bioinformatics/btp618

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Advances in genomic technologies have dramatically boosted research on genetic associations (Kim and Misra, 2007), including genome-wide association studies (Neale and Purcell, 2008). Rapid growth in this field is reflected in the burgeoning number of related publications in public access databases such as PubMed (http://www.ncbi.nlm.nih.gov/pubmed/). Providing access to published information in an easy, comprehensive and systematic fashion is a critical first step in the synthesis and translation of genomic research data; however, information overload makes the retrieval, curation and presentation of such data an extremely challenging task. Since 2001, we have systematically collected and curated data on genetic association retrieved from PubMed, and deposited them in a database (Lin et al., 2006). We recently developed a screening program for genetic association literature that uses a machine learning technique called support vector machine for automatic classification. The new application significantly increased the recall, specificity and precision of screening (Yu et al., 2008a). Along with the new screening tool, the deployment of a new web-based system called HuGE Navigator for querying and filtering the data (Yu et al., 2008b) makes the database more robust, user-friendly and complete. Here, we present two extensions of HuGE Navigator, Phenopedia and Genopedia—integrated, web-based applications that display comprehensive summaries of published gene–disease associations, organized either by disease or by gene.

2 IMPLEMENTATION

As components of the integrated knowledge base on human genome epidemiology (HuGE Navigator) (Yu et al., 2008b), Phenopedia and Genopedia were built on J2EE technology (http://java.sun.com/javaee/) and on other Java open source frameworks such as Hibernate (http://www.hibernate.org/), Strut (http://struts.apache.org/), JChart (http://jcharts.krysalis.org/) and Google MAP API (http://www.google.com/apis/maps/documentation/). The database contains a curated collection of records retrieved weekly from PubMed since 2001. Each week, an automatic literature screening program (Yu et al., 2008a) screens PubMed for abstracts reporting gene–disease associations. A genetic epidemiologist selects abstracts meeting inclusion criteria and indexes them by gene, category and study type (Lin et al., 2006; Supplementary Table). Once staff of the National Center for Biotechnology Information (NCBI) has assigned Medical Subject Headings (MeSH) terms to abstracts in PubMed, they are retrieved using (NCBI) E-Utilities (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html) and used to index the records in the HuGE Navigator database. Disease terms in HuGE Navigator include all MeSH terms under the disease category in MeSH terminology (http://www.ncbi.nlm.nih.gov/sites/entrez?db=mesh). The metathesaurus in the Unified Medical Language System (Lindberg et al., 1993) is used as a lookup table for disease term synonyms. Entrez Gene records from the NCBI Entrez Gene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=gene) are used as standards for gene information. Data from the Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa and Goto, 2000) are used to populate pathway information. The detailed schema for the HuGE Navigator database can be found in the paper by Yu et al. (2007).

3 FEATURES

Phenopedia provides a disease-centered view of genetic association studies. Information about genes studied in relation to a particular disease (e.g. stroke) or phenotype (e.g. hypertension) is summarized on the web page in tabular format. To perform a search, the user enters a disease term in the search box; the system maps the search text onto possible MeSH disease term(s), and the user selects from among all possible terms. Results of the search include: (i) the number of published genetic association studies, including the numbers of meta-analyses and genome-wide association studies (GWAS); (ii) the number of genes studied; (iii) the number of investigators (published authors) in the field; and (iv) temporal and geographic publication trends. A separate table displays a list of genes in descending order of the frequency with which they have been studied for association with the disease. For each gene, the table includes the total number of publications, as well as the numbers of meta-analyses, GWAS and gene–environment interaction studies; a link leads to a display of publication trends for the specific gene–disease association (Supplementary Fig. 1A and D). Each number in the table is a hyperlink that leads to a corresponding detailed information page. For example, the link for the number of publications links to results for the relevant query in HuGE Literature Finder, one of the other applications in HuGE Navigator. We applied the method developed by Goh et al. (2007) to build a literature-based disease–gene network in which two diseases are ‘connected’ if they have been studied for association with the same gene(s). For a given disease, the display page lists each connected disease and the genes studied for association with both diseases (Supplementary Fig. 1B); data are shown in tabular form, without graphic representation. In addition, genes in the tables can be grouped according to pathways defined in KEGG (Supplementary Fig. 1C). The summary page provides links to some major disease-specific databases and published field synopses (Khoury et al., 2009) if available. Similarly, Genopedia provides a gene-centered summary view of genetic association studies. The system translates a gene name, gene symbol, gene alias or protein name entered by the user into a HUGO gene symbol. Genopedia displays information about diseases that have been studied in association with a given gene using a format similar to that described for Phenopedia (Supplementary Fig. 2A). A gene–disease network is also generated by defining two genes as ‘connected’ if they have been studied for association with the same disease(s) (Goh et al., 2007). The list of connected genes is displayed along with the disease(s) that connect them (Supplementary Fig. 2B). Each search result page provides links to the foremost gene-centered databases, including NCBI Entrez Gene (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=gene), GeneCards (http://www.genecards.org/), PharmGKB (http://www.pharmgkb.org/) and ALFRED (http://alfred.med.yale.edu/). Genopedia has also become a major resource linkout for these major gene-centered databases. As of June 2009, the HuGE Navigator database contains 2506 disease MeSH terms and 5456 genes. Phenopedia and Genopedia search results (a list of genes or a list of diseases) and whole datasets for the connections between genes and diseases are downloadable in a tab-delimited text file format. Both applications are components of HuGE Navigator, which allows navigation among all components as needed.

4 CONCLUSIONS

Our goal in developing Phenopedia and Genopedia was to provide researchers with quick and easy access to updated information on human genetic association studies, in order to facilitate knowledge synthesis, which is the first step in translating new knowledge gained from basic research to applications for clinical practice and public health (Khoury et al., 2007). With these applications, we do not attempt to quantify specific genotype–phenotype associations; instead, we provide a starting point for systematic review and evaluation of associations by meta-analysis or other methods (Ioannidis et al., 2008). Phenopedia and Genopedia serve as resources for the development of field synopses, which are regularly updated summaries of genetic associations in a particular field of research defined by a phenotype or family of genes (Khoury et al., 2009). Our database differs in several key respects from the Online Mendelian Inheritance in Man (OMIM; http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db1/4OMIM) database, which focuses on rare disease-causing genetic variants and has only recently begun to include more common diseases. Our database systematically collects population-based genetic association studies dealing with common diseases, regardless of whether they report positive findings. Our applications display search results on the web page in tabular format, which is more efficient and user-friendly than the free plain text format used by OMIM. The automatic generation of hypothetical disease–gene networks and the integration of pathway information provide additional means for exploring hidden potential connections (Ekins et al., 2007; Goh et al., 2007). Currently, the association data in our database are indexed only at the gene level. We have experimented with extracting and displaying gene–disease association data (including published reference, phenotype, number of studies, number of cases, number of controls, contrast, effect size and heterogeneity) at the variant level for meta-analysis studies only (see example in Supplementary Fig. 1E). In our future work, we plan to collect gene variant-level information systematically and display it on the web accordingly, in table format. We also plan to create application programming interfaces or web services to facilitate integration with other systems. Conflict of Interest: none declared
  13 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Pathway mapping tools for analysis of high content data.

Authors:  Sean Ekins; Yuri Nikolsky; Andrej Bugrim; Eugene Kirillov; Tatiana Nikolskaya
Journal:  Methods Mol Biol       Date:  2007

3.  The human disease network.

Authors:  Kwang-Il Goh; Michael E Cusick; David Valle; Barton Childs; Marc Vidal; Albert-László Barabási
Journal:  Proc Natl Acad Sci U S A       Date:  2007-05-14       Impact factor: 11.205

Review 4.  The positives, protocols, and perils of genome-wide association.

Authors:  Benjamin M Neale; Shaun Purcell
Journal:  Am J Med Genet B Neuropsychiatr Genet       Date:  2008-10-05       Impact factor: 3.568

Review 5.  Tracking the epidemiology of human genes in the literature: the HuGE Published Literature database.

Authors:  Bruce K Lin; Melinda Clyne; Matthew Walsh; Onnalee Gomez; Wei Yu; Marta Gwinn; Muin J Khoury
Journal:  Am J Epidemiol       Date:  2006-04-26       Impact factor: 4.897

6.  The Unified Medical Language System.

Authors:  D A Lindberg; B L Humphreys; A T McCray
Journal:  Methods Inf Med       Date:  1993-08       Impact factor: 2.176

7.  Genome-wide association studies, field synopses, and the development of the knowledge base on genetic variation and human diseases.

Authors:  Muin J Khoury; Lars Bertram; Paolo Boffetta; Adam S Butterworth; Stephen J Chanock; Siobhan M Dolan; Isabel Fortier; Montserrat Garcia-Closas; Marta Gwinn; Julian P T Higgins; A Cecile J W Janssens; James Ostell; Ryan P Owen; Roberta A Pagon; Timothy R Rebbeck; Nathaniel Rothman; Jonine L Bernstein; Paul R Burton; Harry Campbell; Anand Chockalingam; Helena Furberg; Julian Little; Thomas R O'Brien; Daniela Seminara; Paolo Vineis; Deborah M Winn; Wei Yu; John P A Ioannidis
Journal:  Am J Epidemiol       Date:  2009-06-04       Impact factor: 4.897

Review 8.  Assessment of cumulative evidence on genetic associations: interim guidelines.

Authors:  John P A Ioannidis; Paolo Boffetta; Julian Little; Thomas R O'Brien; Andre G Uitterlinden; Paolo Vineis; David J Balding; Anand Chokkalingam; Siobhan M Dolan; W Dana Flanders; Julian P T Higgins; Mark I McCarthy; David H McDermott; Grier P Page; Timothy R Rebbeck; Daniela Seminara; Muin J Khoury
Journal:  Int J Epidemiol       Date:  2007-09-26       Impact factor: 7.196

9.  An open source infrastructure for managing knowledge and finding potential collaborators in a domain-specific subset of PubMed, with an example from human genome epidemiology.

Authors:  Wei Yu; Ajay Yesupriya; Anja Wulf; Junfeng Qu; Muin J Khoury; Marta Gwinn
Journal:  BMC Bioinformatics       Date:  2007-11-09       Impact factor: 3.169

10.  GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique.

Authors:  Wei Yu; Melinda Clyne; Siobhan M Dolan; Ajay Yesupriya; Anja Wulf; Tiebin Liu; Muin J Khoury; Marta Gwinn
Journal:  BMC Bioinformatics       Date:  2008-04-22       Impact factor: 3.169

View more
  96 in total

1.  Genome-wide linkage and positional candidate gene study of blood pressure response to dietary potassium intervention: the genetic epidemiology network of salt sensitivity study.

Authors:  Tanika N Kelly; James E Hixson; Dabeeru C Rao; Hao Mei; Treva K Rice; Cashell E Jaquish; Lawrence C Shimmin; Karen Schwander; Chung-Shuian Chen; Depei Liu; Jichun Chen; Concetta Bormans; Pramila Shukla; Naveed Farhana; Colin Stuart; Paul K Whelton; Jiang He; Dongfeng Gu
Journal:  Circ Cardiovasc Genet       Date:  2010-09-22

2.  Crohn disease risk prediction-Best practices and pitfalls with exome data.

Authors:  Manuel Giollo; David T Jones; Marco Carraro; Emanuela Leonardi; Carlo Ferrari; Silvio C E Tosatto
Journal:  Hum Mutat       Date:  2017-03-21       Impact factor: 4.878

3.  Regulatory network analysis of microRNAs and genes in imatinib-resistant chronic myeloid leukemia.

Authors:  Ismael Soltani; Hanen Gharbi; Islem Ben Hassine; Ghada Bouguerra; Kais Douzi; Mouheb Teber; Salem Abbes; Samia Menif
Journal:  Funct Integr Genomics       Date:  2016-09-16       Impact factor: 3.410

4.  Transactions Between Substance Use Intervention, the Oxytocin Receptor (OXTR) Gene, and Peer Substance Use Predicting Youth Alcohol Use.

Authors:  H Harrington Cleveland; Amanda M Griffin; Pedro S A Wolf; Richard P Wiebe; Gabriel L Schlomer; Mark E Feinberg; Mark T Greenberg; Richard L Spoth; Cleve Redmond; David J Vandenbergh
Journal:  Prev Sci       Date:  2018-01

Review 5.  Online tools for bioinformatics analyses in nutrition sciences.

Authors:  Sridhar A Malkaram; Yousef I Hassan; Janos Zempleni
Journal:  Adv Nutr       Date:  2012-09-01       Impact factor: 8.701

6.  Convergence of genome-wide association and candidate gene studies for alcoholism.

Authors:  Emily Olfson; Laura Jean Bierut
Journal:  Alcohol Clin Exp Res       Date:  2012-09-14       Impact factor: 3.455

Review 7.  Genetics and outcomes after traumatic brain injury (TBI): what do we know about pediatric TBI?

Authors:  Brad Kurowski; Lisa J Martin; Shari L Wade
Journal:  J Pediatr Rehabil Med       Date:  2012

Review 8.  Inverse cancer comorbidity: a serendipitous opportunity to gain insight into CNS disorders.

Authors:  Rafael Tabarés-Seisdedos; John L Rubenstein
Journal:  Nat Rev Neurosci       Date:  2013-04       Impact factor: 34.870

9.  Multiple common variants for celiac disease influencing immune gene expression.

Authors:  Patrick C A Dubois; Gosia Trynka; Lude Franke; Karen A Hunt; Jihane Romanos; Alessandra Curtotti; Alexandra Zhernakova; Graham A R Heap; Róza Adány; Arpo Aromaa; Maria Teresa Bardella; Leonard H van den Berg; Nicholas A Bockett; Emilio G de la Concha; Bárbara Dema; Rudolf S N Fehrmann; Miguel Fernández-Arquero; Szilvia Fiatal; Elvira Grandone; Peter M Green; Harry J M Groen; Rhian Gwilliam; Roderick H J Houwen; Sarah E Hunt; Katri Kaukinen; Dermot Kelleher; Ilma Korponay-Szabo; Kalle Kurppa; Padraic MacMathuna; Markku Mäki; Maria Cristina Mazzilli; Owen T McCann; M Luisa Mearin; Charles A Mein; Muddassar M Mirza; Vanisha Mistry; Barbara Mora; Katherine I Morley; Chris J Mulder; Joseph A Murray; Concepción Núñez; Elvira Oosterom; Roel A Ophoff; Isabel Polanco; Leena Peltonen; Mathieu Platteel; Anna Rybak; Veikko Salomaa; Joachim J Schweizer; Maria Pia Sperandeo; Greetje J Tack; Graham Turner; Jan H Veldink; Wieke H M Verbeek; Rinse K Weersma; Victorien M Wolters; Elena Urcelay; Bozena Cukrowska; Luigi Greco; Susan L Neuhausen; Ross McManus; Donatella Barisani; Panos Deloukas; Jeffrey C Barrett; Paivi Saavalainen; Cisca Wijmenga; David A van Heel
Journal:  Nat Genet       Date:  2010-02-28       Impact factor: 38.330

10.  Gene regulatory network reveals oxidative stress as the underlying molecular mechanism of type 2 diabetes and hypertension.

Authors:  Jesmin Jesmin; Mahbubur Sm Rashid; Hasan Jamil; Raquel Hontecillas; Josep Bassaganya-Riera
Journal:  BMC Med Genomics       Date:  2010-10-13       Impact factor: 3.063

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.