Literature DB >> 17610726

Online genetic databases informing human genome epidemiology.

Angela J Frodsham1, Julian P T Higgins.   

Abstract

BACKGROUND: With the advent of high throughput genotyping technology and the information available via projects such as the human genome sequencing and the HapMap project, more and more data relevant to the study of genetics and disease risk will be produced. Systematic reviews and meta-analyses of human genome epidemiology studies rely on the ability to identify relevant studies and to obtain suitable data from these studies. A first port of call for most such reviews is a search of MEDLINE. We examined whether this could be usefully supplemented by identifying databases on the World Wide Web that contain genetic epidemiological information.
METHODS: We conducted a systematic search for online databases containing genetic epidemiological information on gene prevalence or gene-disease association. In those containing information on genetic association studies, we examined what additional information could be obtained to supplement a MEDLINE literature search.
RESULTS: We identified 111 databases containing prevalence data, 67 databases specific to a single gene and only 13 that contained information on gene-disease associations. Most of the latter 13 databases were linked to MEDLINE, although five contained information that may not be available from other sources.
CONCLUSION: There is no single resource of structured data from genetic association studies covering multiple diseases, and in relation to the number of studies being conducted there is very little information specific to gene-disease association studies currently available on the World Wide Web. Until comprehensive data repositories are created and utilized regularly, new data will remain largely inaccessible to many systematic review authors and meta-analysts.

Entities:  

Mesh:

Year:  2007        PMID: 17610726      PMCID: PMC1929117          DOI: 10.1186/1471-2288-7-31

Source DB:  PubMed          Journal:  BMC Med Res Methodol        ISSN: 1471-2288            Impact factor:   4.615


Background

Following the human genome project [1] and with the increasing efficiency and throughput of genotyping techniques, very high numbers of genetic variants can be examined for predisposition to disease [2]. Vast untapped resources of genotyping data sit in laboratories across the world, unlikely to ever be published due to natural tendency to better disseminate the more striking of these findings [3]. As the world of genetics moves into the era of whole genome association studies, the amount of data generated will increase still further [2]. Interpretation of the findings of genetic association studies is problematic, not only due to the selective reporting of findings, but also due to limitations of design, conduct, sample size, suboptimal analysis, and inconsistent findings across studies [4,5]. Systematic reviews and meta-analyses offer valuable means of assembling and synthesising the totality of evidence. They offer maximal power to detect true effects, the highest precision to estimate gene prevalences and gene-disease associations, and enable investigation of differences and inconsistencies across studies. However, when based solely on the available published literature they are dependent on what results have been reported, and publication-related biases may be substantial. To partly overcome this, data can be requested from primary investigators, although lack of response, changes in personnel, lack of access to archived data and unwillingness to share data can hamper such attempts. A preferable approach is for collaborative combined analyses by consortia of multiple studies [6]. The Human Genome Epidemiology Network (HuGENet) is promoting meta-analyses of genetic association studies [7], which all, to some extent, depend on information being available about which groups have examined which genetic variants. One means of making genetic association information available is through online databases. A discussion paper published in 2000 recommended that, although resources for the provision of genomic information on the web were adequate, the availability of genetic epidemiology data was limited. This was in part blamed on the relative youth of the field of genetic epidemiology at the time [8]. Here we present findings from a systematic search for genetic epidemiology data available on the World Wide Web. Our primary motivation was to seek resources that would facilitate thorough systematic reviews or meta-analyses of gene prevalence or genetic association. We were interested both in identification of relevant studies and in availability of data that might not be published in journal articles. For genetic association information we further sought to evaluate the role of online databases as a supplement to information contained in MEDLINE, from the point of view of either a literature-based meta-analysis or in the preliminary stages of a collaborative combined analysis.

Methods

We sought databases containing epidemiological information on gene prevalence or genetic association. Prevalence databases were determined as those with information on population prevalence of genetic variants without information on the evidence that such variants are involved in disease susceptibility or progression. Association databases were determined as those containing epidemiological information relating specific genetic variants to specific health or disease outcomes. To identify these we investigated the databases listed in the 2005 issue of the Nucleic Acids Research Database issue [9] and used those listed on the Center for Disease Control and Disease Prevention (CDC) Office of Genomics and Disease Prevention website [10]. We supplemented this with a search of the world wide web using the Google [11] search engine, using the search term "database (genetics OR genomics)(phenotype OR disease OR epidemiology OR association)" on the 14th October 2005. Links from all databases identified were followed to identify further databases. We excluded general purpose reference databases (such as MEDLINE and EMBASE), databases primarily presenting information on genomics or proteomics without information on epidemiological studies, databases providing a resource for families and health care practitioners, and reported databases whose websites were found to be non-functional. We produced a list of prevalence databases, and a list of databases addressing variants of a single gene. Databases including association information on more than one gene were the subject of detailed investigation. We extracted information from these on content, source of data, regularity of update, size of the database, accessibility, search functions, connections to other databases, administration and funding, using a pre-piloted pro forma. We developed a system of grading the database according to its potential utility within systematic reviews and meta-analyses, as a supplement to a standard search of MEDLINE. This 'Beyond-MEDLINE utility grade' runs from grade 1 for a database that includes only material available in MEDLINE (and therefore would be identified by searching MEDLINE alone) to grade 5 for a database making unpublished data available to the user. The grade definitions are as follows:

1 Nothing novel

Database entries are equivalent to/links to MEDLINE records;

2 Novel information

Database entries are based on MEDLINE records, but with additional qualitative information, or otherwise available data (e.g. a specifically written summary, or results extracted from the cited paper);

3 Novel data

Database entries are based on MEDLINE records, but with additional quantitative information otherwise unavailable (e.g. updated results or unpublished association data);

4 Novel studies

Database enables identification of association studies not mentioned in MEDLINE records (e.g. non-MEDLINE-indexed report of an association study);

5 Novel studies and data

Database enables identification of association studies not mentioned in MEDLINE records AND includes association data from such studies (e.g. grouped data or individual patient data).

Results

A total of 448 websites were investigated, excluding duplicates. Of these, 257 were excluded, 111 were classed as containing prevalence data, 67 were classed a specific to a single gene and the remaining 13 databases were classed as containing information from genetic association studies and contained information on more than one gene. These were examined in more detail. Lists of all databases, by category, are available on our website [12]. The prevalence databases contained information on the frequency of genetic variation in multiple genes, often in more than one population. If a database only contained information relevant to a single gene, then this was placed in the gene-specific subcategory. The majority of databases in the gene-specific subcategory contained only prevalence data but some contained information about gene- disease associations, though these were often limited to the rather older field of single gene disorders. Databases containing information on only a single gene were excluded from the utility grade analysis. Thirteen databases contained information on genetic association studies in more than a single gene (Table 1 and Additional file 1). The majority of the extracted databases are freely available to the scientific community, although three (Asthma Gene Database, MedGene and PharmGKB) require users to register in order to use the website. Most databases had entries that were specifically linked to MEDLINE citations, and added little to the information available in the relevant MEDLINE record beyond a summary of key findings. Five databases contained summary results for unpublished data, indications that a particular gene had been analysed, or (in the case of PharmGKB), access to the genotype and phenotype data enabling further analysis. These five databases of greatest utility in systematic reviews and meta-analyses are, however, restricted to the disease areas of Alzheimers disease, cardiovascular disease, hereditary inflammation and fever, pharmacogenetics and type 1 diabetes.
Table 1

a table summarising the key information from the databases identified as containing information on genetic association studies. Further information is available in the Supplementary information section. No of entries refers to the approximate number of different study reports contained within the specified database.

Name of WebsiteWebsite URLBrief DescriptionHost/FundingNo. of EntriesDate of last updateBeyond-MEDLINE Utility GradeAccessibilityReference
AlzgeneRegularly updated collection of published genetic association studies performed on Alzheimer Disease phenotypes, from database searches and journals' contents lists. Case and control data presented. Performs crude meta-analysis of odds ratios on request.Various>1000Mar 20065Freely available[14, 20]
Asthma and Allergy Gene Database*Database of published studies for phenotypes related to asthmatic disease. Databases of linkage and mutation information include results extracted from peer-reviewed publicationsInstitut für Epidemiologie, Munchen>20,000Dec 20033Registered access only[21, 22]
Cytokine Gene Polymorphism in Human DiseaseRegularly updated database with Medline-based records from a systematic review of cytokine gene polymorphisms associated with human disease. Data extracted from two publications about the studyUniversity of Bristol/Genes and Immunity>100Mar 20022Freely available[23-25]
GDPinfo†Extensive information system including large published literature database (currently only Medline records), HuGE reviews, books and various types of reports.CDC>15,000Mar 20064Freely available[13, 26]
GenAtlasRegularly updated database of genes, phenotypes and references. Among numerous databases are brief sections on disorders associated with genes, with lists of citations. May be biased towards statistically significant resultsUniversité René Descartes, Paris>60,000Feb 20062Freely available[27]
GeneCanvasDatabase of cardiovascular candidate genes and their polymorphisms investigated at INSERM (Paris, France). Data include gene frequencies and linkage disequilibrium statisticsInserm>750Oct 20054Freely available
Genetic Association DatabaseDatabase of human genetic association studies of complex diseases and disorders, based on Medline records. Data Extracted from publications.NIH/National Institute on Aging and Center for Information Technology>8000-2Freely available[28]
Human Obesity Gene Map DatabaseDatabase of obesity-related genes, including P values for association and references. Biased in favour of statistically significant results.Pennington Biomedical Research Centre, Louisiana State University>100Mar 20052Freely available[29-34]
InfeversDatabase of genetic associations in hereditary inflammatory disorders, with voluntarily submitted entries. Submissions are validated by an editorial board member,Institut de Génétique Humaine, CNRS/European Union 5th framework>400Aug 20054Freely available[35]
MedGeneAutomated database of gene-disease association studies in Medline.Havard Medical SchoolUnknownApr 20051Registered access only[36]
OMIMDatabase of human genes and genetic disorders, containing textual information with links to MEDLINE and sequence records in the Entrez system, and links to additional related resources at NCBI and elsewhere.Johns Hopkins University>16000Mar20062Freely available[37-39]
PharmGKBDatabase of genomic data and clinical information from participants in pharmacogenetics research studies. Welcomes submission of primary data.Stanford University/NIH>700-5Registered access only[40]
T1DBaseDatabase of type 1 diabetes data, including information from collaborating laboratories. Some indication given of unpublished dataInstitute for Systems Biology/JDRF International/JDRF/WT Diabetes and Inflammation laboratory>200Mar20064Freely available[41]

*- No longer being updated due to lack of financial support

† Reports included in the database may contain structured or unstructured data that are not from MEDLINE- indexed paper

a table summarising the key information from the databases identified as containing information on genetic association studies. Further information is available in the Supplementary information section. No of entries refers to the approximate number of different study reports contained within the specified database. *- No longer being updated due to lack of financial support † Reports included in the database may contain structured or unstructured data that are not from MEDLINE- indexed paper

Discussion

Our study aimed to identify, via a systematic search, the readily identifiable databases that have been set up to disseminate genetic epidemiology information over and above that available via MEDLINE to the scientific community. While many databases have been set up to house information on prevalence of genetic variation, with some notable exceptions little progress has been made in the field of gene-disease association data. In the 13 databases we identified on gene-disease association, all but one provided at least some extra information unavailable via a MEDLINE search alone. However, the seven databases among these that gave access to previously unavailable data (i.e. a utility grade of ∁3) clearly include only a small minority of the genetic association studies that exist (for example, Lin et al [13] found over 15,000 articles) The most useful of the databases, i.e. those providing the most, previously unavailable, information were considered excellent examples of resources potentially useful in systematic reviews and meta-analyses, but were targeted to particular fields, such as Type 1 diabetes, Hereditary Fever, Alzheimer's disease or pharmacogenetics. The utility of one such database for meta-analyses is demonstrated by a recent paper on Alzheimer's [14]. Many of the genetic epidemiology databases cited in the 2000 paper [8] are no longer updated or no longer exist, due a lack of financial support. Efforts and funding are needed to facilitate the further development of online repositories that enable the dissemination of all findings into the public domain. Any new repositories will need to provide some assurance of suitable quality control. The Human Genome Epidemiology Network (HuGENet) maintains the Published Literature Database [13], which is currently based on MEDLINE records alone. We would be keen to see this developed into a more comprehensive resource in the way that the Cochrane Central Register of Controlled Trials attempts to includes all clinical trials [15]. Neither database is currently structured to link together reports from the same study. In the wake of the Human Genome Project, with the advent of high throughput genotyping technology, the HapMap project, and now in the era of whole genome association studies, many thousands of genotypes and other data will be generated from epidemiological studies. Only a small minority of these will be reported in traditional journals, and the published literature will continue to provide a potentially biased resource of only the most exciting findings [16]. The Human Genome Epidemiology Network (HuGENet) is committed to encouraging the dissemination of negative findings into the public domain via collaborating with existing journals and setting up on-line journals that will make this process easier. The 'Journal of Negative Results in Biomedicine' published online by BioMed Central [17] has already published several sets of null results of genetic associations and other journals have dedicated subsections for the reporting of null results [18]. We would strongly encourage individual study investigators, and especially consortia of investigators such as those in the HuGENet network of networks [6], to assemble and maintain lists of studies and data repositories. To enable the latter, an approach similar to that of the microarray research community could be adopted for gene-disease association studies: the MIAME (Minimum Information About a Microarray Experiment) guidelines encourage provision of sufficient detail about a microarray experiment for it to be replicated, and offer a format for data to be held in public repositories. Until such developments, it will continue to be difficult to interpret findings from genetic epidemiological studies easily and to fully include them in rigorous and regularly updated meta-analyses. Since the completion of this study, the National Center for Biotechnology Information (NCBI) have announced a new database called dbGaP specifically to host genotype-phenotype studies [19]. This database appears to be an ideal example of the sort of database for which we were searching and will hopefully, in time if adequately utilised, form an essential resource for those preparing systematic reviews and meta-analyses of gene-disease association studies.

Conclusion

As a result of our systematic search for online repositories of genetic epidemiology data, we found 13 databases containing information on genetic association on more than one gene. On grading each of these with respect to the amount and type of extra data contained compared with a search of MEDLINE, we found seven that contained completely novel data that was previously unavailable (i.e. utility grade ≥ 3). This suggests that systematic reviews and meta-analyses based on published reports could be usefully supplemented with searches of some of these resources. However, the yield of information on the world wide web was still disappointingly low, and neither published literature nor online databases appear adequate to find all relevant evidence for inclusion in a comprehensive meta-analysis. We encourage study investigators to make their published and unpublished data available in suitable online repositories. A single resource providing structured data from genetic association studies covering multiple diseases would be an invaluable resource.

Abbreviations

CDC Center for Disease Control HuGENet Human Genome Epidemiology Network

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

AJF participated in the design of the study, carried it out, and drafted the manuscript. JPTH conceived of the study, participated in its design and coordination and helped draft the manuscript. Both authors approved the final manuscript.

Pre-publication history

The pre-publication history for this paper can be accessed here:

Additional file 1

This file list all of the 13 extracted databases and gives a more detailed description of each Click here for file
  35 in total

1.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

Review 2.  Cytokine gene polymorphism in human disease: on-line databases.

Authors:  J Bidwell; L Keen; G Gallagher; R Kimberly; T Huizinga; M F McDermott; J Oksenberg; J McNicholl; F Pociot; C Hardt; S D'Alfonso
Journal:  Genes Immun       Date:  1999-09       Impact factor: 2.676

3.  PharmGKB: the Pharmacogenetics Knowledge Base.

Authors:  Micheal Hewett; Diane E Oliver; Daniel L Rubin; Katrina L Easton; Joshua M Stuart; Russ B Altman; Teri E Klein
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

4.  Replication validity of genetic association studies.

Authors:  J P Ioannidis; E E Ntzani; T A Trikalinos; D G Contopoulos-Ioannidis
Journal:  Nat Genet       Date:  2001-11       Impact factor: 38.330

Review 5.  Cytokine gene polymorphism in human disease: on-line databases, supplement 1.

Authors:  J Bidwell; L Keen; G Gallagher; R Kimberly; T Huizinga; M F McDermott; J Oksenberg; J McNicholl; F Pociot; C Hardt; S D'Alfonso
Journal:  Genes Immun       Date:  2001-04       Impact factor: 2.676

6.  Publication bias is a scientific problem with adverse ethical outcomes: the case for a section for null results.

Authors:  P G Shields
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2000-08       Impact factor: 4.254

7.  Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database.

Authors:  Lars Bertram; Matthew B McQueen; Kristina Mullin; Deborah Blacker; Rudolph E Tanzi
Journal:  Nat Genet       Date:  2007-01       Impact factor: 38.330

8.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.

Authors:  Ada Hamosh; Alan F Scott; Joanna Amberger; Carol Bocchini; David Valle; Victor A McKusick
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

9.  Online Mendelian Inheritance in Man (OMIM).

Authors:  A Hamosh; A F Scott; J Amberger; D Valle; V A McKusick
Journal:  Hum Mutat       Date:  2000       Impact factor: 4.878

Review 10.  The human obesity gene map: the 2000 update.

Authors:  L Pérusse; Y C Chagnon; S J Weisnagel; T Rankinen; E Snyder; J Sands; C Bouchard
Journal:  Obes Res       Date:  2001-02
View more
  7 in total

1.  A literature search tool for intelligent extraction of disease-associated genes.

Authors:  Jae-Yoon Jung; Todd F DeLuca; Tristan H Nelson; Dennis P Wall
Journal:  J Am Med Inform Assoc       Date:  2013-09-02       Impact factor: 4.497

Review 2.  Network integration and graph analysis in mammalian molecular systems biology.

Authors:  A Ma'ayan
Journal:  IET Syst Biol       Date:  2008-09       Impact factor: 1.615

3.  Genetic influences on blood lipids and cardiovascular disease risk: tools for primary prevention.

Authors:  José M Ordovas
Journal:  Am J Clin Nutr       Date:  2009-04-01       Impact factor: 7.045

Review 4.  Fulfilling the promise of personalized medicine? Systematic review and field synopsis of pharmacogenetic studies.

Authors:  Michael V Holmes; Tina Shah; Christine Vickery; Liam Smeeth; Aroon D Hingorani; Juan P Casas
Journal:  PLoS One       Date:  2009-12-02       Impact factor: 3.240

5.  Systematic reviews of genetic association studies. Human Genome Epidemiology Network.

Authors:  Gurdeep S Sagoo; Julian Little; Julian P T Higgins
Journal:  PLoS Med       Date:  2009-03-03       Impact factor: 11.069

6.  Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining.

Authors:  Byron C Wallace; Kevin Small; Carla E Brodley; Joseph Lau; Christopher H Schmid; Lars Bertram; Christina M Lill; Joshua T Cohen; Thomas A Trikalinos
Journal:  Genet Med       Date:  2012-07       Impact factor: 8.822

7.  Registered access: a 'Triple-A' approach.

Authors:  Stephanie O M Dyke; Emily Kirby; Mahsa Shabani; Adrian Thorogood; Kazuto Kato; Bartha M Knoppers
Journal:  Eur J Hum Genet       Date:  2016-09-28       Impact factor: 4.246

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.