| Literature DB >> 17877839 |
Abstract
BACKGROUND: Although molecular pathway information and the International HapMap Project data can help biomedical researchers to investigate the aetiology of complex diseases more effectively, such information is missing or insufficient in current genetic association databases. In addition, only a few of the environmental risk factors are included as gene-environment interactions, and the risk measures of associations are not indexed in any association databases. DESCRIPTION: We have developed a published association database (PADB; http://www.medclue.com/padb) that includes both the genetic associations and the environmental risk factors available in PubMed database. Each genetic risk factor is linked to a molecular pathway database and the HapMap database through human gene symbols identified in the abstracts. And the risk measures such as odds ratios or hazard ratios are extracted automatically from the abstracts when available. Thus, users can review the association data sorted by the risk measures, and genetic associations can be grouped by human genes or molecular pathways. The search results can also be saved to tab-delimited text files for further sorting or analysis. Currently, PADB indexes more than 1,500,000 PubMed abstracts that include 3442 human genes, 461 molecular pathways and about 190,000 risk measures ranging from 0.00001 to 4878.9.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17877839 PMCID: PMC2039752 DOI: 10.1186/1471-2105-8-348
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Sorting associations by the risk measures. PADB automatically extracts the odds ratio, hazard ratio, risk ratio and relative risk data if they are available in sentences. When multiple associations are reported in a single sentence, those multiple association data are indexed as separate records.
Figure 2Linking genetic risks to molecular pathway and HapMap information. PADB can help biomedical researchers to review and interpret genetic risk factors more effectively along with molecular pathway and HapMap information.
Figure 3Median risk measures of 'risk' and 'protective' associations during the past 25 years. (A) The median strengths of risk associations remain quite stable around 2.5 (0.4 when log-transformed) in spite of the exponential increase of published association data during the past 25 years. (B) The median strengths of protective associations also remain stable around 0.6 (-0.2 when log-transformed).
PubMed search results of various query terms related to association studies
| Initial Query | Results | Combined Query Results | ||||||
| NOT (odds ratio*) | NOT (relative risk*) | NOT (hazard ratio*) | NOT (risk ratio*) | NOT (case-control) | NOT (cohort*) | NOT (associa*) | ||
| odds ratio* | 67,815 | - | 65,047 | 66,898 | 67,238 | 50,804 | 56,612 | 24,427 |
| relative risk* | 32,280 | 29,512 | - | 31,944 | 31,830 | 27,878 | 24,374 | 14,907 |
| hazard ratio* | 8,927 | 8,110 | 8,591 | - | 8,868 | 8,598 | 5,848 | 3,548 |
| risk ratio* | 3,694 | 3,117 | 3,244 | 3,535 | - | 3,377 | 2,814 | 1,841 |
| case-control | 92,264 | 75,253 | 87,862 | 91,935 | 91,947 | - | 83,109 | 50,155 |
| cohort* | 129,128 | 117,925 | 121,222 | 126,049 | 128,248 | 119,973 | - | 75,875 |
| associa* | 1,591,541 | 1,548,153 | 1,574,168 | 1,586,162 | 1,589,688 | 1,549,432 | 1,538,288 | - |
The search results for various keywords related to association studies in PubMed seldom overlap. The search for 'odds ratio*' in PubMed retrieved 67,815 abstracts when accessed on 30 November 2006. Among these, 65,047 (95.9%), 66,998 (98.7%) and 67,238 (99.2%) abstracts did not contain 'relative risk*', 'hazard ratio*' or 'risk ratio*', respectively. Moreover, 50,804 (74.9%) and 56,612 (83.5%) abstracts did not contain 'case-control' or 'cohort*', respectively. By comparison, 75,253 (81.6%) abstracts out of 92,264 containing 'case-control' and 117,925 (91.3%) abstracts out of 129,128 containing 'cohort*' did not contain 'odds ratio*'.
Comparison of association databases
| Data coverage | Search type | Special content | Data presentation | |||||||
| Database | genetic risk factors | environmental risk factors | all research area | free text | controlled vocabulary | sample size information | systematic analysis results | sorting by risk measures | link to HapMap database | link to pathway database |
| PADB | O | O | O | O | X | X | X | O | O | O |
| GAD | O | Partial | O | O | X | Partial | X | X | O | O |
| HPLD | O | Partial | O | Partial | O | Partial | X | X | X | X |
| Cochrane | Partial | O | O | O | X | O | O | X | X | X |
| AlzGene | O | X | X | O | X | O | O | X | X | X |
| T1DBase | O | X | X | O | X | X | X | X | X | O |
| PharmGKB | O | X | X | O | Partial | X | X | X | X | O |
| CGEMS | O | X | X | X | X | O | O | X | X | X |
| NINDS | O | X | X | X | X | O | O | X | X | X |
| dbGaP | O | X | X | X | X | O | O | X | X | X |
PADB was compared with other association databases, including those for general associations (GAD, HPLD and the Cochrane Reviews), genetic associations for specific diseases (AlzGene, T1Dbase and PharmGKB) and genome-wide associations (CGEMS, NINDS and dbGaP).