| Literature DB >> 21575203 |
Matthias Samwald1, Anja Jentzsch, Christopher Bouton, Claus Stie Kallesøe, Egon Willighagen, Janos Hajagos, M Scott Marshall, Eric Prud'hommeaux, Oktie Hassenzadeh, Elgar Pichler, Susie Stephens.
Abstract
There is an abundance of information about drugs available on the Web. Data sources range from medicinal chemistry results, over the impact of drugs on gene expression, to the outcomes of drugs in clinical trials. These data are typically not connected together, which reduces the ease with which insights can be gained. Linking Open Drug Data (LODD) is a task force within the World Wide Web Consortium's (W3C) Health Care and Life Sciences Interest Group (HCLS IG). LODD has surveyed publicly available data about drugs, created Linked Data representations of the data sets, and identified interesting scientific and business questions that can be answered once the data sets are connected. The task force provides recommendations for the best practices of exposing data in a Linked Data representation. In this paper, we present past and ongoing work of LODD and discuss the growing importance of Linked Data as a foundation for pharmaceutical R&D data sharing.Entities:
Year: 2011 PMID: 21575203 PMCID: PMC3121711 DOI: 10.1186/1758-2946-3-19
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
The current LODD datasets.Further information about content and accessibility (URIs, SPARQL endpoints) of these linked datasets can be found online at [27].
| Name | Short Description | Size and coverage (rounded) | Sources | Provider (1. original dataset, 2. RDF version of dataset) |
|---|---|---|---|---|
| DrugBank | Chemical, pharmacological and pharmaceutical drug data; data about drug targets (e.g., sequences, structure, pathways) | 767,000 triples; 4,800 drugs, 2,500 protein sequences | Aggregated from various biomedical and pharmaceutical databases | 1. University of Alberta |
| ClinicalTrials.gov/LinkedCT | Information about clinical trials | 9.8 million triples, 80,000 trials | Data submitted by study sponsors or their representatives | 1. US National Institute of Health |
| DailyMed | Information about approved prescription drugs, including FDA approved labels (package inserts) | 164,000 triples; 4,000 drugs | Package inserts, data from the US food and drug administration (FDA) | 1. US National Library of Medicine |
| ChEMBL | Information on drugs, e.g., activity against drug targets such as proteins, chemical properties. Linked to primary literature | 24 million triples; 8000 drug targets, 660,000 compounds | Aggregated from various biomedical and pharmaceutical databases | 1. European Bioinformatics Institute |
| Diseasome | Characteristics of disorders and disease genes linked by known disease-gene associations | 91,000 triples; 2,600 genes | Generated from data in | 1. Consortium of several labs |
| TCMGeneDIT/RDF-TCM | Gene-disease-drug associations mined from literature about Chinese medicine | 117,000 triples | Mined from research articles | 1. National Taiwan University |
| RxNorm | Prescription drugs, their ingredients, and national drug codes | 7.7 million triples; 166,000 unique drugs and ingredients | FDA databases | 1. US National Library of Medicine |
| UMLS | Unified Medical Language System (UMLS) sources available without restrictions | 55 million triples | Ontologies created by third parties | 1. US National Library of Medicine |
| SIDER | Reported adverse effects of marketed drugs | 193,000 triples; 63,000 adverse effect reports | Mined package inserts | 1. European Molecular Biology Laboratory, Heidelberg |
| STITCH | Molecular interactions between chemicals and proteins | 7.5 million chemicals, 500,000 proteins, 370 organisms | Aggregated from various biomedical and pharmaceutical databases | 1. European Molecular Biology Laboratory, Heidelberg |
| Medicare | The Medicare formulary | 44,500 triples; 6800 drugs | Primary data | 1. US Government |
| WHO Global Health Observatory | Data and statistics for infectious diseases at country, regional, and global levels. | 354,000 triples | Primary data collected by the World Health Organization | 1. World Health Organization |
Statistics about size and coverage were last checked on March 24, 2011.
Figure 1A graph of some of the LODD datasets (dark grey), related biomedical datasets (light grey), related general-purpose datasets (white) and their interconnections. Line weights correspond to the number of links. The direction of an arrow indicates the dataset that contains the links, e.g., an arrow from A to B means that dataset A contains RDF triples that use identifiers from B. Bidirectional arrows usually indicate that the links are mirrored in both datasets.
Figure 2TripleMap www.triplemap.com is a web-based application that provides a rich, dynamic, visual interface to integrated RDF datasets such as the LODD. On the left hand side of the application a researcher uses an icon-based menu representing biomedical entities such as compounds, diseases and assays to search for entities and view their associations. Entities can be dragged and dropped from the icon menu into the application's zoomable workspace. In the middle of the application the user navigates maps of entities and their associations in the zoomable workspace much like users of Google Maps are able to scan and zoom into and out of geographically based maps. On the right hand side of the application the user can view an integrated set of all of the available properties for a selected entity. As entities are added to the workspace the system automatically generates semantically tagged edges between associated entities.