| Literature DB >> 28069052 |
Abstract
BACKGROUND: Drug ontologies could help pharmaceutical researchers overcome information overload and speed the pace of drug discovery, thus benefiting the industry and patients alike. Drug-disease relations, specifically drug-indication relations, are a prime candidate for representation in ontologies. There is a wealth of available drug-indication information, but structuring and integrating it is challenging.Entities:
Keywords: Drug indications; Drug information integration; Drug ontologies; Drug-disease relations; UMLS; WHO-ATC
Mesh:
Year: 2017 PMID: 28069052 PMCID: PMC5223332 DOI: 10.1186/s13326-016-0110-0
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Example USAN raw data
Source contributions of drug-indication data
| source abbrev | source name or description | subset if any | version/date | number of drug-indication pairs | ||
|---|---|---|---|---|---|---|
| initial | filtered | parsed | ||||
| ChEBI | Chemicals of Biological Interest Ontology | has_role relations | 104/June 1, 2013 | 16,415 | 8,598 | 8,598 |
| CTD | Comparative Toxicogenomics Database | Chemicals-Diseases Associations,“direct evidence” subset | May 2, 2014 | 82,000 | 81,214 | 81,214 |
| DailyMed | NLM’s database of FDA package inserts | single component title (product name) & Indications sections with tractable text length (<540) | March 20, 2011 | 15,834 | 1,612 | 3,840 |
| DrugBank | U. Alberta open access DB of drug target and other info | title (drug name) & Indications sections | 3.0/2011 | 1,599 | 1,595 | 6,004 |
| MeSH PA | Medical Subject Headings Pharmacologic Action relations | 2013/Dec. 3, 2012 | 26,293 | 25,847 | 25,908 | |
| NDFRT | National Drug Formulary Reference Terminology | may_treat & may_prevent relations | 2009AA (UMLS) | 50,775 | 5,294 | 5,294 |
| PDR | Physicians’ Desk Reference | Section | 2006 | 3,150 | 1,204 | 2,169 |
| USAN_TC | United States Adopted Names Therapeutic Claims | March 31, 2014 (eVOC) | 6,569 | 5,954 | 7,234 | |
| WHO_ATC | World Health Organization Anatomic-Therapeutic-Chemical, Defined Daily Dose index | 2005 | 16,276 | 7,807 | 9,004 | |
| WHO_DD | World Health Organization Drug Dictionary | single generic compounds with ATC codes (minus 2005 WHO-ATC overlap and herbals BNA = “9…”) | Sept. 2013 | 40,736 | 21,764 | 25,674 |
| evoc_ATC | WHO-ATC codes in Merck’s eVOC | single generic compounds with ATC codes (minus WHO-ATC & WHO-DD overlap) | May 6, 2014 | 65,552 | 16,269 | 19,093 |
The numbers refer to candidate drug-indication pairs in the initial raw data extract (initial), after filtering for internal redundancy, relevance, and/or tractability (filtered), and after parsing of free text into single concepts (parsed) as described in the main text. The “filtered” count is the number of unique pairs of raw drug name (DID column D) and indication “entire value/string” (column AQ), while the “parsed” count is the number of unique pairs of raw drug name and indication “target/substring” (column AR). evoc_eProj data are not shown
Comparison of sources’ coverage of unique drug names, indication terms, and drug-indication relations after normalization
| Source | %normalized | ||
|---|---|---|---|
| drug | indication | drug-indication pairs | |
| CTD | 33 | 49 | 49 |
| MeSH_PA | 27 | 6 | 14 |
| WHO_DD | 28 | 5 | 14 |
| evoc_ATC | 26 | 5 | 11 |
| ChEBI | 17 | 8 | 5 |
| WHO_ATC | 11 | 5 | 5 |
| DrugBank | 6 | 34 | 4 |
| USAN_TC | 23 | 18 | 4 |
| NDFRT | 6 | 16 | 3 |
| DailyMed | 4 | 23 | 2 |
| evoc_eProj | 3 | 6 | 1 |
| PDR | 3 | 5 | 1 |
Percentages are relative to total counts of 25,278 unique normalized drug names, 6,228 unique normalized indication terms, and 167,087 unique normalized drug-indication relations
Comparison of sources’ overlapping coverage of unique drug names, indication terms, and drug-indication relations before and after normalization
| source | raw | normalized | change | ||||||
|---|---|---|---|---|---|---|---|---|---|
| drug | indic (target) | drug-indic pairs | drug | indic | drug-indic pairs | drug | indic | drug-indic pairs | |
| All | 1.64 | 1.30 | 1.02 | 1.87 | 1.80 | 1.14 | 0.23 | 0.50 | 0.12 |
| evoc_ATC | 1.62 | 3.40 | 1.03 | 2.09 | 6.77 | 1.38 | 0.47 | 3.37 | 0.35 |
| WHO_ATC | 3.81 | 3.37 | 1.07 | 4.66 | 6.67 | 1.96 | 0.85 | 3.30 | 0.89 |
| WHO_DD | 1.91 | 3.29 | 1.03 | 2.59 | 6.54 | 1.40 | 0.68 | 3.25 | 0.37 |
| MeSH_PA | 2.81 | 1.33 | 1.03 | 3.08 | 4.54 | 1.43 | 0.27 | 3.21 | 0.40 |
| PDR | 5.56 | 1.60 | 1.17 | 6.14 | 4.68 | 2.32 | 0.58 | 3.08 | 1.15 |
| evoc_eProj | 1.00 | 1.80 | 1.00 | 2.21 | 4.17 | 1.44 | 1.21 | 2.37 | 0.44 |
| ChEBI | 2.51 | 1.14 | 1.01 | 3.11 | 3.40 | 1.72 | 0.60 | 2.26 | 0.71 |
| USAN_TC | 3.03 | 1.47 | 1.02 | 3.34 | 3.30 | 1.81 | 0.31 | 1.83 | 0.79 |
| DailyMed | 4.68 | 1.60 | 1.15 | 5.18 | 2.79 | 1.48 | 0.50 | 1.19 | 0.33 |
| NDFRT | 5.46 | 2.45 | 1.41 | 6.24 | 3.61 | 1.76 | 0.78 | 1.16 | 0.35 |
| DrugBank | 5.63 | 1.49 | 1.20 | 6.24 | 2.62 | 1.70 | 0.61 | 1.13 | 0.50 |
| CTD | 2.41 | 1.53 | 1.03 | 2.62 | 2.06 | 1.08 | 0.21 | 0.53 | 0.05 |
Numbers represent the average number of sources sharing each term or term pair, computed within each source’s coverage. For example, the low outlier raw drug name score of 1.00 for evoc_eProj means that system only shares its raw drug names with itself, reflecting the use of Merck company codes as preferred terms in its internal data systems. When these are normalized, as much as possible, to public domain generic names, the score rises to 2.21; that is, these generic names representing evoc_eProj content are shared with enough other DID normalized content to push the non-self average from zero up to 1.21 (=2.21–1.00) even though some company codes do not yet have public domain generic names. Data are sorted in descending order of the change indication scores (column 9; ”change/indic”) for the individual sources
Fig. 2Zipf distributions of sources’ overlapping coverage of unique drug names, indication terms, and drug-indication relations before and after normalization