| Literature DB >> 26306243 |
Guoqian Jiang1, Sunghwan Sohn1, Michael T Zimmermann1, Chen Wang1, Hongfang Liu1, Christopher G Chute1.
Abstract
Heterogeneous drug data representation among different druggable genome knowledge resources and datasets delays effective cancer therapeutic target discovery within the broad scientific community. The objective of the present paper is to describe the challenges and lessons learned from our efforts in developing and evaluating a standards-based drug normalization framework targeting cancer druggable genome datasets. Our findings suggested that mechanisms need to be established to deal with spelling errors and irregularities in normalizing clinical drug data in The Cancer Genome Atlas (TCGA), whereas the annotations from NCI Thesaurus (NCIt) and PubChem are two layers of normalization that potentially bridge between the clinical phenotypes and the druggable genome knowledge for effective cancer therapeutic target discovery.Entities:
Year: 2015 PMID: 26306243 PMCID: PMC4525232
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
The examples of multiple drug names mapped to a single NCIt code.
| dataset | drug name in dataset | UMLS CUI | NCIt Code |
|---|---|---|---|
| DGIdb | SODIUM PHENYLBUTYRATE | C0718066 | 440 |
| PHENYLBUTYRATE | |||
| MS-275 | C1510480 | C1863 | |
| ENTINOSTAT | |||
| SNDX-275 | |||
| VATALANIB | C0912586 | C1868 | |
| PTK787/ZK 222584 | |||
| TCGA [GBM] | EMD 121974 | C0971473 | C1834 |
| Cilengitide | |||
| Bevacizumab | C0796392 | C2039 | |
| Avastin | |||
| BCNU | C0007257 | C349 | |
| Carmustine | |||
| Gliadel | |||
| Carmustin |
Evaluation results on both DGIdb (for 1835 matched drug names) and TCGA GBM datasets.
| Drug data | precision | recall | F-measure |
|---|---|---|---|
| DGIdb | 100% | – | – |
| TCGA GBM | 99.0% | 51.3% | 67.6% |
| TCGA GBM (after spelling correction) | 99.0% | 98.0% | 98.5% |