| Literature DB >> 29261751 |
A S M Ashique Mahmood1, Shruti Rao2, Peter McGarvey2,3, Cathy Wu1,3,4, Subha Madhavan2,5, K Vijay-Shanker1.
Abstract
Tumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for 'best-fit' therapies and readily generate hypotheses for new clinical trials.Entities:
Mesh:
Year: 2017 PMID: 29261751 PMCID: PMC5738129 DOI: 10.1371/journal.pone.0189663
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Schematic diagram of the developed system.
List of tools used for different entity detections.
| Entity | Tool |
|---|---|
| Gene | Pubtator + PGenN |
| Disease | Pubtator |
| Drug | Pubtator |
| Mutation | DiMeX |
| Copy Number Variation | eGARD |
| Gene expression | eGARD |
| Drug response | eGARD |
Performance of our system in finding the association of a genomic anomaly with drug responses (
| TP | FP | FN | Precision | Recall | F-measure | |
|---|---|---|---|---|---|---|
| 128 | 7 | 21 | 0.95 | 0.86 | 0.90 |
Performance of our system in separating non-relevant abstract from relevant abstracts in InHouseSet2.
| TP | FP | FN | TN | Precision | Recall | F-measure | TNR | |
|---|---|---|---|---|---|---|---|---|
| 28 | 2 | 10 | 60 | 0.93 | 0.74 | 0.82 | 0.97 |
Performance of our system in finding the association between genomic anomaly and drug responses in PharmGKB dataset.
| TP | FN | Recall | |
|---|---|---|---|
| 59 | 17 | 0.78 | |
| 50 | 15 | 0.77 | |
| 109 | 32 | 0.77 |
Characteristics of the extracted results from the large scale run.
The abstracts were retrieved for 50
| Characteristics | Counts |
|---|---|
| Abstracts | 35,677 |
| Abstracts with at least one associations | 7,309 |
| Total anomaly-drug response associations | 20,282 |