| Literature DB >> 27760149 |
Sunwon Lee1, Donghyeon Kim1, Kyubum Lee1, Jaehoon Choi1, Seongsoon Kim1, Minji Jeon1, Sangrak Lim1, Donghee Choi1, Sunkyu Kim1, Aik-Choon Tan2, Jaewoo Kang1.
Abstract
As the volume of publications rapidly increases, searching for relevant information from the literature becomes more challenging. To complement standard search engines such as PubMed, it is desirable to have an advanced search tool that directly returns relevant biomedical entities such as targets, drugs, and mutations rather than a long list of articles. Some existing tools submit a query to PubMed and process retrieved abstracts to extract information at query time, resulting in a slow response time and limited coverage of only a fraction of the PubMed corpus. Other tools preprocess the PubMed corpus to speed up the response time; however, they are not constantly updated, and thus produce outdated results. Further, most existing tools cannot process sophisticated queries such as searches for mutations that co-occur with query terms in the literature. To address these problems, we introduce BEST, a biomedical entity search tool. BEST returns, as a result, a list of 10 different types of biomedical entities including genes, diseases, drugs, targets, transcription factors, miRNAs, and mutations that are relevant to a user's query. To the best of our knowledge, BEST is the only system that processes free text queries and returns up-to-date results in real time including mutation information in the results. BEST is freely accessible at http://best.korea.ac.kr.Entities:
Mesh:
Year: 2016 PMID: 27760149 PMCID: PMC5070740 DOI: 10.1371/journal.pone.0164680
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overview of the BEST System.
The BEST system consists of two main parts: Indexing and Searching. (1) “Indexing” represents the indexing subsystem of BEST. For every document, BEST extracts all biomedical entities (1-a) and makes a paired posting (1-b). The basic structure of BEST’s index is similar to that of the inverted index of conventional search engines. However, BEST uses a different indexing unit, paired posting, which is a pair of a document ID and a list of entities that appear in the document. (2) “Searching” represents the search subsystem of BEST. All retrieved paired postings are aggregated to rank the entities (2-a). Ranking scores are computed using four subcomponents described in “Searching and Scoring” in the Methods section (2-b).
Fig 2BEST Interface.
Users can pose queries in the query window (a) and select result entity types either by using the drop down box (b) or by clicking on the entity-type filter tab (c). BEST returns a list of entities that are relevant to a user’s query. For each entity in the list, BEST shows a description (d), an interaction network (e), enriched GO terms (f) of the entity, and top 3 abstracts (g) in which the query terms and the entity co-occur.
Search result of query "imatinib resistance ABL1" with type filter "mutations."
| Rank | BEST result | BEST score | ABL1 mutation | Known Imatinib binding site in ABL1 [ | Acquired mutations found in CML patient resistant to imatinib |
|---|---|---|---|---|---|
| 1 | 24.520 | Yes | Yes | Yes | |
| 7.744 | Yes | Yes | Yes | ||
| 5.602 | Yes | Yes | |||
| 2.871 | Yes | Yes | |||
| 2.757 | Yes | Yes | |||
| 2.048 | Yes | Yes | |||
| 1.430 | Yes | Yes | |||
| 1.349 | Yes | Yes | |||
| 1.328 | Yes | Yes | |||
| 1.218 | Yes | Yes | Yes |
Fig 3BEST’s result of "MAP2K1" with type filter "genes."
Top 10 drugs returned for query "chronic myeloid leukemia."
| BEST | PolySearch2 | FACTA+ | FDA approved drugs for CML | ||
|---|---|---|---|---|---|
| Query response time | 0.024 s | 30 s | 0.01 s | ||
| 1 | |||||
| 2 | |||||
| 3 | |||||
| 4 | Interferon alpha | ||||
| 5 | |||||
| 6 | Bortezomib | ||||
| 7 | Valproic Acid | ||||
| 8 | Glutathione | ||||
| 9 | |||||
| 10 | Fludarabine | Flavopiridol | |||
| Number of retrieved FDA approved drugs for CML | |||||
*Note: Source: http://www.cancer.gov/about-cancer/treatment/drugs/leukemia#7
Accuracy and response time comparison of Best, PolySearch2, and FACTA+.
| BEST | PolySearch2 | FACTA+ | ||||
|---|---|---|---|---|---|---|
| Query | Precision@10 | Response time | Precision@10 | Response time | Precision@10 | Response time |
| “chronic myeloid leukemia” | 0.024s | 0.5 | 30s | 0.3 | 0.01s | |
| “lung cancer” | 0.116s | 0.9 | 30s | 0.4 | 0.09s | |
| “melanoma” | 0.058s | 0.1 | 28s | 0.1 | 0.07s | |
| “tyrosine kinase inhibitor” | 0.067s | 0.8 | 45s | 0.2 | 0.03s | |
| 0.066s | 0.58 | 33.25s | 0.23 | 0.05s | ||
Fig 4Recency evaluation of BEST using "(chronic myeloid leukemia) AND (year:[*—YYYY])" with result type filter “drug.”
Search results of drugs when more weight is given to the recency factor.
| Power of recency | ||||
|---|---|---|---|---|
| Rank | 0 | 1.0 | 2.0 | 4.0 |
| Imatinib | Imatinib | Imatinib | Imatinib | |
| Dasatinib | Dasatinib | Dasatinib | Dasatinib | |
| Nilotinib | Nilotinib | Nilotinib | Nilotinib | |
| Interferon α | Interferon α | Bosutinib | Bosutinib | |
| Hydroxyurea | Hydroxyurea | Hydroxyurea | Ponatinib | |
| Busulfan | Busulfan | Ponatinib | Hydroxyurea | |
| Cytarabine | Cyclophosphamide | Busulfan | Cyclophosphamide | |
| Cyclophosphamide | Cytarabine | Fludarabine | Fludarabine | |
| Fludarabine | Bosutinib | Cyclophosphamide | Busulfan | |
| Methotrexate | Fludarabine | Interferon α | Homoharringtonine | |
Source databases for BEST dictionary.
| Entity Type | Source Databases (URL) | Entity Group |
|---|---|---|
| Gene/Protein | NCBI Entrez Gene ( | Gene |
| Target | DrugBank ( | Gene |
| T3DB ( | ||
| Transcription Factor | Animal TFDB ( | Gene |
| Therapeutic Target Database ( | ||
| miRNA | miRBase ( | Gene |
| Chemical Compound | PubChem ( | Chem |
| Drug | DrugBank ( | Chem |
| US FDA Approved drugs ( | ||
| Toxin | T3DB ( | Chem |
| Disease | MeSH ( | Disease |
| Pathway | KEGG Pathway ( | Pathway |