| Literature DB >> 25925572 |
Yifeng Liu1, Yongjie Liang1, David Wishart2.
Abstract
PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25925572 PMCID: PMC4489268 DOI: 10.1093/nar/gkv383
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A screenshot montage of PolySearch2's query interface and result display showing (A) the PolySearch2 query submission form, (B) the advanced option page for further query refinement, (C) the PolySearch2 result overview table and (D) the detailed result page showing the supporting evidence for a single association.
Figure 2.PolySearch2's system overview showing the architecture of PolySearch2 web server, API and the underlying search engine.
Performance evaluation and feature comparison of PolySearch2 versus PolySearch
| PolySearch | PolySearch2 | |||||||
|---|---|---|---|---|---|---|---|---|
| Prediction accuracy | P | R | F | Accu. | P | R | F | Accu. |
| #1 Disease/gene | 0.6533 | 1.0000 | 0.7903 | 0.6533 | 0.8708 | 0.9091 | 0.8895 | 0.8525 |
| #2 Drug/gene | 0.7490 | 1.0000 | 0.8565 | 0.7490 | 0.9701 | 0.8351 | 0.8975 | 0.8571 |
| #3 Protein/protein | 0.8396 | 1.0000 | 0.9128 | 0.8396 | 0.9432 | 0.9326 | 0.9379 | 0.8962 |
| #4 Metabolite/gene | 0.7834 | 1.0000 | 0.8785 | 0.7834 | 0.9579 | 0.8619 | 0.9074 | 0.8614 |
| #5 Drug/adverse effects | - | - | - | - | 0.9233 | 0.8022 | 0.8585 | 0.7737 |
| #6 Toxin/disease | - | - | - | - | 0.9054 | 0.7864 | 0.8417 | 0.7810 |
| #7 Toxin/adverse effects | - | - | - | - | 0.8808 | 0.6822 | 0.7689 | 0.7854 |
| Thesaurus categories | Nine categories | 20 categories | ||||||
| Thesaurus terms | 57 706 terms with 353 862 synonyms | 1 131 328 terms with 2 848 936 synonyms | ||||||
| Filter words | 7011 | 29 718 | ||||||
| Database numbers | One free-text collection and six databases | Six free-text collections and 14 databases | ||||||
| Number of search types | 66 query combinations | 308 query combinations | ||||||
| Analysis speed | 6.5 documents per second | 165 documents per second | ||||||
| Mobile friendly? | No | Yes | ||||||
P stands for Precision, R stands for Recall, F stands for F-measure, and Accu. stands for accuracy. All evaluation datasets are available on PolySearch2's download page. Evaluation #1 assesses PolySearch2's ability to identify disease–gene association. Evaluation #2 assesses PolySearch2's ability to identify drug–gene/protein associations. Evaluation #3 assesses PolySearch2's ability to identify protein–protein interactions. Evaluation #4 assesses PolySearch2's metabolite–gene associations. Evaluation #5 assesses PolySearch2's ability to identify drugs with significant adverse effects, or ‘dangerous drugs’. Evaluation #6 assesses PolySearch2's ability to identify toxin–disease association. Finally, evaluation #7 evaluates PolySearch2's ability to identify toxin–adverse effect associations. Analysis speed is calculated based on multiple runs on a query with 10 000 relevant documents.