| Literature DB >> 28815126 |
Finn Kuusisto1, John Steill1, Zhaobin Kuang2, James Thomson1,2, David Page2, Ron Stewart1.
Abstract
We present a simple text mining method that is easy to implement, requires minimal data collection and preparation, and is easy to use for proposing ranked associations between a list of target terms and a key phrase. We call this method KinderMiner, and apply it to two biomedical applications. The first application is to identify relevant transcription factors for cell reprogramming, and the second is to identify potential drugs for investigation in drug repositioning. We compare the results from our algorithm to existing data and state-of-the-art algorithms, demonstrating compelling results for both application areas. While we apply the algorithm here for biomedical applications, we argue that the method is generalizable to any available corpus of sufficient size.Entities:
Year: 2017 PMID: 28815126 PMCID: PMC5543342
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1:Visual example of KinderMiner, with contingency table and associated Fisher’s Exact Test (FET) analysis of the key phrase “embryonic stem cell” and the target term “NANOG.” Target terms are filtered by significance of co-occurrence with the key phrase and then sorted by the co-occurrence ratio.
Top 20 hits for each of our cell reprogramming queries. Hits that match the landmark papers are highlighted in gray, and hits that match transcription factors predicted by Mogrify are marked with *.
Top 50 drug and device hits for our drug repositioning query against the key phrase “hypoglycemia.” Hits that are diabetes drugs are not highlighted. Hits that are not diabetes drugs, but which are known to decrease blood sugar are highlighted in green, and hits that increase blood sugar are highlighted in red. Hits that are not diabetes drugs, but were also not in our labeled list, are highlighted in gray. Hits that are exact matches to those in Kuang et al. 18 are marked with *.