| Literature DB >> 35083039 |
Finn Kuusisto1, Daniel Ng2, John Steill1, Ian Ross2, Miron Livny1,2, James Thomson1,3,4, David Page5, Ron Stewart1.
Abstract
Many important scientific discoveries require lengthy experimental processes of trial and error and could benefit from intelligent prioritization based on deep domain understanding. While exponential growth in the scientific literature makes it difficult to keep current in even a single domain, that same rapid growth in literature also presents an opportunity for automated extraction of knowledge via text mining. We have developed a web application implementation of the KinderMiner algorithm for proposing ranked associations between a list of target terms and a key phrase. Any key phrase and target term list can be used for biomedical inquiry. We built the web application around a text index derived from PubMed. It is the first publicly available implementation of the algorithm, is fast and easy to use, and includes an interactive analysis tool. The KinderMiner web application is a public resource offering scientists a cohesive summary of what is currently known about a particular topic within the literature, and helping them to prioritize experiments around that topic. It performs comparably or better to similar state-of-the-art text mining tools, is more flexible, and can be applied to any biomedical topic of interest. It is also continually improving with quarterly updates to the underlying text index and through response to suggestions from the community. The web application is available at https://www.kinderminer.org. Copyright:Entities:
Keywords: KinderMiner; Text mining; web application
Year: 2020 PMID: 35083039 PMCID: PMC8756297 DOI: 10.12688/f1000research.25523.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. A diagrammatic example of KinderMiner for the key phrase “embryonic stem cell” and target term “NANOG.”
Figure 2. Users enter a search for a particular key phrase and list of target terms.
Figure 3. Users can dynamically filter the results for a query using the p-value slider.
iPS cell transcription factor search.
Landmark factors are highlighted in blue (duplicates in orange) and the bottom row shows Recall@20. All methods find a sufficient set of factors (POU5F1 and SOX2). Note that KinderMiner and BEST have been censored to articles published through 2004, whereas the other methods have no such censoring, giving them the advantage of access to the landmark papers and more.
| KM-2004 | BEST-2004 | FACTA+ | Polysearch2 |
|---|---|---|---|
| NANOG | POU5F1 | Oct4 [POU5F1] | ESCS |
| UTF1 | LBX1 | OCT4 | OCT3
[POU5F1]
|
| POU5F1 | TP53 | Nanog | |
| TCF7 | TBX1 | histone | |
| FOXD3 | GATA1 | insulin | DAZ homolog |
| DNMT3L | FOS | SOX2 | Bladder cancer related protein XHL |
| SOX2 | MYC | alkaline phosphatase | Acetyl-CoA carboxylase biotin holoenzyme
|
| PITX3 | STAT3 | NANOG | BMP-2B |
| MYF6 | RUNX1 | collagen | JARID-2 |
| HIF1A | JUN | p53 | FOXD-3 |
| SOX1 | HOXB4 | nestin | E2A/HLF fusion gene |
| PDX1 | HIF1A | CD34 | LIN-41 |
| PAX4 | MSC | cytokine | Epithelial zinc finger protein EZF [KLF4] |
| HOXB3 | PAX3 | leukemia inhibitory factor | APRF |
| HMGA1 | MYF5 | osteogenic | MIRN410 |
| LMO2 | NEUROD1 | catenin | HRIHFB2060 |
| OLIG2 | SOX2 | gut | ERG associated protein with SET domain |
| DNMT1 | PDX1 | erythroid | DMTase |
| RUNX1 | SPI1 | c-Myc | BIG-3 |
| HOXB4 | SP1 | Leukemia inhibitory factor | ER71 |
|
|
|
|
|
Cardiomyocyte transcription factor search.
Landmark factors are highlighted in blue and the bottom row shows Recall@20. Note that KinderMiner and BEST have been censored to articles published through 2008, whereas the other methods have no such censoring, giving them the advantage of access to the landmark papers and more.
| KM-2008 | BEST-2008 | FACTA+ | Polysearch2 |
|---|---|---|---|
| GATA4 | HLHS2 | caspase-3 | Adenovirus E4 gene transcription
|
| NKX2-5 | NFKB1 | collagen | Apopain |
| TBX18 | AR | angiotensin II | FNDC-5 |
| HDAC9 | JUN | Bcl-2 | ADCAD-1 |
| TBX20 | MSC | ATP | BAG family molecular chaperone
|
| NFATC4 | TLX2 | insulin | Cytoplasmic nuclear factor of
|
| GATA5 | GATA4 | p38 | APRF |
| TBX5 | TP53 | Ang II | GGF-2 |
| ISL1 | STAT3 | sarcomeric | GATA binding factor 4 |
| HAND2 | PPARA | cardiac muscle | FK506 binding protein 12 rapamycin
|
| MEF2C | FOS | cytokine | T box 20 |
| NFATC3 | NR3C2 | natriuretic peptide | 5’-AMP-activated protein kinase
|
| HDAC5 | HIF1A | ERK1 | KKLF |
| FOXO3A | IRF6 | myosin heavy chain | T box 5 |
| GATA6 | MEF2A | lactate dehydrogenase | Antigen NY-CO-9 |
| MEF2A | FOSB | endoplasmic reticulum | HMOX-1 |
| ILK | SRF | atrial natriuretic peptide | CASZ-1 |
| SRF | POU5F1 | MAPK | AMPH-2 |
| STAT3 | TBX5 | ATPase | DMDL |
| MSC | PPARG | tumor necrosis factor | NAD-dependent deacetylase sirtuin |
|
|
|
|
|
Hepatocyte transcription factor search.
Landmark factors are highlighted in blue and the bottom row shows Recall@20. Note that KinderMiner and BEST have been censored to articles published through 2009, whereas the other methods have no such censoring, giving them the advantage of access to the landmark papers and more.
| KM-2009 | BEST-2009 | FACTA+ | Polysearch2 |
|---|---|---|---|
| HNF4A | NFKB1 | hepatocyte growth factor | Acetyl-CoA carboxylase biotin
|
| HNF1A | IRF6 | albumin | HNF-4 |
| HNF1B | TP53 | insulin | ABC16 |
| TCF2 | HNF4A | cytokine | F TCF |
| TCF1 | MYC | c-Met | ABC30 |
| FOXA3 | JUN | collagen | EGF receptor |
| NR1I3 | PPARA | HGF | 5’-AMP-activated protein kinase
|
| NR0B2 | ESR1 | epidermal growth factor | AQP-7 |
| FOXA2 | HNF1A | VEGF | APRF |
| NR1I2 | STAT3 | cytochrome P450 | FABP-1 |
| NR1H4 | NR3C1 | alanine aminotransferase | ACT2 |
| IPF1 | FOSB | tumor necrosis factor | HAMP |
| FOXA1 | NR1I2 | scatter factor | Apopain |
| FOXF1 | AHR | endoplasmic reticulum | HGF receptor |
| PBX2 | FOS | Met | C8FW |
| NEUROD1 | PPARG | MET | NR1C1 |
| PROX1 | MBD2 | aspartate aminotransferase | CPE-1 |
| ALF | ONECUT1 | ATP | NTCP |
| PAX4 | HNF1B | IL-6 | KLHL-1 |
| FOXO1A | FKHL16 | caspase-3 | SREBF-1 |
|
|
|
|
|