Literature DB >> 28815126

A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications.

Finn Kuusisto¹, John Steill¹, Zhaobin Kuang², James Thomson^1,2, David Page², Ron Stewart¹.

Abstract

We present a simple text mining method that is easy to implement, requires minimal data collection and preparation, and is easy to use for proposing ranked associations between a list of target terms and a key phrase. We call this method KinderMiner, and apply it to two biomedical applications. The first application is to identify relevant transcription factors for cell reprogramming, and the second is to identify potential drugs for investigation in drug repositioning. We compare the results from our algorithm to existing data and state-of-the-art algorithms, demonstrating compelling results for both application areas. While we apply the algorithm here for biomedical applications, we argue that the method is generalizable to any available corpus of sufficient size.

Entities: CellLine Chemical Disease Gene Species

Year: 2017 PMID： 28815126 PMCID： PMC5543342

Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc

Introduction

Many scientific discoveries are often subject to lengthy processes of trial and error before important and meaningful results are found. For example: In biology, determining a set of defined transcription factors for differentiating or reprogramming cell types requires trying numerous combinations from lists of factors. The combinatorial growth of the search space quickly leads to intractability. In medicine, discovering off-label uses of approved drugs can take years of collecting observational data and running post-approval trials. Once again, the search becomes time-consuming due to the enormous number of pairs of drugs and effects. Similarly in medicine, detecting adverse drug events can require extensive observational data to detect potential correlations between drugs and events. Because the search spaces are so large, proper prioritization of research directions in these cases is essential to reaching novel discoveries quickly, but this requires both extensive breadth and depth of knowledge within the domain. Furthermore, due to exponential growth in scientific literature, [1,2] it is becoming continually more challenging to keep up with current knowledge in any particular domain. We present a general text mining approach to address this prioritization problem by ranking a list of target terms (e.g. transcription factors or drugs) by their association with a key phrase (e.g. “embryonic stem cell” or “hypoglycemia”). This list provides researchers with a starting point for entering the literature domain and prioritizing potential research directions, thereby accelerating the discovery process. Our method is easy to implement, requires minimal data collection and preparation, and is easy to use. To produce our ranked list of target terms associated with a key phrase, we leverage the vast collective knowledge available within the published scientific and medical literature. We use simple keyword matching and document counting to automatically identify significant correlations and rank them by their co-occurrence proportion. Owing to its simplicity, we call our method KinderMiner. While we can imagine several applications of our approach, we focus our attention on the two former examples given above: determining important transcription factors for cell reprogramming and discovering off-label uses of approved drugs. To assess our approach, we compare rankings produced by our approach with three cell reprogramming tasks that have experimentally proven sets of defined factors from landmark publications. For fairness, we censor the literature in our experiments to publications from roughly two years prior to the relevant landmark publications. We also apply our approach to the task of discovering drugs that may be repurposed for reducing blood glucose. In both cases, we show that our method is able to reproduce sufficient sets of defined factors and many relevant drugs within the top hits, suggesting that our method will likely be useful in accelerating the discovery process.

The KinderMiner Algorithm

Algorithm 1 breaks KinderMiner down step-by-step. At a high level, KinderMiner ranks a list of target terms by their association with a specified key phrase. It does this via keyword matching and document counting within a specified, relevant, searchable text corpus. Algorithm 1 The KinderMiner algorithm. First, KinderMiner requires a large corpus of documents for querying. While we focus on corpora of scientific literature, the corpus could also be a collection of plain text patient records taken from an electronic health record, a twitter feed, blog posts, or any other large indexed collection of plain text documents. The corpus must be queryable for document counts with exact matching of words and phrases. For evaluation purposes, it is also useful if the document queries can be date censored, reducing counts of documents to only those that have been published within a specified date range. This is not required, however. Second, the user must specify a list of target terms to be ranked by their association with a specified key phrase. For example, for one of our cell reprogramming applications, we specify a list of transcription factors and rank them by their association with the key phrase “embryonic stem cell.” The goal of this query is to identify the factors necessary for inducing an embryonic stem cell-like state. See Figure 1 for a more visual representation of this set of queries.

Figure 1:

Visual example of KinderMiner, with contingency table and associated Fisher’s Exact Test (FET) analysis of the key phrase “embryonic stem cell” and the target term “NANOG.” Target terms are filtered by significance of co-occurrence with the key phrase and then sorted by the co-occurrence ratio.

Next, for each target term, KinderMiner queries the corpus for documents that contain both, either, and neither the target term and the key phrase, producing a contingency table of document counts. KinderMiner then performs a one-sided Fisher’s exact test on the resultant contingency table, and filters out target term, key phrase pairs that do not meet a prespecified significance level. KinderMiner uses the one-sided Fisher’s exact test to assess significance only in the direction that there are more articles that contain both key phrase and target. Finally, the selected target terms are ranked by the ratio of documents containing both the target term and the key phrase, over the total of those containing the key phrase; that is, they are ranked by the proportion of documents containing the target term that also contain the key phrase. A great deal of work has been devoted to mining the biomedical literature. Our simple approach is related to prior work on co-occurrence statistics and relationship extraction [3,4] which often constrains search to particular types of relationships or relies on more sophisticated techniques such as part-of-speech tagging and named entity recognition. KinderMiner simply constrains the search space by relying on exact text matches to an input key phrase and target terms. Of course, KinderMiner could almost certainly benefit from NLP techniques such as text normalization and named entity recognition. Nevertheless, our goal with this work is to address whole literature information extraction using the simplest approach we can imagine to rank potential associations, using readily available tools and sources of data, and requiring little to no data annotation or processing. Despite its lack of sophistication, we find that our approach performs well when presented with a large corpus. In the next two sections, we motivate two different applications, cell reprogramming and drug repositioning respectively, and evaluate the KinderMiner algorithm in the context of these applications. We selected these particular applications not only for their significance to science and medicine, but also because of the availability of reasonable ground truth against which we can compare KinderMiner’s findings.

Cell Reprogramming Applications

An increasingly common task in modern biology is the process of taking cells of one type and reprogramming them to exhibit the characteristics of another cell type. Reprogramming in this case often involves introducing a set of transcription factors that put the source cells on track to behave like a different target cell type. A particularly important example of reprogramming is that of somatic cells to an induced pluripotent stem (iPS) cell as iPS cells behave like embryonic stem cells, wherein they have the potential to differentiate into nearly all fetal or adult cell types. [5,6,7,8] Reprogramming can also be accomplished through transdifferentiation, which is when one somatic cell type is directly converted into another somatic cell type. [9] Reprogramming is important because researchers often need particular cell types to create models, study the effects of disease, develop therapies, or perform basic science, but primary cells of certain types are not always available in abundant quantities, if at all. Altering the expression of transcription factors is also useful in the maturation of cells. For instance, methods exist for differentiating and culturing immature hepatocytes, the main cells of the liver responsible for metabolism of drugs and toxins, but these cells are difficult to mature. Immature hepatocytes cannot serve as reasonable surrogates for hepatocyte function, drug toxicity, or metabolism. Recent publications [10,11] describe methods for partial maturation of hepatocytes using transcription factors. For similar reasons, having methods for differentiating cardiomyocytes, muscle cells of the heart, is useful, and transcription factor sets for differentiating cells into cardiomyocytes have recently been described. [12,13] Determining a set of important transcription factors for converting one cell type into another is, however, a challenging task that involves a great deal of domain expertise as well as trial and error. There are roughly 2,000 transcription factors to choose from, [14] and researchers must rely on their reading of the literature and intuition to decide which combinations to try and in what order. This search is time consuming, and we propose that our algorithm can assist researchers by accelerating the trial and error process. Instead of trying combinations from the entire list of transcription factors based on intuition, researchers can prioritize their experiments by exploring a much smaller number of possible combinations from only the top ranked factors provided by our algorithm. To demonstrate our algorithm in this domain, we refer to three well-established sets of factors for reprogramming. The first is for creating induced pluripotent stem cells (iPS cells), the second is for creating cardiomyocytes, and the third is for the maturation of hepatocytes. We use our algorithm to mine scientific and medical literature and rank a list of transcription factors by correlation with the key phrases “embryonic stem cell,” “cardiomyocyte,” and “hepatocyte.” We then compare the top hits in each list with the experimentally determined factors known to produce cells representative of these cell states. For fairness, we censor the literature available to our algorithm by roughly two years in advance of the earliest publications that demonstrate these conversions.

Drug Repurposing Application

Despite increases in R&D spending, the biopharmaceutical industry has struggled to improve cost and throughput of de novo drug discovery. 15 Due to advances in key technologies and the increasing availability of data, drug repositioning, the detection of new uses for existing drugs, has become more feasible. [16] Furthermore, repositioned drugs do not require a costly development process and can reach clinical trials much faster than traditionally developed drugs. These advantages have led repositioned drugs to constitute approximately 30% of drugs and vaccines newly approved by the US Food and Drug Administration [17]. There have been several computational drug repositioning (CDR) approaches proposed. Computational methods often rely on heterogeneous data sources containing genetic and phenotypic information, drug molecular structure, elec tronic health records, or plain-text literature as we do here. [16,18,19] We propose that our algorithm is a useful addition to the CDR toolbox, despite being far simpler than other methods. To demonstrate our algorithm in this domain, we focus on the task of identifying drugs that may reduce blood glucose. We use our algorithm to mine the literature and rank a list of drugs and devices by correlation with the key phrase “hypoglycemia” (i.e. low blood sugar). We manually assess how well our method is able to identify drugs and devices that are specifically used to treat diabetes in the top hits, and then assess the potential of those top hits that are not specifically for treatment of diabetes. We do not censor the date for this task.

Materials and Methods

For our experiments, we used the Europe PMC (EPMC) corpus. [20] We implemented our queries with EPMC’s RESTful API, using the profile search module with counts coming taken from the ALL publication type. We form our queries using quoted, exact matches for both the target terms and key phrases, and we use the FIRST PDATE parameter to censor publication year from 1900 through the specified year. For example, a query for co-occurrence of the term NANOG and key phrase “embryonic stem cell,” censored to the end of 2004, would appear as follows: ‘‘NANOG’’ AND “embryonic stem cell’’ AND (FIRST PDATE:[1900-01-01 TO 2004-12-31]) At time of writing, the EPMC corpus contains a total of approximately 27.5 million publications. Approximately 20 million of the articles were published during or before 2008 and 17 million were published during or before 2004. For our cell reprogramming applications, we query our lab’s list of 2,243 transcription factors against the key phrases “embryonic stem cell,” “cardiomyocyte,” and “hepatocyte.” We use a one-sided FET p-value threshold of 1 × 10-5. We collect the top 20 transcription factors from each of these queries and use two standards for comparison. First, we search our top factors for factors from landmark publications that have previously been shown experimentally to reprogram somatic cells to iPS cells, cardiomyocytes, and to partially mature hepatocytes. Specifically, the relevant factors we consider for iPS cells are MYC, KLF4, LIN28, NANOG, POU5F1, and SOX2.[6,8,7] The relevant factors we consider for cardiomyocytes are GATA4, HAND2, MEF2C, NKX2-5, and TBX5.[12,13] The relevant factors for hepatocyte maturation are GATA4, HNF1A, FOXA3, FOXA2, HNF4A, CEBPB, and MYC.[10,11] Second, we identify our top selected transcription factors that are also indicated as being relevant by the Mogrify algorithm, a state-of-the-art algorithm to predict transcription factors for reprogramming between several cell types. [21] Mogrify starts from gene expression data to score differentially expressed genes between cell types of interest and background expression levels. It then combines these differential expression scores with regulatory network infor mation to rank transcription factors in each cell type by regulatory influence. Finally, Mogrify selects optimal sets of transcription factors with the greatest regulatory influence over differentially expressed genes in a given target cell type in comparison to a given starting cell type. Importantly, Mogrify requires a large amount of processed data that may not be readily available and would be costly and time prohibitive to collect. For the Mogrify comparison, we collect the complete lists of predicted transcription factors from http://www.mogrify.net. For the iPS cell comparison, we use the conversion between dermal fibroblast and H9 embryonic stem cells. For the cardiomyocyte comparison, we use the conversion between dermal fibroblast and heart - adult. For the hepatocyte comparison, we use the conversion between dermal fibroblast and liver - adult. For the iPS cell queries, we censor the publication date range through the end of the year 2004. This time frame roughly corresponds to two years prior to the first publications on direct reprogramming in mouse cells.[6] For the cardiomyocyte queries, we censor the publication date range through the end of the year 2008, which also corresponds to two years prior to the first major publications on cardiomyoctye reprogramming in mice. [12] We censor to the year 2009 for the hepatocyte applications as it corresponds to roughly two years prior to the first major publication on induction of functional hepatocytes from mouse fibroblasts. [10] To evaluate our algorithm on the drug repositioning application, we query the same list of 2,609 drugs and devices used by Kuang et al.[18] against the key phrase “hypoglycemia” (low blood glucose). Again, we use a p-value threshold of 1 × 10-5. Note that we use an exact match of drug names in this case (e.g. Glucotrol and Glucotrol XL are treated as different) even though there may be multiple names for the same drug. To evaluate our method, we first manually annotate the top 50 hits as either advertised specifically to treat diabetes or not. We then compare those that were not identified as diabetes drugs to a curated list of drugs[22] that are known to cause hypoglycemia, hyperglycemia, or both, reporting those correctly and incorrectly identified as reducing blood glucose. Finally, we mark the top hits that also match hits in the full list of drugs and devices predicted to reduce blood glucose by the state-of-the-art approach proposed by Kuang et al. using electronic health records. [18] Kuang et al. extend the self-controlled case series model[23] to handle continuous numeric responses. The self-controlled case series, which has been widely used for detecting adverse drug events, divides patient time-course data into control and risk periods corresponding to periods before and after exposure to a drug. Patients thus serve as their own control cases and relative incidence of adverse events can be measured in the control and risk periods. Importantly, this approach requires a large amount of time-course electronic health record data, which is difficult to acquire.

Results

Table 1(a) shows the top 20 ranked transcription factors from our query using a list of 2,243 transcription factors and the key phrase “embryonic stem cell,” censored to a publication date range through 2004. Factors that match the landmark papers for producing iPS cells are highlighted gray, and factors that match Mogrify’s list of predicted factors are marked with *. Note that our naive approach is able to reproduce a sufficient list of factors (NANOG, POU5F1, and SOX2) for direct reprogramming [24] in the top 12 hits. Additionally, five of the top 20 match the list of 70 factors produced by Mogrify.

Table 1:

Top 20 hits for each of our cell reprogramming queries. Hits that match the landmark papers are highlighted in gray, and hits that match transcription factors predicted by Mogrify are marked with *.

Table 1(b) shows the top 20 ranked hits from our query using a list of transcription factors and the key phrase “car-diomyocyte,” censored to a publication date range through 2008. Again, factors that match the landmark papers for direct reprogramming to cardiomyocytes are highlighted in gray, and factors that match Mogrify’s list of predicted factors are marked with *. Similar to the iPS cell query, our approach reproduces the complete list of early published transcription factors in the first nine hits, and nine of the top 20 hits match the list of 57 factors predicted by Mogrify. Table 1(c) shows the top 20 ranked hits from our query using a list of transcription factors and the key phrase “hepa-tocyte,” censored to a publication date range through 2009. Again, factors that match the landmark papers for direct reprogramming to hepatocytes are highlighted in gray, and factors that match Mogrify’s list of predicted factors are marked with *. KinderMiner successfully reproduces four of the six factors for maturation from the landmark literature, and nine of the top 20 hits match the 27 predicted by Mogrify. Table 2 shows the top 50 drugs and devices ranked by our method as being relevant to hypoglycemia (low blood sugar). Drugs that are advertised specifically to treat diabetes are not highlighted. The highlighted drugs are not specifically advertised to treat diabetes. Drugs highlighted green are labeled as drugs that may reduce blood sugar, drugs highlighted red may increase blood sugar, and drugs highlighted gray are not present in our labeled list. [22]

Table 2:

Top 50 drug and device hits for our drug repositioning query against the key phrase “hypoglycemia.” Hits that are diabetes drugs are not highlighted. Hits that are not diabetes drugs, but which are known to decrease blood sugar are highlighted in green, and hits that increase blood sugar are highlighted in red. Hits that are not diabetes drugs, but were also not in our labeled list, are highlighted in gray. Hits that are exact matches to those in Kuang et al. 18 are marked with *.

Perhaps unsurprisingly, 43 of our top 50 hits are specifically for treatment of diabetes, due in part to the abundance of diabetes drugs and various brand names thereof. We note that the hit premeal is likely a result of correlation to premeal insulin. These 43 hits are a positive result as they suggest that our method successfully finds relevant correlations, but the more interesting hits are those that are not diabetes drugs as our goal is to reposition drugs. Of the seven hits that are not specifically diabetes drugs, Zestoretic, Avalide, and Demadex have been shown to potentially increase blood glucose, whereas Zebeta, Tiazac, and Calan SR have been shown to potentially decrease blood glucose. Tequin is not in our labeled list. It is an antibiotic that has been shown to increase patient risk of dysglycemia (either hypoglycemia or hyperglycemia).[25] Overall, in all of our evaluation tasks, our method finds numerous relevant hits and demonstrates overlap with the results of far more sophisticated methods designed specifically for the separate tasks presented.

Conclusions and Future Work

In this work, we present a simple and general text mining method for predicting pairwise associations between a key phrase and target terms. We demonstrate the use of this method for identifying transcription factors that are important for three cell reprogramming tasks and for discovering candidate drugs for alternative uses. In both of our applica tion domains, we find that KinderMiner identifies numerous relevant hits and overlaps with state-of-the-art methods designed specifically for each domain. In historically censored searches of factors for reprogramming cell states, Kin-derMiner highly ranks transcription factors that would, years later, be shown to be important for reprogramming to cell states of interest, thus providing a short, ordered list of candidates for biologists that would have greatly simplified the challenging combinatorial task they faced. Importantly, the domain-specific approaches require domain-specific data, whereas KinderMiner only requires an indexed text corpus. We argue that our method is a valuable new tool that can be used to help prioritize research directions despite its naiveté. Furthermore, we anticipate that our method may prove valuable in domains other than biomedicine by mining other large plain text corpora. We view the simplicity of KinderMiner as a strength, but this simplicity also leads to limitations. For example, KinderMiner does not explicitly implement any actual natural language processing. Thus, terms like the transcription factor T (Brachyury) are likely to match many articles that do not reference the T gene, but may in fact be matches to middle initials or similar. We do not observe this particular phenomenon in our lists of top 20 hits presented here, but we anticipate that this may be a problem for other queries. While we believe there is value in the simplicity of our method, we expect that the addition of techniques such as text normalization and named entity recognition may help alleviate this issue and, therefore, propose it as future work. Furthermore, we observe that some of our queries have low total counts of articles for sorting by ratio. For example, THRAP1 in Table 1(b) counts a total of 15 articles that contain the term, four of which contain both the term and key phrase. This may pose a greater challenge when using smaller corpora, or when querying terms or key phrases that are relatively new within the literature. A query that counts a total of four articles, three of which have both term and key phrase may be ranked well by ratio, but is unlikely to actually represent compelling evidence of association. In general, there will always be a horizon of discovery defined by the quantity of published literature for particular key phrases and target terms, but we will explore the use of thresholding, pseudocounts, and other Bayesian approaches to modulate the rank of such cases in future work. Finally, we note that constructing a search engine around large corpora is non-trivial. We were fortunate with our applications in that Europe PMC offers a web API on which we built KinderMiner, but not all corpora will afford such convenience. We do not propose any specific suggestions for how to address this issue, but instead expect that time will assist with the continued democratization of search tools (e.g. Apache Lucene and SOLR). We anticipate that the availability of easy-to-use software packages will continue to grow, and we propose evaluating applications of KinderMiner using such software on open data as future work.

22 in total

Review 1. A survey of current work in biomedical text mining.

Authors: Aaron M Cohen; William R Hersh
Journal: Brief Bioinform Date: 2005-03 Impact factor: 11.622

Review 2. Frontiers of biomedical text mining: current progress.

Authors: Pierre Zweigenbaum; Dina Demner-Fushman; Hong Yu; Kevin B Cohen
Journal: Brief Bioinform Date: 2007-10-30 Impact factor: 11.622

Review 3. Forcing cells to change lineages.

Authors: Thomas Graf; Tariq Enver
Journal: Nature Date: 2009-12-03 Impact factor: 49.962

Review 4. Literature mining, ontologies and information visualization for drug repurposing.

Authors: Christos Andronis; Anuj Sharma; Vassilis Virvilis; Spyros Deftereos; Aris Persidis
Journal: Brief Bioinform Date: 2011-06-28 Impact factor: 11.622

Review 5. A survey of current trends in computational drug repositioning.

Authors: Jiao Li; Si Zheng; Bin Chen; Atul J Butte; S Joshua Swamidass; Zhiyong Lu
Journal: Brief Bioinform Date: 2015-03-31 Impact factor: 11.622

6. Computational Drug Repositioning Using Continuous Self-Controlled Case Series.

Authors: Zhaobin Kuang; James Thomson; Michael Caldwell; Peggy Peissig; Ron Stewart; David Page
Journal: KDD Date: 2016-08

Review 7. Totipotency, pluripotency and nuclear reprogramming.

Authors: Shoukhrat Mitalipov; Don Wolf
Journal: Adv Biochem Eng Biotechnol Date: 2009 Impact factor: 2.635

8. Outpatient gatifloxacin therapy and dysglycemia in older adults.

Authors: Laura Y Park-Wyllie; David N Juurlink; Alexander Kopp; Baiju R Shah; Therese A Stukel; Carmine Stumpo; Linda Dresser; Donald E Low; Muhammad M Mamdani
Journal: N Engl J Med Date: 2006-03-01 Impact factor: 91.245

9. A predictive computational framework for direct reprogramming between human cell types.

Authors: Owen J L Rackham; Jaber Firas; Hai Fang; Matt E Oates; Melissa L Holmes; Anja S Knaupp; Harukazu Suzuki; Christian M Nefzger; Carsten O Daub; Jay W Shin; Enrico Petretto; Alistair R R Forrest; Yoshihide Hayashizaki; Jose M Polo; Julian Gough
Journal: Nat Genet Date: 2016-01-18 Impact factor: 38.330

10. Europe PMC: a full-text literature database for the life sciences and platform for innovation.

Authors:
Journal: Nucleic Acids Res Date: 2014-11-06 Impact factor: 16.971

14 in total

1. Safe-in-Man Broad Spectrum Antiviral Agents.

Authors: Rouan Yao; Aleksandr Ianevski; Denis Kainov
Journal: Adv Exp Med Biol Date: 2021 Impact factor: 2.622

2. Text Mining Protocol to Retrieve Significant Drug-Gene Interactions from PubMed Abstracts.

Authors: Oviya Ramalakshmi Iyyappan; Sharanya Manoharan; Sadhanha Anand; Dheepa Anand; Manonmani Alvin Jose; Raja Ravi Shanker
Journal: Methods Mol Biol Date: 2022

Review 3. Biomedical Literature Mining for Repurposing Laboratory Tests.

Authors: Finn Kuusisto; Ross Kleiman; Jeremy Weiss
Journal: Methods Mol Biol Date: 2022

4. Network inference with Granger causality ensembles on single-cell transcriptomics.

Authors: Atul Deshpande; Li-Fang Chu; Ron Stewart; Anthony Gitter
Journal: Cell Rep Date: 2022-02-08 Impact factor: 9.995

Review 5. Review of Drug Repositioning Approaches and Resources.

Authors: Hanqing Xue; Jie Li; Haozhe Xie; Yadong Wang
Journal: Int J Biol Sci Date: 2018-07-13 Impact factor: 6.580

6. Data-driven phenotype discovery of FMR1 premutation carriers in a population-based sample.

Authors: Arezoo Movaghar; David Page; Murray Brilliant; Mei Wang Baker; Jan Greenberg; Jinkuk Hong; Leann Smith DaWalt; Krishanu Saha; Finn Kuusisto; Ron Stewart; Elizabeth Berry-Kravis; Marsha R Mailick
Journal: Sci Adv Date: 2019-08-21 Impact factor: 14.136

Review 7. Exploring the new horizons of drug repurposing: A vital tool for turning hard work into smart work.

Authors: Rajesh Kumar; Seetha Harilal; Sheeba Varghese Gupta; Jobin Jose; Della Grace Thomas Parambi; Md Sahab Uddin; Muhammad Ajmal Shah; Bijo Mathew
Journal: Eur J Med Chem Date: 2019-08-08 Impact factor: 6.514

8. KinderMiner Web: a simple web tool for ranking pairwise associations in biomedical applications.

Authors: Finn Kuusisto; Daniel Ng; John Steill; Ian Ross; Miron Livny; James Thomson; David Page; Ron Stewart
Journal: F1000Res Date: 2020-07-30

9. Semantic text mining in early drug discovery for type 2 diabetes.

Authors: Lena K Hansson; Rasmus Borup Hansen; Sune Pletscher-Frankild; Rudolfs Berzins; Daniel Hvidberg Hansen; Dennis Madsen; Sten B Christensen; Malene Revsbech Christiansen; Ulrika Boulund; Xenia Asbæk Wolf; Sonny Kim Kjærulff; Martijn van de Bunt; Søren Tulin; Thomas Skøt Jensen; Rasmus Wernersson; Jan Nygaard Jensen
Journal: PLoS One Date: 2020-06-15 Impact factor: 3.240

10. Literature-Wide Association Studies (LWAS) for a Rare Disease: Drug Repurposing for Inflammatory Breast Cancer.

Authors: Xiaojia Ji; Chunming Jin; Xialan Dong; Maria S Dixon; Kevin P Williams; Weifan Zheng
Journal: Molecules Date: 2020-08-28 Impact factor: 4.411