Literature DB >> 28398460

TIN-X: target importance and novelty explorer.

Daniel C Cannon¹, Jeremy J Yang¹, Stephen L Mathias¹, Oleg Ursu¹, Subramani Mani¹, Anna Waller², Stephan C Schürer³, Lars Juhl Jensen⁴, Larry A Sklar^2,5, Cristian G Bologa¹, Tudor I Oprea¹.

Abstract

MOTIVATION: The increasing amount of peer-reviewed manuscripts requires the development of specific mining tools to facilitate the visual exploration of evidence linking diseases and proteins.
RESULTS: We developed TIN-X, the Target Importance and Novelty eXplorer, to visualize the association between proteins and diseases, based on text mining data processed from scientific literature. In the current implementation, TIN-X supports exploration of data for G-protein coupled receptors, kinases, ion channels, and nuclear receptors. TIN-X supports browsing and navigating across proteins and diseases based on ontology classes, and displays a scatter plot with two proposed new bibliometric statistics: Importance and Novelty.
AVAILABILITY AND IMPLEMENTATION: http://www.newdrugtargets.org. CONTACT: cbologa@salud.unm.edu.

Entities: Chemical

Mesh：

Substances：

Year: 2017 PMID： 28398460 PMCID： PMC5870731 DOI： 10.1093/bioinformatics/btx200

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Science builds upon past discoveries, traditionally communicated through scientific literature. However, scale-out of traditional and alternate publication modes has exceeded the limits of human processing (Hunter and Cohen, 2006), creating the need for computer-assisted methods. Text mining, ontologies, and interactive visualization are some of the emerging technologies that can alleviate information overload. We present a new method and software tool utilizing these technologies, to provide biomedical scientists with rankings and visualizations linking proteins and diseases. The information is derived from PubMed abstracts, text mined using previously published tools for named entity recognition (NER) of gene/protein and disease names (Pletscher-Frankild ). We propose two new bibliometric indices based on NER mentions, specifically devised to address research planning use cases. Our goal is to enable scientists to identify research subjects with sufficient evidence of importance, but not already well-studied, with focus on target prioritization for drug discovery. Novelty estimates the scarcity of publications about a protein target (see also Equation 1). Importance estimates the strength of the association between that protein target and a specific disease (see also Equation 2). By visualizing Novelty and Importance on a 2D plot, users can easily examine the strength of the evidence linking protein targets and diseases of interest. TIN-X is a web application which allows users to navigate targets via the drug target ontology (DTO, http://drugtargetontology.org) and diseases via the disease ontology (DO) (Kibbe ). In its current implementation, TIN-X allows users to query, browse and display disease-target associations for the following protein families: ion channels, G-protein coupled receptors (GPCRs), nuclear receptors (NRs) and kinases. These plots are interactive, with built-in drill-down and link-out functionality for in-depth examination of selected targets. TIN-X was inspired by and developed for the Illuminating the Druggable Genome (IDG) Knowledge Management Center (KMC, http://targetcentral.ws), which aggregates and integrates protein-centric information, and seeks to identify and prioritize understudied genes and proteins for further investigation and validation as potentially novel drug targets.

2 Methods and implementation

TIN-X is implemented in Scala using the Lift web framework (http://liftweb.net). The back-end utilizes a PostgreSQL database derived from the IDG-KMC knowledge base known as Target Central Research Database (TCRD) (Nguyen ). TCRD compiles data from over 50 genomic, proteomic and drug-centric sources, and relies on the Disease Ontology, as well as in-network (IDG-KMC) developed tools such as DTO and disease—genes/protein mapping using a highly efficient dictionary-based NER application (Pafilis ) with dictionaries from the DISEASES (Pletscher-Frankild ) database, respectively. Data are extracted, transformed, aggregated and loaded into TCRD, with routine updates quarterly. All PubMed abstracts are downloaded from NCBI FTP site. Bibliometric statistics and the derived Novelty (N) and Importance (I) scores are pre-computed for runtime performance, using the following formulae: where T and D are the numbers of targets and diseases in abstract (k), respectively, and summation over all publications including target (i), and for importance, also including disease (j). Fractional counts are employed to reflect strength of association. For example, if an abstracts mentions three targets once, each receives a fractional count 1/T = ⅓.

3 Application example

In Figure 1, TIN-X displays targets for a selected disease, ‘glucose intolerance’. The DO hierarchy can be navigated in the left panel, and the targets are plotted with log–log Importance–Novelty axes. Searching by disease name and by target name (with auto-suggest) is supported. In the current layout, targets with stronger associations are in the upper part of the plot, while targets with a higher number of publications are on the left side of the plot. In general, more interesting associations will be on the upper right boundary of the plot, where data points represent non-dominated solutions to the multi-objective optimization maximizing both Importance and Novelty. Targets can be filtered based on Target Development Level, with Tclin representing mode-of-action drug targets (Santos ), and Tdark representing the understudied proteins (Nguyen ). Targets can be filtered by protein superfamily (e.g. GPCRs). Link outs to Pharos (targets) and DO (diseases) are provided for each association. Figure 2 illustrates the result of mouse-over actions for the target from Figure 1, ‘Chemokine-like receptor 1’ (CMKLR1), which links to 6 publications associating CMKLR1 and glucose intolerance. Titles and citations are displayed. Clicking on each article reveals full abstracts and links out to complete PubMed records, where users can examine the available evidence.

Fig. 1

Fig. 2

Screenshot of TIN-X. Click on selected point shows the papers where the target protein (CMKLR1) might be relevant to the disease (glucose intolerance)

Screenshot of TIN-X. The disease, glucose intolerance, was queried. Mouse-over displays information about a specific data point, in this case, CMKLR1 (Color version of this figure is available at Bioinformatics online.) Screenshot of TIN-X. Click on selected point shows the papers where the target protein (CMKLR1) might be relevant to the disease (glucose intolerance)

4 Conclusions and future directions

TIN-X provides an interactive visualization, ranking, and prioritization platform for scientists interested in exploring potentially novel drug targets, and examining the relationship between diseases, disease categories, proteins and protein classes, using automated text mining of biomedical literature. As with all current text mining, TIN-X cannot replace expert human readers and curators, yet, it is increasingly clear that automated bibliometry is essential given the accelerated pace and volume of publications. TIN-X is an important interactive tool in the IDG project, and future plans include adding functionality relating to known drugs and drug classes, and inclusion of additional protein classes. More information about Tclin, Tdark and other target development levels can be found at http://www.nature.com/nrd/posters/druggablegenome/

Funding

This work was supported by the National Institutes of Health [U54 CA189205-01], and the Novo Nordisk Foundation [NNF14CC0001]. Conflict of Interest: none declared.

6 in total

Review 1. Biomedical language processing: what's beyond PubMed?

Authors: Lawrence Hunter; K Bretonnel Cohen
Journal: Mol Cell Date: 2006-03-03 Impact factor: 17.970

2. DISEASES: text mining and data integration of disease-gene associations.

Authors: Sune Pletscher-Frankild; Albert Pallejà; Kalliopi Tsafou; Janos X Binder; Lars Juhl Jensen
Journal: Methods Date: 2014-12-05 Impact factor: 3.608

3. The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.

Authors: Evangelos Pafilis; Sune P Frankild; Lucia Fanini; Sarah Faulwetter; Christina Pavloudi; Aikaterini Vasileiadou; Christos Arvanitidis; Lars Juhl Jensen
Journal: PLoS One Date: 2013-06-18 Impact factor: 3.240

4. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data.

Authors: Warren A Kibbe; Cesar Arze; Victor Felix; Elvira Mitraka; Evan Bolton; Gang Fu; Christopher J Mungall; Janos X Binder; James Malone; Drashtti Vasant; Helen Parkinson; Lynn M Schriml
Journal: Nucleic Acids Res Date: 2014-10-27 Impact factor: 16.971

5. Pharos: Collating protein information to shed light on the druggable genome.

Authors: Dac-Trung Nguyen; Stephen Mathias; Cristian Bologa; Soren Brunak; Nicolas Fernandez; Anna Gaulton; Anne Hersey; Jayme Holmes; Lars Juhl Jensen; Anneli Karlsson; Guixia Liu; Avi Ma'ayan; Geetha Mandava; Subramani Mani; Saurabh Mehta; John Overington; Juhee Patel; Andrew D Rouillard; Stephan Schürer; Timothy Sheils; Anton Simeonov; Larry A Sklar; Noel Southall; Oleg Ursu; Dusica Vidovic; Anna Waller; Jeremy Yang; Ajit Jadhav; Tudor I Oprea; Rajarshi Guha
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

Review 6. A comprehensive map of molecular drug targets.

Authors: Rita Santos; Oleg Ursu; Anna Gaulton; A Patrícia Bento; Ramesh S Donadi; Cristian G Bologa; Anneli Karlsson; Bissan Al-Lazikani; Anne Hersey; Tudor I Oprea; John P Overington
Journal: Nat Rev Drug Discov Date: 2016-12-02 Impact factor: 84.694

6 in total

9 in total

Review 1. Exploring the dark genome: implications for precision medicine.

Authors: Tudor I Oprea
Journal: Mamm Genome Date: 2019-07-04 Impact factor: 2.957

2. Integration of Transcriptomics Data and Metabolomic Data Using Biomedical Literature Mining and Pathway Analysis.

Authors: Archana Prabahar
Journal: Methods Mol Biol Date: 2022

3. Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration.

Authors: Dhouha Grissa; Alexander Junge; Tudor I Oprea; Lars Juhl Jensen
Journal: Database (Oxford) Date: 2022-03-28 Impact factor: 4.462

4. Orphan G-Protein Coupled Receptor GPRC5B Is Critical for Lymphatic Development.

Authors: Wenjing Xu; Nathan P Nelson-Maney; László Bálint; Hyouk-Bum Kwon; Reema B Davis; Danielle C M Dy; James M Dunleavey; Brad St Croix; Kathleen M Caron
Journal: Int J Mol Sci Date: 2022-05-20 Impact factor: 6.208

5. How to Illuminate the Druggable Genome Using Pharos.

Authors: Timothy Sheils; Stephen L Mathias; Vishal B Siramshetty; Giovanni Bocci; Cristian G Bologa; Jeremy J Yang; Anna Waller; Noel Southall; Dac-Trung Nguyen; Tudor I Oprea
Journal: Curr Protoc Bioinformatics Date: 2020-03

6. Drug target ontology to classify and integrate drug discovery data.

Authors: Yu Lin; Saurabh Mehta; Hande Küçük-McGinty; John Paul Turner; Dusica Vidovic; Michele Forlin; Amar Koleti; Dac-Trung Nguyen; Lars Juhl Jensen; Rajarshi Guha; Stephen L Mathias; Oleg Ursu; Vasileios Stathias; Jianbin Duan; Nooshin Nabizadeh; Caty Chung; Christopher Mader; Ubbo Visser; Jeremy J Yang; Cristian G Bologa; Tudor I Oprea; Stephan C Schürer
Journal: J Biomed Semantics Date: 2017-11-09

7. An omics perspective on drug target discovery platforms.

Authors: Jussi Paananen; Vittorio Fortino
Journal: Brief Bioinform Date: 2020-12-01 Impact factor: 11.622

8. Knowledge graph analytics platform with LINCS and IDG for Parkinson's disease target illumination.

Authors: Jeremy J Yang; Christopher R Gessner; Joel L Duerksen; Daniel Biber; Jessica L Binder; Murat Ozturk; Brian Foote; Robin McEntire; Kyle Stirling; Ying Ding; David J Wild
Journal: BMC Bioinformatics Date: 2022-01-12 Impact factor: 3.169

9. TCRD and Pharos 2021: mining the human proteome for disease biology.

Authors: Timothy K Sheils; Stephen L Mathias; Keith J Kelleher; Vishal B Siramshetty; Dac-Trung Nguyen; Cristian G Bologa; Lars Juhl Jensen; Dušica Vidović; Amar Koleti; Stephan C Schürer; Anna Waller; Jeremy J Yang; Jayme Holmes; Giovanni Bocci; Noel Southall; Poorva Dharkar; Ewy Mathé; Anton Simeonov; Tudor I Oprea
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

9 in total