Literature DB >> 28587637

Literature evidence in open targets - a target validation platform.

Şenay Kafkas1,2, Ian Dunham3,4, Johanna McEntyre3,4.   

Abstract

BACKGROUND: We present the Europe PMC literature component of Open Targets - a target validation platform that integrates various evidence to aid drug target identification and validation. The component identifies target-disease associations in documents and ranks the documents based on their confidence from the Europe PMC literature database, by using rules utilising expert-provided heuristic information. The confidence score of a given document represents how valuable the document is in the scope of target validation for a given target-disease association by taking into account the credibility of the association based on the properties of the text. The component serves the platform regularly with the up-to-date data since December, 2015.
RESULTS: Currently, there are a total number of 1168365 distinct target-disease associations text mined from >26 million PubMed abstracts and >1.2 million Open Access full text articles. Our comparative analyses on the current available evidence data in the platform revealed that 850179 of these associations are exclusively identified by literature mining.
CONCLUSIONS: This component helps the platform's users by providing the most relevant literature hits for a given target and disease. The text mining evidence along with the other types of evidence can be explored visually through https://www.targetvalidation.org and all the evidence data is available for download in json format from https://www.targetvalidation.org/downloads/data .

Entities:  

Keywords:  Document ranking; Information retrieval; Target validation; Target-disease associations; Text mining

Mesh:

Year:  2017        PMID: 28587637      PMCID: PMC5461726          DOI: 10.1186/s13326-017-0131-3

Source DB:  PubMed          Journal:  J Biomed Semantics


Background

Understanding the underlying mechanisms of diseases is crucial in translational research. Discovering the association between drug target and disease has become a main focus for scientists since it is key for developing new drugs or repurposing them. Scientists gather various evidence representing different aspects of target-disease associations such as gene expression changes and the role of genetic variations to increase understanding. Such evidence can be stored in structured databases and requires integration to obtain complete and comprehensive knowledge in target validation studies. Motivated by this, the Target Validation Platform (https://targetvalidation.org) [1] integrates different evidence from various resources with the aim of assisting scientists to identify and prioritise drug targets (proteins and their genes) associated with diseases and phenotypes. The evidence includes common disease genetic evidence based on GWAS study results from GWAS Catalog [2], rare Mendelian disease evidence based on ClinVar [3] clinical variant information from EVA and text mined target-disease associations from the Europe PMC (https://europepmc.org/) literature database [4] (see Table 3 for a complete list of evidence types).
Table 3

Comparison on the target-disease association data in the Target Validation Platform (release 1.2)

Evidence TypeTotal number of distinct target-disease associationsOverlapping target-disease associationTotal number of exclusively identified associations
GeneExpressionGeneticAssociationsAffectedPathwaysAnimal ModelsSomatic MutationsKnown Drugs
Literature Mining1,168,365197,94356,228250699,83619,80119,811850,179
Gene Expression909,960X18,94590135,61632,7959913669,330
Genetic Associations129,826XX191226,5043626213362,999
Affected Pathways3613XXX1045310163714
Animal Models602,995XXXX29654421486,167
Somatic Mutations58,941XXXXX184516,197
Known Drugs57,319XXXXXX33,005

Total number of distinct target-disease associations in the platform is 2,485,000

Europe PMC contains over 33 million records and expands at a rate of over a million articles per year—one article every two minutes as scientists publish their findings continuously. Text mining target-disease associations is crucial for an integrated platform like the Target Validation Platform, since it provides a high volume of complementary and up-to-date data to the other type of evidences, otherwise the knowledge would stay hidden in millions of documents. In this study, we present the Europe PMC Open Targets literature component that identifies target-disease associations in documents and ranks the documents according to their confidence based on rules utilising expert-provided heuristic information. Our main aim is to provide a scalable, robust and continuous text-mining service to the community for a real-world and very important application—target validation. Many of the previous studies focused on extracting gene-disease association from the literature [5-7]. However, only a few of them specifically focused on developing methods for integrated resources; DisGeNET [8] and DISEASES [9] for example cover various types of evidence for target validation. These two systems provide confidence scores for target-disease associations extracted from Medline abstracts for a given disease or target and don’t provide very regular updates to the data. In DisGeNET, the target-disease text mining method is based on a machine learning approach while in DISEASES, target-disease associations are extracted based on scoring their co-occurrences according to their confidence. In comparison to DisGeNET and DISEASES, our system operates on full text articles in addition to abstracts, and ranks documents according to the confidence for a given target-disease association rather than ranking the associations extracted from the whole set of Medline abstracts. More specifically, we calculate a document confidence score for each given (article, target, disease) triple which represents how valuable the document is in the scope of target validation for the given target-disease association (see "Document scoring" section). However, the confidence score of a given target-disease association is handled at the platform level and calculated based on all the evidence data in the platform by using a harmonic sum approach (see [1] for the details). This confidence score at the association level represents the overall credibility of the evidence for a given target-disease association. Our approach to target-disease extraction differs from these systems, and probably many other traditional text-mining studies, in that we rely on heuristic information from experts/users for developing the system. The platform was first launched in December, 2015 and is publicly available at https://targetvalidation.org. Since then, our system has served the platform regularly (monthly) with up-to-date data.

Implementation

Resources used

The literature source that we used in the study is the Europe PMC database. Europe PMC is one of the largest biomedical literature databases in the world which provides public access to >30.4 million abstracts and >3.3 million full text articles from PubMed and PubMed Central. In our analyses, we used the latest version of the Open Access full text articles (http://europepmc.org/ftp/archive/v.2016.06/) (~1.2 Million), and all of the PubMed abstracts (~26 Million) from the database. Two comprehensive resources, UniProt and the Experimental Factor Ontology (EFO) are used to identify target and disease names in text, respectively. These two resources are chosen as the reference resources by Open Targets. The data providers of the platform are asked to ground their target and disease entities in to these reference resources so as to integrate the evidence in the platform. Therefore, two dictionaries are generated and refined from the human part of the SwissProt Database (the annotated part of UniProt, Release 2015_10) (http://www.uniprot.org/) and disease and phenotype parts of EFO (http://www.ebi.ac.uk/efo/) (Release 2.74) before applying text mining. In the refining process, we filtered out the terms that would introduce potentially very high numbers of false positives. These are the terms having character length < 3 (e.g. “A” is a gene name) and terms that are ambiguous with common English words (e.g. “Large” is a protein name as well). In addition, we generated term variations by replacing the widely used Greek letters in gene/disease names with their symbols (e.g. replacing “alpha” with α). The final target and disease dictionaries consisted of a total of 104,434 and 29,846 terms respectively. These dictionaries are available from ftp://ftp.ebi.ac.uk/pub/databases/pmc/otar/.

Target and disease name annotation

We used the Europe PMC text-mining pipeline, which is based on Whatizit [10], to annotate target and disease names in text with the two dictionaries described above. Although we reduce a very high level of ambiguity by applying the dictionary refinement process before text mining the documents, some target and disease name abbreviations could still be ambiguous with some other names. For example, ALS which is an abbreviation used for “Amyotrophic Lateral Sclerosis”, is ambiguous with “Advanced Life Support” in some articles (e.g. see PMID:26811420). Therefore, we implemented and used a disease and target name abbreviation filter for screening out the potential false positive abbreviations introduced during the annotation process. Our tool differs from the available abbreviation finders, such as [11] since it behaves rather as a filter specifically for potential false positive target and disease name abbreviations annotated based on our dictionaries. The abbreviation filter operates based on several rules using heuristic information. Regular expressions are used for identifying the text sequences in the form of “X….. Y…. Z…. (XYZ)”. The text in parentheses (i.e. (XYZ)) is identified as a gene/disease name abbreviation candidate if it is in the uppercase form, has length <6 (the length was decided by manually analysing a random subset of the Uniprot and EFO dictionaries) and annotated by the system either as a disease or a gene name, whereas, the text located immediately before the parentheses is identified as the potential long form. For example, in the following sentence from the article having PMID:26811420; “The guidelines form the basis for all levels of resuscitation training, now from first aid to advanced life support (ALS),” the italicised text matches with our pattern defined above. “ALS” would be the abbreviation candidate and “advanced life support” would be the potential long form. Documents matching the pattern above are analysed manually by an expert to come up with heuristics that we can apply in filtering the ambiguous abbreviation. Abbreviation candidates satisfying one of the following rules are kept as true target/disease abbreviations, otherwise, they are filtered out: For disease name abbreviation candidates: If any of the EFO long forms of the abbreviation candidate exists in the document If the long form extracted from the text contains any of the keywords (disease, disorder, syndrome, defect, etc.) that can be used to describe a disease For gene or protein name abbreviation candidates: If (XYZ) appears more than 3 times in the document body (this rule applies to OA full text documents only) If the long form matches any of the terms from SwissProt or Enzymes (http://enzyme.expasy.org/) If the long form ends with (-ase/-ases) OR it contains any of the keywords (factor, receptor, gene, protein etc.) that can be used to describe a target name If at least 3 sentences for full text and at least 2 sentences for abstracts contain one of the keywords: “mutation, SNP, variation, gene, inhibit, variation, variant, polymorphism, mutant, isoform, protein, enzyme, activate, antibody, transcription, tumor suppressor, express, overexpress, regulator, receptor, oncogene” along with the protein name abbreviation candidate and a disease name.

Target-disease association identification

Our association extraction method is based on identification of target-disease co-occurrences at the sentence level and applying several filtering rules to reduce noise possibly introduced by the high sensitivity, low specificity co-occurrence method. Our filtering rules utilise heuristic information from a careful manual analysis of the text data to filter out potential false positive associations. More specifically, the manual analyses are conducted iteratively by analysing a randomly selected set of results and identifying the reasons behind the false positives in the results so that we could formulate them as filtering rules to tune our system. The system applies the following filtering rules: Filter out all type of articles except “Research” articles (e.g. Reviews, Case Reports). Filter out target-disease associations appearing in the following sections: Methods, References, Acknowledgement & Funding, Competing Interests, Author Contribution and Supplementary Material. Filter out target-disease associations that appear only once in the body of a given article but not in the article's title or abstract. Sections of a given document are identified by using our Section Tagger [12] tool that we developed previously.

Document scoring

A document scoring algorithm is implemented and integrated in to the system to assign each document a confidence score for a given target-disease association. Document confidence score for a given target-disease association represents how valuable the document is in the scope of target validation by taking credibility of the given association into account. Document confidence scores are used to rank all the documents relevant to a given target-disease association. The algorithm is based on weighting document sections and sentence locations differently for full text articles and abstracts respectively (see Table 1 and Table 2). The weighting approach is often used in text mining tasks for assigning confidence scores. For example in [9] different weights are assigned to the different features for calculating the confidence scores of the identified associations. In our study, we assign weights from the range of [1-10] which is wide enough to pick different weights for different sections based on their potential confidence. The following formulas, CS1 and CS2 are used to calculate the confidence scores for abstracts and full text articles respectively:
Table 1

Sentence location weights in abstracts

Sentence LocationWeight
First or second2
Last5
Other3
Table 2

Section weights in full text articles

SectionWeight
Title10
AbstractSee Table 1
Results, Figure, Table5
Discussion, Conclusion2
Introduction, Case Study, Appendix, Other1
Sentence location weights in abstracts Section weights in full text articles The weights are selected based on heuristic information and our goal is to identify associations that are the subject of the given paper, rather than instances that are reviewing prior knowledge. Therefore, we assign the highest weight, which is 10, to “Title”, since an article title would contain the most confident information and highlight the main finding of the study. The lowest weight (1), is assigned to “Introduction”, since well-known associations are often reported here while a higher weight (5) is assigned to the “Results”, “Figures” and “Tables” sections where the new findings are generally reported. The sentence location weights that are used for abstract scoring are determined based on a sentence level concept analysis by using CoreSC [13]. CoreSC is a text-mining tool which assigns each sentence one of its 11 pre-defined concepts such as “Results” and “Background”. Our concept analysis performed on randomly selected 360 MEDLINE abstracts revealed that most of the time, the last sentence of a given abstract is a “Results” sentence, while the first/second one is generally an introductory sentence (“Background”) (CoreSC analysis results are available at ftp://ftp.ebi.ac.uk/pub/databases/pmc/otar/). We further verified our finding by manually checking some of the abstracts from this set. Hence, we assign the highest weight (5) to the last sentence and lower weights to the first/second and other sentences accordingly.

Results & discussion

Performance evaluation

The ultimate goal of this study is to provide a scalable, robust and continuous service to the biomedical community for target validation, by using text mining methods. Therefore, we took a different approach from many traditional text mining studies and benchmarked the system based on expert perspective—expert satisfaction and feedback are the most valuable parameters for us to judge on the system’s performance. Our service has been up and running since December 2005 and we continuously improve our algorithms as we receive user feedback. Nevertheless, as a case study, we estimated the overall performance of the system on two randomly selected samples by using Mean Average Precision (MAP) which is a commonly used metric in evaluation of ranking system performance. MAP takes into account the relative order of the documents retrieved by the system and gives more weight to the documents returned at higher ranks [14]. We manually estimated the MAP for abstracts only as 89% and for full text articles as 90% on the top 25 documents of the two randomly selected gene-disease associations which were IGF1Diabetes and NOD2—Inflammatory Bowel Disease. We also estimated the correlation coefficients between the abstract only and full text article scores as 0.82 and 0.94 for IGF1Diabetes and NOD2—Inflammatory Bowel Disease respectively. Obtaining almost the same MAP values for both abstracts only and full text articles as well as high correlation coefficients between the scores are promising for our heuristic score adjustment. The individual performances of the components used are as follows: The target and disease names are identified based on Whatizit by using SwissProt and EFO as terminological resources. The target (gene/protein) name tagging method of Whatizit is compared against some other existing methods on different gold standard datasets previously [15]. Results reveal that Whatizit delivers gene/protein name annotations (grounded in Swissprot) at the state-of-the-art level (~60% F-score values are obtained on different gold standard corpora). The results show that there is still some room for improving the performance and in future we will explore on expanding our Swissport dictionary with the other available resources (e.g. Entrez Gene Database). We evaluated our disease name tagger which is based on EFO on randomly selected 50 abstracts manually. Our tool achieves a recall of 83.67%, a precision of 97.61% and an F-score of 90.10%. Results show that there is still some room for the EFO’s coverage improvement. Indeed, one of the considerations of Open Target is the EFO’s coverage as EFO is being used as the reference dataset for diseases/phenotypes in the platform. Therefore, we previously analysed the coverage of EFO against other 5 major disease/phenotype resources (ORDO, UMLS, MP, HDO and HP) [16]. Based on our previous finding, which is in line with the current finding, Open Targets is currently working on developing methods to expand EFO’s coverage. The abbreviation name filtering performance alone was estimated to have an F-Score value of 92.3% by evaluating randomly selected 50 sentences from the Open Access articles reporting on target-disease associations. The Section tagger’s performance was previously estimated manually on 100 full text articles as an F-score of 98.02% [12]. In the near future, we plan to organise a hackathon that would allow us to form a gold standard dataset and also conduct extensive usability test. The gold standard dataset as well as the user feedback, would allow us to carry out extensive evaluations on our design strategies, and improve them if necessary.

User experience

Since the first release of the Europe PMC Open Targets component, we iteratively improved our text mining algorithm and the visualisation of the text mining evidence in the Target Validation Platform based on user feedback. Initial user testing showed that the incorporation of the text mining evidence in to the platform filled in perceived gaps in evidence caused by limitations in coverage by the other direct evidence sources. The users also valued the reinforcement of other evidence when complementary text mining evidence was available. Feedback from users of incorrect associations predominantly from false positive entity recognition assisted us in improving our filters.

Added value from the literature mined target-disease associations

The Target Validation Platform currently covers evidence from literature mining, genetic associations, somatic mutations, known drugs, gene expression, affected pathways and animal models. (Please refer to [1] for further information about how the other types of evidence data are gathered.) In the current release (release 1.2) of the platform, there are a total number of 2,485,000 distinct target-disease associations. Table 3 shows a comparison of the target-disease association data currently available in the platform. The literature evidence constitutes the largest amount of data compared to the other type of evidence (such as gene expression and animal models). Currently, there are more than 1.1 million (47% of the whole evidence data) distinct target-disease associations extracted from ~26 million PubMed abstracts and ~1.2 million open access full text articles. Other large amounts of evidence data are provided from the gene expression (~900 K) and animal models (~600 K) sources. The analysis shows that 21.75% (197,943) of gene expression, 43.31% (56,228) of genetic associations, 69.36% (2506) of affected pathways, 16.55% (99,836) of animal models, 33.59% (19,801) of somatic mutations and 34.56% (19,811) of known drugs evidence data overlap with the literature mining data. The majority of the distinct associations in the platform are identified exclusively through literature mining (~850 K, 34.21%) showing the added value from text mining. Comparison on the target-disease association data in the Target Validation Platform (release 1.2) Total number of distinct target-disease associations in the platform is 2,485,000 The discrepancy between the literature mining data and the other type of evidence data is due to the fact that each evidence data is gathered by using different methods as well as resources. For example, gene expression data is gathered from Expression Atlas (https://www.ebi.ac.uk/gxa/home), the scope of which is microarray or RNA-Seq experiments. Other evidence data such as genetic associations and known drugs are gathered through manual curation of the literature by experts and from DailyMed (https://dailymed.nlm.nih.gov/dailymed/). Our approach is based on computationally extracting evidence data from the literature. In many of the curated studies, which may report associations between many targets and several diseases, it is unusual to highlight the individual association results in a way that is detectable by the sentence co-occurrence approach and often these associations are confined to a supplementary data table. Indeed, previous studies focusing on text mining supplementary material revealed that there are many more data in supplementary material compared to abstract and full text [17, 18]. Although text mining and manual curation both use the biomedical literature as a resource, the coverage of the methods is different and complementary. In fact, in our early work with users the text-mining approach was highly valued precisely because it accesses evidence from papers that do not contribute to the curated databases. One further reason for any discrepancy originates from the licencing restrictions on the reuse of full text content. We can only text mine the full text of Open Access publications (and all MEDLINE abstracts), while experts can curate evidence from the non-open access publications, accessed for reading via journal subscriptions. We further analysed the contribution of text mining based on the associations by disease and associations by target in Table 4 and Table 5 respectively. Table 4 shows comparison of the associations by disease in the platform. Currently, there are a total number of 9426 associations by disease in the platform. The majority of these diseases are provided from genetic associations (5912), literature mining (5801) and animal models (4942). Our analysis shows that 56.02% (405) of gene expression, 59.98% (3546) of genetic associations, 88.89% (504) of affected pathways, 68.86% (3403) of animal models, 53.75% (494) of somatic mutations and 82.72% (1489) of known drugs provided target associated diseases overlap with the literature mining data. The majority of the distinct associations by disease in the platform are identified exclusively through genetic associations (1336, 14.17%) and literature mining (1304, 13.83%).
Table 4

Comparison of the associations by disease in the Target Validation Platform (release 1.2)

Evidence TypeTotal number of distinct associations by diseaseOverlapping associations by diseaseTotal number of exclusively identified associations by disease
GeneExpressionGeneticAssociationsAffectedPathwaysAnimal ModelsSomatic MutationsKnownDrugs
Literature Mining58014053546504340349414891304
Gene Expression723X52019630946032825
Genetic Associations5912XX527372553011931336
Pathways567XXX4431683109
Animal Models4942XXXX281752811
Somatic Mutations919XXXXX354113
Known Drugs1800XXXXXX179

Total number of distinct associations by disease in the platform is 9426

Table 5

Comparison of the associations by target data in the Target Validation Platform (release 1.2)

Evidence TypeTotal number of associations by targetOverlapping associations by targetTotal number of exclusively identified associations by target
GeneExpressionGeneticAssociationsAffected PathwaysAnimal ModelsSomatic MutationsKnown Drugs
Literature Mining14,72814,217867066451873903736321
Gene Expression29,842X98176715449412574314,148
Genetic Associations10,200XX56140723165569217
Pathways690XXX379324704
Animal Models5497XXXX37444848
Somatic Mutations4138XXXXX3302
Known Drugs756XXXXXX1

Total number of distinct associations by target in the platform is 30,592

Comparison of the associations by disease in the Target Validation Platform (release 1.2) Total number of distinct associations by disease in the platform is 9426 Comparison of the associations by target data in the Target Validation Platform (release 1.2) Total number of distinct associations by target in the platform is 30,592 Table 5 shows comparison of the associations by target in the platform. Currently, there are a total number of 30592 associations by target in the platform. The majority of these targets are provided from gene expression (29,842), literature mining (14,728) and genetic associations (10,200). Our analysis shows that 47.64% (14,217) of gene expression, 85% (8670) of genetic associations, 96.23% (664) of affected pathways, 94.36% (5187) of animal models, 94.32% (3903) of somatic mutations and 97.35% (736) of known drugs provided disease associated targets overlap with the literature mining data. The majority of the distinct associations by target in the platform are identified exclusively through gene expression (14,148, 46.25%) which is understandable given the comprehensive gene coverage in gene expression experiments such as RNA-seq. Altogether, our analysis shows that literature mining suggests many more new target-disease associations (850,179, see Table 3) rather than new diseases (1304, see Table 4) or targets (321, see Table 5) involved in associations.

Examples of target-disease associations exclusively identified by literature mining

Our analysis reveals that there are a total number of 850,179 target-disease associations exclusively identified by literature mining. One such example is the CTGF gene and male breast carcinoma association (Fig. 1) (https://www.targetvalidation.org/evidence/ENSG00000118523/EFO_0006861). Currently, there is evidence for the association of 101 different targets with male breast carcinoma. All of these targets are identified through literature mining and only 4 of them are also supported by the known drugs evidence.
Fig. 1

The CTGF and male breast carcinoma association

The CTGF and male breast carcinoma association Another example is the ST3GAL4 and diabetes mellitus association. There are 1572 different publications potentially reporting this association (Fig. 2).
Fig. 2

The ST3GAL4 and diabetes mellitus association

The ST3GAL4 and diabetes mellitus association (https://www.targetvalidation.org/evidence/ENSG00000110080/EFO_0000400). Currently, there is evidence for the association of 5017 different targets with diabetes mellitus. 3670 of these targets are identified through literature mining.

Conclusions

Here, we present the Europe PMC Open Targets component, a new service for analysing and visualising target-disease associations from the literature within Open Targets. The aim of this component is to help users by providing the most relevant literature hits for a given target and disease. The platform users reported that the text mining evidence helped Open Targets to become more complete and a given association is more credible when it is supported not only by text mining but also by the other types of evidence. Our text mining algorithm and visualisation of the text mining evidence are improved iteratively based on user feedback. Currently, we are analysing the EFO coverage by comparing it against the other existing disease/phenotype resources such as Disease Ontology (http://disease-ontology.org/) and Unified Medical Language System (https://www.nlm.nih.gov/research/umls/). In future, we plan to expand the EFO’s coverage based on our findings. We also work on classifying articles based on the available evidence types in the platform such as genetic variations and RNA expression. This would provide users with a better understanding and more insight on the weight of individual target-disease associations.

Availability and requirements

All target-disease data is available for download from https://www.targetvalidation.org/downloads/data as compressed json files. The compiled target and disease dictionaries as well the dataset used in MAP estimation are available from ftp://ftp.ebi.ac.uk/pub/databases/pmc/otar/ for download. The source code is available from the contact author upon request. The code runs on linux system.
  15 in total

1.  A text-mining technique for extracting gene-disease associations from the biomedical literature.

Authors:  Hisham Al-Mubaid; Rajit K Singh
Journal:  Int J Bioinform Res Appl       Date:  2010

2.  DISEASES: text mining and data integration of disease-gene associations.

Authors:  Sune Pletscher-Frankild; Albert Pallejà; Kalliopi Tsafou; Janos X Binder; Lars Juhl Jensen
Journal:  Methods       Date:  2014-12-05       Impact factor: 3.608

3.  Automatic recognition of conceptualization zones in scientific articles and two life science applications.

Authors:  Maria Liakata; Shyamasree Saha; Simon Dobnik; Colin Batchelor; Dietrich Rebholz-Schuhmann
Journal:  Bioinformatics       Date:  2012-02-08       Impact factor: 6.937

4.  Europe PMC: a full-text literature database for the life sciences and platform for innovation.

Authors: 
Journal:  Nucleic Acids Res       Date:  2014-11-06       Impact factor: 16.971

5.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes.

Authors:  Janet Piñero; Núria Queralt-Rosinach; Àlex Bravo; Jordi Deu-Pons; Anna Bauer-Mehren; Martin Baron; Ferran Sanz; Laura I Furlong
Journal:  Database (Oxford)       Date:  2015-04-15       Impact factor: 3.451

6.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

7.  Section level search functionality in Europe PMC.

Authors:  Şenay Kafkas; Xingjun Pi; Nikos Marinos; Francesco Talo'; Andrew Morrison; Johanna R McEntyre
Journal:  J Biomed Semantics       Date:  2015-03-10

8.  Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

Authors:  Şenay Kafkas; Jee-Hyub Kim; Xingjun Pi; Johanna R McEntyre
Journal:  J Biomed Semantics       Date:  2015-01-05

9.  ClinVar: public archive of relationships among sequence variation and human phenotype.

Authors:  Melissa J Landrum; Jennifer M Lee; George R Riley; Wonhee Jang; Wendy S Rubinstein; Deanna M Church; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2013-11-14       Impact factor: 16.971

10.  Finding abbreviations in biomedical literature: three BioC-compatible modules and four BioC-formatted corpora.

Authors:  Rezarta Islamaj Doğan; Donald C Comeau; Lana Yeganova; W John Wilbur
Journal:  Database (Oxford)       Date:  2014-06-09       Impact factor: 3.451

View more
  6 in total

1.  Open Targets: a platform for therapeutic target identification and validation.

Authors:  Gautier Koscielny; Peter An; Denise Carvalho-Silva; Jennifer A Cham; Luca Fumis; Rippa Gasparyan; Samiul Hasan; Nikiforos Karamanis; Michael Maguire; Eliseo Papa; Andrea Pierleoni; Miguel Pignatelli; Theo Platt; Francis Rowland; Priyanka Wankar; A Patrícia Bento; Tony Burdett; Antonio Fabregat; Simon Forbes; Anna Gaulton; Cristina Yenyxe Gonzalez; Henning Hermjakob; Anne Hersey; Steven Jupe; Şenay Kafkas; Maria Keays; Catherine Leroy; Francisco-Javier Lopez; Maria Paula Magarinos; James Malone; Johanna McEntyre; Alfonso Munoz-Pomer Fuentes; Claire O'Donovan; Irene Papatheodorou; Helen Parkinson; Barbara Palka; Justin Paschall; Robert Petryszak; Naruemon Pratanwanich; Sirarat Sarntivijal; Gary Saunders; Konstantinos Sidiropoulos; Thomas Smith; Zbyslaw Sondka; Oliver Stegle; Y Amy Tang; Edward Turner; Brendan Vaughan; Olga Vrousgou; Xavier Watkins; Maria-Jesus Martin; Philippe Sanseau; Jessica Vamathevan; Ewan Birney; Jeffrey Barrett; Ian Dunham
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

2.  SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data.

Authors:  Aravind Venkatesan; Jee-Hyub Kim; Francesco Talo; Michele Ide-Smith; Julien Gobeill; Jacob Carter; Riza Batista-Navarro; Sophia Ananiadou; Patrick Ruch; Johanna McEntyre
Journal:  Wellcome Open Res       Date:  2017-07-10

3.  Ontology based mining of pathogen-disease associations from literature.

Authors:  Şenay Kafkas; Robert Hoehndorf
Journal:  J Biomed Semantics       Date:  2019-09-18

4.  Combining lexical and context features for automatic ontology extension.

Authors:  Sara Althubaiti; Şenay Kafkas; Marwa Abdelhakim; Robert Hoehndorf
Journal:  J Biomed Semantics       Date:  2020-01-13

5.  Uncovering new disease indications for G-protein coupled receptors and their endogenous ligands.

Authors:  Johannes M Freudenberg; Ian Dunham; Philippe Sanseau; Deepak K Rajpal
Journal:  BMC Bioinformatics       Date:  2018-10-01       Impact factor: 3.169

6.  A Systematic Pharmacology and In Vitro Study to Identify the Role of the Active Compounds of Achyranthes bidentata in the Treatment of Osteoarthritis.

Authors:  Zhenyuan Chen; Guangwen Wu; Ruoxi Zheng
Journal:  Med Sci Monit       Date:  2020-09-14
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.