Literature DB >> 22829734

TPX: Biomedical literature search made easy.

Thomas Joseph1, Vangala G Saipradeep, Ganesh Sekar Venkat Raghavan, Rajgopal Srinivasan, Aditya Rao, Sujatha Kotte, Naveen Sivadasan.   

Abstract

UNLABELLED: TPX is a web-based PubMed search enhancement tool that enables faster article searching using analysis and exploration features. These features include identification of relevant biomedical concepts from search results with linkouts to source databases, concept based article categorization, concept assisted search and filtering, query refinement. A distinguishing feature here is the ability to add user-defined concept names and/or concept types for named entity recognition. The tool allows contextual exploration of knowledge sources by providing concept association maps derived from the MEDLINE repository. It also has a full-text search mode that can be configured on request to access local text repositories, incorporating entity co-occurrence search at sentence/paragraph levels. Local text files can also be analyzed on-the-fly. AVAILABILITY: http://tpx.atc.tcs.com

Entities:  

Keywords:  PubMed search; biomedical literature; concept association; concept identification; concept-assisted search; ontology based dictionaries; text-mining

Year:  2012        PMID: 22829734      PMCID: PMC3398782          DOI: 10.6026/97320630008578

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

PubMed, the most popular and publicly available life science literature retrieval tool, is generally used for retrieval of specific information from MEDLINE rather than as an exploratory medium. There are several other retrieval tools for searching MEDLINE such as GoPubMed [1] and EBIMed [2] that have been designed to provide improved retrieval from literature and other related information sources. However, it is desirable to have a mechanism which (a) utilizes the strengths of PubMed (b) enables users to search, explore and manage the literature more effectively and (c) enables integration with structured knowledge sources. We have developed TPX (TCS Pubmed eXplorer), a web-based tool which supports concept-assisted search and navigation that relies on PubMed as the underlying search engine to search the MEDLINE database. In addition, many users have local collections of free/purchased articles as well as other text documents related to their specific areas of research in the biomedical domain. Exploring and information retrieval from these collections for purpose of extracting specific information remains a challenge. TPX can be configured on request to give the user the ability to explore the user's own locally available collection of articles. To demonstrate this option, TPX has been configured to access the Open Access subset of PMC articles using the full-text search mode. In addition, while searching through local repositories, users can limit co-occurring terms to be within the same sentence, paragraph or the entire article.

Tool features

Concept Identification:

TPX is equipped with a dictionary-based Named Entity Recognition (NER) system that identifies various biological concepts in an article using approximate string matching rules. The NER system identifies a wide range of concept categories such as genes, proteins, diseases, symptoms, chemicals and drugs, processes, functions, localization, experimental methods and cell lines derived from ontologies like MeSH [3], Gene Ontology [4] and other sources like Entrez Gene [5], UniProt [6], Expasy Enzyme [7], NCBI Taxonomy [8] and HyperCLDB [9]. The dictionaries are modular - so that specified types can be selectively used and dictionaries can be added/removed on request. Local abbreviation-handling is also integral to the tool [10]. Individual users can further include custom categories and concepts to augment the provided dictionaries. These custom concepts are recognized in articles and also used for identifying co-occurrences in the full-text mode. SNPs and other mutation mentions are currently tagged and highlighted on-the-fly using a pattern-based NER system.

Tagged Abstracts:

The abstracts retrieved in PubMed/full-text mode are displayed with the identified entities highlighted with predefined colors (Figure 1).
Figure 1

Shows a screen-shot of the tool

The properties for each of the identified entities can be viewed by clicking it. The properties include external links to respective information sources, facility to explore associated concepts, feedback, etc. The tool provides a facility for the users to suggest entities that have been missed out by the NER system. Through user preferences, the user can customize highlighting only those concepts belonging to categories of interest.

Concept Assisted Search and Navigation:

The identified concepts from the analyzed article set of the search results are ranked on-the-fly according to their relevance to the article collection. The ranked concepts are categorized under the respective categories allowing the user to explore the articles with respect to the categorized concepts. These concepts can be exported and saved into file in various formats. Users can select concepts of interest and this selection can either be used to filter the result or be used in query refinement. Entity co-occurrences (at sentence or paragraph level) in full-text articles can be searched in the full-text mode, using terms in the system dictionaries as well as user-created terms.

Concept Association Map:

The tool has a pairwise concept association map incorporated. These associations are pre-computed and ranked according to their relevance to the whole of the tagged MEDLINE corpus. The user can explore the associations starting from a concept of interest by selecting it in the tagged abstract. For each association, a set of relevant article abstracts are displayed. In addition, through the Concept Panel or Association Viewer, users can retrieve sub-maps of the association map such as associations between proteins and chemicals related to a disease like breast cancer. These sub-maps can be exported and saved into file as a list of associations.

Notes, Comments, Labels and Pinned Abstracts:

Users can store personal notes and comments for each abstract. Notes are specific to a user, while comments are visible to all users. Users can bookmark articles of interest using custom labels. Alternatively, the ‘article pinning’ option allows users to bookmark important articles without attaching specific labels to them. All such data can be viewed under the user's Briefcase.

Conclusion

We have developed TPX, a search analysis and exploration tool whose various features aid in faster information hunt in addition to enhanced exploration and management of PubMed/local repository search results. The user terms/types feature makes TPX an ideal “personal information assistant” for scientists and biomedical literature curators. We believe that relying on PubMed as the basic search/indexing engine and having an advanced search analysis tool that performs customized off-line and on-the-fly analysis of the search result is an effective way of integrating biomedical literature with inhouse and other external knowledge sources. TPX, with the ability to point to external websites in addition to local repositories is a step towards faster, personalized biomedical literature analysis.
  9 in total

1.  The ENZYME database in 2000.

Authors:  A Bairoch
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

3.  ALICE: an algorithm to extract abbreviations from MEDLINE.

Authors:  Hiroko Ao; Toshihisa Takagi
Journal:  J Am Med Inform Assoc       Date:  2005-05-19       Impact factor: 4.497

4.  EBIMed--text crunching to gather facts for proteins from Medline.

Authors:  Dietrich Rebholz-Schuhmann; Harald Kirsch; Miguel Arregui; Sylvain Gaudan; Mark Riethoven; Peter Stoehr
Journal:  Bioinformatics       Date:  2007-01-15       Impact factor: 6.937

5.  Ongoing and future developments at the Universal Protein Resource.

Authors: 
Journal:  Nucleic Acids Res       Date:  2010-11-04       Impact factor: 16.971

6.  Database resources of the National Center for Biotechnology Information.

Authors:  Eric W Sayers; Tanya Barrett; Dennis A Benson; Evan Bolton; Stephen H Bryant; Kathi Canese; Vyacheslav Chetvernin; Deanna M Church; Michael DiCuccio; Scott Federhen; Michael Feolo; Ian M Fingerman; Lewis Y Geer; Wolfgang Helmberg; Yuri Kapustin; David Landsman; David J Lipman; Zhiyong Lu; Thomas L Madden; Tom Madej; Donna R Maglott; Aron Marchler-Bauer; Vadim Miller; Ilene Mizrachi; James Ostell; Anna Panchenko; Lon Phan; Kim D Pruitt; Gregory D Schuler; Edwin Sequeira; Stephen T Sherry; Martin Shumway; Karl Sirotkin; Douglas Slotta; Alexandre Souvorov; Grigory Starchenko; Tatiana A Tatusova; Lukas Wagner; Yanli Wang; W John Wilbur; Eugene Yaschenko; Jian Ye
Journal:  Nucleic Acids Res       Date:  2010-11-21       Impact factor: 16.971

7.  Entrez Gene: gene-centered information at NCBI.

Authors:  Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal:  Nucleic Acids Res       Date:  2010-11-28       Impact factor: 16.971

8.  GoPubMed: exploring PubMed with the Gene Ontology.

Authors:  Andreas Doms; Michael Schroeder
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

9.  Cell Line Data Base: structure and recent improvements towards molecular authentication of human cell lines.

Authors:  Paolo Romano; Assunta Manniello; Ottavia Aresu; Massimiliano Armento; Michela Cesaro; Barbara Parodi
Journal:  Nucleic Acids Res       Date:  2008-10-15       Impact factor: 16.971

  9 in total
  3 in total

1.  Electronic biomedical literature search for budding researcher.

Authors:  Subhash B Thakre; Sushama S Thakre S; Amol D Thakre
Journal:  J Clin Diagn Res       Date:  2013-09-10

2.  A pipeline to extract drug-adverse event pairs from multiple data sources.

Authors:  Srijyothsna Yeleswarapu; Aditya Rao; Thomas Joseph; Vangala Govindakrishnan Saipradeep; Rajgopal Srinivasan
Journal:  BMC Med Inform Decis Mak       Date:  2014-02-24       Impact factor: 2.796

3.  PRIORI-T: A tool for rare disease gene prioritization using MEDLINE.

Authors:  Aditya Rao; Thomas Joseph; Vangala G Saipradeep; Sujatha Kotte; Naveen Sivadasan; Rajgopal Srinivasan
Journal:  PLoS One       Date:  2020-04-21       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.