| Literature DB >> 22359430 |
Mary Rajathei David1, Selvaraj Samuel.
Abstract
Literature search is a process in which external developers provide alternative representations for efficient data mining of biomedical literature such as ranking search results, displaying summarized knowledge of semantics and clustering results into topics. In clustering search results, prominent vocabularies, such as GO (Gene Ontology), MeSH(Medical Subject Headings) and frequent terms extracted from retrieved PubMed abstracts have been used as topics for grouping. In this study, we have proposed FNeTD (Frequent Nearer Terms of the Domain) method for PubMed abstracts clustering. This is achieved through a two-step process viz; i) identifying frequent words or phrases in the abstracts through the frequent multi-word extraction algorithm and ii) identifying nearer terms of the domain from the extracted frequent phrases using the nearest neighbors search. The efficiency of the clustering of PubMed abstracts using nearer terms of the domain was measured using F-score. The present study suggests that nearer terms of the domain can be used for clustering the search results.Entities:
Keywords: PubMed abstracts; clustering; domain knowledge; nearer term; nearest neighbors search
Year: 2012 PMID: 22359430 PMCID: PMC3282271 DOI: 10.6026/97320630008020
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1System overview of clustering of PubMed abstracts using nearer terms of the domain
Figure 2Frequent multi-word term extraction algorithm. The flowchart explains the steps involved in the extraction of multiword terms from each of the abstract. The computational steps involve comparing two abstracts for the identification of single match, extension of the word match and, storing the commonly occurring multi-word terms into Database S.
Figure 3Snapshot overview of nearer terms of the p53 in the form of hierarchical tree as well as hyper tree view.