Literature DB >> 11928492

A thematic analysis of the AIDS literature.

W John Wilbur1.   

Abstract

Faced with the need for human comprehension of any large collection of objects, a time honored approach has been to cluster the objects into groups of closely related objects. Individual groups are then summarized in some convenient manner to provide a more manageable view of the data. Such methods have been applied to document collections with mixed results. If a hard clustering of the data into mutually exclusive clusters is performed then documents are frequently forced into one cluster when they may contain important information that would also appropriately make them candidates for other clusters. If a soft clustering is used there still remains the problem of how to provide a useful summary of the data in a cluster. Here we introduce a new algorithm to produce a soft clustering of document collections that is based on the concept of a theme. A theme is conceptually a subject area that is discussed by multiple documents in the database. A theme has two potential representations that may be viewed as dual to each other. First it is represented by the set of documents that discuss the subject or theme and second it is also represented by the set of key terms that are typically used to discuss the theme. Our algorithm is an EM algorithm in which the term representation and the document representation are explicit components and each is used to refine the other in an alternating fashion. Upon convergence the term representation provides a natural summary of the document representation (the cluster). We describe how to optimize the themes produced by this process and give the results of applying the method to a database of over fifty thousand PubMed documents dealing with the subject of AIDS. How themes may improve access to a document collection is also discussed.

Entities:  

Mesh:

Year:  2002        PMID: 11928492

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  8 in total

1.  Retro: concept-based clustering of biomedical topical sets.

Authors:  Lana Yeganova; Won Kim; Sun Kim; W John Wilbur
Journal:  Bioinformatics       Date:  2014-07-29       Impact factor: 6.937

2.  The biomedical discourse relation bank.

Authors:  Rashmi Prasad; Susan McRoy; Nadya Frid; Aravind Joshi; Hong Yu
Journal:  BMC Bioinformatics       Date:  2011-05-23       Impact factor: 3.169

3.  Mapping interdisciplinary fields: efficiencies, gaps and redundancies in HIV/AIDS research.

Authors:  Jimi Adams; Ryan Light
Journal:  PLoS One       Date:  2014-12-15       Impact factor: 3.240

4.  Thematic clustering of text documents using an EM-based approach.

Authors:  Sun Kim; W John Wilbur
Journal:  J Biomed Semantics       Date:  2012-10-05

5.  Exploring supervised and unsupervised methods to detect topics in biomedical text.

Authors:  Minsuk Lee; Weiqing Wang; Hong Yu
Journal:  BMC Bioinformatics       Date:  2006-03-16       Impact factor: 3.169

6.  Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.

Authors:  Sun Kim; Lana Yeganova; W John Wilbur
Journal:  Bioinformatics       Date:  2016-06-10       Impact factor: 6.937

7.  Discovering themes in biomedical literature using a projection-based algorithm.

Authors:  Lana Yeganova; Sun Kim; Grigory Balasanov; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2018-07-16       Impact factor: 3.169

8.  Prioritising references for systematic reviews with RobotAnalyst: A user study.

Authors:  Piotr Przybyła; Austin J Brockmeier; Georgios Kontonatsios; Marie-Annick Le Pogam; John McNaught; Erik von Elm; Kay Nolan; Sophia Ananiadou
Journal:  Res Synth Methods       Date:  2018-07-30       Impact factor: 5.273

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.