Literature DB >> 14872004

Finding scientific topics.

Thomas L Griffiths1, Mark Steyvers.   

Abstract

A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying "hot topics" by examining temporal dynamics and tagging abstracts to illustrate semantic content.

Mesh:

Year:  2004        PMID: 14872004      PMCID: PMC387300          DOI: 10.1073/pnas.0307752101

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  2 in total

1.  Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.

Authors:  S Geman; D Geman
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  1984-06       Impact factor: 6.226

2.  Fundamental theorem of natural selection under gene-culture transmission.

Authors:  C S Findlay
Journal:  Proc Natl Acad Sci U S A       Date:  1991-06-01       Impact factor: 11.205

  2 in total
  193 in total

1.  Mapping subsets of scholarly information.

Authors:  Paul Ginsparg; Paul Houle; Thorsten Joachims; Jae-Hoon Sul
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-06       Impact factor: 11.205

2.  From paragraph to graph: latent semantic analysis for information visualization.

Authors:  Thomas K Landauer; Darrell Laham; Marcia Derr
Journal:  Proc Natl Acad Sci U S A       Date:  2004-03-22       Impact factor: 11.205

3.  Mixed-membership models of scientific publications.

Authors:  Elena Erosheva; Stephen Fienberg; John Lafferty
Journal:  Proc Natl Acad Sci U S A       Date:  2004-03-12       Impact factor: 11.205

4.  Mapping knowledge domains: characterizing PNAS.

Authors:  Kevin W Boyack
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-12       Impact factor: 11.205

5.  Mapping annotations with textual evidence using an scLDA model.

Authors:  Bo Jin; Vicky Chen; Lujia Chen; Xinghua Lu
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

6.  A LDA-based approach to promoting ranking diversity for genomics information retrieval.

Authors:  Yan Chen; Xiaoshi Yin; Zhoujun Li; Xiaohua Hu; Jimmy Xiangji Huang
Journal:  BMC Genomics       Date:  2012-06-11       Impact factor: 3.969

7.  Reconceptualizing the classification of PNAS articles.

Authors:  Edoardo M Airoldi; Elena A Erosheva; Stephen E Fienberg; Cyrille Joutard; Tanzy Love; Suyash Shringarpure
Journal:  Proc Natl Acad Sci U S A       Date:  2010-11-15       Impact factor: 11.205

8.  Effects of event knowledge in processing verbal arguments.

Authors:  Klinton Bicknell; Jeffrey L Elman; Mary Hare; Ken McRae; Marta Kutas
Journal:  J Mem Lang       Date:  2010-11-01       Impact factor: 3.059

9.  Pairwise Latent Semantic Association for Similarity Computation in Medical Imaging.

Authors:  Fan Zhang; Yang Song; Weidong Cai; Sidong Liu; Siqi Liu; Sonia Pujol; Ron Kikinis; Yong Xia; Michael J Fulham; David Dagan Feng
Journal:  IEEE Trans Biomed Eng       Date:  2015-09-10       Impact factor: 4.538

10.  Structured Correspondence Topic Models for Mining Captioned Figures in Biological Literature.

Authors:  Amr Ahmed; Eric P Xing; William W Cohen; Robert F Murphy
Journal:  KDD       Date:  2009
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.