Literature DB >> 19229086

Supervised and traditional term weighting methods for automatic text categorization.

Man Lan1, Chew Lim Tan, Jian Su, Yue Lu.   

Abstract

In vector space model (VSM), text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different importance in a text. The term weighting methods assign appropriate weights to the terms to improve the performance of text categorization. In this study, we investigate several widely-used unsupervised (traditional) and supervised term weighting methods on benchmark data collections in combination with SVM and kappa NN algorithms. In consideration of the distribution of relevant documents in the collection, we propose a new simple supervised term weighting method, i.e. tf.rf, to improve the terms' discriminating power for text categorization task. From the controlled experimental results, these supervised term weighting methods have mixed performance. Specifically, our proposed supervised term weighting method, tf.rf, has a consistently better performance than other term weighting methods while other supervised term weighting methods based on information theory or statistical metric perform the worst in all experiments. On the other hand, the popularly used tf.idf method has not shown a uniformly good performance in terms of different data sets.

Entities:  

Year:  2009        PMID: 19229086     DOI: 10.1109/TPAMI.2008.110

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  15 in total

1.  Endogenous sequential cortical activity evoked by visual stimuli.

Authors:  Luis Carrillo-Reid; Jae-Eun Kang Miller; Jordan P Hamm; Jesse Jackson; Rafael Yuste
Journal:  J Neurosci       Date:  2015-06-10       Impact factor: 6.167

2.  Imprinting and recalling cortical ensembles.

Authors:  Luis Carrillo-Reid; Weijian Yang; Yuki Bando; Darcy S Peterka; Rafael Yuste
Journal:  Science       Date:  2016-08-11       Impact factor: 47.728

3.  Using ensemble models to classify the sentiment expressed in suicide notes.

Authors:  James A McCart; Dezon K Finch; Jay Jarman; Edward Hickling; Jason D Lind; Matthew R Richardson; Donald J Berndt; Stephen L Luther
Journal:  Biomed Inform Insights       Date:  2012-01-30

4.  A text mining approach to detect mentions of protein glycosylation in biomedical text.

Authors:  Daksha Shukla; Valadi K Jayaraman
Journal:  Bioinformation       Date:  2012-08-24

5.  Japanese EMRs and IT in Medicine: Expansion, Integration, and Reuse of Data.

Authors:  Katsuhiko Takabayashi; Shunsuke Doi; Takahiro Suzuki
Journal:  Healthc Inform Res       Date:  2011-09-30

6.  The importance of Term Weighting in semantic understanding of text: A review of techniques.

Authors:  R N Rathi; A Mustafi
Journal:  Multimed Tools Appl       Date:  2022-04-13       Impact factor: 2.577

7.  Exploiting semantic annotations and Q-learning for constructing an efficient hierarchy/graph texts organization.

Authors:  Asmaa M El-Said; Ali I Eldesoky; Hesham A Arafat
Journal:  ScientificWorldJournal       Date:  2015-01-01

8.  Automatic query generation using word embeddings for retrieving passages describing experimental methods.

Authors:  Ferhat Aydın; Zehra Melce Hüsünbeyi; Arzucan Özgür
Journal:  Database (Oxford)       Date:  2017-01-10       Impact factor: 3.451

9.  Relevance popularity: A term event model based feature selection scheme for text classification.

Authors:  Guozhong Feng; Baiguo An; Fengqin Yang; Han Wang; Libiao Zhang
Journal:  PLoS One       Date:  2017-04-05       Impact factor: 3.240

10.  Text Mining in Organizational Research.

Authors:  Vladimer B Kobayashi; Stefan T Mol; Hannah A Berkers; Gábor Kismihók; Deanne N Den Hartog
Journal:  Organ Res Methods       Date:  2017-08-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.