| Literature DB >> 27579032 |
Hongfang Zhou1, Jie Guo1, Yinghui Wang1, Minghua Zhao1.
Abstract
Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.Entities:
Mesh:
Year: 2016 PMID: 27579032 PMCID: PMC4992800 DOI: 10.1155/2016/1715780
Source DB: PubMed Journal: Comput Intell Neurosci
Pseudocode 120 NewsGroup corpus.
| Category number | Category name |
|---|---|
| 1 | alt.atheism |
| 2 | comp.graphics |
| 3 | comp.os.ms-windows.misc |
| 4 | comp.sys.ibm.pc.hardware |
| 5 | comp.sys.mac.hardware |
| 6 | comp.windows.x |
| 7 | misc.forsale |
| 8 | rec.autos |
| 9 | rec.motorcycles |
| 10 | rec.sport.baseball |
| 11 | rec.sport.hockey |
| 12 | sci.crypt |
| 13 | sci.electronics |
| 14 | sci.med |
| 15 | sci.space |
| 16 | soc.religion.christian |
| 17 | talk.politics.guns |
| 18 | talk.politics.mideast |
| 19 | talk.politics.misc |
| 20 | talk.religion.misc |
SougouCS corpus.
| Category number | Category name |
|---|---|
| 1 | Car |
| 2 | Finance |
| 3 | IT |
| 4 | Health |
| 5 | Sports |
| 6 | Tourism |
| 7 | Education |
| 8 | Culture |
| 9 | Military |
| 10 | Housing |
| 11 | Entertainment |
| 12 | Fashion |
Figure 1Precision and recall performance on the 20 NewsGroup corpus.
Figure 2macro-F 1 performance on the 20 NewsGroup corpus.
Figure 3Precision and recall performance on the SougouCS corpus.
Figure 4macro-F 1 performance on the SougouCS corpus.