Literature DB >> 20876938

A framework for semisupervised feature generation and its applications in biomedical literature mining.

Yanpeng Li1, Xiaohua Hu, Hongfei Lin, Zhihao Yang.   

Abstract

Feature representation is essential to machine learning and text mining. In this paper, we present a feature coupling generalization (FCG) framework for generating new features from unlabeled data. It selects two special types of features, i.e., example-distinguishing features (EDFs) and class-distinguishing features (CDFs) from original feature set, and then generalizes EDFs into higher-level features based on their coupling degrees with CDFs in unlabeled data. The advantage is: EDFs with extreme sparsity in labeled data can be enriched by their co-occurrences with CDFs in unlabeled data so that the performance of these low-frequency features can be greatly boosted and new information from unlabeled can be incorporated. We apply this approach to three tasks in biomedical literature mining: gene named entity recognition (NER), protein-protein interaction extraction (PPIE), and text classification (TC) for gene ontology (GO) annotation. New features are generated from over 20 GB unlabeled PubMed abstracts. The experimental results on BioCreative 2, AIMED corpus, and TREC 2005 Genomics Track show that 1) FCG can utilize well the sparse features ignored by supervised learning. 2) It improves the performance of supervised baselines by 7.8 percent, 5.0 percent, and 5.8 percent, respectively, in the tree tasks. 3) Our methods achieve 89.1, 64.5 F-score, and 60.1 normalized utility on the three benchmark data sets.

Entities:  

Mesh:

Year:  2011        PMID: 20876938     DOI: 10.1109/TCBB.2010.99

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  4 in total

1.  Semi-supervised method for biomedical event extraction.

Authors:  Jian Wang; Qian Xu; Hongfei Lin; Zhihao Yang; Yanpeng Li
Journal:  Proteome Sci       Date:  2013-11-07       Impact factor: 2.480

2.  A robust data-driven approach for gene ontology annotation.

Authors:  Yanpeng Li; Hong Yu
Journal:  Database (Oxford)       Date:  2014-11-25       Impact factor: 3.451

3.  Overview of the gene ontology task at BioCreative IV.

Authors:  Yuqing Mao; Kimberly Van Auken; Donghui Li; Cecilia N Arighi; Peter McQuilton; G Thomas Hayman; Susan Tweedie; Mary L Schaeffer; Stanley J F Laulederkind; Shur-Jen Wang; Julien Gobeill; Patrick Ruch; Anh Tuan Luu; Jung-Jae Kim; Jung-Hsien Chiang; Yu-De Chen; Chia-Jung Yang; Hongfang Liu; Dongqing Zhu; Yanpeng Li; Hong Yu; Ehsan Emadzadeh; Graciela Gonzalez; Jian-Ming Chen; Hong-Jie Dai; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2014-08-25       Impact factor: 3.451

4.  Learning Semantic Tags from Big Data for Clinical Text Representation.

Authors:  Yanpeng Li; Hongfang Liu
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2015-03-25
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.