Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Biological sequence classification utilizing positive and unlabeled data.

Literature DB >> 18344247

Biological sequence classification utilizing positive and unlabeled data.

Abstract

MOTIVATION: In the genomics setting, an increasingly common data configuration consists of a small set of sequences possessing a targeted property (positive instances) amongst a large set of sequences for which class membership is unknown (unlabeled instances). Traditional two-class classification methods do not effectively handle such data.
RESULTS: Here, we develop a novel method, likely positive-iterative classification (LP-IC) for this problem, and contrast its performance with the few existing methods, most of which were devised and utilized in the text classification context. LP-IC employs an iterative classification scheme and introduces a class dispersion measure, adopted from unsupervised clustering approaches, to monitor the model selection process. Using two case studies--prediction of HLA binding, and alternative splicing conservation between human and mouse--we show that LP-IC provides superior performance to existing methodologies in terms of: (i) combined accuracy and precision in positive identification from the unlabeled set; and (ii) predictive performance of the resultant classifiers on independent test data.

Entities: Species

Mesh：

Substances：
Genetic Markers

Year: 2008 PMID： 18344247 DOI： 10.1093/bioinformatics/btn089

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

1 in total

1. Towards site-based protein functional annotations.

Authors: Seak Fei Lei; Jun Huan
Journal: Int J Data Min Bioinform Date: 2010 Impact factor: 0.667

1 in total