| Literature DB >> 26761857 |
Abstract
Mutual information (MI) is a powerful concept for correlation-centric applications. It has been used for feature selection from microarray gene expression data in many works. One of the merits of MI is that, unlike many other heuristic methods, it is based on a mature theoretic foundation. When applied to microarray data, however, it faces some challenges. First, due to the large number of features (i.e., genes) present in microarray data, the true distributions for the expression values of some genes may be distorted by noise. Second, evaluating inter-group mutual information requires estimating multi-variate distributions, which is quite difficult if not impossible. To address these problems, in this paper, we propose a new MI-based feature selection approach for microarray data. Our approach relies on two strategies: one is relevance boosting, which requires a desirable feature to show substantially additional relevance with class labeling beyond the already selected features, the other is feature interaction enhancing, which probabilistically compensates for feature interaction missing from simple aggregation-based evaluation. We justify our approach from both theoretical perspective and experimental results. We use a synthetic dataset to show the statistical significance of the proposed strategies, and real-life datasets to show the improved performance of our approach over the existing methods.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26761857 DOI: 10.1109/TCBB.2016.2515582
Source DB: PubMed Journal: IEEE/ACM Trans Comput Biol Bioinform ISSN: 1545-5963 Impact factor: 3.710