| Literature DB >> 20161303 |
Alexander Gordon1, Linlin Chen, Galina Glazko, Andrei Yakovlev.
Abstract
A new procedure is proposed to balance type I and II errors in significance testing for differential expression of individual genes. Suppose that a collection, F(k), of k lists of selected genes is available, each of them approximating by their content the true set of differentially expressed genes. For example, such sets can be generated by a subsampling counterpart of the delete-d-jackknife method controlling the per-comparison error rate for each subsample. A final list of candidate genes, denoted by S(*), is composed in such a way that its contents be closest in some sense to all the sets thus generated. To measure "closeness" of gene lists, we introduce an asymmetric distance between sets with its asymmetry arising from a generally unequal assignment of the relative costs of type I and type II errors committed in the course of gene selection. The optimal set S(*) is defined as a minimizer of the average asymmetric distance from an arbitrary set S to all sets in the collection F(k). The minimization problem can be solved explicitly, leading to a frequency criterion for the inclusion of each gene in the final set. The proposed method is tested by resampling from real microarray gene expression data with artificially introduced shifts in expression levels of pre-defined genes, thereby mimicking their differential expression.Entities:
Year: 2009 PMID: 20161303 PMCID: PMC2699298 DOI: 10.1016/j.csda.2008.04.010
Source DB: PubMed Journal: Comput Stat Data Anal ISSN: 0167-9473 Impact factor: 1.681