| Literature DB >> 30387496 |
Xiaoyu Dai1, Nan Lin1,2, Daofeng Li3, Ting Wang3.
Abstract
In the analysis of next-generation sequencing technology, massive discrete data are generated from short read counts with varying biological coverage. Conducting conditional hypothesis testing such as Fisher's Exact Test at every genomic region of interest thus leads to a heterogeneous multiple discrete testing problem. However, most existing multiple testing procedures for controlling the false discovery rate (FDR) assume that test statistics are continuous and become conservative for discrete tests. To overcome the conservativeness, in this article, we propose a novel multiple testing procedure for better FDR control on heterogeneous discrete tests. Our procedure makes decisions based on the marginal critical function (MCF) of randomized tests, which enables achieving a powerful and non-randomized multiple testing procedure. We provide upper bounds of the positive FDR (pFDR) and the positive false non-discovery rate (pFNR) corresponding to our procedure. We also prove that the set of detections made by our method contains every detection made by a naive application of the widely-used q-value method. We further demonstrate the improvement of our method over other existing multiple testing procedures by simulations and a real example of differentially methylated region (DMR) detection using whole-genome bisulfite sequencing (WGBS) data.Entities:
Keywords: Discrete P-value; differentially methylated regions; marginal critical function; multiple testing; randomized test; whole-genome bisulfite sequencing
Mesh:
Substances:
Year: 2019 PMID: 30387496 PMCID: PMC6565503 DOI: 10.1111/biom.12996
Source DB: PubMed Journal: Biometrics ISSN: 0006-341X Impact factor: 2.571