| Literature DB >> 30217148 |
Matthew M Parks1, Benjamin J Raphael2, Charles E Lawrence3,4.
Abstract
BACKGROUND: Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results.Entities:
Keywords: Big data; False discovery rate (FDR); High dimensional inference; Hypothesis testing
Mesh:
Year: 2018 PMID: 30217148 PMCID: PMC6137876 DOI: 10.1186/s12859-018-2356-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Empirical probability density functions f and f for the observed read depth ratios for the test and control data, respectively. Both density functions were obtained by kernel density estimation with a Normal kernel. The vertical black line indicates y = 1
Number of test data points that are significant (FDR < 0.05) according to various strategies for controlling the FDR. “Control data” indicates the control data-based local FDR strategy described in the present work. All other strategies indicate the assumed parametric form for the null distribution whose parameters are estimated via Efron’s semi-parametric local FDR method. Results are shown for a representative individual
| null distribution form | number of significant calls |
|---|---|
| control data | 47 |
| lognormal | 106 |
| 2-mix | 118 |
| 3-mix | 119 |
| 4-mix | 123 |
| normal | 106 |
Fig. 2Probability density functions for the test distribution, mode-shifted control distribution, and 1-, 2-, 3-, and 4- component Gaussian mixtures fitted to the central region of the test data. The vertical dotted black line indicates the mode of the test data. The vertical solid black lines indicate the boundaries of the half-height region