| Literature DB >> 28361675 |
Abstract
BACKGROUND: q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation.Entities:
Keywords: Conservative adjustment; False discovery rate; q-value
Mesh:
Year: 2017 PMID: 28361675 PMCID: PMC5374657 DOI: 10.1186/s12859-017-1474-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
A summary in the situation of multiple hypothesis testing
| True null | False null | Total | |
|---|---|---|---|
| Negative |
|
|
|
| Positive |
|
|
|
| Total |
|
|
|
This table shows the numbers of true/false negatives/positives in the situation of multiple hypothesis testing. The details are described in the Methods section
Fig. 1Simulation results for four scenarios. a Relatively weak differential expression and relatively small proportion of differential expression. b Relatively strong differential expression but relatively small proportion of differential expression. c Relatively weak differential expression but relatively large proportion of differential expression. d Relatively strong differential expression and relatively large proportion of differential expression. The simulation details are described in the Results section
Fig. 2Simulation results for a typical scenario. Moderate differential expression and moderate proportion of differential expression. The simulation details are described in the Results section
Fig. 3A simulation example for an artificial illustration. The theoretical true false discovery rate (FDR) is compared to the related estimate by q-value. This is a scenario with relatively weak differential expression and relatively small proportion of differential expression. Dark circles represent original (unadjusted) q-values and dark triangles represent conservatively adjusted q-values. The simulation details are described in the Results section
Fig. 4Three applications to experimental genome-wide expression data. a A microarray data set collected for a type 2 diabetes study. b A RNA sequencing (RNA-seq) data set collected for a prostate cancer study in The Cancer Genome Atlas (TCGA) project. c A microarray data set collected for a pancreatic islet study. The curves represent q-value (as estimated FDR) vs. its related number of identified genes. In each application, dark solid curve represents original (unadjusted) q-values and dark dashed curve represents conservatively adjusted q-values