| Literature DB >> 27829002 |
Sora Yoon1, Seon-Young Kim2,3, Dougu Nam1,4.
Abstract
Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set. We demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance. An efficient R package (AbsFilterGSEA) coded with C++ (Rcpp) is available from CRAN.Entities:
Mesh:
Year: 2016 PMID: 27829002 PMCID: PMC5102490 DOI: 10.1371/journal.pone.0165919
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The relationship between the mixing coefficient (alpha) and the average inter-gene correlation.
Fig 2Performance comparison of gene-permuting GSEA methods for simulated read counts.
GSEA-GP methods combined with eight gene statistics, (moderated t-statistic, SNR, Ranksum, logFC and their absolute versions), Camera combined with voom normalization, RNA-Enrich and two preranked GSEA methods for edgeR p-values and FCs were compared for false positive rate, true positive rate and area under the receiver operating curve using simulated read count data with three (A-C) and five replicates (D-F).
Fig 3Average receiver operating characteristic (ROC) curves.
The average ROC curves (20 repetitions) of the twelve gene-permuting GSEA methods applied to simulation data with the inter-gene correlation of 0.3 for (A) three and (B) five replicate cases.
Significant gene-sets detected by the absolute GSEA-GP filtering (FDR<0.1) with the mod-t score (DHT-treated and control LNCaP cell line).
| 2.79 × 10−4 | 1.78 | |
| 2.27 × 10−2 | 1.94 | |
| 3.02 × 10−2 | 1.42 | |
| 5.12 × 10−2 | 1.17 | |
| 4.40 × 10−2 | 1.53 |