| Literature DB >> 35263338 |
Kaumadi Wijesooriya1, Sameer A Jadaan2, Kaushalya L Perera1, Tanuveer Kaur1, Mark Ziemann1.
Abstract
Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. These problems include the use of inappropriate background gene lists, lack of false discovery rate correction and lack of methodological detail. To ascertain the frequency of these issues in the literature, we performed a screen of 186 open-access research articles describing functional enrichment results. We find that 95% of analyses using over-representation tests did not implement an appropriate background gene list or did not describe this in the methods. Failure to perform p-value correction for multiple tests was identified in 43% of analyses. Many studies lacked detail in the methods section about the tools and gene sets used. An extension of this survey showed that these problems are not associated with journal or article level bibliometrics. Using seven independent RNA-seq datasets, we show misuse of enrichment tools alters results substantially. In conclusion, most published functional enrichment studies suffered from one or more major flaws, highlighting the need for stronger standards for enrichment analysis.Entities:
Mesh:
Year: 2022 PMID: 35263338 PMCID: PMC8936487 DOI: 10.1371/journal.pcbi.1009935
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Seven independent RNA-seq experiments used for functional enrichment analysis.
Detection threshold is an average of 10 reads per sample. Differentially expressed genes are defined as FDR<0.05 using DESeq2.
| SRA accession and citation | Control datasets | Case datasets | Genes detected | Genes differentially expressed |
|---|---|---|---|---|
| SRP128998 [ | GSM2932797 GSM2932798 GSM2932799 | GSM2932791 GSM2932792 GSM2932793 | 15635 | 3472 |
| SRP038101 [ | GSM1329862 GSM1329863 GSM1329864 | GSM1329859 GSM1329860 GSM1329861 | 13926 | 3589 |
| SRP037718 [ | GSM1326472 GSM1326473 GSM1326474 | GSM1326469 GSM1326470 GSM1326471 | 15477 | 9488 |
| SRP096177 [ | GSM2448985 GSM2448986 GSM2448987 | GSM2448982 GSM2448983 GSM2448984 | 15607 | 5150 |
| SRP247621 [ | GSM4300737 GSM4300738 GSM4300739 | GSM4300731 GSM4300732 GSM4300733 | 14288 | 230 |
| SRP253951 [ | GSM4462339 GSM4462340 GSM4462341 | GSM4462336 GSM4462337 GSM4462338 | 15182 | 8588 |
| SRP068733 [ | GSM2044431 GSM2044432 GSM2044433 | GSM2044428 GSM2044429 GSM2044430 | 14255 | 7365 |
Scoring schema.
| 1 point deducted | 1 point awarded |
|---|---|
| Gene set library origin not stated | Code made available |
| Gene set library version not stated | Gene profile data provided |
| Statistical test not stated | |
| No statistical test conducted | |
| No FDR correction conducted | |
| App used not stated | |
| App version not stated | |
| Background list not defined | |
| Inappropriate background list used |