| Literature DB >> 35870894 |
Ozan Ozisik1, Morgane Térézol2, Anaïs Baudot3,4,5.
Abstract
BACKGROUND: Enrichment analyses are widely applied to investigate lists of genes of interest. However, such analyses often result in long lists of annotation terms with high redundancy, making the interpretation and reporting difficult. Long annotation lists and redundancy also complicate the comparison of results obtained from different enrichment analyses. An approach to overcome these issues is using down-sized annotation collections composed of non-redundant terms. However, down-sized collections are generic and the level of detail may not fit the user's study. Other available approaches include clustering and filtering tools, which are based on similarity measures and thresholds that can be complicated to comprehend and set. RESULT: We propose orsum, a Python package to filter enrichment results. orsum can filter multiple enrichment results collectively and highlight common and specific annotation terms. Filtering in orsum is based on a simple principle: a term is discarded if there is a more significant term that annotates at least the same genes; the remaining more significant term becomes the representative term for the discarded term. This principle ensures that the main biological information is preserved in the filtered results while reducing redundancy. In addition, as the representative terms are selected from the original enrichment results, orsum outputs filtered terms tailored to the study. As a use case, we applied orsum to the enrichment analyses of four lists of genes, each associated with a neurodegenerative disease.Entities:
Keywords: Enrichment analysis; Filtering; Neurodegenerative diseases; Over-representation analysis
Mesh:
Year: 2022 PMID: 35870894 PMCID: PMC9308244 DOI: 10.1186/s12859-022-04828-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Venn diagrams presenting a the overlap of genes associated with each disease, b the overlap of GO Biological Process terms enriched in genes associated with each disease, c the overlap of GO Biological Process terms enriched in genes associated with each disease after collective filtering by orsum
Fig. 2Top 20 representative terms and the quartiles their ranks belong to according to each input enrichment result
Fig. 3Top 20 representative terms and the number of terms they represent
Numbers of annotation terms in the enrichment analysis results of four neurodegenerative disease gene lists, in total and after filtering by orsum and REVIGO
| Disease | Total number of terms | Number of terms after orsum filtering | Number of terms after REVIGO filtering |
|---|---|---|---|
| AD | 991 | 79 | 540 |
| ALS | 59 | 34 | 51 |
| HD | 22 | 12 | 17 |
| PD | 714 | 104 | 415 |
Fig. 4Numbers of representative terms resulting from orsum and REVIGO applied to the enrichment analyses of artificially generated gene lists. Each point corresponds to an enrichment analysis result obtained for one of the 100 artificially generated gene lists. The size and color of a point indicates the number of terms in the original enrichment analysis result. The red line shows the coordinates where the representative term numbers in orsum and REVIGO are equal