Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics.

Literature DB >> 33726670

Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics.

Peter D Karp¹, Peter E Midford², Ron Caspi¹, Arkady Khodursky³.

Abstract

BACKGROUND: Enrichment or over-representation analysis is a common method used in bioinformatics studies of transcriptomics, metabolomics, and microbiome datasets. The key idea behind enrichment analysis is: given a set of significantly expressed genes (or metabolites), use that set to infer a smaller set of perturbed biological pathways or processes, in which those genes (or metabolites) play a role. Enrichment computations rely on collections of defined biological pathways and/or processes, which are usually drawn from pathway databases. Although practitioners of enrichment analysis take great care to employ statistical corrections (e.g., for multiple testing), they appear unaware that enrichment results are quite sensitive to the pathway definitions that the calculation uses.
RESULTS: We show that alternative pathway definitions can alter enrichment p-values by up to nine orders of magnitude, whereas statistical corrections typically alter enrichment p-values by only two orders of magnitude. We present multiple examples where the smaller pathway definitions used in the EcoCyc database produces stronger enrichment p-values than the much larger pathway definitions used in the KEGG database; we demonstrate that to attain a given enrichment p-value, KEGG-based enrichment analyses require 1.3-2.0 times as many significantly expressed genes as does EcoCyc-based enrichment analyses. The large pathways in KEGG are problematic for another reason: they blur together multiple (as many as 21) biological processes. When such a KEGG pathway receives a high enrichment p-value, which of its component processes is perturbed is unclear, and thus the biological conclusions drawn from enrichment of large pathways are also in question.
CONCLUSIONS: The choice of pathway database used in enrichment analyses can have a much stronger effect on the enrichment results than the statistical corrections used in these analyses.

Entities: Chemical Disease Species

Keywords: BioCyc; EcoCyc; Enrichment analysis; KEGG; Metabolomics; Over-representation analysis; Pathway size; Pathways

Year: 2021 PMID： 33726670 PMCID： PMC7967953 DOI： 10.1186/s12864-021-07502-8

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

16 in total

1. Systematic determination of genetic network architecture.

Authors: S Tavazoie; J D Hughes; M J Campbell; R J Cho; G M Church
Journal: Nat Genet Date: 1999-07 Impact factor: 38.330

2. Enrichment or depletion of a GO category within a class of genes: which test?

Authors: Isabelle Rivals; Léon Personnaz; Lieng Taing; Marie-Claude Potier
Journal: Bioinformatics Date: 2006-12-20 Impact factor: 6.937

3. The BioCyc collection of microbial genomes and metabolic pathways.

Authors: Peter D Karp; Richard Billington; Ron Caspi; Carol A Fulcher; Mario Latendresse; Anamika Kothari; Ingrid M Keseler; Markus Krummenacker; Peter E Midford; Quang Ong; Wai Kit Ong; Suzanne M Paley; Pallavi Subhraveti
Journal: Brief Bioinform Date: 2019-07-19 Impact factor: 11.622

Review 4. Ten years of pathway analysis: current approaches and outstanding challenges.

Authors: Purvesh Khatri; Marina Sirota; Atul J Butte
Journal: PLoS Comput Biol Date: 2012-02-23 Impact factor: 4.475

5. Critical assessment of human metabolic pathway databases: a stepping stone for future integration.

Authors: Miranda D Stobbe; Sander M Houten; Gerbert A Jansen; Antoine H C van Kampen; Perry D Moerland
Journal: BMC Syst Biol Date: 2011-10-14

6. The outcomes of pathway database computations depend on pathway ontology.

Authors: M L Green; P D Karp
Journal: Nucleic Acids Res Date: 2006-08-07 Impact factor: 16.971

7. KEGG: new perspectives on genomes, pathways, diseases and drugs.

Authors: Minoru Kanehisa; Miho Furumichi; Mao Tanabe; Yoko Sato; Kanae Morishima
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

8. The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.

Authors: Ingrid M Keseler; Amanda Mackie; Alberto Santos-Zavaleta; Richard Billington; César Bonavides-Martínez; Ron Caspi; Carol Fulcher; Socorro Gama-Castro; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Luis Muñiz-Rascado; Quang Ong; Suzanne Paley; Martin Peralta-Gil; Pallavi Subhraveti; David A Velázquez-Ramírez; Daniel Weaver; Julio Collado-Vides; Ian Paulsen; Peter D Karp
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

9. GAGE: generally applicable gene set enrichment for pathway analysis.

Authors: Weijun Luo; Michael S Friedman; Kerby Shedden; Kurt D Hankenson; Peter J Woolf
Journal: BMC Bioinformatics Date: 2009-05-27 Impact factor: 3.169

10. The MetaCyc database of metabolic pathways and enzymes - a 2019 update.

Authors: Ron Caspi; Richard Billington; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Peter E Midford; Wai Kit Ong; Suzanne Paley; Pallavi Subhraveti; Peter D Karp
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

6 in total

1. Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis.

Authors: Cecilia Wieder; Clément Frainay; Nathalie Poupin; Pablo Rodríguez-Mier; Florence Vinson; Juliette Cooke; Rachel Pj Lai; Jacob G Bundy; Fabien Jourdan; Timothy Ebbels
Journal: PLoS Comput Biol Date: 2021-09-07 Impact factor: 4.475

2. Urgent need for consistent standards in functional enrichment analysis.

Authors: Kaumadi Wijesooriya; Sameer A Jadaan; Kaushalya L Perera; Tanuveer Kaur; Mark Ziemann
Journal: PLoS Comput Biol Date: 2022-03-09 Impact factor: 4.475

3. Understanding signaling and metabolic paths using semantified and harmonized information about biological interactions.

Authors: Ryan A Miller; Martina Kutmon; Anwesha Bohler; Andra Waagmeester; Chris T Evelo; Egon L Willighagen
Journal: PLoS One Date: 2022-04-18 Impact factor: 3.752