Literature DB >> 33726670

Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics.

Peter D Karp1, Peter E Midford2, Ron Caspi1, Arkady Khodursky3.   

Abstract

BACKGROUND: Enrichment or over-representation analysis is a common method used in bioinformatics studies of transcriptomics, metabolomics, and microbiome datasets. The key idea behind enrichment analysis is: given a set of significantly expressed genes (or metabolites), use that set to infer a smaller set of perturbed biological pathways or processes, in which those genes (or metabolites) play a role. Enrichment computations rely on collections of defined biological pathways and/or processes, which are usually drawn from pathway databases. Although practitioners of enrichment analysis take great care to employ statistical corrections (e.g., for multiple testing), they appear unaware that enrichment results are quite sensitive to the pathway definitions that the calculation uses.
RESULTS: We show that alternative pathway definitions can alter enrichment p-values by up to nine orders of magnitude, whereas statistical corrections typically alter enrichment p-values by only two orders of magnitude. We present multiple examples where the smaller pathway definitions used in the EcoCyc database produces stronger enrichment p-values than the much larger pathway definitions used in the KEGG database; we demonstrate that to attain a given enrichment p-value, KEGG-based enrichment analyses require 1.3-2.0 times as many significantly expressed genes as does EcoCyc-based enrichment analyses. The large pathways in KEGG are problematic for another reason: they blur together multiple (as many as 21) biological processes. When such a KEGG pathway receives a high enrichment p-value, which of its component processes is perturbed is unclear, and thus the biological conclusions drawn from enrichment of large pathways are also in question.
CONCLUSIONS: The choice of pathway database used in enrichment analyses can have a much stronger effect on the enrichment results than the statistical corrections used in these analyses.

Entities:  

Keywords:  BioCyc; EcoCyc; Enrichment analysis; KEGG; Metabolomics; Over-representation analysis; Pathway size; Pathways

Year:  2021        PMID: 33726670      PMCID: PMC7967953          DOI: 10.1186/s12864-021-07502-8

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  16 in total

1.  Systematic determination of genetic network architecture.

Authors:  S Tavazoie; J D Hughes; M J Campbell; R J Cho; G M Church
Journal:  Nat Genet       Date:  1999-07       Impact factor: 38.330

2.  Enrichment or depletion of a GO category within a class of genes: which test?

Authors:  Isabelle Rivals; Léon Personnaz; Lieng Taing; Marie-Claude Potier
Journal:  Bioinformatics       Date:  2006-12-20       Impact factor: 6.937

3.  The BioCyc collection of microbial genomes and metabolic pathways.

Authors:  Peter D Karp; Richard Billington; Ron Caspi; Carol A Fulcher; Mario Latendresse; Anamika Kothari; Ingrid M Keseler; Markus Krummenacker; Peter E Midford; Quang Ong; Wai Kit Ong; Suzanne M Paley; Pallavi Subhraveti
Journal:  Brief Bioinform       Date:  2019-07-19       Impact factor: 11.622

Review 4.  Ten years of pathway analysis: current approaches and outstanding challenges.

Authors:  Purvesh Khatri; Marina Sirota; Atul J Butte
Journal:  PLoS Comput Biol       Date:  2012-02-23       Impact factor: 4.475

5.  Critical assessment of human metabolic pathway databases: a stepping stone for future integration.

Authors:  Miranda D Stobbe; Sander M Houten; Gerbert A Jansen; Antoine H C van Kampen; Perry D Moerland
Journal:  BMC Syst Biol       Date:  2011-10-14

6.  The outcomes of pathway database computations depend on pathway ontology.

Authors:  M L Green; P D Karp
Journal:  Nucleic Acids Res       Date:  2006-08-07       Impact factor: 16.971

7.  KEGG: new perspectives on genomes, pathways, diseases and drugs.

Authors:  Minoru Kanehisa; Miho Furumichi; Mao Tanabe; Yoko Sato; Kanae Morishima
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

8.  The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.

Authors:  Ingrid M Keseler; Amanda Mackie; Alberto Santos-Zavaleta; Richard Billington; César Bonavides-Martínez; Ron Caspi; Carol Fulcher; Socorro Gama-Castro; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Luis Muñiz-Rascado; Quang Ong; Suzanne Paley; Martin Peralta-Gil; Pallavi Subhraveti; David A Velázquez-Ramírez; Daniel Weaver; Julio Collado-Vides; Ian Paulsen; Peter D Karp
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

9.  GAGE: generally applicable gene set enrichment for pathway analysis.

Authors:  Weijun Luo; Michael S Friedman; Kerby Shedden; Kurt D Hankenson; Peter J Woolf
Journal:  BMC Bioinformatics       Date:  2009-05-27       Impact factor: 3.169

10.  The MetaCyc database of metabolic pathways and enzymes - a 2019 update.

Authors:  Ron Caspi; Richard Billington; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Peter E Midford; Wai Kit Ong; Suzanne Paley; Pallavi Subhraveti; Peter D Karp
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

View more
  6 in total

1.  Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis.

Authors:  Cecilia Wieder; Clément Frainay; Nathalie Poupin; Pablo Rodríguez-Mier; Florence Vinson; Juliette Cooke; Rachel Pj Lai; Jacob G Bundy; Fabien Jourdan; Timothy Ebbels
Journal:  PLoS Comput Biol       Date:  2021-09-07       Impact factor: 4.475

2.  Urgent need for consistent standards in functional enrichment analysis.

Authors:  Kaumadi Wijesooriya; Sameer A Jadaan; Kaushalya L Perera; Tanuveer Kaur; Mark Ziemann
Journal:  PLoS Comput Biol       Date:  2022-03-09       Impact factor: 4.475

3.  Understanding signaling and metabolic paths using semantified and harmonized information about biological interactions.

Authors:  Ryan A Miller; Martina Kutmon; Anwesha Bohler; Andra Waagmeester; Chris T Evelo; Egon L Willighagen
Journal:  PLoS One       Date:  2022-04-18       Impact factor: 3.752

Review 4.  Analytical Considerations of Large-Scale Aptamer-Based Datasets for Translational Applications.

Authors:  Will Jiang; Jennifer C Jones; Uma Shankavaram; Mary Sproull; Kevin Camphausen; Andra V Krauze
Journal:  Cancers (Basel)       Date:  2022-04-29       Impact factor: 6.639

Review 5.  On the influence of several factors on pathway enrichment analysis.

Authors:  Sarah Mubeen; Alpha Tom Kodamullil; Martin Hofmann-Apitius; Daniel Domingo-Fernández
Journal:  Brief Bioinform       Date:  2022-05-13       Impact factor: 13.994

6.  Screening of plasma exosomal lncRNAs to identify potential biomarkers for obstructive sleep apnea.

Authors:  Xunxun Chen; Hongbing Liu; Rong Huang; Ran Wei; Yuchuan Zhao; Taoping Li
Journal:  Ann Transl Med       Date:  2022-09
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.