Literature DB >> 24903419

Gene set analysis: limitations in popular existing methods and proposed improvements.

Pashupati Mishra1, Petri Törönen1, Yrjö Leino1, Liisa Holm1.   

Abstract

MOTIVATION: Gene set analysis is the analysis of a set of genes that collectively contribute to a biological process. Most popular gene set analysis methods are based on empirical P-value that requires large number of permutations. Despite numerous gene set analysis methods developed in the past decade, the most popular methods still suffer from serious limitations.
RESULTS: We present a gene set analysis method (mGSZ) based on Gene Set Z-scoring function (GSZ) and asymptotic P-values. Asymptotic P-value calculation requires fewer permutations, and thus speeds up the gene set analysis process. We compare the GSZ-scoring function with seven popular gene set scoring functions and show that GSZ stands out as the best scoring function. In addition, we show improved performance of the GSA method when the max-mean statistics is replaced by the GSZ scoring function. We demonstrate the importance of both gene and sample permutations by showing the consequences in the absence of one or the other. A comparison of asymptotic and empirical methods of P-value estimation demonstrates a clear advantage of asymptotic P-value over empirical P-value. We show that mGSZ outperforms the state-of-the-art methods based on two different evaluations. We compared mGSZ results with permutation and rotation tests and show that rotation does not improve our asymptotic P-values. We also propose well-known asymptotic distribution models for three of the compared methods.
AVAILABILITY AND IMPLEMENTATION: mGSZ is available as R package from cran.r-project.org.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24903419     DOI: 10.1093/bioinformatics/btu374

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Robust multi-group gene set analysis with few replicates.

Authors:  Pashupati P Mishra; Alan Medlar; Liisa Holm; Petri Törönen
Journal:  BMC Bioinformatics       Date:  2016-12-09       Impact factor: 3.169

2.  Mlh1 deficiency in normal mouse colon mucosa associates with chromosomally unstable colon cancer.

Authors:  Marjaana Pussila; Petri Törönen; Elisabet Einarsdottir; Shintaro Katayama; Kaarel Krjutškov; Liisa Holm; Juha Kere; Päivi Peltomäki; Markus J Mäkinen; Jere Linden; Minna Nyström
Journal:  Carcinogenesis       Date:  2018-05-28       Impact factor: 4.944

3.  Epigenome-450K-wide methylation signatures of active cigarette smoking: The Young Finns Study.

Authors:  Pashupati P Mishra; Ismo Hänninen; Emma Raitoharju; Saara Marttila; Binisha H Mishra; Nina Mononen; Mika Kähönen; Mikko Hurme; Olli Raitakari; Petri Törönen; Liisa Holm; Terho Lehtimäki
Journal:  Biosci Rep       Date:  2020-07-31       Impact factor: 3.840

Review 4.  Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges.

Authors:  Samarendra Das; Craig J McClain; Shesh N Rai
Journal:  Entropy (Basel)       Date:  2020-04-10       Impact factor: 2.524

5.  Methylation status of nc886 epiallele reflects periconceptional conditions and is associated with glucose metabolism through nc886 RNAs.

Authors:  Saara Marttila; Leena E Viiri; Pashupati P Mishra; Brigitte Kühnel; Pamela R Matias-Garcia; Leo-Pekka Lyytikäinen; Tiina Ceder; Nina Mononen; Wolfgang Rathmann; Juliane Winkelmann; Annette Peters; Mika Kähönen; Nina Hutri-Kähönen; Markus Juonala; Katriina Aalto-Setälä; Olli Raitakari; Terho Lehtimäki; Melanie Waldenberger; Emma Raitoharju
Journal:  Clin Epigenetics       Date:  2021-07-22       Impact factor: 6.551

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.