MOTIVATION: Gene set analysis is the analysis of a set of genes that collectively contribute to a biological process. Most popular gene set analysis methods are based on empirical P-value that requires large number of permutations. Despite numerous gene set analysis methods developed in the past decade, the most popular methods still suffer from serious limitations. RESULTS: We present a gene set analysis method (mGSZ) based on Gene Set Z-scoring function (GSZ) and asymptotic P-values. Asymptotic P-value calculation requires fewer permutations, and thus speeds up the gene set analysis process. We compare the GSZ-scoring function with seven popular gene set scoring functions and show that GSZ stands out as the best scoring function. In addition, we show improved performance of the GSA method when the max-mean statistics is replaced by the GSZ scoring function. We demonstrate the importance of both gene and sample permutations by showing the consequences in the absence of one or the other. A comparison of asymptotic and empirical methods of P-value estimation demonstrates a clear advantage of asymptotic P-value over empirical P-value. We show that mGSZ outperforms the state-of-the-art methods based on two different evaluations. We compared mGSZ results with permutation and rotation tests and show that rotation does not improve our asymptotic P-values. We also propose well-known asymptotic distribution models for three of the compared methods. AVAILABILITY AND IMPLEMENTATION: mGSZ is available as R package from cran.r-project.org.
MOTIVATION: Gene set analysis is the analysis of a set of genes that collectively contribute to a biological process. Most popular gene set analysis methods are based on empirical P-value that requires large number of permutations. Despite numerous gene set analysis methods developed in the past decade, the most popular methods still suffer from serious limitations. RESULTS: We present a gene set analysis method (mGSZ) based on Gene Set Z-scoring function (GSZ) and asymptotic P-values. Asymptotic P-value calculation requires fewer permutations, and thus speeds up the gene set analysis process. We compare the GSZ-scoring function with seven popular gene set scoring functions and show that GSZ stands out as the best scoring function. In addition, we show improved performance of the GSA method when the max-mean statistics is replaced by the GSZ scoring function. We demonstrate the importance of both gene and sample permutations by showing the consequences in the absence of one or the other. A comparison of asymptotic and empirical methods of P-value estimation demonstrates a clear advantage of asymptotic P-value over empirical P-value. We show that mGSZ outperforms the state-of-the-art methods based on two different evaluations. We compared mGSZ results with permutation and rotation tests and show that rotation does not improve our asymptotic P-values. We also propose well-known asymptotic distribution models for three of the compared methods. AVAILABILITY AND IMPLEMENTATION: mGSZ is available as R package from cran.r-project.org.
Authors: Marjaana Pussila; Petri Törönen; Elisabet Einarsdottir; Shintaro Katayama; Kaarel Krjutškov; Liisa Holm; Juha Kere; Päivi Peltomäki; Markus J Mäkinen; Jere Linden; Minna Nyström Journal: Carcinogenesis Date: 2018-05-28 Impact factor: 4.944
Authors: Pashupati P Mishra; Ismo Hänninen; Emma Raitoharju; Saara Marttila; Binisha H Mishra; Nina Mononen; Mika Kähönen; Mikko Hurme; Olli Raitakari; Petri Törönen; Liisa Holm; Terho Lehtimäki Journal: Biosci Rep Date: 2020-07-31 Impact factor: 3.840
Authors: Saara Marttila; Leena E Viiri; Pashupati P Mishra; Brigitte Kühnel; Pamela R Matias-Garcia; Leo-Pekka Lyytikäinen; Tiina Ceder; Nina Mononen; Wolfgang Rathmann; Juliane Winkelmann; Annette Peters; Mika Kähönen; Nina Hutri-Kähönen; Markus Juonala; Katriina Aalto-Setälä; Olli Raitakari; Terho Lehtimäki; Melanie Waldenberger; Emma Raitoharju Journal: Clin Epigenetics Date: 2021-07-22 Impact factor: 6.551