Yang Shi1,2,3,4, Mengqiao Wang2, Weiping Shi5, Ji-Hyun Lee6, Huining Kang3, Hui Jiang4,7,8. 1. Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, Georgia, USA. 2. Department of Epidemiology and Biostatistics, West China School of Public Health, Sichuan University, Chengdu, Sichuan, China. 3. Biostatistics Shared Resource, University of New Mexico Comprehensive Cancer Center and Department of Internal Medicine, University of New Mexico, Albuquerque, New Mexico, USA. 4. Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA. 5. College of Mathematics, Jilin University, Changchun, Jilin, China. 6. Division of Quantitative Sciences, University of Florida Health Cancer Center and Department of Biostatistics, University of Florida, Gainesville, Florida, USA. 7. Center for Computational Medicine and Bioinformatics. 8. University of Michigan Rogel Cancer Center, University of Michigan, Ann Arbor, Michigan, USA.
Abstract
MOTIVATION: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. RESULTS: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies. AVAILABILITY AND IMPLEMENTATION: R programs for implementing the algorithm and reproducing the results are available at: https://github.com/shilab2017/MCMC-CE-codes. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. RESULTS: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies. AVAILABILITY AND IMPLEMENTATION: R programs for implementing the algorithm and reproducing the results are available at: https://github.com/shilab2017/MCMC-CE-codes. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock Journal: Nat Genet Date: 2000-05 Impact factor: 38.330
Authors: Yidong Chen; Vishnu Kamat; Edward R Dougherty; Michael L Bittner; Paul S Meltzer; Jeffery M Trent Journal: Bioinformatics Date: 2002-09 Impact factor: 6.937
Authors: William Valdar; Leah C Solberg; Dominique Gauguier; Stephanie Burnett; Paul Klenerman; William O Cookson; Martin S Taylor; J Nicholas P Rawlins; Richard Mott; Jonathan Flint Journal: Nat Genet Date: 2006-07-09 Impact factor: 38.330