Literature DB >> 30521030

Accurate and efficient estimation of small P-values with the cross-entropy method: applications in genomic data analysis.

Yang Shi1,2,3,4, Mengqiao Wang2, Weiping Shi5, Ji-Hyun Lee6, Huining Kang3, Hui Jiang4,7,8.   

Abstract

MOTIVATION: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques.
RESULTS: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.
AVAILABILITY AND IMPLEMENTATION: R programs for implementing the algorithm and reproducing the results are available at: https://github.com/shilab2017/MCMC-CE-codes. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2019        PMID: 30521030      PMCID: PMC6612894          DOI: 10.1093/bioinformatics/bty1005

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  17 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  A global test for groups of genes: testing association with a clinical outcome.

Authors:  Jelle J Goeman; Sara A van de Geer; Floor de Kort; Hans C van Houwelingen
Journal:  Bioinformatics       Date:  2004-01-01       Impact factor: 6.937

3.  Ratio statistics of gene expression levels and applications to microarray data analysis.

Authors:  Yidong Chen; Vishnu Kamat; Edward R Dougherty; Michael L Bittner; Paul S Meltzer; Jeffery M Trent
Journal:  Bioinformatics       Date:  2002-09       Impact factor: 6.937

4.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies.

Authors:  D Y Lin
Journal:  Bioinformatics       Date:  2004-09-28       Impact factor: 6.937

5.  Genome-wide genetic association of complex traits in heterogeneous stock mice.

Authors:  William Valdar; Leah C Solberg; Dominique Gauguier; Stephanie Burnett; Paul Klenerman; William O Cookson; Martin S Taylor; J Nicholas P Rawlins; Richard Mott; Jonathan Flint
Journal:  Nat Genet       Date:  2006-07-09       Impact factor: 38.330

6.  How Accurate are the Extremely Small P-values Used in Genomic Research: An Evaluation of Numerical Libraries.

Authors:  Sai Santosh Bangalore; Jelai Wang; David B Allison
Journal:  Comput Stat Data Anal       Date:  2009-05-15       Impact factor: 1.681

7.  Recurrent Fusions in MYB and MYBL1 Define a Common, Transcription Factor-Driven Oncogenic Pathway in Salivary Gland Adenoid Cystic Carcinoma.

Authors:  Kathryn J Brayer; Candace A Frerich; Huining Kang; Scott A Ness
Journal:  Cancer Discov       Date:  2015-12-02       Impact factor: 39.397

8.  Markov Chain Monte Carlo from Lagrangian Dynamics.

Authors:  Shiwei Lan; Vasileios Stathopoulos; Babak Shahbaba; Mark Girolami
Journal:  J Comput Graph Stat       Date:  2015-04-01       Impact factor: 2.302

9.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

10.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

Authors: 
Journal:  Nature       Date:  2007-06-07       Impact factor: 49.962

View more
  2 in total

1.  Cell-specific gene association network construction from single-cell RNA sequence.

Authors:  Riasat Azim; Shulin Wang
Journal:  Cell Cycle       Date:  2021-09-16       Impact factor: 5.173

2.  Speeding up Monte Carlo simulations for the adaptive sum of powered score test with importance sampling.

Authors:  Yangqing Deng; Yinqiu He; Gongjun Xu; Wei Pan
Journal:  Biometrics       Date:  2020-12-11       Impact factor: 1.701

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.