Literature DB >> 20161126

How Accurate are the Extremely Small P-values Used in Genomic Research: An Evaluation of Numerical Libraries.

Sai Santosh Bangalore1, Jelai Wang, David B Allison.   

Abstract

In the fields of genomics and high dimensional biology (HDB), massive multiple testing prompts the use of extremely small significance levels. Because tail areas of statistical distributions are needed for hypothesis testing, the accuracy of these areas is important to confidently make scientific judgments. Previous work on accuracy was primarily focused on evaluating professionally written statistical software, like SAS, on the Statistical Reference Datasets (StRD) provided by National Institute of Standards and Technology (NIST) and on the accuracy of tail areas in statistical distributions. The goal of this paper is to provide guidance to investigators, who are developing their own custom scientific software built upon numerical libraries written by others. In specific, we evaluate the accuracy of small tail areas from cumulative distribution functions (CDF) of the Chi-square and t-distribution by comparing several open-source, free, or commercially licensed numerical libraries in Java, C, and R to widely accepted standards of comparison like ELV and DCDFLIB. In our evaluation, the C libraries and R functions are consistently accurate up to six significant digits. Amongst the evaluated Java libraries, Colt is most accurate. These languages and libraries are popular choices among programmers developing scientific software, so the results herein can be useful to programmers in choosing libraries for CDF accuracy.

Entities:  

Year:  2009        PMID: 20161126      PMCID: PMC2742983          DOI: 10.1016/j.csda.2008.11.028

Source DB:  PubMed          Journal:  Comput Stat Data Anal        ISSN: 0167-9473            Impact factor:   1.681


  2 in total

1.  A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity.

Authors:  Timothy M Frayling; Nicholas J Timpson; Michael N Weedon; Eleftheria Zeggini; Rachel M Freathy; Cecilia M Lindgren; John R B Perry; Katherine S Elliott; Hana Lango; Nigel W Rayner; Beverley Shields; Lorna W Harries; Jeffrey C Barrett; Sian Ellard; Christopher J Groves; Bridget Knight; Ann-Marie Patch; Andrew R Ness; Shah Ebrahim; Debbie A Lawlor; Susan M Ring; Yoav Ben-Shlomo; Marjo-Riitta Jarvelin; Ulla Sovio; Amanda J Bennett; David Melzer; Luigi Ferrucci; Ruth J F Loos; Inês Barroso; Nicholas J Wareham; Fredrik Karpe; Katharine R Owen; Lon R Cardon; Mark Walker; Graham A Hitman; Colin N A Palmer; Alex S F Doney; Andrew D Morris; George Davey Smith; Andrew T Hattersley; Mark I McCarthy
Journal:  Science       Date:  2007-04-12       Impact factor: 47.728

2.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

Authors: 
Journal:  Nature       Date:  2007-06-07       Impact factor: 49.962

  2 in total
  3 in total

1.  A Note on Comparing the Power of Test Statistics at Low Significance Levels.

Authors:  Nathan Morris; Robert Elston
Journal:  Am Stat       Date:  2011-01-01       Impact factor: 8.710

2.  Accurate and efficient estimation of small P-values with the cross-entropy method: applications in genomic data analysis.

Authors:  Yang Shi; Mengqiao Wang; Weiping Shi; Ji-Hyun Lee; Huining Kang; Hui Jiang
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

3.  An African ancestry-specific allele of CTLA4 confers protection against rheumatoid arthritis in African Americans.

Authors:  James M Kelley; Laura B Hughes; Jeffrey D Faggard; Maria I Danila; Monica H Crawford; Yuanqing Edberg; Miguel A Padilla; Hemant K Tiwari; Andrew O Westfall; Graciela S Alarcón; Doyt L Conn; Beth L Jonas; Leigh F Callahan; Edwin A Smith; Richard D Brasington; David B Allison; Robert P Kimberly; Larry W Moreland; Jeffrey C Edberg; S Louis Bridges
Journal:  PLoS Genet       Date:  2009-03-20       Impact factor: 5.917

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.