| Literature DB >> 25210205 |
Yi-Hui Zhou1, Gregory Mayhew2, Zhibin Sun3, Xiaolin Xu4, Fei Zou2, Fred A Wright1.
Abstract
The Mantel and Knox space-time clustering statistics are popular tools to establish transmissibility of a disease and detect outbreaks. The most commonly used null distributional approximations may provide poor fits, and researchers often resort to direct sampling from the permutation distribution. However, the exact first four moments for these statistics are available, and Pearson distributional approximations are often effective. Thus, our first goal is to clarify the literature and to make these tools more widely available. In addition, by rewriting terms in the statistics we obtain the exact first four permutation moments for the most commonly used quadratic form statistics, which need not be positive definite. The extension of this work to quadratic forms greatly expands the utility of density approximations for these problems, including for high-dimensional applications, where the statistics must be extreme in order to exceed stringent testing thresholds. We demonstrate the methods using examples from the investigation of disease transmission in cattle, the association of a gene expression pathway with breast cancer survival, regional genetic association with cystic fibrosis lung disease, and hypothesis testing for smoothed local linear regression.Entities:
Keywords: Exact testing; Resampling; Statistical Computing
Year: 2013 PMID: 25210205 PMCID: PMC4157666 DOI: 10.1002/sta4.37
Source DB: PubMed Journal: Stat ISSN: 0038-9986
Figure 1Performance of the proposed approach for space–time clustering analysis of the cattle data. The left panel shows a histogram of SMantel and a q–q plot of observed approximating p-values versus expected for 106 permutations.The right panel shows the analogous results for SKnox for 106 permutations, along with density fits based on the Barton–David and Poisson approximations, as well as our proposed density fit. The inset shows the true permutation p-values for all possible outcomes, compared to that of the approximation.
Figure 2Example 2. Results for Sself (left panel) and Scompet (right panel) for the Miller breast cancer data, pathway GO:0000184 ( n = 236, 44 genes in pathway).
Figure 3The left panel shows − log10 p-values for Sassoc1 and Sassoc2 for the CF dataset. Each p-value is computed for a moving window of ± 10 SNPs around the center SNP. The two q–q plots for a fixed interval show that the proposed approximating p-values are approximately uniform under 106 permutations.
Figure 4The application of the quadratic form approximation to the test statistics for local linear regression. Left panel: fitted curve and no-effected reference band. The triangles denote the fitted values for observed depth, obtained from the smoothing matrix as My. Middle panel: significance trace showing permutation p-values (dots) and the proposed approximation (line) as a function of h. Right panel: q–q plot for approximating p-values under permutation for h = 5.