Literature DB >> 25210205

Space Time Clustering and the Permutation Moments of Quadratic Form.

Yi-Hui Zhou¹, Gregory Mayhew², Zhibin Sun³, Xiaolin Xu⁴, Fei Zou², Fred A Wright¹.

Abstract

The Mantel and Knox space-time clustering statistics are popular tools to establish transmissibility of a disease and detect outbreaks. The most commonly used null distributional approximations may provide poor fits, and researchers often resort to direct sampling from the permutation distribution. However, the exact first four moments for these statistics are available, and Pearson distributional approximations are often effective. Thus, our first goal is to clarify the literature and to make these tools more widely available. In addition, by rewriting terms in the statistics we obtain the exact first four permutation moments for the most commonly used quadratic form statistics, which need not be positive definite. The extension of this work to quadratic forms greatly expands the utility of density approximations for these problems, including for high-dimensional applications, where the statistics must be extreme in order to exceed stringent testing thresholds. We demonstrate the methods using examples from the investigation of disease transmission in cattle, the association of a gene expression pathway with breast cancer survival, regional genetic association with cystic fibrosis lung disease, and hypothesis testing for smoothed local linear regression.

Entities: Chemical Disease Gene Species

Keywords: Exact testing; Resampling; Statistical Computing

Year: 2013 PMID： 25210205 PMCID： PMC4157666 DOI： 10.1002/sta4.37

Source DB: PubMed Journal: Stat ISSN： 0038-9986

1. Introduction

(Mantel 1967) proposed an approach to detect clustering of location of events in space versus time of occurrence, by regressing a function of geographic distance on a function of distance in time. The prototypical application is to evaluate the evidence for communicable disease transmission, in contrast to sporadic occurrences that show no clustering. The approach has proven to be hugely popular, with 5200+ citations in the Science Citation Index as of 2013, with approximately 450 citations in each of recent years. Briefly, we let l and t represent the geographic location (space) and time of occurrence for the ith location–time sample, i = 1, …, n. For samples i and j, we denote measures of location and time distances as c = f(l,l), d = g(t,t), and these elements populate the matrices C and D, respectively. For a final “regression” statistic high values are evidence of location–time clustering, and the author considered the power of various choices of f and g. He also noted that the Knox statistic, which records whether two locations or time points are less distant than predefined thresholds, is a special case. In addition, the paper solved for the mean and variance of S under permutation of sample labels for the location and time points. This permutation is equivalent to simultaneous permutation of rows and columns of one of the matrices C or D. Much of Mantel's (1967) and subsequent work is concerned with finding powerful choices of f and g, but here, we assume that the statistic has been chosen, and our goal is to provide accurate testing. For numerous datasets, a normal approximation to S is inadequate, because of strong dependencies among the matrix elements. For the Knox statistic, p-values based on Poisson approximations (Knox, 1964) or a normal approximation (David & Barton, 1966) have been used. The improvements to Mantel and Knox tests for space–time interaction were proposed in several papers (Kulldorff & Hjalmars, 1999; Diggle et al., 1995; Jacquez, 1996; Baker, 1996) by not requiring as strong assumptions on the spatial and temporal scales of clustering. But in general, direct sampling from the permutation distribution had often been thought to be necessary, as enumeration of the n! outcomes is of course infeasible for most datasets. An alternative approach is to use moment-based density approximations, but the skewness and kurtosis are important for tail accuracy. (Siemiatycki 1978) provided the first four moments of S under permutation, for the most commonly encountered situation that C and D are symmetric with zero diagonals. The author described graphical patterns to aid in computing expectations of product terms, for example, in cccc, there are 23 distinct patterns of equality/inequality for the eight subscripts. In addition, moments of S were expressed in linear combinations of products of terms of varying order from C and D—the terms for the fourth moment involve nearly 150 non-zero coefficients. Although the bookkeeping is tedious, these operations reduce the complexity from a naive O(n8) to O(n3). With this reduction, density approximations become feasible for computing p-values, with reasonable accuracy even for stringent testing thresholds. The space–time clustering statistic can easily be seen to resemble a quadratic form yAy, where y is an n × 1 vector with elements y, and A is a symmetric n × n matrix with elements a. This can be seen by rewriting , which is similar to 1, with a and yy serving the roles of c and d. However, a key difference lies in the diagonals, that is, that a and are not generally zero. Quadratic forms have been used for location–time clustering (Tango, 1984), but we are not aware that a direct equivalence has been described between the Mantel statistic and a quadratic form over permutations, and for the latter, to our knowledge, only the first two exact moments have been reported (Commenges, 2003). Quadratic forms arise in a number of disciplines, including epidemiology, genomic, economics, and other areas. The computation of exact moments enables robust analysis, while avoiding the additional computational cost of direct permutation. Despite the popularity of the Knox–Mantel and related location–time clustering statistics, software has not been available to compute the four moments or subsequently obtain approximation p-values, despite a number of packages devoted to location–time surveillance (Robertson & Nelson, 2010). Similarly, quadratic forms are increasingly used, for example, in genomics problems (Tong ). However, standard results for normal quadratic forms may not apply, such as for binary disease traits. The application of quadratic forms to non-normal data is often justified by appealing to asymptotics (Wu ), but the use of exact methods may be preferred. We have developed R code to compute the first four exact moments for the location–time statistic and for centered quadratic forms and to compute approximations to the exact permutation p-values using Pearson density approximations. We believe that the software and methods are useful additions to the statistician's toolkit.

2. Methods

2.1. The location–time statistic S

For symmetric C and D (with zero diagonals), we have implemented the Siemiatycki moment computation. The permutation approach involves simultaneous permutation of rows and columns of one of the matrices (say D), which is equivalent to permutation of the location versus time observations (Mantel, 1967). We use π = 1, …, n! as a subscript to represent a permutation of the n objects, with reordered indexes π[1], …, π[n]. A random permutation is denoted as Π, and our task is to compute the first four moments of . The key computations are shown in the Appendix, expressed in matrix form to exploit linear algebra routines in R. Approximate p-values are obtained by matching the exact moments to the Pearson family of distributions using the PearsonDS package, which automatically chooses the best-fitting type within the Pearson family.

2.2. Equivalence of the quadratic form statistic S

Here, the statistic is S = yAy, for symmetric A with corresponding permutation random variable . In many useful applications, A is centered, that is, the rows and columns sum to a constant μ. Here, we will assume μ = 0, essentially without loss of generality, as non-zero μ values will offset SΠ by a constant μyy. Standard normal-theory results typically assume that A is positive definite, and the assumption is necessary for standard χ2 distributional approximations. However, relaxing this assumption would considerably increase the variety of problems for which accurate p-values can be obtained. For example, in a genomic context, (Zhou & Wright 2013) provided motivation for useful quadratic forms with eigenvalues summing to zero. (Kuonen 1999) summarized a number of previous studies of quadratic form approximations, including those that are not positive definite, and described saddlepoint approximations applicable to normally distributed y only. The moments computed by Siemiatycki were considerably simplified by assuming zero diagonals for C and D. Here, we describe a simple construction to map the quadratic form to the Mantel statistic. First, we define C = A − diag(A), that is, c = a for i ≠ j and zero otherwise. Then we define D as the matrix with entries , and by this definition, each d = 0. Our claim is that, for any π, .

Proof

By the constraint, , and therefore for any fixed π, we have , and by the same reasoning, . We have because each d = 0. Expanding the right-hand side gives for which the last two entries are zero. Thus, . As with the location–time statistic, we use Pearson family approximations to compute p-values. Because of the row/column constraint, several moment terms can be further simplified to lower order O(n2) (Appendix), which may be useful in applications for very large n.

2.3. Permutation versus normal quadratic forms

Our motivation here is to perform approximations to exact inference, and our procedures only need the exchangeability assumption on the observed y, applying equally well to discrete or continuous data. For normal quadratic forms, where the elements of y are drawn randomly iid from a normal density, the null distribution may be computed as a weighted sum of independent random variables, using the methods of (Imhof 1961) or the saddlepoint approximation of (Kuonen 1999), for example, as implemented in the survey package in R. A common technique used in genomics and other disciplines is to perform robust analysis by transforming data to be discrete-normal using rank-based inverse normal transformations. For example, if r(y) is the rank of the ith observation, the transformed value is . The use of normal scores in genetics was discussed and extensively critiqued by Beasley . An underlying theme in the application of normal scores appears to be a presumption that permutation of the scores is nearly equivalent to unconditional normal random sampling. For individual association tests, this assumption may be reasonable. For example, the permutation variance of the Pearson correlation coefficient between fixed vectors x and y is 1/(n − 1), which is identical to the variance if y is randomly drawn as iid normal. However, permutation of y inherently creates negative correlation among the sampled elements. This dependence, which is slight for individual elements of y and decreases with n, remains highly consequential for S, because there are n2 correlation terms among the elements. This effect of with-replacement sampling is especially strong if the eigenvalues of A do not contain a few dominant values (Zhou & Wright, 2013). The permutation dependency phenomenon is illustrated in four panels in Supplementary Figure 1. For each panel, a single initial m × n matrix X was generated with elements drawn iid N(0,1) and row-centered, where m = {10,1000}and n = {50,500}. Then we let A = XX and compare the distribution of the unconditional normal quadratic form with that of permutation of normal scores. The figure illustrates that the variability under permutation is markedly less than for unconditional sampling, except when n > > m. Thus, even if an investigator transforms y to normal scores, the normal quadratic form null distribution cannot be used for permutation testing, and the methods described here remain relevant.

Figure 1

Performance of the proposed approach for space–time clustering analysis of the cattle data. The left panel shows a histogram of SMantel and a q–q plot of observed approximating p-values versus expected for 106 permutations.The right panel shows the analogous results for SKnox for 106 permutations, along with density fits based on the Barton–David and Poisson approximations, as well as our proposed density fit. The inset shows the true permutation p-values for all possible outcomes, compared to that of the approximation.

2.4. Example datasets

We illustrate our methods for four published examples, and for each of the first three examples, we use two different S statistics. The statistics are the same as proposed by the original authors or are otherwise well motivated within the context of the problem. For each example and choice of statistic, the analyst need only find C and D, or y and A, as appropriate to the problem. We note that these examples are useful not only for the observed statistics and p-values but also for the adequacy of the fit for the entire permutation distribution, and thus, the examples effectively illustrate the performance of our approximation in a variety of settings.

Example 1

In White , space–time clustering was used to investigate the evidence of transmissibility of dysentery in cattle for 37 outbreaks in farms in rural New York. Both the Mantel and Knox statistics were used, which we will denote SMantel and SKnox. Following the authors’ implementation of the Mantel statistic, for f, we calculated the straight-line distance in kilometres between locations, and for g, we used the unsigned difference in days between outbreaks. The resulting matrices C and D were then used to calculate SMantel. The Knox statistic is the number of outbreak pairs that are close in space and time. Thresholds for defining closeness are required, and we used the thresholds of 5.5 km and 30 days chosen in White . In other words, c = 1 for f(l,l) < 5.5 km, and 0 otherwise. Similarly, for the Knox statistic, d is an indicator for g(t,t) less than 30 days, and c = d = 0. The resulting matrices C and D were then used to calculate SKnox (which is twice the statistic proposed by (Kulldorff and Hjalmars 1999)). Although our moment calculations are exact, for an observed statistic s, density approximations to p-values tend to be closer to the mid p-value than to the p-value P(S ≥ s). For most of the examples in this paper, the difference between the two is trivial and need not be considered. However, for this example, SKnox statistic can assume only the 25 even values 0, 2, …, 48, and so we apply a continuity correction, by using the Pearson density approximation for s − 1 instead of s.

Example 2

For pathway analysis of genetic expression data, the data are typically divided into Xpath, which represents the mpath × n matrix of expression of mpath genes belonging to a pathway, and Xcomp is the remaining mcomp × n complementary matrix of genes not in the pathway. We assume that both matrices are row centered and scaled. Expressions of genes are then compared to a clinical or experimental outcome y, either by examining the association of y to only genes within the pathway (known as self-contained testing) or by contrasting the association with genes in the pathway versus that in the complement (competitive testing). (Zhou & Wright 2013) proposed corresponding quadratic form statistics , and , for which they obtained p-values using a weighted beta density approximation. However, for that approximation, only the first two moments are exact. Scompet has eigenvalues summing to zero, and for some, datasets can have a negative skew, making χ2 density approximations ineffective. We use the breast cancer data of Miller , for which the pathway GO:0000184: “nuclear-transcribed mRNA catabolic process” (44 genes, n = 236 samples) was used in (Zhou & Wright 2013) for an example in tests of association with survival. Here, y is the vector of martingale residuals for survival time, X is gene expression data, and both have been preresidualized for p53 mutation status.

Example 3

Wright described a genome-wide association analysis for lung function among 1978 cystic fibrosis (CF) patients, identifying the interval between the genes EHF and APIP on chromosome 11 as of interest. For an interval consisting of several genetics markers, we use an approach to perform regional genetic analysis, rather than testing individual markers. The approach compares similarities in the lung function phenotype between all pairs of individuals with a correlation-based measure in regional genotypes. The result (which we call Sassoc1) is similar in spirit to a Mantel statistic, except that the individual elements represent similarity rather than distance. Specifically, we let y denote the phenotype for the jth individual, and the subsequent description is simplified by assuming y has been centered and scaled so that . We use d = yy for i ≠ j and d = 0, following suggestions that the product yy should be powerful in performing tests of phenotypic versus genotypic relatedness (Elston ). For m genetic markers in a region, with genotypes measured on the n individuals, we have an m × n genotype matrix G, which has been centered and column scaled. For i ≠ j, we use c = corr(g,g), where g is the ith column of G, and “corr” is the Pearson correlation. A closely related quadratic form statistic (Sassoc2) is the sum of squared score statistics across the markers, which is similar to Sassoc1 but with slightly different genotype scaling, and with non-zero diagonals for the corresponding matrices. We use X to denote the matrix of genotypes, which have been row centered and scaled so that and . A single score statistic for the ith marker is , and , which can be shown to be Sassoc2 = yAy, where A = XX.

Example 4

(Bowman & Azzalini 1997, pp. 86–90) described a dataset resulting from sampling aquatic life in a coral reef, with 42 observations of catch score, summarized as a log weight across numerous species, versus depth. The dataset has been used by these authors and others to demonstrate local linear regression, using a normal smoothing kernel. A standard test statistic for local linear regression can be expressed as a quadratic form, as follows. The derivation applies to the Nadaraya–Watson estimator (Nadaraya, 1964; Watson, 1964) with kernel function w, for the regression model E(Y | x) = m(x). The fitted values can obtained using a smoothing matrix M (which depends on h) such that . As shown in (Bowman & Azzalini 1997), an F-like statistic can be obtained using the ratio with U = I − 1/n − (I − M)(I − M) and V = (I − M)(I − M). The p-value is P(F > Fobs), which can be rewritten as P(y(U − FobsV)y > 0, and so we use, finally, A = (U − FobsV) in the quadratic form. It is easy to show that A is symmetric with row/column sums of zero. (Bowman & Azzalini 1997) obtained p-values using moments from a normal quadratic form and a scaled chi-square density approximation, while acknowledging that the data included some non-normal features, such as truncation. They describe permutation analysis as an alternative approach, which they did not pursue further. For the same normal quadratic form, (Kuonen 1999) reported p-values using a saddlepoint approximation. Here, we report p-values based for direct permutation and compare to results from our moment-based density approximation.

3. Results

Example 1

Figure 1 (left panel) shows a histogram of the SMantel statistic, overlaid with the normal density approximation. Although the normal approximation is based on exact moments, the presence of skew in the data creates a poor tail fit. In contrast, the approximation from our proposed method, which uses four moments and a Pearson type IV fit, is highly accurate. An observed versus expected q–q plot for p-values from our proposed procedure, applied to 106 permutations, shows that the p-values are nearly uniform. The actual data show only marginal evidence of location–time clustering, with true p = 0.0703, with the density approximation of p = 0.0699. The results are similar for SKnox (right panel). The proposed approximation (type IV) is accurate, while the two competing approximations in common use, based on (David & Barton 1966), and a Poisson approximation are observably less accurate. For the actual data, SKnox has a permutation-based p = 0.0681 and approximating p = 0.0696. Note that the tail probabilities do not degrade in accuracy, as shown by a comparison of true versus approximating p-values for the entire range of possible outcomes (inset of Figure 1, right panel). Performance of the proposed approach for space–time clustering analysis of the cattle data. The left panel shows a histogram of SMantel and a q–q plot of observed approximating p-values versus expected for 106 permutations.The right panel shows the analogous results for SKnox for 106 permutations, along with density fits based on the Barton–David and Poisson approximations, as well as our proposed density fit. The inset shows the true permutation p-values for all possible outcomes, compared to that of the approximation.

Example 2

Figure 2 shows histograms and q–q plots for Sself and Scompet for the Miller breast cancer data for pathway GO:0000184. Here, again the fits (type VI for Sself and type IV for Scompet) are accurate, with a slight conservativeness of the approximating p-values in the extreme right tail for Scompet. For the observed data and Sself, the permutation based p = 0.080 and Pearson distributional approximation p = 0.081. For Scompet, the respective values are p = 0.822 and p = 0.817.

Figure 2

Example 2. Results for Sself (left panel) and Scompet (right panel) for the Miller breast cancer data, pathway GO:0000184 ( n = 236, 44 genes in pathway).

Example 3

Figure 3 shows − log10(p) for Sassoc1 and Sassoc2 for the CF data, where each statistic is plotted for the middle single-nucleotide polymorphism (SNP) in each 21-SNP window. q–q plots, produced for the interval showing the greatest evidence in the original data rs, again support the accuracy of the approximating p-values (type VI for all windows). The most highly significant region is in the interval between EHF and APIP, which is also supported by the single-SNP analysis. However, the evidence is much stronger for Sassoc1 and Sassoc2 than for single-SNP analysis and certainly significant in a genome-wide scan accounting for ∼ 570,000 SNPs. We attribute the greater evidence from these statistics to the potential presence of multiple causal SNPs in the region, as proposed by Wright in their analyses, because the moving window can capture the combined evidence from multiple SNPs. In fact, the relative genome-wide evidence may be even stronger for the regional methods, as they tend to have higher serial correlation than for individual SNPs and thus incur a smaller multiple-testing penalty. The use of Sassoc1 and Sassoc2 in this context is very similar to using sequence kernel association test (Wu ), which is designed for regional analysis and rare-variant testing of genetic association. However, these methods were formally designed for normal or binary phenotypes, and our use of exact moments adds considerable flexibility in handling the actual phenotype distribution.

Figure 3

The left panel shows − log10 p-values for Sassoc1 and Sassoc2 for the CF dataset. Each p-value is computed for a moving window of ± 10 SNPs around the center SNP. The two q–q plots for a fixed interval show that the proposed approximating p-values are approximately uniform under 106 permutations.

Example 4

Figure 4 (left panel) plots the coral reef data, along with the smoothed local linear regression fit and confidence band from the R sm package, for a kernel smoothing bandwidth of h=5, produced by the sm package in R. The fitted values at the extremes are clearly outside the reference band for the no-effect model. A “significance trace” (middle panel) shows p-values as a function of h, and for which the permutation-based p-values (dots) and type IV approximation (line) are nearly indistinguishable. The permutation-based p-values are generally lower than those obtained from the normal quadratic form, which were obtained for these data in (Kuonen 1999). For example, for h=5, the permutation-based p = 0.058 but is 0.063 for the normal quadratic form.

Figure 4

The application of the quadratic form approximation to the test statistics for local linear regression. Left panel: fitted curve and no-effected reference band. The triangles denote the fitted values for observed depth, obtained from the smoothing matrix as My. Middle panel: significance trace showing permutation p-values (dots) and the proposed approximation (line) as a function of h. Right panel: q–q plot for approximating p-values under permutation for h = 5.

4 Discussion

For standard space–time clustering statistics, our contribution has been to provide software for the moments and approximate p-values. Quadratic forms are used in a wide variety of settings, and the use of exact permutation moments has been often overlooked as an alternative to direct permutation. One setting where these approaches may be useful is SNP association pathway analysis, where the effect of sets of SNPs is aggregated and where direct permutation has been considered cumbersome, leading to alternative resampling proposals (Schaid ). The use of our Scompet statistic, applied to genome-wide SNP association data, would enable true competitive testing for association pathway analysis. Such competitive testing had been viewed as infeasible, as a naive approach involves performing a full genome scan for each permutation. Another point of consideration is whether direct sampling from the permutation distribution might be still preferable, as it provides an unbiased estimate of the permutation p-value. In high-throughput settings, however, extreme thresholds may be required to declare significance, and here, our approximation may be especially useful. In (Zhou & Wright 2013), even the use of adaptive permutation (performing only as many permutations as necessary for high relative accuracy) was ∼ 250 times slower than the use of a moment-based analytic approximation. For the genome-scan setting of Example 3, p-values on the order of 10− 8 are necessary in order to declare significance. Moreover, the EHF/APIP region for the CF example was initially identified using screening of individual SNPs, and so the investigator might compute the quadratic form p-value only in regions identified as of potential interest, avoiding the computational burden of genome-wide scans using the quadratic form statistic.

15 in total

1. Haseman and Elston revisited.

Authors: R C Elston; S Buxbaum; K B Jacobs; J M Olson
Journal: Genet Epidemiol Date: 2000-07 Impact factor: 2.135

2. Efficient calculation of P-value and power for quadratic form statistics in multilocus association testing.

Authors: Liping Tong; Jie Yang; Richard S Cooper
Journal: Ann Hum Genet Date: 2010-05 Impact factor: 1.670

3. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.

Authors: Lance D Miller; Johanna Smeds; Joshy George; Vinsensius B Vega; Liza Vergara; Alexander Ploner; Yudi Pawitan; Per Hall; Sigrid Klaar; Edison T Liu; Jonas Bergh
Journal: Proc Natl Acad Sci U S A Date: 2005-09-02 Impact factor: 11.205

4. The detection of disease clustering in time.

Authors: T Tango
Journal: Biometrics Date: 1984-03 Impact factor: 2.571

5. The detection of disease clustering and a generalized regression approach.

Authors: N Mantel
Journal: Cancer Res Date: 1967-02 Impact factor: 12.701

6. Empirical pathway analysis, without permutation.

Authors: Yi-Hui Zhou; William T Barry; Fred A Wright
Journal: Biostatistics Date: 2013-02-20 Impact factor: 5.899

Review 7. Review of software for space-time disease surveillance.

Authors: Colin Robertson; Trisalyn A Nelson
Journal: Int J Health Geogr Date: 2010-03-12 Impact factor: 3.918

8. Space-time clustering of, and risk factors for, farmer-diagnosed winter dysentery in dairy cattle.

Authors: M E White; Y H Schukken; B Tanksley
Journal: Can Vet J Date: 1989-12 Impact factor: 1.008

9. Rank-based inverse normal transformations are increasingly used, but are they merited?

Authors: T Mark Beasley; Stephen Erickson; David B Allison
Journal: Behav Genet Date: 2009-06-14 Impact factor: 2.805

10. Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2.

Authors: Fred A Wright; Lisa J Strug; Vishal K Doshi; Clayton W Commander; Scott M Blackman; Lei Sun; Yves Berthiaume; David Cutler; Andreea Cojocaru; J Michael Collaco; Mary Corey; Ruslan Dorfman; Katrina Goddard; Deanna Green; Jack W Kent; Ethan M Lange; Seunggeun Lee; Weili Li; Jingchun Luo; Gregory M Mayhew; Kathleen M Naughton; Rhonda G Pace; Peter Paré; Johanna M Rommens; Andrew Sandford; Jaclyn R Stonebraker; Wei Sun; Chelsea Taylor; Lori L Vanscoy; Fei Zou; John Blangero; Julian Zielenski; Wanda K O'Neal; Mitchell L Drumm; Peter R Durie; Michael R Knowles; Garry R Cutting
Journal: Nat Genet Date: 2011-05-22 Impact factor: 38.330

5 in total

1. Hypothesis testing at the extremes: fast and robust association for high-throughput data.

Authors: Yi-Hui Zhou; Fred A Wright
Journal: Biostatistics Date: 2015-03-18 Impact factor: 5.899

2. Risk Characterization of Environmental Samples Using In Vitro Bioactivity and Polycyclic Aromatic Hydrocarbon Concentrations Data.

Authors: Zunwei Chen; Dillon Lloyd; Yi-Hui Zhou; Weihsueh A Chiu; Fred A Wright; Ivan Rusyn
Journal: Toxicol Sci Date: 2021-01-06 Impact factor: 4.849

3. Pathway analysis for RNA-Seq data using a score-based approach.

Authors: Yi-Hui Zhou
Journal: Biometrics Date: 2015-08-10 Impact factor: 2.571

4. Set-based differential covariance testing for genomics.

Authors: Yi-Hui Zhou
Journal: Stat (Int Stat Inst) Date: 2019-08-06

5. Potential Human Health Hazard of Post-Hurricane Harvey Sediments in Galveston Bay and Houston Ship Channel: A Case Study of Using In Vitro Bioactivity Data to Inform Risk Management Decisions.

Authors: Zunwei Chen; Suji Jang; James M Kaihatu; Yi-Hui Zhou; Fred A Wright; Weihsueh A Chiu; Ivan Rusyn
Journal: Int J Environ Res Public Health Date: 2021-12-19 Impact factor: 3.390

5 in total