Literature DB >> 36035606

Modeling nematode population dynamics using a multivariate poisson model with spike and slab variable selection.

Gill Giese¹, Dayna P Saldaña Zepeda², Jacquelin Beacham³, Ciro Velasco Cruz⁴.

Abstract

Model-based learning of organism dynamics is challenging, particularly when modeling count correlated data. In this paper, we adapt the multivariate Poisson distribution to model nematode dynamics. This distribution relaxes the mean-equal-variance property of the univariate Poisson distribution and allows recovery of the correlation among nematode genera. An observational dataset with 68 soil samples, 11 nematode genera, and 12 soil parameters is analyzed. The Spike and Slab Variable Selection procedure is adapted to obtain parsimonious models for the nematode occurrence. Nematode genus to genus interaction is assessed through the correlation matrix of the model. A simulation study validated the model's implementation. As a result, the model determined the most important covariates for each nematode and classified pairs of nematodes as: sympathetic, antagonistic or neutral, based on their estimated correlations. The model is useful for researchers and practitioners interested in studying population dynamics. In particular, the current results are important inputs when planning strategies for improving or managing soil health regarding nematodes.

Entities: Chemical

Keywords: NUTS algorithm; Nematode correlation; multivariate Poisson lognormal; organism–habitat relationship; parsimonious models

Year: 2021 PMID： 36035606 PMCID： PMC9415552 DOI： 10.1080/02664763.2021.1935800

Source DB: PubMed Journal: J Appl Stat ISSN： 0266-4763 Impact factor: 1.416

Introduction

The Aitchison and Ho [2]'s multivariate Poisson Lognormal is a flexible and robust model for multivariate count data. Modeling count correlated data, although numerically challenging, can help to understand and learn about population dynamics of organisms such as soil inhabiting nematodes. Two types of relationships intrigue nematologists: the relationship between organisms, and the relationship between the organisms and their environment. The first type enables scientists to learn about the organisms' interaction, via the correlation between nematodes; and the second helps to describe the organism–habitat interaction and can be studied by the mean function of the model. With the multivariate Poisson Lognormal, one can simultaneously model both relationships to develop a more comprehensive understanding of the organisms' behavior and to guide research and management decisions. Nematodes are microscopic, multicellular, non-segmented, roundworms that are common to most soils and rarely occur as a single-species community [16]. Some feed on plant roots (non-beneficial, plant-parasitic, herbivorous), while others feed on bacteria and fungi (beneficial, bacterivorous/fungivorous). Bacterivore and fungivore nematodes benefit the soil ecosystem via their role in the decomposition of organic matter. They feed on bacteria and fungi, and subsequently release minerals and nutrients bound in bacterial and fungal bodies, back to the soil. Plant parasitic nematodes cause ∼$100 billion damage yearly to agriculture worldwide. Nematodes are economic pests of many crops, including Vitis vinifera grape vineyards [6,10,11]. Premature vineyard decline, reduced vine vigor, and increased fungal infection and virus transmission as well as yield losses (>60%) have all been attributed to plant parasitic nematode infestations [23]. Due to the substantial and long-term investment necessary for grape production, researchers and practitioners seek to understand the abiotic and biotic nature of soil microbial ecosystems and possible relationships between different nematode genera within the soil. Nematodes have been a recurrent topic of interest in agricultural research. Murray et al. [17] and Zbylut et al. [24] analyzed the abundance of nematodes assuming a number of models; the Hurdle model, Generalized Poisson, Negative Binomial, or Zero Inflated models [1]. However, the abundance metric is a univariate variable that prevents studying genus to genus interaction together with the genus to environment relationship. In order to learn about nematode dynamics, we implement the multivariate Poisson model as an appropriate alternative to more comprehensively study organism dynamics. This generalizes the approaches based on univariate variables in other studies. Parsimonious models are preferred over complex ones to explain nematode–environment relationships. Indeed, within the regression analysis context, this could be considered a universal rule. George and McCulloch [5] pointed out that parsimonious models simplify interpretations, inferences and conclusions. However, when several covariates are available, a parsimonious model is not straightforward, making it necessary to use a variable selection technique to objectively identify the most important covariates. In this paper, our proposed model combines the multivariate Poisson distribution and the Spike and Slab variable selection [13] technique to select the most important soil-related covariates that explain variations of the mean number of nematodes. Additionally, we use the Huang and Wand [12] prior distribution to model the covariance matrix; the component that explains the interaction between nematode genera. The dataset is from an observational study with soil samples collected at 25 different New Mexico vineyards. Each sample included occurrence and frequency of different nematodes present, identified to the genus level. Soil parameters, such as pH, and amounts of various mineral nutrients were also measured. This paper is organized as follows, a brief description of the dataset, and a detailed introduction of the model are given in Section 2; that includes the likelihood and prior distributions. Section 3 includes details and results of a simple simulation study designed to validate the model's implementation. The analysis and results of the nematode dataset is presented in Section 4, and conclusions are given in Section 5.

Materials and methods

Data

To study the interrelationship among different species of nematodes, soil samples were collected from 25 New Mexico vineyards during 2018. A composite sample consisting of ten 3.2 cm diameter ×45 cm deep soil cores was collected from within 40 cm of 10 individual vine trunks within each of three randomly selected zones per vineyard. Nematodes were extracted from a 100 cm subsample of soil from each vineyard zone by elutriation, wet sieving and centrifugal flotation [4,14]. A 1 mL aliquot of each 10 mL soil extract was placed on a chambered counting slide and examined on an inverted Olympus IX51 compound microscope. Observed nematodes were counted and categorized into trophic groups as bacterivores, fungivores, omnivores, carnivores or herbivores. Herbivorous nematodes were further classified to genus. The nematode quantification revealed 11 different nematode genera present in the soil samples; 7 were Herbivorous or Non-Beneficial, and 4 Beneficial. As mentioned above, the Non-Beneficial were classified to genera level, while the Beneficial to trophic group. However, for all purposes in this study, the nematodes listed in Table 1 are hereafter named only by genera. Not all genera were found in every soil sample; however, in every sample, at least two genera were present. For example, in the first soil sample only genera Helicotylenchus, Tylenchus and Omnivores Dorylamoid were found. This led us to hypothesize that a relationship among the nematodes exists that may explain why some genera coexist more often than others.

Table 1.

List of nematode genera in the composite soil samples (B: Beneficial, H: Herbivorous or non-beneficial) collected from New Mexico vineyards, 2018.

Type	Nematode	Type	Nematode
H	Pratylenchus	H	Paratylenchus
H	Tylenchorhynchus	B	Tylenchus
H	Meloidogyne	B	Bacterivores
H	Mesocriconema	B	Fungivores
H	Helicotylenchus	B	Omnivores Dorylamoid
H	Xiphe americanum

List of nematode genera in the composite soil samples (B: Beneficial, H: Herbivorous or non-beneficial) collected from New Mexico vineyards, 2018. The dataset includes 12 soil parameters (Table 2) that provide a partial description of the environment occupied by the nematodes. These soil parameters might influence the presence of some types of nematodes. For example, soil pH might impact a given nematode genus's presence. Therefore, the proposed model may provide new insight about the nematode–environment interaction, explain some of the interaction among nematodes and help understand the organismal dynamics.

Table 2.

List of soil parameters analyzed from New Mexico soil samples, 2018.

Soil parameters
Nitrate	Sulfate	Zinc	Manganese
Iron	Copper	Boron	Chloride
pH	SAR	Moisture	Saturation

List of soil parameters analyzed from New Mexico soil samples, 2018.

Model

The Poisson distribution for the ith soil sample is where is the number of nematodes of one kind or genus, and is the mean parameter of the distribution. It is well known that the Poisson distribution's mean equals its variance, a strong condition that in practice is not commonly met. In order to relax this condition, and make the Poisson an appropriate model to a wide variety of data, some modifications are typically applied. For example, one could include a mixing parameter, so that marginally follows a Negative Binomial distribution. In this paper, we assume a multivariate Poisson distribution that, as shown in Section 2.3, relaxes the mean equal variance condition and, at the same time, helps to model the correlations between pairs of nematodes as explained in the next section.

Modeling interrelationship among organisms

In order to learn about the relationship of nematodes with themselves and their surroundings, we adapt the Poisson Lognormal model proposed by [2] that generalizes (1) into a multivariate distribution to model an m dimensional response vector. Let be the response vector for the ith sample. In the [2] model, the corresponding m dimensional vector of parameters is assumed random with a Log-normal distribution. Equivalently, we define a random vector so that The elements of the response vector are conditionally independent given , as follows and where Σ is the variance–covariance matrix of size . Both parameters, the intercept (α) and , define the mean function of the distribution, as in a regular regression analysis. The definition of is application dependent, typically , where is the design matrix with transformed columns, and is a vector of regression coefficients of size q. More detailed information on how is constructed for this analysis, is given in the Appendix. Thus, the likelihood function or sampling distribution of a random sample of size n, provided that Cov , is as follows where is given in (2).

Marginal moments

The model in (4) is appropriate to model Poisson correlated data, and to an extent, it is straight forward mathematically, allowing the derivations of marginal moments of , analytically. For instance, the basic marginal moments such as mean, variance, covariance, and correlation, have the following expressions. where is the element of Σ; and . The marginal model derived from (2) makes the variance and mean different. In fact, proves that this model relaxes the constraint of the univariate Poisson distribution, as it is often desired. Notice that because , the range of possible correlation values is not as wide as that of the corresponding Normal distribution [2]. The equations in (5) allow the computation of marginal moments. However, unlike the normal distribution case, the variance and covariance are sample specific because both depend on , and are not equal for all i. Thus, the multiple values of variance and covariance lead to a cumbersome marginal analysis. Instead, it is more convenient to work with and Σ for inferences.

Simpler models

We evaluate two types of simpler models. In both types, the nematode correlation is relaxed; however, one follows the univariate version of the [2] model, while the others are traditional models for count data.

Independent model

A simplified model of interest results if Σ, the covariance matrix, is set equal to the identity matrix. This simplification implies that the nematodes do not interact with each other, an assumption that contradicts our intuition about the social behavior of nematode populations. However, if proved more appropriate for the data, this model would greatly simplify interpretation. Additionally, this model provides an opportunity to evaluate the significance of the effect of the nematode's interaction, when compared with the model given in (3). In particular, the simplification implies that The mean structure in (6) is preserved as in (3), but the covariance structure changes, as it is set equal to the identity matrix, . The superscripts in (6) are labels that denote ‘no nematode interaction’.

Univariate models

It is of interest to evaluate the ability of three univariate models to model the nematode dataset. In essence, they are similar to the independent model above, however, their sampling distributions, , are Poisson, Generalized Poisson, and Negative Binomial. The first two are defined as in [7], while the latter is defined in [8]. Given their definitions, is a parameter of parameters that completely specify the distributions. For instance, for the Poisson distribution, for the Negative Binomial, and for the Generalized Poisson. In practice, a large number of covariates are often available. Models with several covariates although useful, are not easy to use for inference or interpretation. Practitioners are typically more interested in parsimonious models, with only the most important covariates. To determine the subset of most important soil parameters to explain the mean number of nematodes, we adapt the Spike and Slab variable selection procedure [13].

Variable selection

It is of interest to have a simple mean model for each nematode genus (i.e. a parsimonious mean function, , for ). Such a model will facilitate inference, interpretation and understanding of the relationship between nematodes and soil parameters. We adapt the Spike and Slab (SS) variable selection procedure of [13] to find the subset of the most important covariates to be in . The SS procedure assumes that the kth regression coefficient follows a normal distribution conditioned on a binary variable, , that drives the significance of the coefficient through a mixture distribution, as follows. The parameter , and is a user-defined small quantity that serves as a reference value for the variance of the non-significant regression coefficients; is the variance parameter. Two prior distributions for are in play according to (7). The two distributions in the mixture have the same mean but different variances. When is important to explain the response variable, then with high probability and ; depending on the magnitude of , the profile of this density could resemble a slab if is large. However, when is not important, and the prior distribution for collapses at its mean value, resembling a spike at 0. As a closely related topic to the variable selection step, we want to mention that the following simple transformation of the columns of the design matrix, , is numerically convenient. Let be the transformed design matrix, its elements are , for , where . Furthermore, given that is a function of , the posterior distributions for the regression coefficients are for the transformed covariates. Thus, the following transformation, recovers the regression coefficients for the untransformed covariates, where is the regression coefficient for , and is for . To learn about unknown parameters involved in the multivariate Poisson and the SS models presented above, prior distributions are needed. In the next section, we give the priors for these parameters, together with the numerical algorithms needed to estimate their posterior distributions.

Prior distributions definition and algorithms for posterior distributions estimation

Learning about the unknown parameters in equations (4) and (3) requires prior distributions and numerical algorithms such as MCMC and Gibbs sampling to estimate their posterior distributions. Given the prior for in (3), its full conditional distribution results analytically intractable, which entails the use of the NUTS algorithm to efficiently simulate samples from 's full conditional distribution. The Gibbs sampling algorithm is used to learn about Σ, whose prior and full conditional distributions are given in [12]. The hyper-parameters of the prior and hyper-prior distributions involved in the full conditional for Σ are defined as follows: an inverse Wishart is included as an intermediate distribution, with the degrees of freedom parameter set equal to 2, and a varying scale matrix. The main diagonal elements of the scale matrix follow independent Inverse Gamma, other intermediate distributions, with rate parameter equal to the square of 1e6. Huang and Wand [12] show that marginally, the standard deviations of Σ follow Haft distributions, and that the correlations follow Uniform distributions in the interval . See Huang and Wand [12] for more detailed information about the prior distribution for Σ. The Gibbs sampling is also used to learn about α and . The prior distribution for α is non-informative, of the form , and its full conditional is given in the Appendix. The distribution for is the Spike and Slab prior given in (7), with a Uniform(0,1) prior distribution for ξ, a Gamma(shape = 5, rate = 50) prior distribution for ( ), and as for , it is set equal to 5e-15.

Simulation study

To evaluate the implementation of the model, a simulation study is carried out under the following conditions. A dataset is simulated using Equations (2) and (3). The simulated sample size is n = 80, m = 4, and Σ is given in (8). For each of the m = 4 simulated genera, the vector of regression coefficients, of size q = 13, is equal to . Components of the data matrix, , are simulated from a , .

Sensitivity study

The effect of sample size on the parameter estimations is evaluated in a sensitivity study. Five sample sizes (n) are evaluated . The mean square error (MSE) is computed for the regression coefficients and the elements of the correlation matrix as the summarizing statistic. The MSE of the regression coefficients is defined as where is the median of the samples generated from the parameter's posterior distribution. In order to account for variability in the simulation process, we repeat the computation of the R times. The average MSE computed as , and the empirical 68% interval where are reported. Similar summaries are computed for the correlation matrix.

Simulation results

The algorithm used for model fit was coded in Armadillo C++ [21] and RStudio [19]. The following summarizations are based on the last 200,000 samples taken every 10th of the 400,000 iterations. Convergence of the chains was graphically verified based on trace plots, which are the most common convergence diagnostic method, according to [20]. However, for brevity, we are not showing such graphics (available upon request). Table 3 shows the 95% Credible Intervals (CI) for the unique elements of the estimated Σ. Every unique element of Σ is within its CI. Also, the maximum at posteriori (MaP) of each entry is close to its corresponding element in Σ (data shown as the last column in the table).

Table 3.

95% Credible intervals and maximum at posteriori for the distinct entries of the correlation matrix, Σ of the simulation study.

	Entry	2.5%	MaP	97.5%	Σ(i,j)
1	(1,2)	−0.8286	−0.6233	−0.3073	−0.5674
2	(1,3)	0.0407	0.4195	0.6952	0.1385
3	(1,4)	−0.5919	−0.2617	0.1222	−0.1829
4	(2,3)	−0.0174	0.3501	0.6379	0.3629
5	(2,4)	−0.483	−0.1493	0.2223	−0.0669
6	(3,4)	0.3002	0.643	0.8525	0.7986

Note: All the entries are within their CIs.

95% Credible intervals and maximum at posteriori for the distinct entries of the correlation matrix, Σ of the simulation study. Note: All the entries are within their CIs. The biplot [3] graphic displays multivariate relationships in two dimensions. Figure 1 presents the biplot of the MaP values of the correlation matrix. This graphic shows items ‘Nema4’ and ‘Nema3’ as positively correlated (their arrows are pointing in the same direction), while items ‘Nema1’ and ‘Nema2’ are negatively correlated (their arrows point in opposite directions). Therefore, visual understanding of the items's relationships can be effectively and easily assessed using biplots, and can prompt further research questions; for instance, one can focus attention to study ‘Nema4’ and “Nema3”, or ‘Nema1’ and ‘Nema2’ in more detail to explore biological reasons for those correlation values.

Figure 1.

Biplot representation of the MaPs of the estimated correlation matrix Σ.

Biplot representation of the MaPs of the estimated correlation matrix Σ. Estimates of the regression coefficients are shown in Figure 2. The asterisks indicate the actual values used to simulate the data. The vertical lines are the 95% Credible intervals (CI), and the dots are the MaPs (maximum at posteriori). In red (visible only in the electronic version) are the CIs that do not touch the reference line at zero. The CIs of Var7 of Nema2, Var2 of Nema3, and Var2 of Nema4 missed to include their corresponding simulated values by a small quantity.

Figure 2.

95% Credible intervals (CIs) for the simulated regression coefficients, . The asterisks represent the simulated values. Filled dots are the maximum at posteriori.

Sensitivity study results

Figure 3 shows the MSEs of the regression coefficients (Reg Coeff) and of the elements of the correlation matrix (Correl Matrix) for different sample sizes, n. For each n, the average and the interval of the MSE are displayed in the figure, based on R = 5. The MSE of the regression coefficients decreases as n increases, while that of the correlation matrix remains fairly constant. The minima MSEs are reached when n = 100. Although they tend to decrease as n increases, the regression coefficients MSE values for sample sizes 70, 80 and 90 are not so different from each other.

Figure 3.

MSE average and its empirical 68% interval, for different sample sizes, of the regression coefficients and correlation matrix.

MSE average and its empirical 68% interval, for different sample sizes, of the regression coefficients and correlation matrix. The results from the simulation study validates the model and its implementation. In the following section, we present the results of the nematode data analysis.

Nematode data analysis results

The nematode dataset as described in Section 2.1 has 12 soil parameters as covariates, listed in Table 2. The response variable is the number of nematodes found in the soil samples. Any nematode genus not found in a particular soil sample, was set equal to zero. Eleven genera of nematodes were found in the samples. To jointly model the nematodes, we use the Multivariate Poisson model given in (4). Nematode genera occurrence tends to be discriminate and possesses a degree of exclusivity, i.e. certain genera are likely to occur together, while others are seldom found together simultaneously. One of this study's objectives is to investigate the nematode genus to genus relationships. In order to achieve this objective, we estimate the correlations via a simple transformation of Σ in (3). With the correlations, nematodes classification takes place as follows: if two genera of nematodes have a positive correlation then they are described as sympathetic, if their correlation is negative, they are described as antagonistic, and if their correlation is around zero, they are described as neutral. Another objective of this study is to describe the nematode–habitat relationship, that can be achieved by selecting the subset of covariates in the mean function of the model, that better explains the mean number of nematodes. Thus, the variable selection procedure presented in Section 2.5 is implemented for the nematode dataset. The results of this objective are particularly important because learning about the subset of soil parameters that explains nematode presence can help to develop and implement strategic plans or practices to increase/decrease nematode populations in order to provide better growing conditions for a particular crop or improve soil health in general. The response vector that represents the actual counts of the eleven different genera of nematodes found in the ith soil sample, is denoted by , and their genera are listed in Table 1.

Model comparison

In order to compare models, we computed the Deviance Information Criterion [22] as the statistic that summarizes model fit. This criterion assists in selecting one model as best if its DIC is smallest among evaluated models. We are particularly interested in evaluating two models: one that includes the correlation of nematodes (Full model, (3)), and one that assumes that the nematodes do not interact with each other (Independent model, (6)). However, we are also interested in comparing the three univariate models, typically used to model count data: Poisson, Generalized Poisson, and Negative Binomial. Based on the DIC shown in Table 4, the worst possible model is the Poisson, while the best is the Full model, though slightly more complex than the Independent model (since it has a greater pD). Thus, our proposed model given in (4) better explains the nematode dataset. This result supports the assumption that the nematodes interact with each other, and modeling that interaction results in a much more comprehensive and richer model. Incidentally, within the evaluated class of univariate models, the Generalized Poisson is the best for the nematode data.

Table 4.

Deviance Information Criterion (DIC) and model complexity (pD) of four models.

	DIC	pD
Full model	2143.6	398.8
Independent model	2391.9	394.8
Poisson	12763.7	775.0
Generalized Poisson	3082.1	153.7
Negative Binomial	3129.0	205.7

Notes: The Full model includes the nematodes interaction. The Independent model assumes no-interaction of nematodes. The Poisson, Negative Binomial and Generalized Poisson are traditional models for count data.

Deviance Information Criterion (DIC) and model complexity (pD) of four models. Notes: The Full model includes the nematodes interaction. The Independent model assumes no-interaction of nematodes. The Poisson, Negative Binomial and Generalized Poisson are traditional models for count data. In the following sections, we present summary results of the analysis of the nematode data under the Full model.

Correlation between pairs of nematodes

Figure 4 shows the MaPs of the posterior distributions of the elements of the correlation matrix (derived from Σ), and its 90% credible interval. There is a large negative correlation worth highlighting. The correlation between B_Omnivores_Dorylamoid and B_Bacterivores genera is , which is the largest negative value among all correlation values. This number suggests that these two organisms are somewhat antagonistic, even though they are both ‘Beneficial’. On the other hand, the correlation between B_Tylenchus and H_Tylenchorhynchus is 0.64, and between B_Tylenchus and B_Bacterivores is 0.71, both values suggest sympathy. There are other examples of antagonistic and sympathetic correlations evident in the figure. However, the association can be better observed in Figure 5, that shows the biplot of the point estimate (MaPs) of the correlation matrix. It suggests that B_Bacterivores and B_Tylenchus are ‘sympathetic’, because their arrows point in the same direction. Likewise, H_Paratylenchus and B_Omnivores_Dorylamoid are somewhat sympathetic. However, these two pairs have arrows pointing to opposite directions, thus we categorize them as ‘antagonistic’. Finally, H_Mesocriconema, H_Helicotylenchus and H_Pratylenchus are examples of ‘neutral’ nematode, due to their relatively short arrows in the plot.

Figure 4.

Nematode correlation estimates. The number at the top in each cell is the Maximum at Posteriori, and at the bottom, the 90% credible intervals (in parentheses).

Figure 5.

Biplot representation of the MaPs of the estimated correlation matrix, of the nematode dataset.

Nematode correlation estimates. The number at the top in each cell is the Maximum at Posteriori, and at the bottom, the 90% credible intervals (in parentheses). Biplot representation of the MaPs of the estimated correlation matrix, of the nematode dataset. In Figure 6, the regression coefficient point estimates (MaPs) and CIs are plotted with a subplot for each genus. The x-axis lists all the covariates given in Table 2, and the y-axis has the estimates of the regression coefficients and their 90% credible intervals. The vertical segments in red (visible only in the electronic version) indicate that they do not intersect the reference horizontal line at 0, interpreted as the covariates corresponding to those regression coefficients have a significant effect on the mean count. For instance, for H_Pratylenchus genus, ‘SAR’ (negative effect) is the single most important covariate to explain the presence of this nematode. For the B_Tylenchus nematode, ‘manganese’ (positive effect), ‘pH’ (negative effect), and ‘SAR’ (negative effect) are the important covariates to explain the mean count of this genus. Meanwhile, B_Omnivores_Dorylamoid's important covariates are ‘manganese’ (positive effect), ‘chloride’ (small positive effect), ‘pH’ (positive effect), and ‘moisture’ (negative effect). Similar assessments can be derived for other genera, based on the figure.

Figure 6.

The effect size of the covariates by genus. The maximum at posteriori (filled dots) and 90% credible interval (vertical bars). In red are the CIs that do not include zero.

The effect size of the covariates by genus. The maximum at posteriori (filled dots) and 90% credible interval (vertical bars). In red are the CIs that do not include zero. It is also worth noticing that there are some covariates that on average are far from being zero; for instance, the MaP value of pH in the H_Mesocriconema genus shows a negative effect on the mean number of nematodes, however its large variability makes its CI intersect the reference line. A closer study of this covariate, and others with a similar pattern, may reveal more information about their effects.

Model goodness of fit

A graphical depiction of the model's goodness of fit is given in Figure 7. Each panel shows the fitted mean count versus the observed count values, and the 90% credible interval of the fit, by genus. It is clear that the width of the CI increases as the observed values increase, an expected pattern because (a) the dataset does not contain many large values, and (b) as the values increase in magnitude, so do their variability. For instance, with respect to point (a), genus H_Paratylenchus has only one observation greater than ninety. Thus, from the information presented in the figure, we are confident that the model is adequately explaining the variability of the data.

Figure 7.

Fitted mean counts versus observed counts of nematodes by genus (filled dots), and the 90% credible intervals (solid lines).

Fitted mean counts versus observed counts of nematodes by genus (filled dots), and the 90% credible intervals (solid lines). To quantitatively assess goodness of fit of the model, we computed p-values of the Chi-squared goodness of fit test and the paired t-test by nematode genus. Both tests use the observed counts and the MaPs of the fitted mean counts as inputs, hereafter referred as variables. The Chi-squared test computes the total distance between these two variables to evaluate the null hypothesis that states: no difference between the two variables. While, the paired t-test computes the distance between the averages of these two variables, to evaluate the null hypothesis that states: the variables mean difference is zero. Table 5 shows the p-values that result from the tests. The Chi-squared p-values are all equal to one. The p-values of the paired t-test are all greater than 5%. Thus, the conclusion is: do not to reject the null hypotheses, which means that the model is explaining the data adequately.

Table 5.

Estimated p-values of the goodness of fit test, based on the Chi-squared and paired t tests, by genus.

	p-values
Genus	Chi-squared	Paired t
H_Paratylenchus	1.00	0.95
B_Tylenchus	1.00	0.67
H_Mesocriconema	1.00	0.92
H_Pratylenchus	1.00	0.89
H_Helicotylenchus	1.00	0.87
H_Meloidogyne	1.00	0.94
H_Tylenchorhynchus	1.00	0.82
B_Bacterivores	1.00	0.79
H_Xiphe_americanum	1.00	0.52
B_Omnivores_Dorylamoid	1.00	0.54
B_Fungivores	1.00	0.44

Estimated p-values of the goodness of fit test, based on the Chi-squared and paired t tests, by genus.

Concluding remarks

The model that combines the multivariate Poisson model and the Spike-Slab variable selection technique was able to identify the set of the most important soil parameters that explain the mean count of nematodes and to estimate nematode correlations of eleven nematode genera. Typically, the correlation between nematodes is not modeled when analyzing the abundance of nematodes in soil samples. However, we found in Section 4.1 that the univariate analyses were outperformed by our proposed model, an approach that specifically models the correlation of nematodes. These results validate our hypothesis that nematodes are social organisms, and that when this interaction is modeled, it improves model fit. The estimated nematode correlations are useful for further inferences. For instance, one can classify nematodes as antagonistic (those with negative correlation), neutral (those with correlation around zero), or sympathetic (those with positive correlation). Such information could aid researchers in testing, planning and/or applying some management strategies to control herbivorous, destructive nematodes; or to improve soil health in general. The SS is a flexible and efficient variable selection technique when applied to the analysis of the multivariate Poisson data. The SS method selected the most important soil parameters that explain the mean count of nematodes, specific to each genus. This piece of information, together with the correlations of nematodes, can be of broad and practical importance as a design component of strategic plans to regenerate the health of long exploited soils. The simulation study shows that the CIs of the elements of the correlation matrix are rather wide (see Table 3). In future work, we will evaluate other possible prior distributions for learning about the correlation matrix in order to narrow the width of the CIs. We chose the prior distribution of Huang and Wand [12] because of its simplicity and robustness. However, the procedure described by Lewandowski et al. [15] is another interesting option. Finally, given its generality, we want to adapt the proposed model to analyze experimental data. In particular, from an ongoing experiment designed to study nematode population growth under different conditions. The adaptation is going to be able to incorporate the effect of time on the nematode populations while accounting for nematode genera correlations.

4 in total

1. Two semi-automatic elutriators for extracting nematodes and certain fungi from soil.

Authors: D W Byrd; K R Barker; H Ferris; C J Nusbaum; W E Griffin; R H Small; C A Stone
Journal: J Nematol Date: 1976-07 Impact factor: 1.402

2. Impact of Grapevine (Vitis vinifera) Varieties on Reproduction of the Northern Root-Knot Nematode (Meloidogyne hapla).

Authors: Amanda D Howland; Patricia A Skinkis; John H Wilson; Ekaterini Riga; John N Pinkerton; R Paul Schreiner; Inga A Zasada
Journal: J Nematol Date: 2015-06 Impact factor: 1.402

3. Spatial Distribution of Plant-Parasitic Nematodes in Semi-Arid Vitis vinifera Vineyards in Washington.

Authors: Amanda D Howland; R Paul Schreiner; Inga A Zasada
Journal: J Nematol Date: 2014-12 Impact factor: 1.402

4. Plant-Parasitic Nematodes Infecting Grapevine in Southern Spain and Susceptible Reaction to Root-Knot Nematodes of Rootstocks Reported as Moderately Resistant.

Authors: Daniel Téliz; Blanca B Landa; Hava F Rapoport; Fernando Pérez Camacho; Rafael M Jiménez-Díaz; Pablo Castillo
Journal: Plant Dis Date: 2007-09 Impact factor: 4.438

4 in total