Literature DB >> 28168023

Prevalence dependence in model goodness measures with special emphasis on true skill statistics.

Imelda Somodi¹, Nikolett Lepesi², Zoltán Botta-Dukát¹.

Abstract

It has long been a concern that performance measures of species distribution models react to attributes of the modeled entity arising from the input data structure rather than to model performance. Thus, the study of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) identifying the true skill statistics (TSS) as being independent of prevalence had a great impact. However, empirical experience questioned the validity of the statement. We searched for technical reasons behind these observations. We explored possible sources of prevalence dependence in TSS including sampling constraints and species characteristics, which influence the calculation of TSS. We also examined whether the widespread solution of using the maximum of TSS for comparison among species introduces a prevalence effect. We found that the design of Allouche et al. (Journal of Applied Ecology, 43, 1223, 2006) was flawed, but TSS is indeed independent of prevalence if model predictions are binary and under the strict set of assumptions methodological studies usually apply. However, if we take realistic sources of prevalence dependence, effects appear even in binary calculations. Furthermore, in the widespread approach of using maximum TSS for continuous predictions, the use of the maximum alone induces prevalence dependence for small, but realistic samples. Thus, prevalence differences need to be taken into account when model comparisons are carried out based on discrimination capacity. The sources we identified can serve as a checklist to safely control comparisons, so that true discrimination capacity is compared as opposed to artefacts arising from data structure, species characteristics, or the calculation of the comparison measure (here TSS).

Entities: Disease Species

Keywords: Cohen's kappa; model performance; predictive models; sample size; species distribution models

Year: 2017 PMID： 28168023 PMCID： PMC5288248 DOI： 10.1002/ece3.2654

Source DB: PubMed Journal: Ecol Evol ISSN： 2045-7758 Impact factor: 2.912

Introduction

Measuring model performance (goodness) is a central issue in species distribution modeling (SDM, Guisan & Zimmermann, 2000) and predictive vegetation modeling (PVM, Franklin, 1995). There are three major tasks performance measures are used for: 1) comparing modeling techniques, typically using one dataset and the same species with each technique (e.g., Jones, Acker, & Halpern, 2010; Zurell et al., 2012), 2) comparing the performance of models of different species with one or more modeling techniques using one dataset (e.g., Coetzee, Robertson, Erasmus, Van Rensburg, & Thuiller, 2009; Engler et al., 2013; Pliscoff, Luebert, Hilger, & Guisan, 2014), and 3) when models of the same species are tested on different datasets (e.g., Randin et al., 2006; Ribeiro, Somodi, & Čarni, 2016). In the first case, data properties are fixed and thus of less importance. Therefore, the actual prevalence in the data has no effect on the outcome of comparisons. On the contrary, when different species or prediction on different dataset is compared, characteristics of the data (including prevalence) may influence model performance. Why is prevalence dependence a problem? If model goodness measures are used for tasks two and three, the intention is to compare how well the models reflect the species' environmental requirements (Elith & Graham, 2009; Robertson, Peter, Villet, & Ripley, 2003). Species with more distinct environmental requirements are expected to be modeled better (assuming that relevant predictors were included) compared to species with wide tolerance. If we want to assess the degree the models reflect true environmental requirements (as many has aimed at), we do not want rarity to interfere. For example, we have a model of a species and we test its discrimination capacity on test site A and test site B (task 3), and we expect to receive similar discrimination level. If the two sites differ in prevalence and a prevalence‐dependent measure is used, it will seem as if the model would have changed. It is similar when rating different species' models (task 2). In fact, improving models of rare species, so that they reflect the environmental background better, has been a central issue lately (Lomba et al., 2010; Williams et al., 2009; Zimmermann, Edwards, Moisen, Frescino, & Blackard, 2007). We admit that there is a tendency that species with narrower tolerance are also less frequent, but it is not an absolute rule (Flather & Sieg, 2013; Kunin & Gaston, 1993). Besides autecological reasons, human activities may also account for a lower observed prevalence of a potentially common species. Prevalence of different species may differ for two basic reasons: Either sampling points are fixed, but different species occur with different frequency, or presence information of species is independent because of a presence‐only collection scheme, which is often true for datasets originating from museum collections (Elith & Leathwick, 2007). It is difficult to imagine a project with real data, where each species has the same prevalence unless common species are resampled to low prevalence. The latter would however mean information reduction, which would be unnecessary if measures would not depend on prevalence. Model goodness measures relate to calibration and discrimination ability (Lawson, Hodgson, Wilson, & Richards, 2014). While calibration measures the model's ability to match input data, discrimination reflects how well occurrences versus absences are found in independent data. Indices for discrimination ability include one truly threshold independent option (AUC, Hanley & McNeil, 1982) and several ones, where the basic idea is to find a threshold for the calculations of the index (kappa, true skill statistics [TSS], F score, Cohen, 1960; Allouche, Tsoar, and Kadmon (2006); Powers, 2011 respectively). The values of the index are then compared either at a threshold corresponding to the maximum or according to an equality criterion (e.g., TPR = TNR also called ROC‐based approach; Cantor, Sun, Tortolero‐Luna, Richards‐Kortum, & Follen, 1999). Although AUC is widely applied, many agree that it tends to be overoptimistic (Lobo, Jiménez‐Valverde, & Real, 2008; Shabani, Kumar, & Ahmadi, 2016), and therefore, it is often complemented by another measure of model goodness. This second measure used to be maximum kappa (Araújo & Luoto, 2007; Davidson, Hamilton, Boyer, Brown, & Ceballos, 2009; Guo & Liu, 2010). However, worries have been voiced about kappa being prevalence dependent and thus potentially providing misleading information (McPherson, Jetz, & Rogers, 2004; Pontius & Millones, 2011). Lately, TSS has been applied instead (also in prestigious packages as BIOMOD, Thuiller, Lafourcade, Engler, & Araújo, 2009) as Allouche et al. (2006) claimed that it is insensitive to prevalence differences. Nonetheless, reaction of TSS has been observed in relation to prevalence differences in actual studies (Allouche et al., 2006; Hanspach, Kühn, Pompe, & Klotz, 2010). Some other threshold dependent measures (F score, Drake, Randin, & Guisan, 2006; Powers, 2011) are also available, but their use is much more restricted then that of TSS. Please note that TSS exists under a wide variety of synonyms, typically used outside ecology (see also Wilks, 2011) except for the last one mentioned: Youden index (Youden, 1950), Peirce Skill Score (Peirce, 1884), Kuipers Skill Score (Murphy & Daan, 1985), Sum of Sensitivity and Specificity (SSS, Liu, White, & Newell, 2013). It is also noteworthy that TSS is most often applied in the form of maximum TSS over all possible probability cutoffs (e.g., in the BIOMOD package) and advocated in reviews in this form (Liu, Berry, Dawson, & Pearson, 2005; Liu et al. 2016). Motivated by the observed prevalence effects in TSS, we aimed at finding reasons, why such pattern may arise. We specifically set the following aims to: revisit Allouche et al. (2006) if their arguments (whether theoretical or simulation‐based) appropriately prove that TSS is independent of prevalence explore possible manifestations of prevalence dependence in theory determine whether and how TSS is prevalence dependent search for the source of prevalence dependence of TSS experienced in practice.

Theoretical considerations

A critique to the design of Allouche et al. (2006)

The true skill statistics is defined based on the components of the standard confusion matrix representing matches and mismatches between observations and predictions (Fielding & Bell, 1997; Table 1.).

Table 1

Confusion matrix of matches and mismatches of predictions and observations

Observation	Prediction
Observation	1	0	Σ
1	True positives (TP)	False negatives (FN)	No. positive observations (P = π*N)
0	False positives (FP)	True negatives (TN)	N − P
Σ	Number of positive predictions (S)	N − S	Total number of observations (N)

Confusion matrix of matches and mismatches of predictions and observations True skill statistics is defined asWhere The literature refers to TPR as true‐positive rate or sensitivity, while to TNR as true‐negative rate or specificity (Fielding & Bell, 1997). In the rare case, when predictions are binary, computation of the confusion matrix is straightforward. If there are probabilistic predictions, the goodness measure relying on the contingency table is calculated by converting the probabilities into presence and absence predictions. This is usually carried out by carrying out such a conversion at evenly spaced values of the probability spectrum (e.g., 0.1, 0.2, …, 0.9). These values are termed cutoffs or thresholds. Allouche et al. (2006) claim to have randomized their models; however, they only randomized their confusion matrix. They held the value of TPR and TNR constant. If TPR and TNR, or their sum, is held constant, TSS cannot vary theoretically. Allouche et al. (2006) set: TPR = TNR = 0.8 or TPR = 0.7 and TNR = 0.9 or the opposite way. Thus TSS = TPR + TNR − 1 = 0.8 + 0.8 − 1 = 0.6 or TSS = TPR + TNR − 1 = 0.7 + 0.9 − 1 = 0.6 Therefore, whatever the prevalence, the result is 0.6, as it is also clear from Figure 1 in Allouche et al. (2006). The low‐level variation in the TSS value in the figure is due to the constraint that numbers in the cells of the contingency table (including true‐positive and true‐negative cases) have to be integer; thus, actual TPR/TNR may slightly differ from the theoretical values.

Figure 1

Subcases of beta distribution with parameters defined in Table 6. The sampling of probability values for presence “observations” is carried out according to these curves in our simulations. The individual predicted probability values appear in our simulated predictions with such densities. Lines represent: a) quadratic, b) linear, c) square root, d) 1/16th power curve

Table 6

Is there prevalence dependence in TSS? Answers for cases examined in our study

Species occupy suitable sites only, and model goodness changes.		Species occupy unsuitable sites also, and model goodness is fixed (for our analysis). Binary predictions considered only. Source of species' distribution difference:
Binary predictions	Continuous predictions	Missed presence	Fallacious absence	Fallacious presence
No	Yes for small sample size, No for large sample size	Yes	Yes	Yes, except if the rate of fallacious presences is proportional to the number of unsuitable sites

Redefinition of prevalence dependence

As Allouche et al. (2006) did not appropriately prove that TSS is independent of prevalence and empirical experience indicates such an effect, there is a need to revisit prevalence dependence in TSS. The usual interpretation of prevalence dependence in distribution modeling (Lawson et al., 2014; Manel, Williams, & Ormerod, 2001; McPherson et al., 2004; Santika, 2011) is that the value of the index should be constant over prevalence ranges if model goodness is constant. We follow this definition, but it should be mentioned that alternative definitions of prevalence dependence could be developed. For example, an index could be regarded as prevalence independent, if its range (i.e., maximum and minimum values) does not depend on prevalence (cf. independence of beta‐diversity from alpha‐ and gamma‐diversity; Jost, 2007). The problem is how to measure model goodness exclusively without the confounding effects arising from data structure and especially prevalence differences. Lawson et al. (2014) pointed out that there is a distinction whether a performance measure quantifies model calibration or discrimination. In line with their opinion and taking into account that TSS measures discrimination capacity, we are targeting this model feature in our considerations. Thus, we consider two models equally good if they are characterized by same rate of discrimination errors (error rates of FP and FN). We examine two types of influences on TSS: the discrimination capacity of the model (1 − e) and prevalence (π = P/N) in the data. In all our calculations, we fixed the total sample size (N); therefore, the ratio of presence observations and total number of observations (prevalence, π = P/N) only depends on the absolute number of presence observations (P = N*π). Therefore, if P is present in any equation leading to TSS, it also indicates prevalence dependence. While the representation of e in the equations is thus desirable (TSS was designed to reflect that), if P or π is in the equation, then prevalence also matters and can confound discrimination effectiveness. The majority of the currently available model goodness measures and especially Kappa and TSS rely on a dichotomic representation of site occupancy. Therefore, they actually reduce the problem to a dichotomic representation of habitat suitability: Each of the locations is either suitable or unsuitable for the organism. The fact that we have no actual information on this suitability has not been taken into account yet, even though many of the predictive models are targeting the mapping of this suitability. Nonetheless, all estimations have errors, which can arise if 1) the model does not precisely predict suitability (for example, because not all relevant variables were measured). This kind of error is the most commonly considered error type (Guisan & Zimmermann, 2000; Pearce & Ferrier 2000). Discrimination capacity measures are expected to reflect the degree of this error and this error only. However, as Hirzel and Le Lay (2008) have introduced, there is another possible source of error: 2) the observed pattern does not fully reflect the suitability pattern, for example, due to sink populations or other components of metapopulation dynamics. We assume that the two kinds of errors do not extinguish each other (or would do so under very specific conditions only); therefore, we examine their cases separately.

Binary considerations

Although less frequent in practice, we first examine the case when not only observations, but also predictions are binary. If the model goodness measure appears independent of prevalence in such a case, the second step is the examination whether any prevalence dependence appears if continuous predictions are considered. We take the strategy of proceeding from simple cases toward complex ones. We assume that if prevalence dependence appears in a simple case, it is unlikely that it disappears in the corresponding more complex cases. In case 1), we assume that the observed pattern coincides with the suitability. In such a case, the contingency table takes the form presented in Table 2.

Table 2

Confusion matrix of matches and mismatches of predictions and observations assuming different rates of false‐negative and false‐positive errors, e 1 and e 2

Observation	Prediction
Observation	1	0	Σ
1	TP = (1 − e ₁) P	FN = e ₁ P	P = πN
0	FP = e ₂(N − P)	TN = (1 − e ₂) (N − P)	N − P
Σ	S	N − S	N

Confusion matrix of matches and mismatches of predictions and observations assuming different rates of false‐negative and false‐positive errors, e 1 and e 2 Applying our definition of model goodness (i.e., the opposite of the level of error rates) to these equations, TSS is prevalence independent, as its value can be calculated from the two error rates (e 1 and e 2) without using the prevalence value. This form of prevalence dependence is the usually considered and tested from Manel et al. (2001), McPherson et al. (2004), Santika (2011), Lawson et al. (2014). Prevalence dependence of kappa has been proved for this case with equal error rates (i.e., e 1 = e 2) by McPherson et al. (2004). Let us examine case 2 now, when we disregard potential weaknesses of the models but allow misleading observations, that is, allow the observed distribution pattern to be different from the habitat suitability pattern. Such situation can arise, for example, from intense metapopulation dynamics, sink subpopulations, or a transient animal being difficult to spot in the habitat. Differences between the suitability and observations can appear as a) missed presences, b) fallacious presences, and c) fallacious absences (Hirzel & Le Lay, 2008). The first two contribute to false‐positive predictions, while the last one appears as false negative, although this may be mitigated by missed presences. Firstly, we examine the case when there are missed presences only; that is, some of the presences are not detected even though the place is suitable and the species lives there. In the simplest case, the sampling error (i.e., the rate of missed presences denoted by e; Table 3) is constant, and thus, this error itself is independent from prevalence. (We do not miss more presences if the species is rare.)

Table 3

Contingency table when the model is assumed to be perfect, but there are missed presences in the observations. “e” denotes the rate of missed presences

Observation	Prediction
Observation	1	0	Σ
1	TP = (1 − e) S	FN = 0	P = πN
0	FP = eS	TN = (N − S)	N − P
Σ	S = P/(1 − e)	N − S	N

Contingency table when the model is assumed to be perfect, but there are missed presences in the observations. “e” denotes the rate of missed presences Even if the level of error does not depend on prevalence directly, TSS does appear to depend on prevalence (π) according to the equations above. Therefore, TSS differences may arise for species with different π even though we fixed the rate of missed presences (constant e) and did not allow any other error source. Secondly, let us consider fallacious absences (i.e., the species is not present even though the habitat is suitable) as the only source of error. As in metapopulation dynamics, we can assume that the number of false‐positive cases is proportional to the number of suitable sites (i.e., the error rate is constant; Table 4.). From a mathematical point of view, this case is equivalent to the previous one.

Table 4

Contingency table when the model is assumed to be perfect, but there are fallacious absences in the observations

Observation	Prediction
Observation	1	0	Σ
1	TP = P	FN = 0	P
0	FP = eS	TN = (N − S)	N − P
Σ	S = P/(1 − e)	N − S	N

Contingency table when the model is assumed to be perfect, but there are fallacious absences in the observations Thirdly, let us examine when fallacious presences are present and there is no other source of error. There are two reasonable alternative assumptions regarding error rates: Some proportion of presences is a fallacious presence. This is equivalent to case 1, if e 2 = 0. We have proven that TSS is prevalence independent in this case. (ii) The number of fallacious presences is proportional to the number of unsuitable sites (Table 5). In this case, TSS is prevalence dependent:

Table 5

Contingency table when the model is assumed to be perfect, but there are fallacious presences and their amount is proportional to the number of unsuitable sites in the observations

Observation	Prediction
Observation	1	0	Σ
1	TP = P − e (N − S) = S	FN = e (N − S)	P
0	FP = 0	TN = N − P	N − P
Σ	S	N − S	N

Contingency table when the model is assumed to be perfect, but there are fallacious presences and their amount is proportional to the number of unsuitable sites in the observations From Table 5, it follows that Thus, Our findings regarding the prevalence dependence of TSS is summarized in Table 6. Is there prevalence dependence in TSS? Answers for cases examined in our study

The case of continuous predictions

Having explored prevalence dependence of binary predictions, we examine whether binarization has any influence on prevalence dependence. First of all, there is no need to examine cases, where there has been prevalence dependence discovered in the binary case, as continuous predictions are reduced to binary cases at regular cutoffs to provide a distribution of goodness values, from which usually the maximum is chosen. If there is already prevalence dependence in the binary case, it is unlikely that repeated application of the same principle would eliminate the effect. It was case 1, the most popular interpretation of prevalence dependence in fact (when the species is assumed to occupy suitable sites only), which showed no prevalence dependence. However, as detailed in the Introduction, empirical prevalence dependence has been observed. Therefore, we examine whether binarization induces such an effect. We can formulate TPR and TNR as conditional probabilities given a binary prediction as already pointed out by Lawson et al. (2014): where x denotes the predicted value. If we have continuous probabilities as prediction, the equations are as follows: where x c refers to the cutoff value corresponding to maximum TSS. Let F 1 and F 0 denote the conditional distribution functions of predicted values conditional on the presence and absence of the species, subscripts refer to presence (1) and absence (0), respectively. The expected value of TPR, TNR, and TSS is as follows: As usually the cutoff corresponding to the maximum value of TSS is used, we inspect the prevalence dependence of this measure. The maximum of the expected value of TSS is where the derivative is 0.The derivatives of F 0 and F 1, that is, the density functions, will be referred to as f 0 and f 1; thus, TSS is maximal where If the theoretical curves were known and the cutoff was based on them or any other a priori threshold setting method was chosen, TSS would indeed be prevalence independent. In practice, however, the cutoff is determined from the data. Due to this, the mean of TSS maxima will be higher than the expected value, because we only choose a maximum value other than the one corresponding to the theoretical cutoff if the former is higher. Thus, the mean of TSS maxima is a biased estimate of the theoretical TSS. The bias is due to the cumulative frequency distribution being different from the theoretical distribution function. We have two theoretical distribution functions with two corresponding cumulative frequency functions. The theoretical distribution function and the cumulative frequency function increasingly resemble each other with increasing sample size. If the sample size is fixed, but prevalence changes, the fit of the cumulative distribution function to the theoretical distribution function improves for one of the conditional distribution, but deteriorates for the other. If the improvement/deterioration depends on prevalence in a nonlinear manner, they do not extinguish the effect of each other, which may result in the prevalence dependence observed. We tested this effect with numerical simulations.

Simulation methods and results

Methods

We constructed model scenarios where two aspects varied, discrimination capacity and prevalence. We varied prevalence as the proportion of presences in the observations from 0.05 to 0.95 in increments of 0.05. This corresponds to the approach of Allouche et al. (2006) and other papers studying the effect of prevalence on kappa (Manel et al., 2001; McPherson et al., 2004). To observe the effect of sample size, the following sizes were applied: 100, 1,000, 10,000. Presence or absence was allocated to this amount of observations so as to produce the prevalence desired. Predicted probability values were randomly chosen from the beta distribution with parameters given in the Table 7 representing different model goodness scenarios. Density functions of predicted probabilities for presences (f 1) and absences (f 0) were defined by the following general formula: where x corresponds to possible values of the suitability estimate, while α and β are the parameters of the distribution. Parameters has been chosen so that f 0(x) = 1 − f 1(x) if , and it is always true that higher predicted probabilities are chosen more frequently than lower ones for presences, while there is an opposite trend for absences. We will refer to the models according to the function in the nominator of f 1(x): a) quadratic, b) linear, c) square root, d) a power of 1/16 (Figure 1) curves. The steepness of function in the nominator of f 1(x) represents the discrimination power. Steepness patterns were selected so as to present contrasting distribution of predictions and thus to represent different discrimination powers. The quadratic curve corresponds to the best model, where low probabilities are disproportionately more often assigned to absences, while high probabilities to presences. The linear curve represents medium model performance, while the application of the square root function results in a weakly discriminating model, where medium probabilities are assigned both to presences and absences in most cases. The 1/16th power corresponds to extreme weak discrimination. TSS was calculated at 19 cutoffs (thresholds) equally spaced along the probability spectrum (0.05–0.95) for each prevalence ratio and model goodness scenarios. This was repeated 1,000 times for each combination to assess variation. The mean of the maximum TSS values was calculated for each combination of model goodness scenarios and prevalence values. Calculations were carried out in the R Statistical Environment (R Core Team 2014).

Table 7

The f 0 and f 1 functions used in our simulations are specific cases of the beta distribution if α = 1 or β = 1. The table shows the corresponding other parameter of the beta distribution producing the probability function of selecting a certain probability value for presence observations. Selections for absence observations follow the opposite trend. The rbeta function in R generates random numbers with such distributions (Appendix S1)

Curve type	f ₁	f ₀
Quadratic	α = 3, β = 1	β = 3, α = 1
Linear	α = 2, β = 1	β = 2, α = 1
Square root	α = 1.5, β = 1	β = 1.5, α = 1
16^th root	α = 17/16, β = 1	β = 17/16, α = 1

Results

We found a response to prevalence changes in the maximum value of TSS for small sample sizes (Figures 2 and 3), which however decreased with an increase in sample size and approached the theoretically expected value. Sample size of 10,000 eliminated any TSS bias even for the worst model even with lowest prevalence corresponding 500 presences. Sample size of 1,000 with 50 presences showed prevalence dependence for the worst model only; thus, already this sample size can be applied with confidence for reasonably performing models.

Figure 2

Figure 3

Decrease of prevalence effect on maximum TSS with increasing sample size. Sample size equals to (a) 100, (b) 1,000, (c) 10,000. The pattern is similar for all model goodness scenarios; here, sampling according to the square root function (medium model quality) is used as an example

Demonstration of the dependence of the maximum value of TSS on prevalence. The ratio of presences and absences in the observations (prevalence) was varied from 0.05 to 0.95 in increments of 0.05. Average maximum values from 1,000 simulations are shown for four model scenarios (a)–(d). For details, see Fig 1 Decrease of prevalence effect on maximum TSS with increasing sample size. Sample size equals to (a) 100, (b) 1,000, (c) 10,000. The pattern is similar for all model goodness scenarios; here, sampling according to the square root function (medium model quality) is used as an example The dependence at low sample size had an U‐shaped form, implying that the same model goodness can result in higher maximum TSS solely due to a low or high prevalence if sample size is low (corresponding to a rare or common species; Figure 2.). The dependence on prevalence increased with decreasing model quality at constant sample sizes.

Discussion

We found that prevalence dependence is absent in TSS only under strict assumptions and large sample sizes. This is in contrast with actual use of TSS, when these assumptions are often violated. Allouche et al. (2006) used a flawed design; therefore, their results are not relevant. However, using their assumptions, TSS is indeed not prevalence dependent. Nonetheless, there is a tendency for prevalence dependence observations in TSS (Allouche et al., 2006; Lawson et al., 2014; McPherson et al., 2004). Causes of prevalence dependence could be retraced in our study either to 1) a lack of ideal association of species with suitable sites or 2) the use of the maximum value of TSS for cutoff selection and especially at small sample sizes. Previous considerations of prevalence dependence in general assumed that species occupy all suitable sites and suitable sites only. This is often not the case (Hirzel & Le Lay, 2008). This narrow assumption had no significance regarding prevalence dependence of the previously more common kappa, as it proved to be prevalence dependent even under those idealistic assumptions (McPherson et al., 2004). If species behavior does not follow that assumption, the prevalence dependence is not likely to diminish. However, we found that in the binary case (which is also equivalent to a predetermined cutoff), TSS is indeed not prevalence dependent (although not for the reason Allouche et al., 2006 gave). Nonetheless, this only holds if a species closely follows the suitability pattern. Ideally, we want to evaluate the capacity of a model to trace suitability pattern and when we compare species want to compare this property. However, we found that if species are differently detectable (differ in the proportion of missed presences) or tend to leave suitable space open (fallacious absences) or tend to occur at unsuitable places (fallacious presences) to a degree differing, these features might mix up with model discrimination capacity and may lead to artefacts in comparisons. There is abundant evidence against species closely following suitability patterns, including metapopulation theory (Hanski, 1991), extinction debt (Tilman, May, Lehman, & Nowak, 1994), and other considerations (Gu & Swihart, 2004). Such mechanisms may be behind “species characteristics” influencing model performance such as in Hernandez, Graham, Master, and Albert (2006) and Hanspach et al. (2010) and may also account for the prevalence dependence seen in Allouche et al. (2006)'s Figure 2. We offer no solution yet; our aim here is to draw the attention that these aspects need to be considered when making comparisons. The tendency for the appearance of missed presences is related to the life strategy of the species, so it might not be a problem if models of similar species are compared (e.g., several species of trees: Zimmermann et al., 2009), but comparison between species with great differences (butterflies vs. plants; e.g., Hanspach et al., 2014) may become significant. Tendency for fallacious absences and presences is likely in connection with the degree of involvement of metapopulation dynamics in the species' distribution. Fallacious absences reflect a population structure, where empty suitable patches are a constant proportion in the landscape (cf. Levin's model, Pásztor, Botta‐Dukát, Magyar, Meszena, & Czárán, 2016; Husband & Barrett, 1998), while fallacious presences can reflect sink populations (e.g., Ficetola, Thuiller, & Padoa‐Schioppa, 2009). The sample size effect has been observed in relation to the use of maximum TSS, which is most widespread in the literature in relation to TSS use (a few recent examples: Zurell et al., 2012; Gallardo & Albridge 2013, Baross et al. 2015). It is also one of the default measures in BIOMOD (Thuiller et al., 2009), one of the most widespread SDM tool and also propagated in reviews (Liu 2005; Liu, Newell, & White, 2016). While users of max TSS still assume that they use a prevalence independent measure, we observed as large differences as almost 0.2 in the average maximum TSS due to differences in prevalence only even in “good models” at the lowest sample size. Differences in maximum TSS as small as 0.001 and 0.06 have been interpreted as the model with the higher TSS being superior to the one with the lower maximum value (Coetzee et al., 2009 and Zurell et al., 2012, respectively). Therefore, the level of influence of prevalence detected for low sample sizes has a message for the practice, too. One could argue that lower sample sizes used in our simulations (100 observations with 5–95 presences within) are extreme, but several similar examples can be found (e.g., Hernandez et al., 2006; Papeş & Gaubert, 2007; Williams et al., 2009; Wisz et al., 2008). Species' distribution models of rare plants are frequent target of research (Engler, Guisan, & Rechsteiner 2004; Guisan et al., 2006; Zimmermann et al., 2007; Williams et al., 2009), where both extreme prevalence and sample sizes occur. According to our results, in such cases, the effect of data structure may be particularly severe, and therefore, automatically applying maximum TSS for across‐species or across‐sites comparison may lead to erroneous conclusions. We agree with Lobo et al. (2008) that in such cases, indices should be adjusted to the case studied taking into account the potential effect of prevalence on the indices. It is also worth to note that prevalence dependence does not affect the comparison of different models of a single species from a single dataset. Thus, our finding does not affect model type comparisons for one species with one dataset, such as the ensemble modeling approach in BIOMOD, which heavily relies on TSS (Thuiller et al., 2009).

Conclusions

The redefinition of prevalence dependence has brought a wider range of interpretations and explanations to attention. Sources of prevalence dependence have to be considered when evaluating models of different objects (while it is no concern when different models of the same object with the same data points are compared). We found three sources of prevalence dependence not yet considered, arising for an incomplete reflection of habitat suitability in species' distribution: different degree of missed presences, fallacious absences, and fallacious presences per species. Another source of potential prevalence dependence was the use of maximum value over the predicted probability continuum for comparisons (maximum TSS). We found three risk factors for prevalence dependence even when assuming species perfectly mirroring suitability but using maximum TSS for across‐species comparisons: rare or very common species, small sample sizes, and weak models.

Conflict of Interest

None declared. Click here for additional data file.

13 in total

1. A comparison of C/B ratios from studies using receiver operating characteristic curve analysis.

Authors: S B Cantor; C C Sun; G Tortolero-Luna; R Richards-Kortum; M Follen
Journal: J Clin Epidemiol Date: 1999-09 Impact factor: 6.437

2. Combining local- and large-scale models to predict the distributions of invasive plant species.

Authors: Chad C Jones; Steven A Acker; Charles B Halpern
Journal: Ecol Appl Date: 2010-03 Impact factor: 4.657

3. Using niche-based models to improve the sampling of rare species.

Authors: Antoine Guisan; Olivier Broennimann; Robin Engler; Mathias Vust; Nigel G Yoccoz; Anthony Lehmann; Niklaus E Zimmermann
Journal: Conserv Biol Date: 2006-04 Impact factor: 6.560

4. Partitioning diversity into independent alpha and beta components.

Authors: Lou Jost
Journal: Ecology Date: 2007-10 Impact factor: 5.499

5. The numerical measure of the success of predictions.

Authors: C S Peirce
Journal: Science Date: 1884-11-14 Impact factor: 47.728

6. Index for rating diagnostic tests.

Authors: W J YOUDEN
Journal: Cancer Date: 1950-01 Impact factor: 6.860

7. Climatic extremes improve predictions of spatial patterns of tree species.

Authors: Niklaus E Zimmermann; Nigel G Yoccoz; Thomas C Edwards; Eliane S Meier; Wilfried Thuiller; Antoine Guisan; Dirk R Schmatz; Peter B Pearman
Journal: Proc Natl Acad Sci U S A Date: 2009-11-06 Impact factor: 11.205

8. The biology of rarity: Patterns, causes and consequences.

Authors: W E Kunin; K J Gaston
Journal: Trends Ecol Evol Date: 1993-08 Impact factor: 17.712

9. Multiple ecological pathways to extinction in mammals.

Authors: Ana D Davidson; Marcus J Hamilton; Alison G Boyer; James H Brown; Gerardo Ceballos
Journal: Proc Natl Acad Sci U S A Date: 2009-06-15 Impact factor: 11.205

10. Remote sensing-based predictors improve distribution models of rare, early successional and broadleaf tree species in Utah.

Authors: N E Zimmermann; T C Edwards; G G Moisen; T S Frescino; J A Blackard
Journal: J Appl Ecol Date: 2007-10 Impact factor: 6.528

9 in total

1. Predicted distribution of sand fly (Diptera: Psychodidae) species involved in the transmission of Leishmaniasis in São Paulo state, Brazil, utilizing maximum entropy ecological niche modeling.

Authors: Elivelton Da Silva Fonseca; Raul Borges Guimarães; Luiz Euribel Prestes-Carneiro; José Eduardo Tolezano; Moara De Santana Martins Rodgers; Ryan Harry Avery; John B Malone
Journal: Pathog Glob Health Date: 2021-01-11 Impact factor: 2.894

2. Performance evaluation of cetacean species distribution models developed using generalized additive models and boosted regression trees.

Authors: Elizabeth A Becker; James V Carretta; Karin A Forney; Jay Barlow; Stephanie Brodie; Ryan Hoopes; Michael G Jacox; Sara M Maxwell; Jessica V Redfern; Nicholas B Sisson; Heather Welch; Elliott L Hazen
Journal: Ecol Evol Date: 2020-05-11 Impact factor: 2.912

3. Effects of climate change and land cover on the distributions of a critical tree family in the Philippines.

Authors: Sean E H Pang; Jose Don T De Alban; Edward L Webb
Journal: Sci Rep Date: 2021-01-11 Impact factor: 4.379

4. Method for Data Quality Assessment of Synthetic Industrial Data.

Authors: László Barna Iantovics; Călin Enăchescu
Journal: Sensors (Basel) Date: 2022-02-18 Impact factor: 3.576

5. 'Fly to a Safer North': Distributional Shifts of the Orchid Ophrys insectifera L. Due to Climate Change.

Authors: Martha Charitonidou; Konstantinos Kougioumoutzis; Maria Chara Karypidou; John M Halley
Journal: Biology (Basel) Date: 2022-03-24

6. Autumn larval cold tolerance does not predict the northern range limit of a widespread butterfly species.

Authors: Philippe Tremblay; Heath A MacMillan; Heather M Kharouba
Journal: Ecol Evol Date: 2021-05-22 Impact factor: 2.912

7. Environmental suitability models predict population density, performance and body condition for microendemic salamanders.

Authors: Enrico Lunghi; Raoul Manenti; Manuela Mulargia; Michael Veith; Claudia Corti; Gentile Francesco Ficetola
Journal: Sci Rep Date: 2018-05-14 Impact factor: 4.379

8. Insights from modeling studies on how climate change affects invasive alien species geography.

Authors: Celine Bellard; Jonathan M Jeschke; Boris Leroy; Georgina M Mace
Journal: Ecol Evol Date: 2018-05-04 Impact factor: 2.912

9. Greater topoclimatic control of above- versus below-ground communities.

Authors: Heidi K Mod; Daniel Scherrer; Valeria Di Cola; Olivier Broennimann; Quentin Blandenier; Frank T Breiner; Aline Buri; Jérôme Goudet; Nicolas Guex; Enrique Lara; Edward A D Mitchell; Hélène Niculita-Hirzel; Marco Pagni; Loïc Pellissier; Eric Pinto-Figueroa; Ian R Sanders; Benedikt R Schmidt; Christophe V W Seppey; David Singer; Sylvain Ursenbacher; Erika Yashiro; Jan R van der Meer; Antoine Guisan
Journal: Glob Chang Biol Date: 2020-09-27 Impact factor: 10.863

9 in total