Literature DB >> 23923504

On estimating probability of presence from use-availability or presence-background data.

Steven J Phillips1, Jane Elith.   

Abstract

A fundamental ecological modeling task is to estimate the probability that a species is present in (or uses) a site, conditional on environmental variables. For many species, available data consist of "presence" data (locations where the species [or evidence of it] has been observed), together with "background" data, a random sample of available environmental conditions. Recently published papers disagree on whether probability of presence is identifiable from such presence-background data alone. This paper aims to resolve the disagreement, demonstrating that additional information is required. We defined seven simulated species representing various simple shapes of response to environmental variables (constant, linear, convex, unimodal, S-shaped) and ran five logistic model-fitting methods using 1000 presence samples and 10 000 background samples; the simulations were repeated 100 times. The experiment revealed a stark contrast between two groups of methods: those based on a strong assumption that species' true probability of presence exactly matches a given parametric form had highly variable predictions and much larger RMS error than methods that take population prevalence (the fraction of sites in which the species is present) as an additional parameter. For six species, the former group grossly under- or overestimated probability of presence. The cause was not model structure or choice of link function, because all methods were logistic with linear and, where necessary, quadratic terms. Rather, the experiment demonstrates that an estimate of prevalence is not just helpful, but is necessary (except in special cases) for identifying probability of presence. We therefore advise against use of methods that rely on the strong assumption, due to Lele and Keim (recently advocated by Royle et al.) and Lancaster and Imbens. The methods are fragile, and their strong assumption is unlikely to be true in practice. We emphasize, however, that we are not arguing against standard statistical methods such as logistic regression, generalized linear models, and so forth, none of which requires the strong assumption. If probability of presence is required for a given application, there is no panacea for lack of data. Presence-background data must be augmented with an additional datum, e.g., species' prevalence, to reliably estimate absolute (rather than relative) probability of presence.

Mesh:

Year:  2013        PMID: 23923504     DOI: 10.1890/12-1520.1

Source DB:  PubMed          Journal:  Ecology        ISSN: 0012-9658            Impact factor:   5.499


  17 in total

1.  Inference from presence-only data; the ongoing controversy.

Authors:  Trevor Hastie; Will Fithian
Journal:  Ecography (Cop.)       Date:  2013-08-01       Impact factor: 5.992

2.  Determining the potential distribution of Oryctes monoceros and Oryctes rhinoceros by combining machine-learning with high-dimensional multidisciplinary environmental variables.

Authors:  Owusu Fordjour Aidoo; Fangyu Ding; Tian Ma; Dong Jiang; Di Wang; Mengmeng Hao; Elizabeth Tettey; Sebastian Andoh-Mensah; Kodwo Dadzie Ninsin; Christian Borgemeister
Journal:  Sci Rep       Date:  2022-10-19       Impact factor: 4.996

3.  Finite-Sample Equivalence in Statistical Models for Presence-Only Data.

Authors:  William Fithian; Trevor Hastie
Journal:  Ann Appl Stat       Date:  2013-12-01       Impact factor: 2.083

4.  A semi-supervised model to predict regulatory effects of genetic variants at single nucleotide resolution using massively parallel reporter assays.

Authors:  Zikun Yang; Chen Wang; Stephanie Erjavec; Lynn Petukhova; Angela Christiano; Iuliana Ionita-Laza
Journal:  Bioinformatics       Date:  2021-01-30       Impact factor: 6.937

5.  Evaluation of Limiting Climatic Factors and Simulation of a Climatically Suitable Habitat for Chinese Sea Buckthorn.

Authors:  Guoqing Li; Sheng Du; Ke Guo
Journal:  PLoS One       Date:  2015-07-15       Impact factor: 3.240

6.  On the selection of thresholds for predicting species occurrence with presence-only data.

Authors:  Canran Liu; Graeme Newell; Matt White
Journal:  Ecol Evol       Date:  2015-12-29       Impact factor: 2.912

7.  The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species.

Authors:  Keith B Aubry; Catherine M Raley; Kevin S McKelvey
Journal:  PLoS One       Date:  2017-06-22       Impact factor: 3.240

8.  Modeling and mapping the probability of occurrence of invasive wild pigs across the contiguous United States.

Authors:  Meredith L McClure; Christopher L Burdett; Matthew L Farnsworth; Mark W Lutman; David M Theobald; Philip D Riggs; Daniel A Grear; Ryan S Miller
Journal:  PLoS One       Date:  2015-08-12       Impact factor: 3.240

9.  Mapping the climatic suitable habitat of oriental arborvitae (Platycladus orientalis) for introduction and cultivation at a global scale.

Authors:  Guoqing Li; Sheng Du; Zhongming Wen
Journal:  Sci Rep       Date:  2016-07-21       Impact factor: 4.379

10.  Using Species Distribution Models to Predict Potential Landscape Restoration Effects on Puma Conservation.

Authors:  Cintia Camila Silva Angelieri; Christine Adams-Hosking; Katia Maria Paschoaletto Micchi de Barros Ferraz; Marcelo Pereira de Souza; Clive Alexander McAlpine
Journal:  PLoS One       Date:  2016-01-06       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.