Literature DB >> 32203555

Comparing spatial regression to random forests for large environmental data sets.

Eric W Fox1, Jay M Ver Hoef2, Anthony R Olsen3.   

Abstract

Environmental data may be "large" due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates with nonlinear relationships, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records that are spatially autocorrelated. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. A primary application is mapping MMI predictions and prediction errors at 1.1 million perennial stream reaches across the conterminous United States. For the spatial regression model, we develop a novel transformation procedure that estimates Box-Cox transformations to linearize covariate relationships and handles possibly zero-inflated covariates. We find that the spatial regression model with transformations, and a subsequent selection of significant covariates, has cross-validation performance comparable to random forests. We also find that prediction interval coverage is close to nominal for each method, but that spatial regression prediction intervals tend to be narrower and have less variability than quantile regression forest prediction intervals. A simulation study is used to generalize results and clarify advantages of each modeling approach.

Entities:  

Year:  2020        PMID: 32203555      PMCID: PMC7089425          DOI: 10.1371/journal.pone.0229509

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  10 in total

1.  Model selection for geostatistical models.

Authors:  Jennifer A Hoeting; Richard A Davis; Andrew A Merton; Sandra E Thompson
Journal:  Ecol Appl       Date:  2006-02       Impact factor: 4.657

2.  Random forests for classification in ecology.

Authors:  D Richard Cutler; Thomas C Edwards; Karen H Beard; Adele Cutler; Kyle T Hess; Jacob Gibson; Joshua J Lawler
Journal:  Ecology       Date:  2007-11       Impact factor: 5.499

3.  Predicting the biological condition of streams: use of geospatial indicators of natural and anthropogenic characteristics of watersheds.

Authors:  Daren M Carlisle; James Falcone; Michael R Meador
Journal:  Environ Monit Assess       Date:  2008-05-21       Impact factor: 2.513

4.  Big data: the management revolution.

Authors:  Andrew McAfee; Erik Brynjolfsson
Journal:  Harv Bus Rev       Date:  2012-10

5.  Predictive mapping of the biotic condition of conterminous U.S. rivers and streams.

Authors:  Ryan A Hill; Eric W Fox; Scott G Leibowitz; Anthony R Olsen; Darren J Thornbrugh; Marc H Weber
Journal:  Ecol Appl       Date:  2017-11-03       Impact factor: 4.657

6.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

7.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.

Authors:  Carolin Strobl; James Malley; Gerhard Tutz
Journal:  Psychol Methods       Date:  2009-12

8.  Gaussian predictive process models for large spatial data sets.

Authors:  Sudipto Banerjee; Alan E Gelfand; Andrew O Finley; Huiyan Sang
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2008-09-01       Impact factor: 4.488

9.  Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions.

Authors:  Tomislav Hengl; Gerard B M Heuvelink; Bas Kempen; Johan G B Leenaars; Markus G Walsh; Keith D Shepherd; Andrew Sila; Robert A MacMillan; Jorge Mendes de Jesus; Lulseged Tamene; Jérôme E Tondoh
Journal:  PLoS One       Date:  2015-06-25       Impact factor: 3.240

10.  A Case Study Competition Among Methods for Analyzing Large Spatial Data.

Authors:  Matthew J Heaton; Abhirup Datta; Andrew O Finley; Reinhard Furrer; Joseph Guinness; Rajarshi Guhaniyogi; Florian Gerber; Robert B Gramacy; Dorit Hammerling; Matthias Katzfuss; Finn Lindgren; Douglas W Nychka; Furong Sun; Andrew Zammit-Mangion
Journal:  J Agric Biol Environ Stat       Date:  2018-12-14       Impact factor: 1.524

  10 in total
  3 in total

1.  Global relationships in tree functional traits.

Authors:  Daniel S Maynard; Lalasia Bialic-Murphy; Constantin M Zohner; Colin Averill; Johan van den Hoogen; Haozhi Ma; Lidong Mo; Gabriel Reuben Smith; Alicia T R Acosta; Isabelle Aubin; Erika Berenguer; Coline C F Boonman; Jane A Catford; Bruno E L Cerabolini; Arildo S Dias; Andrés González-Melo; Peter Hietz; Christopher H Lusk; Akira S Mori; Ülo Niinemets; Valério D Pillar; Bruno X Pinho; Julieta A Rosell; Frank M Schurr; Serge N Sheremetev; Ana Carolina da Silva; Ênio Sosinski; Peter M van Bodegom; Evan Weiher; Gerhard Bönisch; Jens Kattge; Thomas W Crowther
Journal:  Nat Commun       Date:  2022-06-08       Impact factor: 17.694

2.  Distance from Healthcare Facilities Is Associated with Increased Morbidity of Acute Infection in Pediatric Patients in Matiari, Pakistan.

Authors:  Elise Corden; Saman Hasan Siddiqui; Yash Sharma; Muhammad Faraz Raghib; William Adorno; Fatima Zulqarnain; Lubaina Ehsan; Aman Shrivastava; Sheraz Ahmed; Fayaz Umrani; Najeeb Rahman; Rafey Ali; Najeeha T Iqbal; Sean R Moore; Syed Asad Ali; Sana Syed
Journal:  Int J Environ Res Public Health       Date:  2021-11-07       Impact factor: 3.390

3.  Predicting Heavy Metal Concentrations in Shallow Aquifer Systems Based on Low-Cost Physiochemical Parameters Using Machine Learning Techniques.

Authors:  Thi-Minh-Trang Huynh; Chuen-Fa Ni; Yu-Sheng Su; Vo-Chau-Ngan Nguyen; I-Hsien Lee; Chi-Ping Lin; Hoang-Hiep Nguyen
Journal:  Int J Environ Res Public Health       Date:  2022-09-26       Impact factor: 4.614

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.