| Literature DB >> 35061658 |
John L Schnase1, Mark L Carroll1.
Abstract
MERRA/Max provides a feature selection approach to dimensionality reduction that enables direct use of global climate model outputs in ecological niche modeling. The system accomplishes this reduction through a Monte Carlo optimization in which many independent MaxEnt runs, operating on a species occurrence file and a small set of randomly selected variables in a large collection of variables, converge on an estimate of the top contributing predictors in the larger collection. These top predictors can be viewed as potential candidates in the variable selection step of the ecological niche modeling process. MERRA/Max's Monte Carlo algorithm operates on files stored in the underlying filesystem, making it scalable to large data sets. Its software components can run as parallel processes in a high-performance cloud computing environment to yield near real-time performance. In tests using Cassin's Sparrow (Peucaea cassinii) as the target species, MERRA/Max selected a set of predictors from Worldclim's Bioclim collection of 19 environmental variables that have been shown to be important determinants of the species' bioclimatic niche. It also selected biologically and ecologically plausible predictors from a more diverse set of 86 environmental variables derived from NASA's Modern-Era Retrospective Analysis for Research and Applications Version 2 (MERRA-2) reanalysis, an output product of the Goddard Earth Observing System Version 5 (GEOS-5) modeling system. We believe these results point to a technological approach that could expand the use global climate model outputs in ecological niche modeling, foster exploratory experimentation with otherwise difficult-to-use climate data sets, streamline the modeling process, and, eventually, enable automated bioclimatic modeling as a practical, readily accessible, low-cost, commercial cloud service.Entities:
Mesh:
Year: 2022 PMID: 35061658 PMCID: PMC8782318 DOI: 10.1371/journal.pone.0257502
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1MERRA/Max architecture.
Conceptual diagram showing the major hardware and software components of the MERRA/Max prototype. The study’s testbed consisted of 10 virtual machines (VMs) within NASA’s ADAPT science cloud, with each VM contributing 10 processing cores to the testbed. Numbered arrows indicate the system’s processing workflow.
Bioclim variables.
| bio01 | Annual mean temperature |
| bio02 | Mean diurnal range (mean of monthly (max temp—min temp)) |
| bio03 | Isothermality (bio2/bio7) (×100) |
| bio04 | Temperature seasonality (standard deviation ×100) |
| bio05 | Maximum temperature of warmest month |
| bio06 | Minimum temperature of coldest month |
| bio07 | Temperature annual range (bio5-bio6) |
| bio08 | Mean temperature of wettest quarter |
| bio09 | Mean temperature of driest quarter |
| bio10 | Mean temperature of warmest quarter |
| bio11 | Mean temperature of coldest quarter |
| bio12 | Annual precipitation |
| bio13 | Precipitation of wettest month |
| bio14 | Precipitation of driest month |
| bio15 | Precipitation seasonality (coefficient of variation) |
| bio16 | Precipitation of wettest quarter |
| bio17 | Precipitation of driest quarter |
| bio18 | Precipitation of warmest quarter |
| bio19 | Precipitation of coldest quarter |
MERRA-2 variables.
|
|
| |
| M01 | PS | Time averaged surface pressure |
| M02 | U850 | Eastward wind at 850 hPa |
| M03 | V850 | Northward wind at 850 hPa |
| M04 | T850 | Temperature at 850 hPa |
| M05 | Q850 | Specific humidity at 850 hPa |
| M06 | H1000 | Height at 1000 hPa |
| M07 | TS | Surface skin temperature |
| M08 | QV2M | 2-meter specific humidity |
| M09 | QV10M | 10-meter specific humidity |
| M10 | T2M | 2-meter air temperature |
| M11 | T10M | 10-meter air temperature |
| M12 | U2M | 2-meter eastward wind |
| M13 | U10M | 10-meter eastward wind |
| M14 | U50M | Eastward wind at 50 meters |
| M15 | V2M | 2-meter northward wind |
| M16 | V10M | 10-meter northward wind |
| M17 | V50M | Northward wind at 50 meters |
|
|
| |
| M18 | EFLUX | Latent heat flux (positive upward) |
| M19 | HFLUX | Sensible heat flux (positive upward) |
| M20 | TAUX | Eastward surface wind stress |
| M21 | TAUY | Northward surface wind stress |
| M22 | RHOA | Surface air density |
| M23 | TSH | Effective turbulence skin temperature |
| M24 | QSH | Effective turbulence skin humidity |
| M25 | PGENTOT | Total generation of precipitation |
| M26 | PREVTOT | Total re-evaporation of precipitation |
|
|
| |
| M27 | EMIS | Surface emissivity |
| M28 | ALBEDO | Surface albedo |
| M29 | LWGEM | Emitted longwave at the surface |
| M30 | LWGAB | Surface absorbed longwave |
| M31 | LWGABCLR | Surface absorbed longwave assuming clear sky |
| M32 | LWGABCLRCLN | Surface absorbed longwave assuming clear clean sky |
| M33 | LWGNT | Surface net downward longwave flux |
| M34 | LWGNTCLR | Surface net downward longwave flux assuming clear day |
| M35 | LWGNTCLRCLN | Surface net downward longwave flux assuming clear clean day |
| M36 | SWGDN | Surface incident shortwave flux |
| M37 | SWGDNCLR | Surface incident shortwave flux assuming clear sky |
| M38 | SWGNT | Surface net downward shortwave flux |
| M39 | SWGNTCLR | Surface net downward shortwave flux assuming clear sky |
| M40 | SWGNTCLN | Surface net downward shortwave flux assuming clean sky |
| M41 | SWGNTCLRCLN | Surface net downward shortwave flux assuming clear clean sky |
| M42 | TAUTOT | Optical thickness of all clouds |
| M43 | CLDTOT | Total cloud fraction |
|
|
| |
| M44 | GRN | Vegetation greenness fraction (LAI-weighted) |
| M45 | LAI | Leaf area index |
| M46 | GWETPROF | Total profile soil wetness |
| M47 | GWETROOT | Root zone soil wetness |
| M48 | GWETTOP | Top soil layer wetness |
| M49 | TSURF | Mean land surface temperature (incl. snow) |
| M50 | TPSNOW | Top snow layer temperature |
| M51 | TUNST | Surface temperature of unsaturated (but non-wilting) zone |
| M52 | TSA T | Surface temperature of saturated zone |
| M53 | TWLT | Surface temperature of wilting zone |
| M54 | SNODP | Snow depth |
| M55 | RUNOFF | Overland runoff |
| M56 | BASEFLOW | Baseflow |
| M57 | QINFIL | Soil water infiltration rate |
| M58 | FRUNST | Fractional unsaturated (but non-wilting) area |
| M59 | FRSAT | Fractional saturated area |
| M60 | FRSNO | Fractional snow-covered area |
| M61 | FRWLT | Fractional wilting area |
| M62 | PARDFLAND | Surface downward photosynthetically active radiation diffuse flux |
| M63 | PARDR LAND | Surface downward photosynthetically active radiation beam flux |
| M64 | SHLAND | Sensible heat flux from land |
| M65 | LHLAND | Latent heat flux from land |
| M66 | LWLAND | Net downward longwave flux over land |
| M67 | SWLAND | Net downward shortwave flux over land reservoirs |
| M68 | GHLAND | Downward heat flux into top soil layer |
| M69 | TWLAND | Total water stored in land reservoirs |
| M70 | TELAND | Energy stored in all land |
| M71 | WCHANGE | Total land water change per unit time |
| M72 | ECHANGE | Total land energy change per unit time |
| M73 | SPLAND | Spurious land energy source |
| M74 | SPWATR | Spurious land water source |
| M75 | SPSNOW | Spurious snow energy source |
| M76 | PRMC | Total profile soil moisture content |
| M77 | RZMC | Root zone soil moisture content |
| M78 | SFMC | Top soil layer soil moisture content |
| M79 | PRECTOT | Total surface precipitation |
| M80 | SNOMAS | Snow mass |
| M81 | EVPSOIL | Bare soil evaporation |
| M82 | EVPTRNS | Transpiration |
| M83 | EVPINTR | Interception loss |
| M84 | EVPSBLN | Sublimation |
| M85 | SMLAND | Snowmelt over land |
| M86 | EVLAND | Evaporation from land |
Fig 2MERRA/Max run-time performance and scaling properties.
Figure shows the relationship between the amount of time it takes MERRA/Max to complete a screening run (T) (shown by the left Y axis and the colored lines labeled A, B, and C), the number of variables in the collection being scanned (N), the average number of random samples taken of each variable in the collection during the screening process (S), and the number of processor cores available in the compute environment (C) (shown by the colored vertical bars and right Y axis). MERRA/Max’s parallel implementation scales linearly with respect to S, and, for any given collection of size N and sample size S, the estimated minimum possible run time (Tmin) (shown in parentheses) can be achieved when enough cores are available for a completely parallel screening of the collection.
Fig 3MERRA/Max use case scenarios.
Figure shows the results of two use cases involving Cassin’s Sparrow observational data and predictor data sets of contrasting size and complexity: the Bioclim collection with N = 19 variables (A) and a MERRA-2 reanalysis test collection comprising N = 86 variables (B). A Variable Screening step was used in each scenario to select the top six contributing variables in the underlying collection. Correlated variables (indicated with red text and yellow highlight) were identified in a Predictor Refinement step and thinned to reduce collinearities. In a third step, Model Calibration and a Final Model Run were performed with the remaining non-correlated variables (green highlight). AICc is Akaike’s information criterion corrected for small sample size, AUC is area under the receiver operating characteristic curve, PCC is percent correctly classified, TSS is True Skill Statistic, Parameters is MaxEnt’s measure of model complexity, r is Pearson’s correlation coefficient, r2 is the coefficient of determination, and VIF is variable inflation factor. The estimated minimum run time (Tmin) for a completely parallel screening is shown in parentheses. Maps created by the authors show MaxEnt logistic output, which can be interpreted as an estimate of habitat suitability between 0 and 1 with warmer colors indicating better predicted conditions for the species.
Fig 4Cassin’s Sparrow baseline model and maps.
Figure shows results from a MaxEnt run that builds on the Cassin’s Sparrow bioclimatic modeling work of Salas et al. [75] and reflects a more traditional approach to ENM (A) and Cassin’s Sparrow’s range map based on observational data (B). Highlighted variables indicate those that were also selected by MERRA/Max in the Bioclim use case. Range map provided by eBird (www.ebird.org), created 28 July 2020, and reprinted from [83] under a CC BY license, with permission from the Cornell Lab of Ornithology.
Fig 5Ecological niche modeling (ENM) process.
Schematic description of the ENM process. Color bars under each step reflect an approximate amount of time that may be needed, ranging from low (blue) to high (red). The use of MERRA/Max to prescreen a large collection of predictors could support variable selection in the data cleaning step. Image provided by [92] and adapted for use here under a CC-BY license.