| Literature DB >> 28469446 |
Xun Liu1, Daji Wu1, Gebreab K Zewdie1, Lakitha Wijerante1, Christopher I Timms1, Alexander Riley1, Estelle Levetin2, David J Lary1.
Abstract
This article describes an example of using machine learning to estimate the abundance of airborne Ambrosia pollen for Tulsa, OK. Twenty-seven years of historical pollen observations were used. These pollen observations were combined with machine learning and a very complete meteorological and land surface context of 85 variables to estimate the daily Ambrosia abundance. The machine learning algorithms employed were Least Absolute Shrinkage and Selection Operator (LASSO), neural networks, and random forests. The best performance was obtained using random forests. The physical insights provided by the random forest are also discussed.Entities:
Keywords: Pollen; machine learning
Year: 2017 PMID: 28469446 PMCID: PMC5392111 DOI: 10.1177/1178630217699399
Source DB: PubMed Journal: Environ Health Insights ISSN: 1178-6302
Figure 1.A schematic showing the Ambrosia life cycle.
Figure 2.Correlation of the model-predicted pollen concentrations with observed validation data for 2013. Plotted based on equation (1), using data from Howard and Levetin [14] and Rienecker et al.[16]
Figure 3.Annual pollen data through 1986 to 1988.
Figure 4.Averaged 1986–2014 annual pollen data.
Variable names, abbreviations and units.
| Variable | Description | Units |
|---|---|---|
| EFLUX | latent heat flux(positive upward) |
|
| EVAP | Surface evaporation |
|
| HFLUX | Sensible heat flux (positive upward) |
|
| TAUX | Eastward Surface wind stress |
|
| TAUY | Northward Surface wind stress |
|
| TAUGWX | Eastward gravity wave surface stress |
|
| TAUGWY | Northward gravity wave surface stress |
|
| PBLH | Planetary boundary layer height | m |
| DISPH | Displacement height | m |
| BSTAR | Surface buoyancy scale |
|
| USTAR | Surface velocity scale |
|
| TSTAR | Surface temperature scale | K |
| QSTAR | Surface humidity scale | kg |
| RI | Surface Richardson number | non-dimensional |
| ZOH | Roughness length, sensible heat | m |
| ZOM | Roughness length, momentum | m |
| HLML | Height of center of lowest model layer | m |
| TLML | Temperature of lowest model layer | m |
| QLML | Specific humidity of lowest model layer | kg |
| ULML | Eastward wind of lowest model layer |
|
| VLML | Northward wind of lowest model layer |
|
| RHOA | Surface air density |
|
| SPEED | Three-dimensional wind speed for surface fluxes |
|
| CDH | Surface exchange coefficient for heat |
|
| CDQ | Surface exchange coefficient for moisture |
|
| CDM | Surface exchange coefficient for momentum |
|
| CN | Surface neutral drag coefficient | non-dimensional |
| TSH | Effective turbulence skin temperature | K |
| QSH | Effective turbulence skin humidity | kg |
| FRSEAICE | Fraction of sea-ice | Fraction |
| PRECANV | Surface precipitation flux from anvils |
|
| PRECCON | Surface precipitation flux from convection |
|
| PRECLSC | Surface precipitation flux from large-scale |
|
| PRECSNO | Surface snowfall flux |
|
| PRECTOT | Total surface precipitation flux |
|
| PGENTOT | Total generation of precipitation |
|
| PREVTOT | Total re-evaporation of precipitation |
|
| GRN | Vegetation greenness fraction | Fraction |
| LAI | Leaf area index |
|
| GWETROOT | Root zone soil wetness | fraction |
| GWETTOP | Top soil layer wetness | fraction |
| TPSNOW | Top snow layer temperature | K |
| TUNST | Surface temperature of unsaturated zone | K |
| TSAT | Surface temperature of saturated zone | K |
| TWLT | Surface temperature of wilted zone | K |
| PRECSNO | Surface snowfall |
|
| PRECTOT | Total surface precipitation |
|
| SNOMAS | Snow mass |
|
| SNODP | Snow depth | m |
| EVPSOIL | Bare soil evaporation |
|
| EVPTRNS | Transpiration |
|
| EVPINTR | Interception loss |
|
| EVPSBLN | Sublimation |
|
| RUNOFF | Overland runoff |
|
| BASEFLOW | Baseflow |
|
| SMLAND | Snowmelt |
|
| FRUNST | Fractional unsaturated area | fraction |
| FRSAT | Fractional saturated area | fraction |
| FRSNO | Fractional snow-covered area | fraction |
| FRWLT | Fractional wilting area | fraction |
| PARDF | Surface downward PAR diffuse flux |
|
| PARDR | Surface downward PAR beam flux |
|
| SHLAND | Sensible heat flux from land |
|
| LHLAND | Latent heat flux from land |
|
| EVLAND | Evaporation from land |
|
| LWLAND | Net downward longwave flux over land |
|
| SWLAND | Net downward shortwave flux over land |
|
| GHLAND | Downward heat flux at base of top soil layer |
|
| TWLAND | Total water store in land reservoirs |
|
| TELAND | Energy store in all land reservoirs |
|
| WCHANGE | Total land water change per unit time |
|
| ECHANGE | Total land energy change per unit time |
|
| SPLAND | Spurious land energy source |
|
| SPWATR | Spurious land water source |
|
| SPSNOW | Spurious snow source |
|
| PM2.5 | Airborne Particulate |
|
| Soil | Soil type | non-dimensional |
| Lithology | Lithology | non-dimensional |
| Topography | Topography | m |
| PopulationDensity | Population density | |
| Type | Surface type | non-dimensional |
| AlbedoWSABand1 | Surface reflectivity at 470 nm | non-dimensional |
| AlbedoWSABand2 | Surface reflectivity at 555 nm | non-dimensional |
| AlbedoWSABand3 | Surface reflectivity at 670 nm | non-dimensional |
| AlbedoWSABand4 | Surface reflectivity at 858 nm | non-dimensional |
| AlbedoWSABand5 | Surface reflectivity at 1,240 nm | non-dimensional |
| AlbedoWSABand6 | Surface reflectivity at 1,640 nm | non-dimensional |
| AlbedoWSABand7 | Surface reflectivity at 2,130 nm | non-dimensional |
Figure 6.Descriptions for the random forest result. (a), (b) Verification scatter diagrams, with the x-axis showing the observed amount of pollen and the y-axis showing the estimated amount of pollen, while the error bars show the estimated uncertainty. We note that these estimates do not require the phenology to be specified. In (a) we show the scatter diagram for the first iteration and in (b) we show the much improved scatter diagram after the last iteration. (c) The relative importance of the 20 most important variables for estimating the pollen count. (d) Histogram of the residuals between the observed and estimated pollen counts. (e) Variation of the error as a function of the number of trees in the random forest. (f) The correlation coefficient for the training and independent validation datasets as a function of iteration.
Correlation coefficients for the various machine learning approaches used in this study, with the best performing approach listed first. Here is the correlation coefficient for the training dataset and is the correlation coefficient for the totally independent validation dataset.
| Machine learning approach without phenology | Correlation coefficient | |
|---|---|---|
| Training, | Validation, | |
| Random forest | 1 | 0.98 |
| NN | 0.91 | 0.61 |
| LASSO | 0.53 | 0.56 |
| Prior multi-linear study with phenology | 0.68 | |
Figure 5.Schematic of a random forest. A random forest is an ensemble of decision trees.
Figure 7.Schematic of a single hidden layer, feed-forward NN. Each arrow corresponds to a real-valued parameter, or a weight, of the network. The values of these parameters are tuned in the network training. Here b are the biases, w are the weights, and σ is the activation function.
Figure 8.Scatter diagram for the airborne pollen estimates made using a NN.
Figure 9.Scatter diagram for the airborne pollen estimates made using the LASSO approach.