Ilan Havinga1, Diego Marcos2, Patrick W Bogaart3, Lars Hein4, Devis Tuia2,5. 1. Environmental Systems Analysis Group, Wageningen University, Wageningen, 6708 PB, The Netherlands. ilan.havinga@wur.nl. 2. Laboratory of Geo-Information Science and Remote Sensing, Wageningen University, Wageningen, 6708 PB, The Netherlands. 3. National Accounts Department, Statistics Netherlands, The Hague, 2492 JP, The Netherlands. 4. Environmental Systems Analysis Group, Wageningen University, Wageningen, 6708 PB, The Netherlands. 5. Environmental Computational Science and Earth Observation Laboratory, Ecole Polytechnique Fédérale de Lausanne, Industrie 17, Sion, Switzerland.
Abstract
Peoples' recreation and well-being are closely related to their aesthetic enjoyment of the landscape. Ecosystem service (ES) assessments record the aesthetic contributions of landscapes to peoples' well-being in support of sustainable policy goals. However, the survey methods available to measure these contributions restrict modelling at large scales. As a result, most studies rely on environmental indicator models but these do not incorporate peoples' actual use of the landscape. Now, social media has emerged as a rich new source of information to understand human-nature interactions while advances in deep learning have enabled large-scale analysis of the imagery uploaded to these platforms. In this study, we test the accuracy of Flickr and deep learning-based models of landscape quality using a crowdsourced survey in Great Britain. We find that this novel modelling approach generates a strong and comparable level of accuracy versus an indicator model and, in combination, captures additional aesthetic information. At the same time, social media provides a direct measure of individuals' aesthetic enjoyment, a point of view inaccessible to indicator models, as well as a greater independence of the scale of measurement and insights into how peoples' appreciation of the landscape changes over time. Our results show how social media and deep learning can support significant advances in modelling the aesthetic contributions of ecosystems for ES assessments.
Peoples' recreation and well-being are closely related to their aesthetic enjoyment of the landscape. Ecosystem service (ES) assessments record the aesthetic contributions of landscapes to peoples' well-being in support of sustainable policy goals. However, the survey methods available to measure these contributions restrict modelling at large scales. As a result, most studies rely on environmental indicator models but these do not incorporate peoples' actual use of the landscape. Now, social media has emerged as a rich new source of information to understand human-nature interactions while advances in deep learning have enabled large-scale analysis of the imagery uploaded to these platforms. In this study, we test the accuracy of Flickr and deep learning-based models of landscape quality using a crowdsourced survey in Great Britain. We find that this novel modelling approach generates a strong and comparable level of accuracy versus an indicator model and, in combination, captures additional aesthetic information. At the same time, social media provides a direct measure of individuals' aesthetic enjoyment, a point of view inaccessible to indicator models, as well as a greater independence of the scale of measurement and insights into how peoples' appreciation of the landscape changes over time. Our results show how social media and deep learning can support significant advances in modelling the aesthetic contributions of ecosystems for ES assessments.
Scenicness predictions using Flickr images and deep learning. An example of a Flickr and deep learning-based prediction for a single 5 5 km grid cell is shown in Fig. 1. Individual Flickr images (Fig. 1a) are passed through the Places365-ResNet-50 model to generate a grid cell mean for 365 scene classes (Fig. 1b) and 102 SUN image attributes scores (Fig. 1c), while image scenicness scores generated by the SoN ResNet are used to produce a normalised rating distribution between 1 and 10 (Fig. 1d). The scene class and image attribute scores show that, on average, the Places365-ResNet-50 model scored the images in the grid cell the highest for the “lagoon”, “tundra” and “islet’ scenes, and the lowest for “atrium”, “shopping mall” and “living room”. In terms of attributes, the images were scored the highest for “natural light”, “open area” and “natural” while “enclosed area”, “praying” and “indoor lighting” received the lowest scores. A full list of image attribute and scene classes is available in Supplementary Tables S1 and S2 online. The normalised rating distribution shows that most images were rated 7 and above by the SoN ResNet. The predictions produced by the two deep learning models were then used as individual variables in a random forest model which predicted a final scenicness score of 6.9 for the grid cell (Fig. 1e).
Scenicness model accuracy results on the gridded test set at 5 km resolution, derived from the SoN database.Observed scenicness versus the spatial predictions generated by the best-performing Flickr model and indicator model at 500 m resolution in (a) the Greater London area and (b) the Lake District national park. The arrows within the Greater London indicator model map point to Heathrow Airport (left) and Richmond Park (right). The observed versus predicted grid cell values are shown in Supplementary Fig. S1 online. Drawn using R 3.6.3 (https://www.r-project.org/) with the ggplot2 3.3.5 (https://ggplot2.tidyverse.org) and cowplot 1.1.0 (https://cran.r-project.org/package=cowplot) packages.Comparison of Flickr, environmental indicator and combined models. The accuracy of the random forest models using the Flickr and deep learning-based variables, environmental indicators, and different combinations of the two, within a 20% hold-out test area are show in Table 1. Accuracy is reported using , root mean squared error (RMSE) and Kendall’s , a ranking correlation coefficient between − 1 (inverse correlation) and 1 (absolute correlation). Using Kendall’s to rank the models, the best-performing Flickr model used the Places365 scene classes and SUN attributes as variables. The model achieved a of 0.683 versus 0.730 achieved by the indicator model. Model performance was maximised when the environmental indicator variables and the scenic rating distribution were combined, producing a of 0.739.
Table 1
Scenicness model accuracy results on the gridded test set at 5 km resolution, derived from the SoN database.
Most important variables for (a) the best-performing Flickr model, (b) environmental indicator model, and (c) best-performing combination model at 5 km resolution. “(s)” denotes an ecosystem in surrounding area variable. Supplementary Table S3 online contains a full list of ecosystem type codes and class descriptions. Drawn using R 3.6.3 (https://www.r-project.org/) and the ggplot2 3.3.5 (https://ggplot2.tidyverse.org) package.The spatial predictions generated by the best-performing Flickr model and indicator model for the whole of Great Britain at 5 km grid cell resolution are shown in Fig. 1f. The two model types produced very similar spatial predictions. Areas of particularly high aesthetic value are captured well by both models, such as Snowdonia National Park in Wales, the Lake District in England and the Scottish Highlands. Similarly, urban areas of less scenic quality such as London in England and Glasgow in Scotland, are also clearly visible. In Fig. 2, a more detailed comparison is shown of the model predictions at 500m resolution versus the observed values. In both the Greater London area (Fig. 2a) and in the Lake District (Fig. 2b), we see more nuanced predictions using the Flickr model, while the indicator model produces more extreme values and sharp boundaries. For example, in Greater London, Richmond Park and Heathrow Airport are predicted as very scenic areas in contrast to some of the neighbouring areas by the indicator model, while the predictions of the Flickr model are much more muted and in line with the observed values. In the Lake District, we also see more extreme values in the unscenic areas using the indicator model, while the Flickr model behaves again in a more conservative manner. Overall, the Flickr model predictions in both areas show more consistency with the observed values, although the least scenic areas in the Lake District are less visible.
Figure 2
Observed scenicness versus the spatial predictions generated by the best-performing Flickr model and indicator model at 500 m resolution in (a) the Greater London area and (b) the Lake District national park. The arrows within the Greater London indicator model map point to Heathrow Airport (left) and Richmond Park (right). The observed versus predicted grid cell values are shown in Supplementary Fig. S1 online. Drawn using R 3.6.3 (https://www.r-project.org/) with the ggplot2 3.3.5 (https://ggplot2.tidyverse.org) and cowplot 1.1.0 (https://cran.r-project.org/package=cowplot) packages.
Variable importance for the Flickr, environmental indicator and combination models at 5 km resolution are shown in Fig. 3. The best-performing Flickr model, which used the Places365 scene classes and SUN attributes as variables, mainly drew on “climbing” and “rugged scene” in making its predictions. Natural scenes and attributes closely related to landscape aesthetics were also prominent such as “valley”, “mountain” and “natural”, as well as other recreation-related attributes such as “hiking”. The indicator model relied heavily on the presence of arable land and market gardens (I1), relief, and the presence of buildings (J1 and J2) to generate a scenicness prediction. This was followed by the presence of natural ecosystems, including grasslands (E2), mires/bogs (D1), heathland (F3s, F4s and F3), and inland scree/bare surfaces (H3s). The complexity indices SDI and PDI did not constitute important variables. The best-performing combined model, incorporating the scenic rating distribution (model 13, Table 1), drew on a similar set of indicator variables and the more extreme scenic ratings, focusing on the distributions across rating bins 2, 3, 7 and 8.
Figure 3
Most important variables for (a) the best-performing Flickr model, (b) environmental indicator model, and (c) best-performing combination model at 5 km resolution. “(s)” denotes an ecosystem in surrounding area variable. Supplementary Table S3 online contains a full list of ecosystem type codes and class descriptions. Drawn using R 3.6.3 (https://www.r-project.org/) and the ggplot2 3.3.5 (https://ggplot2.tidyverse.org) package.
Limiting Flickr user activity. For ES modelling purposes at national level, it is important to capture a representative measure of ecosystem contributions to human well-being. In the case of the Flickr models, accuracy results are reported after limiting individual Flickr users to one image per day per km grid cell. We applied the limitation after finding large geographic disparities in images per user (Supplementary Fig. S2 online). After applying the limitation, model accuracy improved versus a non-filtered dataset (Supplementary Table S4 online). Figure 4 shows the largest resulting change in image attribute confidence scores. A key change that can be observed is a decrease in the prevalence of images related to sporting. For example, “playing”, “competing”, “sports”, and “exercise” all saw notable decreases. This suggests that a large number of images associated with sporting events, less relevant for measuring landscape aesthetics, were removed from the dataset by the filtering. This in turn appears to have increased the prevalence of landscape-focused imagery, indicated by the increase in confidence scores for the “clouds”, “far-away horizon”, “ocean” and “natural” attributes.
Figure 4
The largest differences in image attribute scores after limiting Flickr user contributions. To calculate the difference, a single image per user per day per grid cell was randomly selected ten times and the mean attribute scores were calculated per grid cell. The median difference versus the unfiltered dataset is shown here with summary statistics available in Supplementary Table S5 online. Drawn using R 3.6.3 (https://www.r-project.org/) and the ggplot2 3.3.5 (https://ggplot2.tidyverse.org) package.
The largest differences in image attribute scores after limiting Flickr user contributions. To calculate the difference, a single image per user per day per grid cell was randomly selected ten times and the mean attribute scores were calculated per grid cell. The median difference versus the unfiltered dataset is shown here with summary statistics available in Supplementary Table S5 online. Drawn using R 3.6.3 (https://www.r-project.org/) and the ggplot2 3.3.5 (https://ggplot2.tidyverse.org) package.Measuring changes in aesthetic enjoyment over time. Deep learning-based variables generated using social media can also support measures of landscape aesthetics over time. This can support more frequent updates to national ES assessments, and tell us more about how the landscape is contributing to peoples’ well-being. In an additional experiment aiming at studying the temporal dynamics of peoples’ aesthetic enjoyment through their interactions with the landscape, we analysed how scenicness evolves over time in national park areas. Figure 5 shows the contributions of a selected group of image attributes over a ten year period within the 15 national parks of Great Britain. These contain some of the most valuable natural areas in Britain, such as the Peak and Lake Districts in England, the Pembrokeshire coast in Wales, and the Cairngorms in Scotland.
Figure 5
The influence of “snow” and other image attributes on aesthetic enjoyment over time. The (a) average monthly prevalence of “snow” attribute scores are shown between 2009 and 2019. The average monthly scores show a strong correlation (Pearson’s ) with (b) remotely-sensed snow cover data. The (c) average prevalence per weekday of “snow” versus “asphalt” is also shown. A constrained linear model () was trained using the scores of a selected group of attributes including “snow” to (d) predict scenicness at the image-level. The changing monthly contributions of some of these attributes can be observed, with “snow” contributing in the winter months. Contributions represent individual image attribute scores multiplied by the model coefficients and averaged on a monthly basis. Drawn using R 3.6.3 (https://www.r-project.org/) with the ggplot2 3.3.5 (https://ggplot2.tidyverse.org) and cowplot 1.1.0 (https://cran.r-project.org/package=cowplot) packages.
The contribution of aesthetic-related image attributes change in these national parks according to the season. We focus on the “snow” attribute as a specific example of how these contributions change over time. Figure 5a shows how the prevalence of “snow”, the average score accounting only for images with a score higher than 0.5, increases in the winter months. The winter of 2009/2010 reveals itself as a particularly snowy period. The prevalence of snow in user images correlates strongly with remote sensing-based measurements of snow cover using MODIS satellite data, shown in Fig. 5b. In Fig. 5c, we also see how the prevalence of “snow” increases around the weekend when people are more likely to visit snowy landscapes, whilst the prevalence of “asphalt” in images remains relatively constant throughout the week. This shows that the use of social media-based data provides a combination of information about the state of the environment and how people interact with it.In a direct connection to aesthetic landscape quality, when the selected group of image attributes shown in Fig. 5d, including “snow”, are used to predict the image ratings generated by the SoN ResNet, we see again how the contributions change over time. For example, the contributions of “snow” appear between December and April, reaching a peak in the winter month of February, before disappearing again. In contrast, the contributions of “vegetation” grow to their highest between June and August, reflecting the positive influence of deciduous growth on landscape aesthetics in the summer. Although smaller in size, the contributions of “ocean” also grow in the summer, suggesting an increase in user posts of coastal images to Flickr in these warmer months. It is also notable that the contribution of “rugged scene” to scenicness increases in the rainy months of spring.The influence of “snow” and other image attributes on aesthetic enjoyment over time. The (a) average monthly prevalence of “snow” attribute scores are shown between 2009 and 2019. The average monthly scores show a strong correlation (Pearson’s ) with (b) remotely-sensed snow cover data. The (c) average prevalence per weekday of “snow” versus “asphalt” is also shown. A constrained linear model () was trained using the scores of a selected group of attributes including “snow” to (d) predict scenicness at the image-level. The changing monthly contributions of some of these attributes can be observed, with “snow” contributing in the winter months. Contributions represent individual image attribute scores multiplied by the model coefficients and averaged on a monthly basis. Drawn using R 3.6.3 (https://www.r-project.org/) with the ggplot2 3.3.5 (https://ggplot2.tidyverse.org) and cowplot 1.1.0 (https://cran.r-project.org/package=cowplot) packages.
Discussion
The potential of social media and deep learning to capture peoples’ interactions with the landscape has yet to be fully confirmed. In an ES context, social media provides a rich new source of data to capture the cultural contributions of ecosystems to human well-being but its use is rarely validated[46]. In the ES community, deep learning applications also remain limited and those that do exist tend to limit their analysis to using the objects detected in images as proxies for cultural ES[25,36,50]. We have demonstrated that deep learning-based variables which consider the overall semantic meaning of an image can accurately capture the aesthetic quality of the British landscape. Crucially, these techniques also incorporate peoples’ actual interactions with the environment, a key methodological requirement from an ES perspective.Nevertheless, our study highlights the relevance of traditional environmental indicator models in capturing landscape quality in the absence of survey data. The visual concepts put forward in the landscape aesthetics literature serve well to capture the spatial variation in scenicness provided by the SoN database. The especially strong influence of unnatural, man-made environments on aesthetics is reflected in the high variable importance of arable land and buildings[51]. At the same time, the importance of highly valued and unique natural environments, such as bog and heathland ecosystems, as well as the importance of relief, are also accurately identified by the random forest model[52-54]. Surprisingly, the SDI and PDI, normally key indicators for measuring landscape aesthetics[55] and relevant to Britain[56], did not constitute important variables in our results. The variety of ecosystem type indicators and their interaction in the non-linear model space may have offered enough opportunities to capture landscape complexity[57]. Alternatively, visibility modelling of the landscape could produce a more accurate set of indicators[21,58,59]. Theoretically, these could capture more of the aesthetic quality of the landscape by providing a 3D perspective using the location of Flickr images. However, the challenge with visibility modelling at very large scales is the computational resources needed for the geo-spatial calculations[60]. For example, in our case, the sightlines from 9.8 million images would need to be calculated using a m Digital Elevation Model (DEM) for a 210,000 area. On the other hand, in the case of our Flickr model, the presence of image attributes including “far-away horizon” and scene classes such as “mountain” give the model a lot of indirect information on the 3D characteristics of an area.The inclusion of individual spatial interactions offered by the Flickr and deep learning-based approach also makes it a more attractive method for ES modelling purposes. The comparable model accuracy versus the indicator model shows that this key methodological requirement from an ES perspective can be incorporated without significant losses in accuracy. The results also show that this individual perspective produces a finer-grained view which captures highly-valued and unique landscape elements such as rock or water features[18]. For example, the highly aesthetic view of Achmelvich Bay in Scotland, shown in Fig. 1. This is in contrast to the indicator model, which uses variables measured with remote sensing data at 25 m resolution and above. At the same time, important negative environmental contexts, such as Heathrow Airport in London (Fig. 2), are also better captured by the Flickr model. Figure 2 also shows how the Flickr model stays relevant at different scales while simultaneously highlighting the scaling issues common to indicator models[61]. While the indicator model is heavily constrained by the scale of measurement, producing more extreme differences linked to land cover, the Flickr model is able to reproduce a more consistent view of the landscape using the images available to it (see also Supplementary Fig. S1 online). At a national level, it appears that explicitly capturing this more nuanced view of the landscape through the scenic rating distribution, in combination with the strong overall predictive power of the indicator model, produces the highest level of model accuracy in our study.In contrast to the static nature of the indicator approach, the granularity of the Flickr data also enables a detailed examination of aesthetics over time. The time-series analysis illustrated in Fig. 5 shows how the aesthetic contributions of landscapes change over the course of a year in the national parks of Britain. The influence of seasonality on landscape quality, defined as ‘ephemera’ in the landscape aesthetics literature[17], is notably captured. Such granularity can greatly benefit ES assessments requiring regular updates, such as those performed for the purposes of ecosystem accounting in the context of national annual accounts of economic production[8]. These results also show how the contributions of specific landscape characteristics to peoples’ aesthetic enjoyment can be accurately captured using a social media and deep learning-based approach. The large prevalence of snow in images during the 2009/2010 winter is consistent with one of the last great snowfall events in Britain[62]. The consistency with remote sensing data further supports the reliability of the data. Understanding how ecosystems in the landscape contribute to individuals’ aesthetic enjoyment of the landscape, and accurately tracking these contributions over time, can help policy-makers manage and protect the most valuable natural areas for peoples’ recreation and well-being.Although the Flickr and deep learning approach has its advantages, some biases in the method should still be taken into account. By using the SoN database for training purposes, the models have largely learnt a British representation of aesthetic quality. For applications in other cultural and topographical contexts, additional fine-tuning will most likely be required. Challenges also lie in trying to gain an ES measure demographically-representative of the entire population. Flickr has been found to be the most popular with 40 to 60 year-old males[63] and user contributions, as in our study, are usually skewed by small, highly active user groups[64]. At the same time, a great number of differences in the content of images exist and not all images are relevant for measuring landscape aesthetics. However, in this respect, the user limitation in our study appears to have shifted the overall image content away from sporting scenes and more towards landscape images, improving model accuracy versus the SoN database. Notably, the agreement between the Flickr-based models, SoN and the environmental indicators shows that there is a strong consistency between the preferences captured by each dataset. This consistency is also promising for applications in other European contexts as the aesthetic concepts used to develop the environmental indicators have already been successfully applied in a number of European settings[65].In conclusion, landscape aesthetics are an important source of cultural value but large-scale measurement for ES assessments is difficult due to a lack of survey data. Now, social media offers the opportunity to measure the aesthetic contributions of ecosystems whilst integrating peoples’ actual interactions with the environment, and tracking changes over time. In this study, we have demonstrated that models using Flickr images and deep learning enable a highly accurate measure of aesthetic landscape quality, with independence of the scale of measurement. This supports ES measures based on the revealed preferences of individuals rather than a set of broad theoretical concepts. Small gains in accuracy are also achieved when an explicit, deep learning-based measure of aesthetics in the form of an image rating distribution is combined with environmental indicator variables. Changes in the aesthetic contributions of landscapes over time can also be measured. Our results advance ES modelling to better capture the cultural contributions of nature to human well-being.
Methods
Study design. The research focused on comparing Flickr and deep learning-based models with an environmental indicator-based model, as well as different combinations of the two (Fig. 6). Conceptually, we considered the aesthetic quality of the landscape equivalent to the concept of scenicness, and that scenicness constituted an integral factor determining the overall flow of aesthetic ES[66]. We made our comparisons using a km grid covering the entire terrestrial area of Great Britain and at 500 m resolution in Greater London and the Lake District. As a ground truth, we calculated a mean scenicness rating per grid cell using the image scenicness ratings of intersecting SoN images. Each image has a collection of volunteer ratings between 1 (not scenic) and 10 (very scenic). We used the average of these ratings. For training, we used the 5 km 5 km grid. To reduce spatial autocorrelation, a larger 50 50 km grid was then overlaid onto this grid to create sample groups of which 70% was randomly allocated for training, 10% for validation and 20% for testing (Supplementary Fig. S3 online). Random forests was used to model scenicness at the 5 km and 500 m grid level using both the environmental indicator and Flickr-based variables. All spatial analyses were done using the R 3.6.3 programming language (https://www.r-project.org/) including the raster 3.0–12 (https://cran.r-project.org/package=raster), sf 1.0-1 (https://cran.r-project.org/package=sf), caret 6.0–86 (https://cran.r-project.org/package=caret) and tidyverse 1.3.1 (https://cran.r-project.org/package=tidyverse) packages. caret was used to automatically select the random forest hyperparameter settings mtry, min node size, and extratrees.
Authors: Terry C Daniel; Andreas Muhar; Arne Arnberger; Olivier Aznar; James W Boyd; Kai M A Chan; Robert Costanza; Thomas Elmqvist; Courtney G Flint; Paul H Gobster; Adrienne Grêt-Regamey; Rebecca Lave; Susanne Muhar; Marianne Penker; Robert G Ribe; Thomas Schauppenlehner; Thomas Sikor; Ihor Soloviy; Marja Spierenburg; Karolina Taczanowska; Jordan Tam; Andreas von der Dunk Journal: Proc Natl Acad Sci U S A Date: 2012-05-21 Impact factor: 11.205
Authors: Nikhil Naik; Scott Duke Kominers; Ramesh Raskar; Edward L Glaeser; César A Hidalgo Journal: Proc Natl Acad Sci U S A Date: 2017-07-06 Impact factor: 11.205
Authors: Sandra Díaz; Unai Pascual; Marie Stenseke; Berta Martín-López; Robert T Watson; Zsolt Molnár; Rosemary Hill; Kai M A Chan; Ivar A Baste; Kate A Brauman; Stephen Polasky; Andrew Church; Mark Lonsdale; Anne Larigauderie; Paul W Leadley; Alexander P E van Oudenhoven; Felice van der Plaat; Matthias Schröter; Sandra Lavorel; Yildiz Aumeeruddy-Thomas; Elena Bukvareva; Kirsten Davies; Sebsebe Demissew; Gunay Erpul; Pierre Failler; Carlos A Guerra; Chad L Hewitt; Hans Keune; Sarah Lindley; Yoshihisa Shirayama Journal: Science Date: 2018-01-19 Impact factor: 47.728
Authors: Maxime Lenormand; Sandra Luque; Johannes Langemeyer; Patrizia Tenerelli; Grazia Zulian; Inge Aalders; Serban Chivulescu; Pedro Clemente; Jan Dick; Jiska van Dijk; Michiel van Eupen; Relu C Giuca; Leena Kopperoinen; Eszter Lellei-Kovács; Michael Leone; Juraj Lieskovský; Uta Schirpke; Alison C Smith; Ulrike Tappeiner; Helen Woods Journal: PLoS One Date: 2018-11-01 Impact factor: 3.240