| Literature DB >> 35572742 |
Robert McLeman1, Clara Grieg1, George Heath1, Colin Robertson1.
Abstract
Machine learning techniques have to date not been widely used in population-environment research, but represent a promising tool for identifying relationships between environmental variables and population outcomes. They may be particularly useful for instances where the nature of the relationship is not obvious or not easily detected using other methods, or where the relationship potentially varies across spatial scales within a given study unit. Machine learning techniques may also help the researcher identify the relative strength of influence of specific variables within a larger set of interacting ones, and so provide a useful methodological approach for exploratory research. In this study, we use machine learning techniques in the form of random forest and regression tree analyses to look for possible connections between drought and rural population loss on the North American Great Plains between 1970 and 2020. In doing so, we analyzed four decades of population count data (at county-size spatial scales), monthly climate data, and Palmer Drought Severity Index scores for Canada and the USA at multiple spatial scales (regional, sub-regional, national, and county/census division levels), along with county level irrigation data. We found that in some parts of Saskatchewan and the Dakotas - particularly those areas that fall within more temperate/less arid ecological sub-regions - drought conditions in the middle years of the 1970s had a significant association with rural population losses. A similar but weaker association was identified in a small cluster of North Dakota counties in the 1990s. Our models detected few links between drought and rural population loss in other decades or in other parts of the Great Plains. Based on R-squared results, models for US portions of the Plains generally exhibited stronger drought-population loss associations than did Canadian portions, and temperate ecological sub-regions exhibited stronger associations than did more arid sub-regions. Irrigation rates showed no significant influence on population loss. This article focuses on describing the methodological steps, considerations, and benefits of employing this type of machine learning approach to investigating connections between drought and rural population change. Supplementary Information: The online version contains supplementary material available at 10.1007/s11111-022-00399-9.Entities:
Keywords: Drought; Machine learning analysis; North American Great Plains
Year: 2022 PMID: 35572742 PMCID: PMC9085368 DOI: 10.1007/s11111-022-00399-9
Source DB: PubMed Journal: Popul Environ ISSN: 0199-0039
Fig. 1Study area: Great Plains and ecological sub-regions, based on Omernik and Griffith (2014)
County-level Rural–Urban Continuum Codes used by the US Department of Agriculture
| 1 | Metro counties with urban population > 1,000,000 |
| 2 | Metro counties with urban population 250,000 < × < 1,000,000 |
| 3 | Metro counties with urban population < 250,000 |
| 4 | Non-metro counties with urban population > 20,000 and adjacent to a metro county |
| 5 | Non-metro counties with urban population > 20,000 and not adjacent |
| 6 | Non-metro counties with urban population 2,500 < × < 20,000 and adjacent |
| 7 | Non-metro counties with urban population 2,500 < × < 20,000 and not adjacent |
| 8 | Non-metro counties with urban population < 2,500 and adjacent |
| 9 | Non-metro counties with urban population < 2,500 and not adjacent |
List of models, with random forest models returning R2 values > 0.25 in bold
| 1. Entire Great Plains | 1981–1990 | |||
| 2. Great Plains − Canadian portion | 1981–1990 | 1991–2000 | 2001–2010 | |
| 3. Great Plains − US portion | 1971–1980 | 1981–1990 | 1991–2000 | 2001–2010 |
| 4. Temperate ecoregion | 1981–1990 | |||
| 5. Temperate ecoregion − Canadian portion | 1981–1990 | 1991–2000 | 2001–2010 | |
| 6. Temperate ecoregion − US portion | 1981–1990 | 2001–2010 | ||
| 7. West-central semiarid ecoregion | 1981–1990 | 1991–2000 | 2001–2010 | |
| 8. West-central semiarid ecoregion − Canadian portion | 1971–1980 | 1981–1990 | 1991–2000 | 2001–2010 |
| 9. West-central semiarid ecoregion − US portion | 1971–1980 | 1981–1990 | 1991–2000 | 2001–2010 |
| 10. South-central semiarid ecoregion | 1971–1980 | 1981–1990 | 1991–2000 | 2001–2010 |
Decadal random forest models returning R values > 0.25 are shown in bold text
Fig. 2Sample regression tree output: splitting of county-level rural population loss data for the temperate ecoregion of the USA between 1970 and 1980, using summer precipitation for 1974. Explanatory note: the figure depicts regression tree output for counties in the temperate ecoregion of the US Great Plains for the decade 1970 to 1980. Node 1 (the root node) shows the average rural county-level population change during the decade (i.e., − 6%), the sample size (i.e., n = 153 counties), and the percent of the sample accounted for at this node (i.e., 100%). The sum of precipitation between April and August in 1974 (i.e., PrSum_04_08_1974) is used to split the data in node 1 using ANOVA into leaf nodes 2 and 3. The key threshold for calculating the split was 316.731 mm, with 24 of the 153 counties being sent to node 2, which then has an average population loss of 12%. The remaining 129 counties from the dataset are sent to node 3, which then has an average population loss of 4.8%. The boxplots illustrate variations within each node and between nodes 2 and 3
Results of random forest/regression tree analysis for Great Plains, both countries
| Entire Great Plains region | ||||
| 1981 to 1991 | 0.19 | 0.19 | Pr1987 (20%), Pr1981 (14%), Pr1986 (14%), HD1983 (14%), Pr1984 (13%), PDSI1988 (11%), pfirg1990 (4%), DPI1988 (3%), Pr1985 (3%), HD1982 (2%), DPI1987 (1%), HD1987 (1%) | |
| West-central semiarid ecoregion | ||||
| 1981 to 1991 | 0.08 | 0.17 | pfirg1990 (27%), DPI1988 (20%), DPI1984 (9%), Pr1988 (9%), Pr1981 (9%), Pr1984 (8%), DPI1983 (6%), HD1987 (3%), HD1988 (3%), HD1990 (3%), Pr1989 (3%) | |
| 1991 to 2001 | 0.20 | 0.19 | Pr1997 (19%), HD1991 (18%), HD1994 (17%), HD1995 (16%), Pr1992 (16%), HD1998 (15%) | |
| 2001 to 2011 | 0.18 | 0.17 | DPI2003 (20%), PDSI2003 (18%), HD2008 (16%), HD2005 (15%), HD2010 (15%), HD2003 (15%) | |
| Temperate ecoregion | ||||
| 1981 to 1991 | 0.20 | 0.23 | Pr1987 (18%), Pr1985 (16%), Pr1986 (15%), HD1986 (14%), Pr1982 (14%), Pr1981 (14%), PDSI1983 (5%), DPI1987 (2%), HD1987 (1%), DPI1989 (1%) | |
Results of random forest/regression tree analysis for US portion of Great Plains
| Entire US Great Plains region | 1971–1981 | 0.14 | 0.31 | Pr1973 (19%), Pr1972 (14%), Pr1975 (10%), Pr1977 (9%), Pr1978 (9%), Pr1974 (8%), Pr1976 (8%), PDSI1978 (6%), DPI1974 (4%), DPI1979 (4%), PDSI1972 (4%), PDSI1975 (1%) |
| 1981–1991 | 0.14 | 0.15 | Pr1987 (17%), Pr1981 (15%), Pr1982 (14%), Pr1985 (11%), Pr1983 (10%), DPI1990 (10%), DPI1988 (8%), DPI1987 (7%), DPI1984 (3%), HD1987 (2%), pfirgchng19811991 (1%), PDSI1984 (1%) | |
| 1991–2001 | 0.20 | 0.21 | Pr1997 (22%), Pr1994 (14%), Pr1992 (12%), Pr1999 (12%), Pr1996 (11%), Pr2000 (8%), Pr1995 (8%), Pr1998 (5%), pfirg2000 (5%), Pr1991 (4%) | |
| 2001–2011 | 0.11 | 0.10 | Pr2004 (24%), Pr2002 (16%), Pr2007 (16%), Pr2006 (16%), Pr2003 (15%), Pr2001 (14%) | |
| US Northwest semi-arid ecoregion | 1971–1981 | 0.13 | 0.35 | Pr1976 (28%), pfirg1980 (18%), HD1971 (15%), DPI1976 (14%), PDSI1978 (8%), PDSI1972 (5%), pfirgchng19711981 (5%), DPI1978 (4%), PDSI1974 (3%) |
| 1981–1991 | 0 | 0.12 | PDSI1990 (43%), DPI1990 (19%), HD1982 (17%), HD1983 (12%), HD1987 (10%) | |
| 1991–2001 | 0 | 0.28 | DPI1991 (20%), HD1998 (18%), PDSI1991 (18%), HD1994 (11%), HD1995 (11%), HD1999 (10%), PDSI1992 (6%), DPI2000 (5%), pfirg2000 (3%) | |
| 2001–2011 | 0 | 0.18 | DPI2001 (30%), PDSI2001 (20%), DPI2005 (14%), HD2003 (12%), HD2004 (12%), HD2008 (12%) | |
| US South semi-arid ecoregion | 1971–1981 | − 0.17 | 0.20 | pfirgchng19711981 (23%), pfirg1980 (23%), DPI1974 (16%), Pr1977 (7%), HD1974 (6%), DPI1977 (5%), Pr1976 (5%), HD1977 (4%), PDSI1978 (4%), Pr1978 (3%), Pr1972 (2%), PDSI1979 (2%) |
| 1981–1991 | 0.20 | 0.27 | PDSI1987 (13%), Pr1987 (12%), Pr1982 (12%), Pr1985 (12%), Pr1983 (11%), Pr1986 (9%), DPI1987 (9%), Pr1981 (5%), HD1987 (5%), DPI1985 (5%), DPI1990 (3%), DPI1988 (2%), DPI1981 (1%) | |
| 1991–2001 | 0.02 | 0.24 | Pr1999 (20%), DPI1993 (17%), DPI1994 (14%), DPI1999 (14%), PDSI1993 (14%), DPI2000 (11%), PDSI1997 (8%), DPI1997 (2%) | |
| 2001–2011 | 0.03 | 0.11 | DPI2007 (21%), PDSI2005 (18%), PDSI2007 (15%), DPI2002 (11%), PDSI2006 (11%), PDSI2003 (11%), Pr2007 (7%), Pr2004 (6%) | |
| US Temperate ecoregion | ||||
| 1981–1991 | 0.15 | 0.13 | Pr1983 (26%), Pr1987 (17%), Pr1982 (15%), Pr1990 (15%), DPI1988 (14%), Pr1988 (13%) | |
| 2001–2011 | 0.19 | 0.25 | Pr2001 (23%), Pr2006 (17%), Pr2004 (17%), Pr2008 (16%), Pr2010 (15%), Pr2003 (13%) |
Results of random forest/regression tree analysis for Canadian portion of Great Plains
| Entire Canadian portion of Great Plains | ||||
| 1981 to 1991 | 0.06 | 0.12 | Pr1987 (30%), PDSI1983 (21%), Pr1984 (10%), DPI1983 (9%), Pr1986 (9%), Pr1989 (6%), DPI1981 (6%), PDSI1981 (4%), DPI1984 (2%), pfirg1990 (2%), DPI1987 (1%) | |
| 1991 to 2001 | 0.15 | 0.20 | PDSI1998 (18%), PDSI1995 (18%), DPI1998 (14%), PDSI1991 (12%), pfirg2000 (9%), DPI1995 (8%), DPI1997 (7%), Pr1997 (6%), Pr1999 (6%), DPI1993 (2%) | |
| 2001 to 2011 | 0.13 | 0.17 | Pr2008 (17%), PDSI2006 (14%), PDSI2004 (13%), Pr2003 (12%), PDSI2009 (11%), Pr2001 (11%), Pr2002 (4%), PDSI2007 (3%) | |
| Canadian west-central semiarid ecoregion | 1971 to 1981 | 0.20 | 0.40 | DPI1975 (15%), HD1980 (15%), pfirg1980 (10%), DPI1980 (8%), PDSI1980 (8%), PDSI1974 (8%), pfirgchng19711981 (6%), PDSI1973 (5%), Pr1974 (5%), DPI1979 (5%), Pr1973 (4%), Pr1979 (4%), Pr1980 (3%), Pr1977 (2%) |
| 1981 to 1991 | 0.03 | 0.11 | PDSI1981 (30%), DPI1981 (27%), HD1988 (11%), PDSI1983 (11%), PrSum1984 (11%), DPI1988 (9%) | |
| 1991 to 2001 | 0.04 | 0.16 | pfirg2000 (27%), HD1998 (26%), pfirgchng19912001 (11%), HD1993 (10%), PrSum1999 (9%), DPI1997 (7%), DPI1993 (6%), HD1995 (3%) | |
| 2001 to 2011 | 0.01 | 0.07 | PDSI2004 (37%), DPI2008 (13%), DPI2010 (13%), PDSI2007 (13%), Pr2005 (13%), Pr2008 (13%) | |
| Canadian temperate ecoregion | ||||
| 1981 to 1991 | 0.07 | 0.12 | DPI1987 (31%), DPI1989 (20%), Pr1987 (16%), PDSI1989 (13%), Pr1985 (11%), Pr1981 (10%) | |
| 1991 to 2001 | 0.15 | 0.18 | Pr1995 (34%), DPI1995 (17%), PDSI1996 (17%), DPI1996 (13%), PDSI1993 (10%), HD1992 (9%) | |
| 2001 to 2011 | 0.18 | 0.24 | Pr2010 (29%), Pr2007 (16%), Pr2006 (15%), HD2009 (10%), HD2010 (10%), PDSI2007 (8%), DPI2007 (8%), DPI2008 (3%), DPI2003 (2%) |
Fig. 3Counties/rural census units showing R2 values > 0.25 for models of the temperate sub-region for 1970s and 1990s. Counties in Canada and the US are distinguished by color changes (red and blue for USA, yellow for Canada)
Fig. 4Residual error of random forest model for US temperate ecoregion for the 1970s