Literature DB >> 35937615

Swiss Municipal Data Merger Tool: Open-source Software for the Compilation of Longitudinal Municipal-level Data.

Abstract

The Swiss Municipal Data Merger Tool (Swiss MDMT) offers a solution to a frequent data management problem encountered when compiling longitudinal datasets involving Swiss municipalities as the observational units. Due to municipal mergers, the number of municipalities in Switzerland declined from 3,095 in 1960 to 2,202 in 2020. As a consequence, manually securing the correct spatial reference when merging historical cross-sectional data is tedious and time-consuming. To facilitate this operation, the Swiss MDMT considers mutations at the municipal level and maps municipalities of a first point in time to municipalities in a second point in time based on information provided by the Swiss Federal Statistical Office's municipality inventory. The tool is distributed as an open-source R package and is freely available on CRAN.

Entities: Chemical

Keywords: Data management; historical data; municipal research; open‐source software; spatial reference

Year: 2021 PMID： 35937615 PMCID： PMC9344967 DOI： 10.1111/spsr.12487

Source DB: PubMed Journal: Schweiz Z Polit ISSN： 1424-7755

INTRODUCTION

Data at the municipal level constitute an important input for statistical analyses that investigate societal phenomena expected to differ across space. For applied statistical work, the change of geographical units over time thereby poses a major challenge if data are to be aggregated or compared longitudinally. In the case of Switzerland, mergers of jurisdictions at the municipal level are of first‐order importance. The number of municipalities in Switzerland has decreased significantly in recent years; while there were still 2,899 municipalities in 2000, the corresponding number declined to 2,202 by 2020. This research note introduces the Swiss Municipal Data Merger Tool (Swiss MDMT), an open‐source R package that allows researchers to merge municipal level information from different points in time and to appropriately adjust the spatial references (German: Raumbezüge) so that a phenomenon is measured for the same spatial units over time. The software makes use of the Swiss Federal Statistical Office’s (SFSO) municipality inventory (German: Historisiertes Gemeindeverzeichnis), which is a database that contains standardized and up‐to‐date information about all mutations at the municipal level in Switzerland starting from January 1, 1960 (Verein eCH, 2014; Bundesamt für Statistik, 2017). 1 Longitudinal datasets are often provided with a unified spatial reference by the data owner. 2 In contrast, the issue of differing spatial references often occurs when an analysis involves data from different data sources. Especially cross‐sectional data are normally provided with the spatial reference of the point in time of collection, as the primary objective is to capture a particular phenomenon within the spatial structure at that time. For instance, municipal statistics about citizens' voting behavior or government finances are recorded for all the existing municipalities at a specific point in time. However, in secondary data analysis, researchers often aim to study developments over time, for example, the evolution of issue‐specific voting behavior, 3 or local public finances. 4 In fact, there is huge potential for the analysis of local contexts based on the Swiss municipal structure (Horber‐Papazian, 2007; Ladner, 1991; Steiner et al., 2014). This is further strengthened by the freely accessible information on municipal‐level politics from Andreas Ladner’s Communal Secretary Survey that started in 1988 (see, e.g., Haus & Ladner, 2020, Ladner & Steiner, 2003). For analyses of developments at the municipal level over time, however, the appropriate and comparable spatial reference first has to be secured. One practice is to restrict the data sample to only those municipalities that were not involved in a mutation over the entire observation period. The spatial units are thus clearly comparable. However, with this approach, a possible selection effect is accepted because the subsample of municipalities that were not involved in a merger might be systematically different from the subsample of municipalities that underwent a transformation. In particular, the latter municipalities often tend to be smaller (and with regard to the aforementioned examples, might also be more rural and more likely to face fiscal difficulties). Another approach is to neglect the differing spatial references and merge municipalities in a “naïve” way by using the SFSO municipality number available at any given point in time thereby accepting, first, structural breaks in the data series for the municipalities that emerged from a merger and, second, the exclusion of data from no longer existing entities. A more appropriate approach could be to combine those historical spatial units that refer to a given spatial unit in the last observational period. This approach works for municipal level datasets which fulfill the necessary property that the spatial aggregation is technically feasible. This holds, for example, for results of popular votes. Corresponding datasets contain information on the number of yes and no votes for all the municipalities that exist at the time of the vote. For example, let us assume that there are two municipalities A and B at time t (i.e., the historical spatial units) and they merge to form municipality C at some later point of time t (i.e., the spatial unit in the last observational period). Furthermore, let us suppose that we are interested in the yes‐share in this new municipality C of a given popular vote which took place at t. Thus, the yes‐share in C is the sum of yes votes of A and B divided by the total sum of the yes and the no votes of A and B. In other words, the yes‐share in C is the weighted average of the yes‐shares of A and B. In sum, we suggest a programmatic solution and offer a corresponding tool that automates this latter approach, as it is tedious and time‐consuming if performed manually. In the end, however, which of these approaches is best suited depends not only on the technical feasibility but also on the given research question. The development of the Swiss MDMT relies on an official and well‐documented data standard (called eCH‐0071 Datenstandard Historisiertes Gemeindeverzeichnis der Schweiz) for the municipality inventory (see Verein eCH 2014). The existence of this e‐government standard influences the tool development in two ways: On the one hand, it allows for a straightforward implementation of the mapping algorithm, and thus facilitates tool development. On the other, the mere existence of this data standard suggests a reasonable degree of continuity with regard to the data model utilized, which is a decisive factor in spurring additional initiatives. Our tool fits and supports the more general enterprise in data science to make more and more data from an increasing range of sources publicly available. For quantitative analyses, the stylized data pipeline commonly consists of, first, getting hold of the raw data, second, accessing the data sources and third, preprocessing data from potentially different data sources into the final dataset, ready for the actual analysis (for a discussion of this concept, see Matter, 2019 or Sebei et al., 2018). Along this data pipeline, various cost‐cutting developments can be observed: firstly, availability and straightforward accessibility of various data sources are enhanced by open data initiatives like the Swiss Open Government Data platform (for a general account, see Bundesblatt, 2019 and Bürgi‐Schmelz, 2019). Secondly, accessing public databases is facilitated by open source interfaces (OSI) which provide high‐level access to data from a web service by generating a flat dataset ready for statistical analyses from structured data available at that web service (Matter & Stutzer, 2015; Matter, 2018). One example of an existing OSI is the R package pvsR (Matter, 2014), which is an OSI for accessing data made available by Vote Smart, a research organization that gathers data about various topics in U.S. politics. Thirdly, a tool like the Swiss MDMT helps to reduce effort during the preprocessing stage. To summarize, all these diverse developments along the data pipeline help facilitate empirical research. In the next section, we describe the Swiss MDMT, which is available as an R package via CRAN (see Knechtl, 2020) and show a code example. The advantages of this approach compared to a naive merge are briefly illustrated in the subsequent section. The final section offers concluding remarks.

PROGRAM FUNCTIONALITY

In this section, the Swiss MDMT’s core functionality is presented. The program automatically detects and maps municipalities that existed at a given point in time in the past to municipalities that exist at a later point in time. The program interface simply requires the dates of these two points in time. In response, the program returns a mapping table, i.e., a table which links the former set of municipalities to the latter set of municipalities. The underlying mapping algorithm exploits the information available in the municipality inventory. Based on an example, we illustrate how the Swiss MDMT's mapping table is used in a complete merge. 5 The example addresses the question of how the electoral support for the facilitated naturalization of third generation immigrants in Switzerland evolved over time at the municipal level. To empirically assess this phenomenon, we use data from two referendums on this topic that took place in 1983 and 2017. Naturally, the spatial reference of these datasets on the vote outcome at the municipal level differs, as each dataset is based on the municipal structure of the respective point in time. The first dataset (vote 1) has the spatial reference capturing the situation as of the 4th of December, 1983, and the second dataset (vote 2) the one as of the 12th of February, 2017. To calculate the change in the yes‐share over time and per municipality, the following three steps are necessary: firstly, the spatial reference of the earlier dataset needs to be adjusted to that of the later one using the Swiss MDMT's mapping table. Secondly, on this basis, the yes‐share of vote 1 is calculated for the same spatial reference as vote 2. Thirdly, the two datasets with the yes‐share are merged and the changes in the yes‐share are calculated. The following code examples in R are presented so that they can be easily understood by researchers who have some basic knowledge in R. If the tool is being used for the first time, the package needs to be installed from CRAN. Furthermore, in a new R session, the package needs to be loaded. If the municipality inventory database is not yet available locally, it is obtained with the following command, which fetches the database and stores it in the current working directory of the R process. This command only needs to be executed once as it makes the municipality inventory locally accessible. Once the R package is installed and the municipality inventory downloaded, the import function imports the database as an R object into the R environment. The mutations object is then used considering the two dates to obtain the mapping table. Table 1 displays an excerpt of the program output. The first record, i.e. Wiliberg, shows an unaltered municipality. Records two and three capture municipalities that were part of a merger, i.e., the municipality of Zofingen incorporated the municipality of Mühlethal in 2002, which, in turn, no longer exists.

TABLE 1

An excerpt of the mapping table which is the main output of the Swiss MDMT

New state (February 12, 2017)		Old state (December 4, 1983)
SFSO number	Municipality name	SFSO number	Municipality name
⁝	⁝	⁝	⁝
4288	Wiliberg	4288	Wiliberg
4289	Zofingen	4289	Zofingen
4289	Zofingen	4278	Mühlethal
⁝	⁝	⁝	⁝

An excerpt of the mapping table which is the main output of the Swiss MDMT This mapping table is the key element in performing a complete merge of the datasets in question. To exhibit the steps involved, let us assume that both popular vote datasets are available in the R environment and named as follows: vote_1_1983 and vote_2_2017. The two tables contain the following columns: yes_votes and no_votes. Furthermore, the former dataset contains a column bfs_nr_old, whereas the latter contains a column bfs_nr_new. Note that for this code example, we use the functionality of further R packages which are thus loaded as well. To aggregate the data of the earlier vote, the following steps are required: first, municipalities that merged are grouped together. The mapping table contains the relevant information about which municipalities existing in December 4, 1983 belong together on February 12, 2017. Second, the grouped yes and no votes are added up. Third, the yes‐share of vote 1 is calculated with the same spatial reference as vote 2. For the second dataset, which already has the proper spatial reference, the yes‐share is calculated in a straightforward way. As the two datasets now share a common spatial reference, they can be merged by a join command, achieving a complete merge. And in a final step, the changes in the yes‐share can be calculated. The complete_merge object now contains the changes in electoral support for a facilitated naturalization of third generation immigrants across Swiss municipalities. This final dataset could now be used for further analyses of how citizens' preferences on naturalization changed over time.

COMPARISON OF THE COMPLETE MERGE WITH THE NAIVE MERGE STRATEGY

The advantage of using the complete merge strategy rather than a naive merge strategy in the example presented above is twofold. Firstly, observations are lost if the naive merge strategy is employed because the municipality numbers do not match (for example, if a new SFSO number was assigned after a merger of municipalities). In the example above, this is the case for 138 out of 2216 municipalities in the later period. Secondly, a bias is introduced in 122 cases because the same SFSO number no longer refers to the same spatial unit (and thus parts of the past municipality level information are ignored, for example, from Mühlethal, see Table 1). For our example, Figure 1 illustrates the resulting differences in the changes in the yes‐shares between the two data merge strategies. 6 In total, 11.7% of the municipal data (regarding the stock in the later period) are either not considered or erroneous.

FIGURE 1

Deviations from the actual changes in the yes‐shares. Notes: This figure shows the deviations from the actual changes in yes‐shares over time between the complete merge and the naive merge strategy for the 122 mutated cases for which the SFSO municipality number of one of the merging municipalities was carried over for the new entity. A positive/negative deviation signifies that the naive strategy overestimates/underestimates the change in the yes‐share. The deviation between the two strategies can be characterized along another dimension, i.e., the fraction of data points from original municipalities in the first period of time that is not considered when a naive merge is conducted. Interestingly, this fraction is not distributed uniformly across Swiss cantons because municipality mutations have occurred more often in certain regions of Switzerland than in others. Figure 2 depicts the fraction of municipalities as of January 1, 1960 that are considered in a naive merge with municipal data as of January 1, 2020 by canton. The canton of Glarus marks the extreme case: no observations would be included in a merged dataset. But also for the cantons of Neuchâtel, Fribourg, Ticino, Graubünden, and Thurgau, information from less than 50% of the original municipalities would be considered.

FIGURE 2

Selection of municipalities included by canton when performing a naive merge between January 1, 1960 and January 1, 2020. Notes: The percentage of original municipalities included if a naive merging strategy is adopted varies across Swiss cantons. For cantons without mutations, 100% of the municipalities are considered in a naive merge, whereas the ratio of considered municipalities is below 100% for cantons where mutations occurred between January 1, 1960 and January 1, 2020.

CONCLUDING REMARKS

The Swiss MDMT automates the process of securing identical spatial references for different datasets as illustrated in the example above. The tool is meant to improve the quality of merged datasets compared to that of alternative procedures. 7 Nevertheless, the appropriate strategy for the selection of municipalities that are considered in a longitudinal analysis depends on the research question. The same holds for how municipal level information is aggregated. If municipal mergers might affect the outcome variable per se, a separate selection and analysis of municipalities that were involved and not involved in mergers might be warranted. For example, consider the case of the municipal tax rate. The weighted mean tax rate might be a poor representation of a local tax situation in the past if the study is about tax competition. For tax competition, the variation in tax rates across spatial units is crucial and some weighted average tax rate might not easily serve as a counterfactual tax rate for the situation that would have existed if the municipalities had already been merged in the past. Regarding the limits to aggregation, there might be institutional municipality characteristics that cannot be meaningfully aggregated. For example, if the financial auditing in municipalities (German: Rechnungsprüfungskommission) that merged was organized differently, it is not readily possible to define and calculate a variable representative of these institutional conditions. Similar limitations might also arise even for characteristics that can be cardinally measured, like the size of the municipal council. In the future, the Swiss MDMT could be extended with analytical functionality regarding the evolution of municipalities in Switzerland, such as one that lists all mutations for a given municipality within a given time period. Meanwhile, the tool will hopefully make life easier for researchers studying questions in the cosmos of Swiss local federalism.

Open research badges

This article has earned an Open Materials badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at https://cran.r‐project.org/package=SMMT.

1 in total

1. pvsR: An Open Source Interface to Big Data on the American Political Sphere.

Authors: Ulrich Matter; Alois Stutzer
Journal: PLoS One Date: 2015-07-01 Impact factor: 3.240

1 in total