Literature DB >> 28398288

Spatiotemporal historical datasets at micro-level for geocoded individuals in five Swedish parishes, 1813-1914.

Finn Hedefalk1,2, Patrick Svensson2, Lars Harrie1.   

Abstract

This paper presents datasets that enable historical longitudinal studies of micro-level geographic factors in a rural setting. These types of datasets are new, as historical demography studies have generally failed to properly include the micro-level geographic factors. Our datasets describe the geography over five Swedish rural parishes, and by linking them to a longitudinal demographic database, we obtain a geocoded population (at the property unit level) for this area for the period 1813-1914. The population is a subset of the Scanian Economic Demographic Database (SEDD). The geographic information includes the following feature types: property units, wetlands, buildings, roads and railroads. The property units and wetlands are stored in object-lifeline time representations (information about creation, changes and ends of objects are recorded in time), whereas the other feature types are stored as snapshots in time. Thus, the datasets present one of the first opportunities to study historical spatio-temporal patterns at the micro-level.

Entities:  

Year:  2017        PMID: 28398288      PMCID: PMC5387924          DOI: 10.1038/sdata.2017.46

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

Studying how micro-level geographic factors have influenced human living conditions over long time periods provides many new and important insights. However, in the field of historical demography studies have primarily been limited to the macro level due to a lack of individual-level and longitudinal geocoded historical databases to enable studies at the micro-level. Currently, only a few historical longitudinal demographic databases have geocoded individuals to more precise locations, such as streets or property units[1]. In most longitudinal historical populations, individuals are linked to a less precise position such as villages, parishes or other larger administrative units. Most historical demographic databases that are geocoded on a more detailed level (e.g., ref. 2) only cover a snapshot in time, which is incompatible with longitudinal analyses. The datasets presented here adds geographic information to the Scanian Economic Demographic Database (SEDD). The SEDD covers an area of five rural parishes (approximately 130 km2) in southern Sweden (Fig. 1). The SEDD includes detailed demographic and economic information from 1646 to the present and is the result of project collaborations between the Centre for Economic Demography (CED) and the Regional Archives in Lund[3]. The SEDD has been extensively used in historical research (e.g., refs 4–6). All individuals in the SEDD are followed from birth or in-migration to death or out-migration. The database contains continuous information on the family and household structure, and on dates of births, marriages, and deaths occurring in the parishes. Characteristics such as socioeconomic status, as well as causes of death, are known for the period 1813–1914. The primary sources for the SEDD are vital registers, catechetical examination registers and annual poll-tax registers. However, the SEDD population had not been previously geocoded on a detailed level. In a recent study, we geocoded a subset of SEDD with approximately 53,000 individuals for the time period 1813–1914 (ref. 7). This geocoding enabled the first longitudinal studies of geographic factors at the micro-level[8].
Figure 1

The five parishes (Hög, Kävlinge, Sireköpinge, Halmstad and Kågeröd) covered in the datasets.

Three types of datasets for the five rural parishes are provided in this paper: 1) property units (which the individuals in the SEDD are linked to); 2) historical wetlands; and 3) additional historical geographic data, such as infrastructure and buildings. The primary sources of the geographic data are historical maps and cadastral dossiers that encompass the five parishes for the period 1757–1914, as well as textual sources, such as annual poll-tax registers. The property units and wetlands are stored in temporal representations of longitudinal object-lifelines[9]. That is, for each property unit, we know when it was created, reformed, and ceased to exist; for the wetlands, we know when they were drained. The reason for storing wetlands in object-lifelines is to enable accurate proximity analyses for the whole study period. Proximity to wetlands and still water may indicate exposure to various diseases such as water-borne diseases and malaria (until the 20th century, malaria was a problem in Europe, and wetlands likely provided habitats for mosquitoes that transmit malaria[10]). Therefore, the possibility of estimating the exposure to wetlands is necessary for researchers to understand causes of mortality in historical societies. These wetlands were gradually drained during the 19th and 20th centuries; thus, their lifelines were estimated to enable estimations of dynamic geographic variables. Besides, the drainage of wetlands substantially changed the landscape and likely increased the agricultural productivity of the land. Thus, it is possible to study how these interventions affected the individual-level economic conditions of the farmers. Finally, the additional historical geographic data (roads, buildings and water) were digitized from several series of historical maps as temporal snapshots at single points in time. Consequently, we have annual information about the shape of each property unit, which enable us to accurately trace the residential histories of individuals in time and space. Combined with the historical geographic data and other modern geographic data, such as elevation and soil conditions, the datasets can be used for spatio-temporal analysis, as well as the inclusion of geographic factors in longitudinal analyses of historical demographic data. Thus, they present opportunities to address novel research questions not only in the fields of historical demography and economic and agrarian history but also in many other fields, such as epidemiology, medicine and geography.

Methods

The creation of the datasets consisted of three process steps: (1) creating geographic snapshot data by georeferencing and digitizing historical maps; (2) transforming the snapshot property unit and wetland data into an object-lifeline representation using Supplementary data; and (3) linking individuals in the SEDD to the property units in which they lived. The digitized buildings, communications and streams remained in their snapshot representations.

Data sources

The maps were obtained in digital format from the Lantmäteriet (the Swedish mapping, cadastral, and land registration authority). The historical maps originate from four map series: land survey maps (LSMs), military topographical survey maps (MTSMs), topographic maps (TMs), and economic maps (EMs). The 150 digitized cadastral dossiers (CDs) were primarily used to document changes in the property units and to create an object-lifeline representation. As shown in Table 1 and Fig. 2, the maps were created at different scales and for different purposes. The LSM and CD maps primarily focused on documenting juridical borders, such as property units, but also documented buildings and land use, communications and the natural landscape. The MTSM and TM were constructed for military purposes and thus, focused on the natural landscape, land use, topography, communications and physical objects, such as buildings. The EMs documented property unit borders and the natural landscape, land use, communications and buildings.
Table 1

Number of map documents digitized from historical maps.

Map seriesYearsNo. of map documentsScaleDigitized objects
Land Survey Maps (LSMs)1757–1863391:4,000–1:8,000Property units, buildings
Military Topographical Survey Maps (MTSMs)1812–1820111:20,000Roads, buildings, streams, wetlands, lakes
Topographic Maps (TMs)1860–186521:100,000Roads, buildings
Economic Maps (EMs)1910–191571:20,000Property units, buildings, roads, railroads,
Cadastral Dossiers (CDs)1757–1914Approximately 1501:1,000–1:8,000Property units
Figure 2

Four maps that show Videröra mansion at the border of Halmstad parish.

(a) LSM from 1831. (b) MTSM from 1812–1820. (c) TM from 1860. (d) EM from 1910–1915. (Source: ref. 17).

For supplemental textual data, we used the Swedish poll-tax registers that cover the study period, which contain the addresses, taxation value (of the property unit), and household and owner information of each person who had to pay taxes. These individual taxation values can be used to detect possible geometric changes in the property units and estimate the lifelines of the property units.

Georeferencing and digitizing historical maps

We georeferenced and digitized the historical maps and cadastral dossiers in Table 1. In the georeferencing process, we used orthophotos from 1940 and modern topographic web maps to identify common points in the historical maps, such as churches, terrain features, crossroads, and river junctions. The historical maps were transformed using a piecewise interpolation method (spline). Geographic objects, such as property units, buildings, roads, railroads and streams, were digitized as temporal snapshots from the maps. The georeferencing and digitalization steps were performed using ArcGIS Desktop 10.3 (http://www.esri.com/software/arcgis/arcgis-for-desktop). Historical and digitized wetlands from the MTSM series were received from the Swedish County Administrative Board of Skane.

Creating object-lifeline representation of the property units and geocoding individuals

In this step, we transformed the snapshot data into an object-lifeline model and linked individuals to the property units in which they lived. This method is described in ref. 7. To create an object life-line representation of the property units we combined the digitized property units from the LSMs, CDs, and EMs with information from the poll-tax registers and cadastral dossiers. With this information, we learned when the property units were created, changed, and ceased to exist. The main aim of the geocoding was to establish links between the records in the poll-tax registers and the digitized property units, which was simultaneously performed with the estimation of the property units’ lifelines. For the geocoding, we primarily used information from the Swedish poll-tax registers, which contains the addresses (property units) of each person who had to pay taxes. The Swedish poll-tax registers are on a household level, which indicates that only the head of the household is noted with a full name. However, each individual in the SEDD is linked to the families and households to which they belong, and most of these families and households have a correspondence in the poll-tax registers. In this project, we used a subset of the SEDD from 1813 to 1914, which contains approximately 53,000 individuals, 10,000 households and 184,500 annual poll-tax records, in which 90.4% of the individuals experiencing an event in the demographic data are linked to the poll-tax registers (96.2% for the period 1829–1914 when continuous population registers exist for all five parishes). The subset contains all the individuals living in the five parishes for the period 1813–1914. We began the geocoding in 1,813 as information about the individuals’ migration within and between parishes from this year and onwards is available. We stopped in 1914 due to personal integrity constraints. The geocoding and creation of object-lifelines required extensive investigation for several reasons. For instance, before the land reforms (conducted between 1757 and 1849 in the parishes), all individuals lived in small villages and cultivated nearby scattered plots. After the land reforms, the self-owned farmers received a cohesive piece of land, to which they also moved. We denote these lands as property units (Fig. 3). Although the property units were usually devoted to agriculture, some of the units also contained forest lands. Throughout the study period, several of the property units were subdivided or partitioned into smaller units (consistent with rapid population growth). However, the property units did not receive new addresses; thus, multiple property units often share addresses. We denote the set of such units as an address unit (Fig. 3). With some exceptions, they are located close and adjacent to each other. The number of property units per address unit, as well as the total number of property units, was constantly increasing throughout the study period (cf. Fig. 4).
Figure 3

Property units and address units in 1871 in Halmstad parish.

Property units that share an address (identical colour in the map) constitute one address unit. For example, the two property units with the address Hasslebacken 07 constitute one address unit. Only the address units are labelled.

Figure 4

Changes of address units and property units for the period 1813–1914.

(a–d) Percentage of address units constituted by a given number of property units for the years 1813, 1850, 1880 and 1914; the values on the bars represent number of address units with the specific amount of property units. (e) Total number of property units for the years 1813, 1850, 1880 and 1914.

The geocoding on the address unit level was more straightforward as the poll-tax register contains annual information about the address unit for the head of the family. However, we also aimed to geocode on the property unit level. To mitigate the problem of units that share addresses, we used taxation values in the poll-tax registers combined with textual sources in the maps and cadastral dossiers to separate the units that share addresses. We also traced the owner histories of the property units when property units shared both taxation values and addresses[7]. For several records, extensive manual research was required to achieve a substantial number of reliable links.

Creating object-lifeline representation of wetlands

To create an object-lifeline representation of the wetlands in the study area, we combined information about 1) digitized historical wetlands from the MTSMs from 1812–1820 (provided by the Swedish County Administrative Board of Skane); 2) modern wetland and water data; 3) soil type data; 4) the EM from 1910–1915; and 5) data regarding registered joint drainage units before 1920 within the five parishes. For each digitized wetland from the 1812–1820 MTSMs, the end date was estimated using overlapping joint drainage units. These drainage units were joint initiatives by several property unit owners with the purpose of draining a specific area and sharing the cost. These drained areas were registered during the 19th century and have been scanned and digitized[11]. Thus, a drainage unit that covers a wetland indicates that the wetland was drained. Observations of wetlands and water bodies in the EMs from 1910–1915 were also compared with the digitized wetlands, the modern digitized wetlands and other water bodies. A soil type map[12], in which soil types such as peat and muddy sediments indicate historical wet areas, was also used. Because of the uncertainty in several of the start and end dates of the wetlands, we also used intervals representing the minimum and maximum start and end date. For example, if a digitized wetland from the 1812–1820 MTSM was not observed in the 1910–1915 EM nor had any overlapping joint drainage unit, its maximum end date was set to 1909. Moreover, if a wetland was observed as being more similar in shape and size to a modern wetland, the maximum start date of the digitized modern wetland was set to 1910. Figure 5 shows the property units and wetlands in the object-lifeline representation of the parishes Sireköpinge, Halmstad and Kågeröd. In the figure, the temporal dimension is displayed on the z-axis, and the height of each object represents its existence in time. Note that several wetlands are not visible in the figure because they are covered by property units (i.e., they disappeared before 1914). Figure 5b shows an example of the property unit Bångstorp 01, which existed for the period 1801–1885. Thereafter, it was subdivided, and a new property unit (also with the address Bångstorp 01) was created in 1885. The maximum end time of the property units is 1914, whereas the maximum end time for the wetlands is 2007.
Figure 5

Property units and wetlands (blue) in object-lifeline representations.

(a) The parishes Sireköpinge, Halmstad and Kågeröd (b) Enlarged area of some property units (including Bångstorp 01).

Data Records

The data records contain eight datasets on ESRI Shapefile format (Table 2); each contains four files with the extensions.shp,.dbf,.shx, and.prj. The datasets are stored in the Harvard Dataverse (Data Citation 1). The spatial reference for all geographic data is the official Swedish geodetic reference system Sweref 99 TM (EPSG: 3,006), which is a UTM 33N projection of the Swedish realization of the European Terrestrial Reference System 1989 (ETRS89). The attributes for each of the datasets are described in Tables 3–7. Empty cell values in the datasets represent missing information. In addition to the information in Tables 3–7, the data records contain a codebook (Codebook_SEDDGeo.pdf) that lists the classifications used in the categorical attributes.
Table 2

Overview of the datasets.

LayerDataset nameGeometryTime representationNo. of objects
‘No. of objects’ represents the number of geographic objects such as building points or road segments.    
Property Unitsproperty_units_SEDD.shpPolygonObject-lifelines1,165
WetlandsWetlands.shpPolygonObject-lifelinesMTSM: 364Modern: 195
BuildingsBuildings_polygons.shpPolygonSnapshotsLSM: 385MTSM: 438
BuildingsBuildings_points.shpPointSnapshotsTM: 638EM: 4,216
RoadsRoads.shpLineSnapshotsMTSM: 839TM: 433EM: 2,060
RailroadsRailroads.shpLineSnapshotsEM: 150
StreamsStreams.shpLineSnapshotsMTSM: 99
LakesLakes.shpPolygonSnapshotsMTSM: 40
Table 3

Property Units (property_units_SEDD.shp).

AttributesTypeDescription
FIDIntegerShapefile automatic identifier
ShapeGeometryGeometry of the object (polygon).
PolygonIdStringUnique identifier of the polygon. Created by the authors.
PuIdStringProperty unit identifier created by the authors. This identifier establishes links on the property unit level between the geographic data and the individuals in the SEDD (cf. section ‘Links to the Scanian Economic Demographic Database (SEDD)’.
AuIdStringAddress unit identifier created by the authors. This identifier establishes links on the address unit level between the geographic data and the individuals in the SEDD (cf. section ‘Links to the Scanian Economic Demographic Database (SEDD)’.
ParishStringThe parish that the property unit belongs to.
addrNameStringAddress name (e.g., ‘Hög’).
addrCodeStringAddress number (e.g., 04).
addrLetterStringAddress part (e.g., ‘B’ or ‘1’).
sDateStringStart date of the object (usually only the year).
eDateStringEnd date of the object (usually only the year).
sDateMinStringThe earliest start date (usually only the year), which represents an uncertainty interval of the object’s lifeline. That is, the object may exist as early as indicated in this attribute.
eDateMaxStringThe latest end date (usually only the year), which represents an uncertainty interval of the object’s lifeline. That is, the object may exist as late as indicated in this attribute.
obsDateStringDate of observation (=date of the map from which the object was digitized) (usually only the year).
mapCodeStringCode of the historical map from which the object was digitized. From the Lantmäteriet historical map archive[17].
mapNameStringName of the historical map from which the object was digitized. From the Lantmäteriet historical map archive[17].
mapSeriesStringName of the map series of the historical map from which the object was digitized.
taxValueFloatTaxation value of the property unit (Swedish: Mantal). A ratio value indicating the productivity of the unit.
propFormStringType of property formation that created or altered the property unit (In Swedish; e.g., Hemmansklyvning).
propFormEnStringEnglish translation of the propForm attribute (e.g., subdivision).
ownerFNObsStringFirst name of the owner of the property unit when observed in the map.
ownerLNObsStringLast name of the owner of the property unit when observed in the map.
SubtypeStringType of property unit (e.g., croft).
notePUStringComments made about the property unit when it was digitized (sometimes a transcription of a text part in the map document or cadastral dossier). This information is primarily relevant if the property units were to be improved or changed.
noteTimEstStringComments about the lifeline estimation of the property unit/object. This information is primarily relevant if the lifeline estimation were to be improved.
Shape_LengFloatCircumference of the property unit polygon (meters).
Shape_AreaFloatArea of the property unit polygon (square meters).
parishCodeStringNational parish codes from Statistics Sweden (SCB).
Table 4

Wetlands in object-lifelines (Wetlands.shp).

AttributesTypeDescription
FIDIntegerShapefile automatic identifier.
ShapeGeometryGeometry of the object (polygon).
wetlandIdStringUnique identifier of the wetland. The same identifier is assigned to those overlapping historical and modern wetland polygons that represent the same wetland (but with different geometric shapes because of, e.g., drainage). Created by the authors.
sDateMinIntegerThe earliest start date (usually only the year), which represents an uncertainty interval of the object’s lifeline. That is, the object may exist as early as indicated in this attribute.
sDateIntegerSpecific start date (usually only the year). Almost always used when a joint drainage unit explicitly determines the start date of a modern wetland. E.g., a wetland was partially drained in 1889, and an overlapping, smaller, wetland can be observed in the modern data, this wetland is assigned the start date of 1890.That is, we assume that at this point in time, the modern wetland got its present geometric shape.The value ‘9999’ indicates that the wetland start within the interval that is defined by the sDateMin and sDateMax.
sDateAvgIntegerAn approximate start date for the wetlands observed in the modern data (usually only the year). Calculated as sDateMin+(eDateMin−sDateMin)/2. E.g., if a wetland observed in the modern data can earliest start in 1821, and earliest end at 2007 (the year it was digitized), its sDateAvg=1914.
sDateMaxIntegerThe latest possible start date. Most often the year when the wetland was observed in the map.
eDateMinIntegerThe earliest possible end date (usually only the year), which represents an uncertainty interval of the object’s lifeline. That is, the object exists to at least the date indicated in this attribute. This date correspond often to the observation in the map. E.g., if a wetland was digitized from the MTSM map in 1820, eDateMin is set to 1820.
eDateIntegerA specific date which most often indicate the year when a joint drainage was carried out on the wetland (usually only the year). The value ‘9999’ means that no joint drainage has been observed and that the wetland therefore stops to exist within the interval eDateMin and eDateMax.
eDateAvgIntegerUsed mostly for the historical wetlands observed in the 1820 MTSM map (usually only the year). Calculated as: sDateMax+(eDateMin−sDateMax)/2. If eDate is a specific date, eDateAvg is assigned the value of eDate.
eDateMaxIntegerThe latest possible end date (usually only the year). The value ‘9999’ is used for the wetlands observed in the modern data and means ‘and onwards’ (i.e., it still exist and we do not know when it will stop existing).
obsDateIntegerDate of observation (=date of the map from which the wetland was digitized) (usually only the year).
SourceStringThe source of the wetland object.
wLandTypeStringType of wetland (only for wetlands observed in the modern data). Marshy or Very marshy.
NoteStringNote about the wetland object. This information is mainly relevant if the lifeline estimation were to be improved.
drainNameStringName of the joint drainage related to the wetland.
Table 5

Buildings (Buildings_polygons.shp, Buildings_points.shp).

AttributesTypeDescription
FIDIntegerShapefile automatic identifier.
ShapeGeometryGeometry of the object (polygon/point).
ParishStringThe parish that the building is located in.
noteBuildStringNote about the building.
addrNameStringAddress name (e.g., ‘Hög’, or ‘Kågeröd church’).
addrCodeStringAddress number (e.g., 04).
addrLetterStringAddress part (e.g., ‘B’ or ‘1’).
obsDateStringDate of observation (=date of the map from which the object was digitized) (usually only the year).
mapCodeStringCode of the historical map from which the building was digitized. From the Lantmäteriet historical map archive[17].
mapNameStringName of the historical map from which the building was digitized. From the Lantmäteriet historical map archive[17].
mapSeriesStringName of the map series of the historical map from which the building was digitized.
subTypeStringType of building (e.g., church).
nameOnMapStringName or number on the historical map.
buildingIdStringUnique identifier of the building. Created by the authors.
parishCodeStringNational parish codes from Statistics Sweden (SCB).
Table 6

Roads and railroads (Roads.shp, Railroads.shp).

AttributesTypeDescription
FIDIntegerShapefile automatic identifier.
ShapeGeometryGeometry of the object (line).
obsDateStringDate of observation (=date of the map from which the object was digitized) (usually only the year).
mapCodeStringCode of the historical map from which the object was digitized. From the Lantmäteriet historical map archive[17].
mapNameStringName of the historical map from which the object was digitized. From the Lantmäteriet historical map archive[17].
mapSeriesStringName of the map series of the historical map from which the object was digitized.
subTypeStringType of road (e.g., passage way, regional way) or railroad (undefined).
roadId/railIdStringUnique identifier of the road/railroad segment. Created by the authors.
noteRoad/noteRailStringNote about the road/railroad.
Table 7

Streams and lakes (Streams.shp, Lakes.shp).

AttributesTypeDescription
FIDIntegerShapefile automatic identifier.
ShapeGeometryGeometry of the object (line or polygon).
lakeId/streamIdStringUnique identifier of the lake or stream. Created by the authors.
obsDateStringDate of observation (=date of the map from which the object was digitized) (usually only the year).
NameStringName of the object.
mapCodeStringCode of the historical map from which the object was digitized. From the Lantmäteriet historical map archive[17].
mapNameStringName of the historical map from which the object was digitized. From the Lantmäteriet historical map archive[17].
mapSeriesStringName of the map series from which the object was digitized.
subTypeStringType of lake/stream.
noteStream/noteLakeStringNote about the stream/lake.
Shape_LengFloatLength of the stream segment/circumference of the lake polygon (in meters).
Shape_AreaFloatArea of the lake polygon (in square meters) (only for Lake.shp).

Links to the scanian economic demographic database (SEDD)

The geocoded demographic information from the SEDD is freely obtained from http://www.ed.lu.se/databases/sedd/sedd-public-access[3]. To access the dataset, the users need to send an e-mail request to the Centre for Economic Demography (for reporting data use). The SEDD dataset is longitudinal and contains individual event information on the total population within the five parishes for the period 1813–1914 (cf. ref. 13). The structure of the dataset is a table in which each row contains the identifier of one unique individual and supplied attributes of the individual. The table also contains events of specific occurrences, such as birth, marriage, outmigration and death. Each individual has one or more episodes, which is the period of time when all variables connected to the individual are constant. These episodes are defined by start- and end-dates. Each time a variable changes a value; e.g., if a person moves to another household, a new row with a new episode is included in the table. The links to the geographic data are specified in the PropID attribute, and changes each time the individual is moving to another location. In addition, the PropLevel attribute specifies whether the links are on address or property unit level. The links in the geographic data are specified by the attributes puId and auId in the Property Units dataset. By using these link attributes in combination with date we can easily link individuals to the digitized property units, as well as link geographic context variables to the individuals in the SEDD.

Technical Validation

Positional accuracy

The evaluation of the positional accuracy of the digitized property units is based on a subset (7.5%) of all property units. The selection was made based on the criteria that the property unit borders have not changed until now. This selection enables us to use modern property unit borders obtained from the Lantmäteriet (the Swedish mapping, cadastral, and land registration authority) as reference data. This subset should be a representative selection in terms of positional accuracy. In the evaluation, two measures of positional accuracy are used. The first measure of positional accuracy is based on the buffer-overlay-statistics (BOS)[14] method. The basic idea with this method is to create buffers around a line to be evaluated and another buffer around a reference line, and then base the quality evaluation on intersections between the buffers (and their set complement). In our case, we created buffers around the historical property unit boundaries (HB) as well as buffers around the modern boundaries (RB). Then the following measures were estimated: That is, Q is defined as the relative complement of RB in HB; and Q is defined as the relative complement of HB in RB. The quality parameters are computed for a set of buffer sizes to provide the buffer overlay statistics (Fig. 6). A high positional accuracy is indicated by high QBOS values and low QA and QB values. When using a buffer size of 80 meters, we include approximately 90% of the reference boundary lines.
Figure 6

Mean BOS accuracy results for the historical property units (HB) compared to the modern property units (RB).

The error bars represent the s.d.

The second measure for estimating the positional accuracy is based on the Euclidean distance between the historical property unit centroid and the centroid of the corresponding modern property unit. The non-normal error distribution of the property unit centroids may limit some of the usefulness of the statistics used in Table 8, such as s.d., mean, and the root mean square error (RMSE). The 95% RMSE was calculated after removing 5% of the outliers, which is a more representative measure for non-normal distributions[15].
Table 8

Positional accuracy—property unit centroids.

RMSE95% RMSEMedian errors.d.MinMax95th perc
16.714.410.311.41.059.440.1
Based on the result of the two quality measures, we conclude that the positional accuracy of the property unit centroids is approximately 10–15 meters, whereas the accuracy for the property unit boundaries is slightly lower. It was more difficult to determine the positional accuracy of the wetlands. The RMSE of the georeferenced 1812–1820 MTSMs, from which most of the wetlands had been digitized, was estimated to be 30 meters. This estimation is based on a comparison of unchanged objects, such as churches, crossroads and residential buildings in the geocoded MTSM, with the reference data World Imagery web map[16] (geometrical resolution between 30–60 cm). Apart from the geometric uncertainty in these well-defined unchanged objects, there are additional uncertainties in the mapping of wetlands; for example it is difficult to define borders of wetlands and also to define what a wetland is.

Match rate and geocoding level

Match rates for the geocoding on the property unit and address levels for the five parishes are presented in Table 9. The rate is measured in percent person-years for the individuals in SEDD. The geocoding period is split into two periods to show the differences in the linking levels before and after the land reforms had been conducted within all parishes. Note that the geocoding match rate has been updated and improved in this paper compared with a previous study[7].
Table 9

Match rates for each geocoding level and parish.

Geocoding levelPeriodPersonsPerson-yearsMatch rate (%)
 
Property unitAddress
The match rate represents the percent of geocoded person-years. Persons represents the number of individuals who have lived within the specified parish.
     
Hög1813–18481,3989,10887.897.5
 1849–19143,40833,03583.897.9
Kävlinge1813–18481,89811,27271.596.3
 1849–19149,50766,48663.291.7
Sireköpinge1813–18482,38818,3370*86.9*
 1849–191410,35380,50062.596.4
Halmstad1813–18482,42718,83358.875.21
 1849–19146,80656,39754.894.9
Kågeröd1813–18485,16555,99727.131.2
 1849–191410,85010,317576.994.2
All parishes1813–184812,826113,54837.345.4
 1849–191439,350339,59566.692.2

*The land reforms had not been conducted in Sireköpinge for the period 1813–1848. The individuals can be geocoded on the address level with the given match rate because most individuals lived within the village and did not live in their property units.

†The total number of individuals in this dataset for the period 1813–1914 is approximately 45,000, which is less than the 53,000 individuals mentioned in ref. 7. The reason is that the latter number includes all individuals with at least one event registered, whereas the former and publicly open dataset excluded individuals with only one event registered (individuals with only a birth, a death or a migration registered).

‡These match rates are low because some of the land reforms had not been implemented in this period. Thus, people still lived within the villages and cultivated nearby scattered plots.

Spatio-temporal topology validation (verification of overlapping polygons in space and time and verification of empty spaces)

A spatio-temporal topology validator for the property units is implemented in Python (https://www.python.org/), version 2.7, using the ArcPy package for ArcGIS Desktop 10.3 (http://desktop.arcgis.com/en/arcmap/10.3/analyze/arcpy/what-is-arcpy-.htm). One main requirement is that property unit polygons should not overlap in space and time, that is, two polygons that cover the same area should not simultaneously exist. A pseudo code for this topology check is as follows: For polygon i1, polygon i2 Check IF (i1 overlaps i2 OR i1 contains i2 OR i1 is within i2) AND i1.eDate >= i2.sDate AND i1.sDate <= i2.eDate The ‘sDate’ and ‘eDate’ represent the start and end year; thus, the topology is evaluated for each year. Polygons that overlap are corrected by either re-evaluating their object-lifelines or adjusting their borders. The former is performed when errors are observed in the estimations of the start and end dates of the property units; the latter is conducted when minor overlaps are caused by non-perfect geometries. The study area should not contain empty spaces, which appear when one polygon ceases to exist before another overlapping polygon starts to exist. To fill these empty spaces when no digitized geometries are available for a specific time period, new polygons are created by a subtraction method[7]. A total of 107 of the 1,159 property unit polygons were created in this manner.

Additional Information

How to cite this article: Hedefalk, F. et al. Spatiotemporal historical datasets at micro-level for geocoded individuals in five Swedish parishes, 1813–1914. Sci. Data 4:170046 doi: 10.1038/sdata.2017.46 (2017). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  4 in total

1.  The social mobility of servants in rural Sweden, 1740-1894.

Authors:  C Lundh
Journal:  Contin Chang       Date:  1999

2.  Early life effects across the life course: the impact of individually defined exogenous measures of disease exposure on mortality by sex in 19th- and 20th-century Southern Sweden.

Authors:  Luciana Quaranta
Journal:  Soc Sci Med       Date:  2014-05-15       Impact factor: 4.634

3.  Airborne infectious diseases during infancy and mortality in later life in southern Sweden, 1766-1894.

Authors:  Tommy Bengtsson; Martin Lindström
Journal:  Int J Epidemiol       Date:  2003-04       Impact factor: 7.196

4.  Influence of geocoding quality on environmental exposure assessment of children living near high traffic roads.

Authors:  Paul A Zandbergen
Journal:  BMC Public Health       Date:  2007-03-16       Impact factor: 3.295

  4 in total
  3 in total

1.  Historical dataset of administrative units with social-economic attributes for Austrian Silesia 1837-1910.

Authors:  Krzysztof Ostafin; Dominik Kaim; Tadeusz Siwek; Anna Miklar
Journal:  Sci Data       Date:  2020-06-30       Impact factor: 6.444

2.  MIReAD, a minimum information standard for reporting arthropod abundance data.

Authors:  Samuel S C Rund; Kyle Braak; Lauren Cator; Kyle Copas; Scott J Emrich; Gloria I Giraldo-Calderón; Michael A Johansson; Naveed Heydari; Donald Hobern; Sarah A Kelly; Daniel Lawson; Cynthia Lord; Robert M MacCallum; Dominique G Roche; Sadie J Ryan; Dmitry Schigel; Kurt Vandegrift; Matthew Watts; Jennifer M Zaspel; Samraat Pawar
Journal:  Sci Data       Date:  2019-04-25       Impact factor: 6.444

3.  Fine-grained, spatiotemporal datasets measuring 200 years of land development in the United States.

Authors:  Johannes H Uhl; Stefan Leyk; Caitlin M McShane; Anna E Braswell; Dylan S Connor; Deborah Balk
Journal:  Earth Syst Sci Data       Date:  2021-01-27       Impact factor: 11.333

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.