| Literature DB >> 30505919 |
George Alter1, Kees Mandemakers2.
Abstract
The Intermediate Data Structure (IDS) is a standard data format that has been adopted by several large longitudinal databases on historical populations. Since the publication of the first version in Historical Social Research in 2009, two improved and extended versions have been published in the Collaboratory Historical Life Courses. In this publication we present version 4 which is the latest 'official' standard of the IDS. Discussions with users over the last four years resulted in important changes, like the inclusion of a new table defining the hierarchical relationships among 'contexts,' decision schemes for recording relationships, additional fields in the metadata table, rules for handling stillbirths, a reciprocal model for relationships, guidance for linking IDS data with geospatial information, and the introduction of an extended IDS for computed variables.Entities:
Keywords: Comparative Research; Data Model; Demography; Entity Attribute Value Model; Historical Demography; History; IDS; Intermediate Data Structure; Life Courses; Social History
Year: 2014 PMID: 30505919 PMCID: PMC6261464
Source DB: PubMed Journal: Hist Life Course Stud ISSN: 2352-6343
Figure 1Strategy with intermediate structure collecting data for scientific research from historical longitudinal databases.
Records in the table INDIVIDUAL (excluding timestamp variables)
| Id | Id_D | Id_I | Source | Type | Value | Value_Id_C |
|---|---|---|---|---|---|---|
|
| ||||||
| 1 | DDB_release_2012.01 | 1 | Population register | Last_Name | Johansson | |
| 2 | DDB_release_2012.01 | 1 | Population register | First_Name | Christiaan | |
| 3 | DDB_release_2012.01 | 1 | Population register | Birth_Date | <time stamp> | |
| 4 | DDB_release_2012.01 | 1 | Population register | Birth_Location | <time stamp> | |
| 5 | DDB_release_2012.01 | 1 | Population register | Death_Date | 1029 | |
| 6 | DDB_release_2012.01 | 1 | Marriage certificate | Marriage_Date | <time stamp> | |
| 7 | DDB_release_2012.01 | 1 | Population register | Observation | <time stamp> | |
| 8 | DDB_release_2012.01 | 1 | Income tax register | Occupation | Timmerman | |
| 9 | DDB_release_2012.01 | 1 | Income tax register | Occupation_Eng | Carpenter | |
| 10 | DDB_release_2012.01 | 1 | Income tax register | Occupation_HISCO | 95410 | |
| 11 | DDB_release_2012.01 | 1 | Population register | Civil Status | Married | |
| 12 | DDB_release_2012.01 | 1 | Population register | Sex | Male | |
| 13 | DDB_release_2012.01 | 1 | Vaccination register | Vaccination | Vaccinated | |
Figure 2ERD-diagram tables of individual data
Explanation: The relations are described by way of so-called Entity Relationship Diagramming. Here: every individual may have one or more relationships with other individuals, but every relationship must refer to two individuals in the INDIV_INDIV table (see Beaumont 2007, for more information about Entity Relationship Diagramming).
Records in the table INDIV_INDIV (excluding timestamp variable)
| Id | Id_D | Id_I_1 | Id_I_2 | Source | Relation |
|---|---|---|---|---|---|
|
| |||||
| 1 | HSN_release_2010.02 | 1 | 2 | Birth certificate | Wife |
| 2 | HSN_release_2010.02 | 2 | 1 | Population register | Husband |
| 3 | HSN_release_2010.02 | 1 | 22 | Birth certificate | Mother |
| 4 | HSN_release_2010.02 | 22 | 1 | Birth certificate | Child |
| 5 | HSN_release_2010.02 | 2 | 22 | Population register | Father |
| 6 | HSN_release_2010.02 | 22 | 2 | Marriage certificate | Child |
| 7 | HSN_release_2010.02 | 2 | 23 | Population register | Householder |
| 8 | HSN_release_2010.02 | 23 | 2 | Population register | Maid |
| 9 | HSN_release_2010.02 | 2 | 8493 | Population register | Master |
| 10 | HSN_release_2010.02 | 8493 | 2 | Population register | Servant |
| 11 | HSN_release_2010.02 | 823 | 824 | Population register | Sibling |
| 12 | HSN_release_2010.02 | 824 | 823 | Population register | Sibling |
Figure 3Example of hierarchical layering of contextual information
Figure 4ERD-diagram of the contextual data
Explanation: see figure 2
Records in the table CONTEXT (excluding source and timestamp variables)
| Id | Id_D | Id_C | Type | Value |
|---|---|---|---|---|
|
| ||||
| 1 | Utah_release_2011.01 | 115023 | Street_id | 3929 |
| 2 | Utah_release_2011.01 | 115023 | Streetname | Mainstreet |
| 3 | Utah_release_2011.01 | 115023 | Streetnumber | 12 |
| 4 | Utah_release_2011.01 | 115023 | Long_Centroid | 233.838 |
| 5 | Utah_release_2011.01 | 115023 | Latit_Centroid | 193.933 |
| 6 | Utah_release_2011.01 | 115023 | Level | Address |
| 7 | Utah_release_2011.01 | 9022 | Name | Salt Lake Harbour |
| 8 | Utah_release_2011.01 | 9022 | Number_inhab | 230 |
| 9 | Utah_release_2011.01 | 9022 | Long_Centroid | 233.838 |
| 10 | Utah_release_2011.01 | 9022 | Latit_Centroid | 193.933 |
| 11 | Utah_release_2011.01 | 9022 | Level | Neighbourhood |
| 12 | Utah_release_2011.01 | 10345 | Name | Salt Lake City |
| 13 | Utah_release_2011.01 | 10345 | Number_inhab | 23455 |
| 14 | Utah_release_2011.01 | 10345 | Long_Centroid | 233.921 |
| 15 | Utah_release_2011.01 | 10345 | Latit_Centroid | 193.888 |
| 16 | Utah_release_2011.01 | 10345 | Level | Municipality |
| 17 | Utah_release_2011.01 | 115029 | Street_id | 2932 |
| 18 | Utah_release_2011.01 | 115029 | Streetname | Smallstreet |
| 19 | Utah_release_2011.01 | 115029 | Streetnumber | 212 |
| 20 | Utah_release_2011.01 | 115029 | Longitude | 233.847 |
| 21 | Utah_release_2011.01 | 115029 | Latitude | 193.899 |
| 22 | Utah_release_2011.01 | 115029 | Level | Address |
Guidelines for defining relationships of persons in the INDIV_CONTEXT table and including records in other tables
| 0 | Every individual has at least one record in the INDIV_CONTEXT table for each context in which the individual has been recorded. Do we need records in other tables as well? | |
| 1 | Is the relationship independent of the context, such as a biological or marital relationship (parent/child, husband/wife)? | |
| Yes | There must be a record in the INDIV_INDIV table. A definition of the relationship on the record in the INDIV_CONTEXT is usually not necessary. | |
| No | Go to step 2 | |
| 2 | Has the person a specific relationship with one or more individuals in the specific context? | |
| Yes | There must be records in the INDIV_INDIV table for these relationships. For example, when the source explicitly lists a relation to the head of household for each person, each of those relationships should be recorded in the INDIV_INDIV table. | |
| No | Go to step 3 | |
| 3 | Does the relationship include an occupational title (like servant, maid)? This includes titles describing a status, like ‘gentleman’, ‘student’ or ‘orphan’. | |
| Yes | Occupations are recorded in the INDIVIDUAL table. | |
| No | Go to step 4 | |
| 4 | Does the relationship have a meaning that is tied to the context in some way? Examples are: servant, lodger, boarder, boarding house keeper. | |
| YES | Include the value in the field | |
| NO | Keep the | |
Figure 5Defining relations between individuals in INDIV_INDIV or INDIV_CONTEXT tables
Example of records in the table INDIV_CONTEXT (without the field Source and part of the timestamp)
| Id | Id_D | Id_I | Id_C | Relation | Time Stamp (period) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
| 1 | Utah_release_2011.01 | 1001 | 115023 | Householder | 21 | 2 | 1879 | 2 | 6 | 1880 |
| 2 | Utah_release_2011.01 | 1001 | 115029 | Householder | 3 | 6 | 1880 | 30 | 11 | 1882 |
| 3 | Utah_release_2011.01 | 2009 | 115029 | Servant | 15 | 8 | 1879 | 5 | 8 | 1882 |
Figure 6ERD-diagram of the Intermediate Data Structure including the METADATA table
Explanation: see figure 2
Records in the table METADATA
| Id | Id_D | Type_T | Type | Value | New | Extract | Id_D_Explanation | Explanation |
|---|---|---|---|---|---|---|---|---|
| 1 | STANDARD | INDIVIDUAL | DEATH | DEFINITION | 1.0 | STANDARD | Date of occurence of death. | |
| 2 | STANDARD | INDIVIDUAL | DEATH | DEFINITION | 1.0 | HSN | Date of occurence of death; three sources which we used in the following preference: 1 civil certificate, 2 population register, 3 Red Cross. We use ‘Red Cross’ as source when dates are estimated on the basis of circumstantial information but must be considered quite accurate, e.g. the date of death in German termination camps like Sobibor which was estimated on the base of date of deportation from The Netherlands. | |
| 3 | DDB | INDIVIDUAL | CHILDBIRTH ASSISTANT | DEFINITION | 3.0 | DDB | Indicates whether the child is delivered by a trained midwife. | |
| 4 | DDB | INDIVIDUAL | CHILDBIRTH ASSISTANT | Delivery with an unexamined assistant | 3.0 | DDB | The child was delivered with help from an untrained assistant. | |
| 5 | DDB | INDIVIDUAL | CHILDBIRTH ASSISTANT | Midwife delivery | 3.0 | DDB | The child was delivered with help from a trained midwife. | |
| 6 | DDB | INDIVIDUAL | CHILDBIRTH ASSISTANT | Midwife delivery with instruments | 3.0 | DDB | The child was delivered with help from a trained midwife and instruments were used. | |
| 7 | DDB | INDIVIDUAL | CHILDBIRTH ASSISTANT | Unknown | 3.0 | DDB | The way the child was delivered is unknown. | |
| 8 | STANDARD | CONTEXT | HOUSEHOLD SIZE | DEFINITION | 4.0 | H_Size__SEDD_2013_01 | STANDARD | Total number of household membership. |
| 9 | HSN | INDIVIDUAL | MUNICIPAL_INCOME_TAX | DEFINITION | X | HSN | Value municipal income tax, year of the tax defined by the timestamp and name of the municipality by way of the context. |
Practical guidelines for defining variables in the Metadata table
| 1 | Is your variable/value completely described in the field | ||
| Yes | You are using the standard explanation and you have nothing to add to the content of the | ||
| STANDARD | |||
| STANDARD | |||
| No | Go to step 2 | ||
| 2 | Is the STANDARD explanation applicable but incomplete? For example, do you need more explanation about the construction of the variable? | ||
| Yes | Make a new record with your own explanation in the | ||
| STANDARD | |||
| Acronym of your database | |||
| No | Go to step 3 | ||
| 3 | Your variable does not fit in the existing STANDARD scheme, and you think it is a good candidate for a new STANDARD variable. | ||
| Yes | Make a record with the explanation of your proposal for the new STANDARD variable or values and send the proposal to the Clearing Committee; while waiting for approval, go further with step 4 as a temporary solution. | ||
| No | Go to step 4 | ||
| 4 | Your variable does not fit in the existing STANDARD scheme, and you must make metadata that will function within the IDS of your own database (example record 9 in | ||
| YES | Acronym of your database | ||
| Acronym of your database | |||
| Fill with an ‘X’, in case of clearance the version number of the IDS will replace the ‘X.’ | |||
Practical guidelines for defining dates and periods in the Timestamp
| 0 | Is the date or period assigned by the database administrator or derived from the sources? For example, periods of observation which are not directly given in the sources are ‘assigned’ dates. | |||
| Yes | Assigned | |||
| All possible values; in case of estimation of a date, the range in which the date is estimated may be given in the period fields. | ||||
| No | Go to step 1 | |||
| 1 | Does the date describe an event, observed at the moment of the event itself (like the date of a divorce in a divorce certificate or the date of a birth in a birth certificate)? | |||
| Yes | In this case no estimation of a date is allowed: | |||
| Event | ||||
| Exact | ||||
| Go to step 4 | ||||
| No | Go to step 2 | |||
| 2 | Does the date describe an event from an earlier time, reported in a source compiled after the event occurred (like the date of a marriage in a certificate of divorce or a marriage date in a population register)? | |||
| Yes | Is the date an exact date? | |||
| Yes | Reported | |||
| Exact | ||||
| No | Reported | |||
| All possible values; in the period fields provide the range in which the date is estimated. | ||||
| Go to step 4 | ||||
| No | Go to step 3 | |||
| 3 | Is the date a moment in time at which a certain attribute is valid (like the status ‘married’ or some occupational title)? In this case, you do not know when the attribute took this value. | |||
| Yes | Is the date an exact date? | |||
| Yes | Declared | |||
| Exact | ||||
| No | Declared | |||
| All possible values; in the period fields you have to include the range in which the date is estimated. | ||||
| 4 | Does the source (or combination of sources) implicitly or explicitly include a second date for the same attribute? Implicit dates may be related to the end of observation in a source, like a population register which is valid for a period of time. | |||
| Yes | Follow steps 1–3 above to create a second record in which the timestamp includes the beginning/end of the period. Choose the appropriate | |||
| No | End | |||
| Version | Change |
|---|---|
| 2.0 | New approach of the CONTEXT, introduction of CONTEX_CONTEXT table, see |
| 2.0 | Making the field |
| 2.0 | Including all examples of records in the tables in the text itself (instead of the Appendix). |
| 2.0 | Removing the first more theoretically part of the original IDS-article. |
| 2.0 | Including decision schemes on relations (what to put in which table); metadata and the timestamp. |
| 2.0 | Introduction of the field |
| 2.0 | Introduction of the field |
| 3.0 | Including two new fields in the METADATA table ( |
| 3.0 | Including a graphic form of the decision scheme on INDIV_CONTEXT. |
| 3.0 | Introducing a new value for the field |
| 3.0 | Including a solution for geometric data of which the strings are too long to be included in the IDS. |
| 3.0 | The database identifying field |
| 3.0 | Checking and improving the text on inconsistencies (especially the tables with examples of records). |
| 3.0 | The description of the METADATA table (par. 3.4) has been improved and the format of the table itself has been brought in correspondence with the description. This includes two new fields: |
| 4.0 | Inclusion of a type for stillbirths (and discussion of the peculiar nature of this type). |
| 4.0 | Including a new section 3.2.2 for the handling of the start date and end date of observations. |
| 4.0 | Explanation why locations and dates have two records in the case of events (instead of one). |
| 4.0 | Introduction of extended IDS and a new field in the Metadata table, Extract, for the name of extraction software (extension of section on metadata). |
Records in the table CONTEXT_CONTEXT (without the Source field and part of the timestamp)
| Id | Id_D | Id_C_1 | Id_C_2 | Relation | Time Stamp (period) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
| 1 | Utah_release_2011.01 | 115023 | 10345 | Address and Municipality | 21 | 2 | 1879 | 2 | 6 | 1882 |
| 2 | Utah_release_2011.01 | 115029 | 9022 | Address and Neighborhood | 21 | 2 | 1879 | 2 | 6 | 1882 |
| 3 | Utah_release_2011.01 | 9022 | 10345 | Neighborhood and Municipality | 21 | 2 | 1879 | 2 | 6 | 1882 |