Literature DB >> 30505920

IDS Transposer: A Users Guide.

Emily Klancher Merchant1, George Alter2, Jane Wang3, Ashok Bhargav2.   

Abstract

The Intermediate Data Structure (IDS) provides a standard format for storing and sharing individual-level longitudinal life-course data (Alter and Mandemakers 2014; Alter, Mandemakers and Gutmann 2009). Once the data are in the IDS format, a standard set of programs can be used to extract data for analysis, facilitating the analysis of data across multiple databases. Currently, life-course databases store information in a variety of formats, and the process of translating data into IDS can be long and tedious. The IDS Transposer is a software tool that automates this process for source data in any format, allowing database administrators to specify how their datasets are to be represented in IDS. This article describes how the IDS Transposer works, first by going through an example step-by-step, and then by discussing each part of the process and potential options and exceptions in detail.

Entities:  

Keywords:  Data management; Event history data; Family reconstitution; Historical demography; IDS Transposer; Intermediate Data Structure; Longitudinal life-course data

Year:  2017        PMID: 30505920      PMCID: PMC6261465     

Source DB:  PubMed          Journal:  Hist Life Course Stud        ISSN: 2352-6343


1 INTRODUCTION

It is important to note at the outset that there is no single correct way to represent a given dataset in IDS.[1] The process of translating a dataset from its native format into the IDS standard involves numerous decisions about how the particular dataset will be represented in IDS. Such decisions include designating Contexts, choosing Types and Relationships, and assigning dates in the Timestamp. The IDS Transposer retains all of the flexibility inherent in the IDS standard; it allows the user to specify exactly how a given dataset should be represented in IDS. For that reason, the user of the IDS Transposer must decide in advance how the original dataset will be represented in IDS. The Transposer does not make those decisions; it simply provides users with a simple tool for implementing them.

Step-by-Step Example

In the following step by step example, we will refer to several tables which are included in the Appendix. We will work with three datasets: a) the “original” dataset, b) the output IDS dataset, and c) a modified “input” dataset with modifications to the “original” dataset required for IDS. The data-set to be translated to IDS, the “original dataset” and its modified counterpart the “input dataset”. may include any number of tables. The resulting IDS tables (INDIVIDUAL, CONTEXT, INDIV_INDIV, INDIV_CONTEXT, CONTEXT_CONTEXT) will be referred to collectively as the “IDS dataset.” The IDS Transposer takes two types of files, all in .csv (comma-separated values) or tab-delimited format: the input data files, which are created from but not necessarily identical to the tables of the original dataset; and two mapping files, titled ENTITY and RELATIONSHIP, which the user prepares to indicate how each element of the input data files is to be represented in the resulting IDS dataset. Following the IDS standard, file names, dataset names, and table names are in all uppercase (e.g. INDIVIDUAL), and field (column) names are in title case (e.g. Type). The contents of a cell within a table is given in quotes in the text (e.g. “Jones” or “45”), but not in the tables. The contents of the Type variable are all uppercase (e.g. “LAST_NAME”); the contents of other variables are in title case (e.g. “Jones”). Fields (columns) in the two mapping files used by the IDS Transposer (ENTITY and RELATIONSHIP) are italicized (e.g. EntityType). Table names and fields/columns in IDS tables are in bold (e.g. Id_D). For this example, we will use a synthetic dataset titled FAMREC, which was created to resemble datasets produced through family reconstitution. FAMREC includes information about the families of fictional couples born between 1700 and 1799 in the present-day neighborhoods of Los Angeles. This information is arrayed in two tables: the PARENTS table contains parents’ first and last names, dates and locations of birth, marriage, and death, and occupations; the CHILDREN table contains children’s first names and dates and locations of birth. Couples in the PARENTS table are identified by a unique identification code (Parid); children in the CHILDREN table are linked to their parents by the couple’s identification code (Parid). Table 1a shows the information for the first five couples listed in the PARENTS table; Table 1b shows the information for their children in the CHILDREN table. Variable names and explanations for this example are given in Table 2, but in actual use variables may be given any names. Place names found in FAMREC are collected in the PLACES table shown in Table 1c.
Table 1
a Original data file PARENTS
ParidHfirstHlastWfirstWlastHbirthWbirthMarriageHdeathWdeathHoccWoccHbirthlocWbirthlocMarrlocHdeathlocWdeathloc
1000FranklinEdwardsNellKim1760- 03-281757- 01-161786- 03-271798- 11-251810- 09-21SalespersonEye DoctorGlassell ParkBoyle HeightsWattsWattsWatts
1001TristramHernandezMarinaKennedy1752- 08-021753- 03-051778- 05-241796- 10-251817- 05-23PoliceBankerChatsworthReynierTarzanaArletaArleta
1002BradfordFisherCynthiaVasquez1744- 02-041743- 11-281759- 06-271804- 10-021822- 03-24HistorianFilmWestwoodLarchmontGlassell ParkGlassell ParkGlassell Park
1003OliverPetersonAntoniaPorter1717- 02-071714- 08-141737- 08-291806- 09-081761- 03-20DesignerMaitreWattsBrooksideMontecitoMontecitoMontecito
1004MorganLeeAdelineWoods1773- 10-151773- 11-241807- 11-141819- 10-071840- 10-06EnglishScienceEdendaleCahuengaHollywoodHollywoodHollywood
Table 2
a Variable Names in PARENTS Table
Variable nameDescription
paridIdentification number of couple
hfirstHusband’s first name
hlastHusband’s last name
wfirstWife’s first name
wlastWife’s last name
hbirthHusband’s date of birth
wbirthWife’s date of birth
marriageMarriage date
hdeathHusband’s date of death
wdeathWife’s date of death
hoccHusband’s occupation
woccWife’s occupation
hbirthlocHusband’s place of birth
wbirthlocWife’s place of birth
marrlocPlace of marriage
hdeathlocHusband’s place of death
wdeathlocWife’s place of death
The first couple in this example is Franklin Edwards, born March 28, 1760 in Glassell Park, and Nell Kim, born January 16, 1757 in Boyle Heights. They married in Watts on March 27, 1786. At the time, Nell was an eye doctor and Franklin was a salesperson. They had one child, Mona, born May 22, 1788 in Watts. Both spouses died in Watts, Franklin on November 25, 1798 and Nell on September 21, 1810. The second couple, Tristram Hernandez and Marina Kennedy, had seven children, Lora, Cameron, Roman, Judith, Roger, Stacy, and Bertha.

2 DESIRED IDS OUTPUT

The IDS Transposer requires the user to decide in advance how the original dataset will be represented in IDS. Tables 3a-e show how we will represent FAMREC in IDS for the purpose of this example, including all data for the first family in FAMREC.
Table 3
a INDIVIDUAL Table
Id_DId_ISourceTypeValueValue_ Id_CDate_ TypeEstimationMissingYearMonthDayStart_ YearStart_ MonthStart_ DayEnd_ YearEnd_ MonthEnd_ Day
FAMREC10001PARENTSFIRST_NAMEFranklinDeclaredExact1786327
FAMREC10001PARENTSLAST_NAMEEdwardsDeclaredExact1786327
FAMREC10001PARENTSSEXMaleTime_Invariant
FAMREC10001PARENTSOCCUPATIONSalespersonDeclaredExact1786327
FAMREC10001PARENTSBIRTH_DATEEventExact1760328
FAMREC10001PARENTSBIRTH_LOCATIONGlassell101EventExact1760328
FAMREC10001PARENTSMARRIAGE_ DATEEventExact1786327
FAMREC10001PARENTSMARRIAGE_ LOCATIONWatts104EventExact1786327
FAMREC10001PARENTSDEATH_DATEEventExact17981125
FAMREC10001PARENTSDEATH_LOCATIONWatts104EventExact17981125
FAMREC10002PARENTSFIRST_NAMENellExact1786327
FAMREC10002PARENTSLAST_NAMEKimExact1786327
FAMREC10002PARENTSSEXFemaleTime_Invariant
FAMREC10002PARENTSOCCUPATIONEye DoctorDeclaredExact1786327
FAMREC10002PARENTSBIRTH_DATEEventExact1757116
FAMREC10002PARENTSBIRTH_LOCATIONBoyle136EventExact1757116
FAMREC10002PARENTSMARRIAGE_ DATEEventExact1786327
FAMREC10002PARENTSMARRIAGE_ LOCATIONWatts104EventExact1786327
FAMREC10002PARENTSDEATH_DATEEventExact1810921
FAMREC10002PARENTSDEATH_LOCATIONWatts104EventExact1810921
FAMREC100001CHILDRENFIRST_NAMEMonaDeclaredExact1788522
FAMREC100001CHILDRENLAST_NAMEEdwardsDeclaredExact1788522
FAMREC100001CHILDRENBIRTH_DATEEventExact1788522
FAMREC100001CHILDRENBIRTH_LOCATIONWatts104EventExact1788522
In the INDIVIDUAL table (Table 3a), places are given with both names (Value) and context identifi-cation codes (Value_Id_C). Since Value_Id_C points directly to a row in the CONTEXT table, we do not need to put the place name in the Value column, but the place name is included to make the data easier to read. IDS recognizes four kinds of Date_Type in the Timestamp attached to every attribute. “Event” is used when the value of an attribute, such as marital status, is known to have changed on a specific date. We use “Declared” when the value of an attribute is recorded in a document created on a specific date, but we do not know when that value was attained. When the timing of an event is provided in a document composed at a later date, we set Date_Type to “Reported.” For example, it is common for marriage certificates to “report” birth dates for brides and grooms. “Assigned” is used when the database manager has imputed a date to an attribute. “Marriage” is an “Event,” when the date comes from the marriage record because the CIVIL_STATUS of the bride and groom transition from “Unmarried” to “Married” on that date. We know the OCCUPATIONs of the bride and groom on the day of the marriage from the marriage certificate, so the Date_Type for each occupation is “Declared”. BIRTH_DATE appears in both “Event” and “Reported” Date_Types. The birth dates of the bride and groom are “Reported” retrospectively in the marriage certificate, but the birth dates of children come from birth certificates, recorded as each “Event” occurred. All of the dates in this dataset are known exactly (i.e. day, month, and year are present), so Estimation is set to “Exact”. We included a row for parents’ SEX, which is not given in the original dataset, but can be inferred, assuming that all husbands are male and all wives female. FAMREC locates events in two CONTEXTs (Table 3b): Unions and Neighborhoods. “Union” is not part of the IDS Standard, but we use it to illustrate the operation of the IDS Transposer. In this example “Union” refers to the marital union of a husband and wife, and it allows us to locate the conjugal family geographically over time. We generate one “Declared” record in the CONTEXT table for the Neighborhood in which the Union was located at the time of the marriage, at the birth of every child, and at the death of each spouse. Note that we are assuming that neither spouse remarries. If remarriage was possible, we would include only the earliest date of death of a spouse. We do not associate the locations and dates of children’s marriages and deaths with the Union, because they may have left their parents’ homes before those events occurred. Contexts in IDS may be hierarchical, so we identify them with the attribute Type “LEVEL” (Table 3b). Unions have only one attribute Type (“LEVEL”) while places also have “NAME”. To illustrate nested contexts, we show the context hierarchy “Neighborhood”, “Municipality”, and “State”. We have designated our places as “Time_Invariant”, but start and end dates may be given in datasets when place names or boundaries change during the period observed in the data. Another alternative is to assign a known date to a context with Date_Type set to “Declared”, which indicates that it was valid on that date but start and end dates are not known. The INDIV_INDIV table (Table 3c) specifies all relationships between family members, in both directions (e.g., 10001 is the Husband of 10002 and 10002 is the wife of 10001). The start date of this relationship is the date of the marriage. We could have included the date of the first spouse to die as the end date of the relationship, but this date often needs to be computed. We listed child/parent relationships as “Time_Invariant”, because parent-child relationships do not change once children are born. The INDIV_CONTEXT table (Table 3d) places individuals in the lowest level context of the dataset, in this case the union (because unions are fully nested in neighborhoods). The Relation field specifies the relationship between the person and the context. We use marriage and birth dates as “declarations”. However, a database administrator could assign start and end dates for the time that each person spends living in the family. Finally, the CONTEXT_CONTEXT table (Table 3e) specifies the relationships among levels of context. We have sorted the rows to show the hierarchy of state-municipality-neighborhood. The “Municipality and State” relationship only needs to be entered once to be applied to all neighborhoods within the city. Union 1000 (Franklin Edwards and Nell Kim) generates four rows in CONTEXT_CONTEXT (table 3e), all of which place the union in Watts (ID_C=104). We see records in 1786 when the couple married, 1788 when their child was born, 1798 when the husband died, and 1810 when the wife died. There are ten observations for union 1001 (Tristram Hernandez and Marina Kennedy). This couple was married and had four children in Tarzana (ID_C=128). Between 1786 and 1788 they moved to Arletta (ID_C=107) where their last three children were born. They were still in Arletta when the husband and wife died. We assign the Date_Type “Declared” to these rows, because we only know that these families were in those neighborhoods on specific days. As this example demonstrates, there are many decisions to be made when translating a dataset into IDS. The ENTITY and RELATIONSHIP mapping files allow the user to specify to the IDS Transposer how the original dataset should be represented in IDS.

3 DATA PREPARATION

The desired output in our example IDS tables (Tables 3a–3e) includes information that is not in our original dataset. For example, Table 1a shows an identification number for the marriage (parid), but the husband and wife do not have IDs as individuals. For this example, we have kept the additional information to the minimum: we added unique identification codes for people and places, and we have added the sex of the parents. We use Parid, which is the identifier for a marital union, as the context identifier (ID_C) for the union. Tables 4a–4c show the input data files in which these additional fields have been added to the original data (PARENTS_INPUT, CHILDREN_INPUT, PLACES_INPUT).
Table 4
a Variables in the PARENTS_INPUT table
Variable nameDescriptionSource for computed variables
paridIdentification number of couple
hfirstHusband’s first name
hlastHusband’s last name
wfirstWife’s first name
wlastWife’s last name
hbirthHusband’s date of birth
wbirthWife’s date of birth
marriageMarriage date
hdeathHusband’s date of death
wdeathWife’s date of death
hoccHusband’s occupation
woccWife’s occupation
hbirthlocHusband’s place of birth
wbirthlocWife’s place of birth
marrlocPlace of marriage
hdeathlocHusband’s place of death
wdeathlocWife’s place of death
hidIdentification number of husband1000*parid+1
widIdentification number of wife1000*parid+2
source_tableData sourceSet to “PARENTS”
hbirth_dayHusband’s date of birth: DayDay from hbirth
hbirth_monthHusband’s date of birth: MonthMonth from hbirth
hbirth_yearHusband’s date of birth: YearYear from hbirth
wbirth_dayWife’s date of birth: DayDay from wbirth
wbirth_monthWife’s date of birth: MonthMonth from wbirth
wbirth_yearWife’s date of birth: YearYear from wbirth
marriage_dayDate of marriage: DayDay from marriage
marriage_monthDate of marriage: MonthMonth from marriage
marriage_yearDate of marriage: YearYear from marriage
hdeath_dayHusband’s date of death: DayDay from hdeath
hdeath_monthHusband’s date of death: MonthMonth from hdeath
hdeath_yearHusband’s date of death: YearYear from hdeath
wdeath_dayWife’s date of death: DayDay from wdeath
wdeath_monthWife’s date of death: MonthMonth from wdeath
wdeath_yearWife’s date of death: YearYear from wdeath
hbirthloc_idIdentification number for husband’s birth locationId_C from CONTEXT table for hbirth-loc
wbirthloc_idIdentification number for wife’s birth locationId_C from CONTEXT table for wbirth-loc
marrloc_idIdentification number for marriage locationId_C from CONTEXT table for marrloc
hdeathloc_idIdentification number for husband’s death locationId_C from CONTEXT table for hdeath- loc
wdeathloc_idIdentification number for wife’s death locationId_C from CONTEXT table for wdeath-loc
To prepare the input data files, we must do the following: Create a table of place names and context identification codes. Add context identification codes to matching place names. Create unique identification codes for husbands, wives, and children. Attach husband and wife identification codes to their children. Split date variables into their components: day, month, year. Create a variable indicating the source of data for each row. If the source of the data is not given in the dataset, we specify the name of the file in the original dataset (PARENTS or CHILDREN). The Appendix gives R code to create the table of place names, which we call PLACES_INPUT, and to add the requisite columns to the PARENTS and CHILDREN tables; we call the resulting tables PARENTS_INPUT and CHILDREN_INPUT. These tables are now our input data files for the IDS Transposer. If we had decided to include additional information, such as end of relationship dates or children’s last names, we would also have added columns for that information to the input files. Note that we did not add columns for parents’ sex: this information will be supplied in the mapping files.

4 MAPPING FILES

The IDS Transposer uses two “mapping” tables to control the transfer of information from the input data files to the output IDS tables. There are two mapping tables: ENTITY for the IDS INDIVIDUAL and CONTEXT tables and RELATIONSHIP for INDIV_INDIV, INDIV_CONTEXT, and CONTEXT_CONTEXT in IDS. Mapping is a complex process, because the structure of IDS (entity-attribute-value) is so different from the format in which most data are collected. As we illustrated above, a single row in the PARENTS file contributes many rows in the IDS INDIVIDUAL table. In the following sections we offer a strategy for creating mapping files. It begins by manually converting a small sample of the original dataset into IDS. This sample IDS serves as a template for creating the mapping files. First, we identify the unique IDS data types found in the sample IDS and place them in the tabular format used by the mapping files. We then insert the fields in the input data files that will be required to populate the IDS tables. The steps in this mapping process are illustrated in the following four sets of tables: A) Our sample IDS tables are shown in Tables 3a–e. These tables include a unique example of every IDS entry that can be created from the input tables. For example, we only need to convert one row in the CHILDREN file to IDS to show how all children will be mapped. B) Tables (5 and 8) show how content is transposed between fields in the input tables and fields in the IDS output files. Table 5 illustrates the transpositions for the ENTITY tables (INDIVIDUAL and CONTEXT). Table 8 illustrates the transpositions for the RELATIONSHIP tables (INDIV_INDIV, INDIV_CONTEXT, and CONTEXT_CONTEXT).
Table 5

Fields in IDS INDIVIDUAL and CONTEXT Tables and IDS Transposer ENTITY Mapping File

IDS INDIVIDUAL and CONTEXT TablesIDS Transposer ENTITY Mapping File
Field nameDescriptionField nameUse
IdPrimary keyAssigned automatically by IDS Trans-poser
TableNameName of data file for input. “PARENTS” is interpreted as “PARENTS. csv”
EntityTypeINDIVIDUAL or CONTEXT
Id_DIdentifier of the database or parts of the database from which the data are extracted.DatabaseIDShort text describing the database
Id_IIdentifying number of each in- dividual in the databaseEntityIDName of a column in the input file with an ID assigned to this individual or context.
SourceSpecification of the source.SourceShort text describing the source of the data within the database.
TypeType of attributeTypeAn IDS attribute type.
ValueThe value of the attribute.VariableNameName of a column in the input file with a value for each row
ValueShort text to be assigned to every row
Value_Id_CIdentifier to the CONTEXT-ta- ble for values of a contextual nature.Value_ID_CName of a column in the input file with the ID of a context.
Date_typeEvent, Reported, Declared, As- signedDateTypeAn IDS date_type: Event, Reported, Declared, Assigned
EstimationType of estimate of the date or period (e.g. “Age_based,” “Middling”)DateEstimationTypeWhen this field is blank, IDS Trans- poser will compute a value using the date information provided (“Exact,” “Month_year”, “Year”). The user may also specify a column in the input file giving the estimation method for each row.
DayDay numberDayName of a column in the input file
MonthMonth numberMonthName of a column in the input file
YearYear numberYearName of a column in the input file
Start_dayStart day numberStartDayName of a column in the input file
Start_monthStart month numberStartMonthName of a column in the input file
Start_yearStart year numberStartYearName of a column in the input file
End_dayEnd day numberEndDayName of a column in the input file
End_monthEnd month numberEndMonthName of a column in the input file
End_yearEnd year numberEndYearName of a column in the input file
MissingThis field explains why a date or part of a date is missingDateMissingTypeShort text explaining why the date is missing, e.g. “Unavailable,” “Time_ Invariant”
Table 8

Fields in IDS INDIV_INDIV, INDIV_CONTEXT and CONTEXT_CONTEXT Tables and IDS Transposer RELATIONSHIP Mapping File

IDS INDIV_INDIV, INDIV_CONTEXT and CONTEXT_CONTEXT TablesIDS Transposer RELATIONSHIP Mapping File

Field nameDescriptionField nameUse

IdPrimary keyAssigned automatically by IDS Transposer

TableNameName of data file for input. “PARENTS” is interpreted as “PARENTS.csv”

RelationshipTypeINDIV_INDIV, INDIV_CONTEXT, or CONTEXT_ CONTEXT

Id_DIdentifier of the database or parts of the database from which the data are extracted.DatabaseIDShort text describing the database

Id_I_1For INDIV_INDIV: Identifying number of the first individual in the relationshipFromEntityIDName of a column in the input file with an ID as- signed to this individual or context.
Id_IFor INDIV_CONTEXT: Identifying number of an individual
ID_C_1For CONTEXT_CONTEXT Identifying number of the less inclusive context in the relationship,

Id_I_2For INDIV_INDIV: Identifying number of the second individual in the relationshipToEntityIDName of a column in the input file with an ID as- signed to this individual or context.
Id_CFor INDIV_CONTEXT: Identifying number of a context
ID_C_2For CONTEXT_CONTEXT: Identifying number of the more inclusive context in the relationship

SourceSpecification of the source.SourceShort text describing the source of the data within the database.

RelationFor INDIV_INDIV: Type of rela- tionship of the first person to the second person. For example, per- son 1 is the “father” of person 2.RelationShort text describing relationship between first ID and second ID to be assigned to every row

RelationVariableName of a column in the input file with a relation value for each row
For INDIV_CONTEXT: Description of the relationship between the context layers, e.g. neighborhood and municipality
For CONTEXT_CONTEXT: The type of the relationship between individual and context, e.g. Boarder or Servant

Date_typeEvent, Reported, Declared, As- signedDateTypeAn IDS date_type: Event, Reported, Declared, As- signed

EstimationType of estimate of the date or period (e.g. “Age_based,” “Mid- dling”)DateEstimation- TypeWhen this field is blank, IDS Transposer will com- pute a value using the date information provided (“Exact,” “Month_year”, “Year”). The user may also specify a column in the input file giving the estimation method for each row.

DayDay numberDayName of a column in the input file

MonthMonth numberMonthName of a column in the input file

YearYear numberYearName of a column in the input file

Start_dayStart day numberStartDayName of a column in the input file

Start_monthStart month numberStartMonthName of a column in the input file

Start_yearStart year numberStartYearName of a column in the input file

End_dayEnd day numberEndDayName of a column in the input file

End_monthEnd month numberEndMonthName of a column in the input file

End_yearEnd year numberEndYearName of a column in the input file

MissingThis field explains why a date or part of a date is missingDateMissingTypeShort text explaining why the date is missing, e.g. “Unavailable,” “Time_Invariant”
C) Tables 6a–b and 9a–c serve as templates for the mapping files, ENTITY and RELATIONSHIP. These templates are constructed by copying the IDS control columns (e.g. Type, Date_type) from the sample IDS (Tables 3a–e) and ignoring the specific information from the input files that appears in our sample IDS. In Table 6a we list all of the IDS data types that will be derived for each type of person in the input data. The PARENTS file describes two “entities”: husband and wife. Each row in the CHILDREN file describes a child. In Tables 9a–b we show RELATIONSHIP information derived from three sources: PARENTS, CHILDREN, and the PLACES file.
Table 6
a INDIVIDUAL Table as Template for ENTITY File
Table 9
a INDIV_INDIV Table as Template for RELATIONSHIP File
D) In Tables 7 and 10 we complete the mapping process by adding field names from the input files.
Table 7

Mapping File: ENTITY

Table- NameEntity TypeData- base IDEntity- IDSourceTypeVariable NameValueValue_ ID_CDate- TypeDateEstimation TypeDayMonthYearStart DayStart MonthStart YearEnd- DayEnd MonthEnd YearDate Missing -Type
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tableFIRST_ NAMEhfirstDeclaredmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tableLAST_ NAMEhlastDeclaredmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tableSEXMaleTime_ Invariant
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tableOCCU- PATIONhoccDeclaredmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tableBIRTH_ DATEEventhbirth_ dayhbirth_ monthhbirth_ year
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tableBIRTH_ LOCATIONhbirthlochbirth- loc_idEventhbirth_ dayhbirth_ monthhbirth_ year
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tablemarriage_ DATEEventmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tablemarriage_ LOCATIONmarrlocmarr- loc_idEventmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tableDEATH_ DATEEventhdeath_ dayhdeath_ monthhdeath_ year
PARENTS_ INPUTINDIVIDUALFAM- REChidsource_ tableDEATH_ LOCATIONhdeath- lochdeath- loc_idEventhdeath_ dayhdeath_ monthhdeath_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tableFIRST_ NAMEwfirstDeclaredmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tableLAST_ NAMEwlastDeclaredmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tableSEXFemaleTime_ Invariant
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tableOCCUPATIONwoccDeclaredmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tableBIRTH_ DATEEventwbirth_ daywbirth_ monthwbirth_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tableBIRTH_ LOCATIONwbirthlocwbirth- loc_idEventwbirth_ daywbirth_ monthwbirth_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tablemarriage_ DATEEventmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tablemarriage_ LOCATIONMarrlocmarr- loc_idEventmarriage_ daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tableDEATH_ DATEEventwdeath_ daywdeath_ monthwdeath_ year
PARENTS_ INPUTINDIVIDUALFAM- RECwidsource_ tableDEATH_ LOCATIONwdeath- locwdeath- loc_idEventwdeath_ daywdeath_ monthwdeath_ year
CHILDREN_INPUTINDIVIDUALFAM- RECkididsource_ tableFIRST_ NAMEnameDeclaredbirth_ daybirth_ monthbirth_year
CHILDREN_INPUTINDIVIDUALFAM- RECkididsource_ tableLAST_ NAMElastnameDeclaredbirth_ daybirth_ monthbirth_year
CHILDREN_INPUTINDIVIDUALFAM- RECkididsource_ tableBIRTH_ DATEEventbirth_ daybirth_ monthbirth_year
CHILDREN_INPUTINDIVIDUALFAM- RECkididsource_ tableBIRTH_ LOCATIONbirthlocbirth- loc_idEventbirth_ daybirth_ monthbirth_year
PARENTS_ INPUTCONTEXTFAM- RECparidsource_ tableLEVELUnionDeclaredmarriage_ daymarriage_ monthmarriage_ year
PLACES_ INPUTCONTEXTFAM- RECpla- ceidLEVELlevelTime_ Invariant
PLACES_ INPUTCONTEXTFAM- RECpla- ceidNAMEplaceTime_ Invariant
Table 10

Mapping file: RELATIONSHIP

TableNameRelationshipTypeDatabaseIDFromEntityIDToEntity- IDSourceRelationRelation- VariableDate- TypeDateEstimation- TypeDayMonthYear- Start DayStart- MonthStart YearEnd- DayEnd- MonthEnd- YearDateMiss- ingType
PARENTS_ INPUTINDIV_INDIVFAMREChidwidsource_ tableHusbandDeclaredmarriage_daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIV_INDIVFAMRECwidhidsource_ tableWifeDeclaredmarriage_daymarriage_ monthmarriage_ year
CHILDREN_ INPUTINDIV_INDIVFAMRECdadidkididsource_ tableFatherDeclaredTime_Invariant
CHILDREN_ INPUTINDIV_INDIVFAMRECkididdadidsource_ tableChildDeclaredTime_Invariant
CHILDREN_ INPUTINDIV_INDIVFAMRECmomidkididsource_ tableMotherDeclaredTime_Invariant
CHILDREN_ INPUTINDIV_INDIVFAMRECkididmomidsource_ tableChildDeclaredTime_Invariant
PARENTS_ INPUTINDIV_ CONTEXTFAMREChidparidsource_ tableHusbandDeclaredmarriage_daymarriage_ monthmarriage_ year
PARENTS_ INPUTINDIV_ CONTEXTFAMRECwidparidsource_ tableWifeDeclaredmarriage_daymarriage_ monthmarriage_ year
CHILDREN_ INPUTINDIV_ CONTEXTFAMRECkididparidsource_ tableChildDeclaredbirth_daybirth_ monthbirth_ year
PLACES_INPUTCONTEXT_ CONTEXTFAMRECplaceidnested_inrelvarTime_Invariant
PARENTS_ INPUTCONTEXT_ CONTEXTFAMRECparidmarr- loc_idsource_ tableUnion and NeighborhoodDeclaredmarriage_daymarriage_ monthmarriage_ year
PARENTS_ INPUTCONTEXT_ CONTEXTFAMRECparidhdeath- loc_idsource_ tableUnion and NeighborhoodDeclaredhdeath_ dayhdeath_ monthhdeath_ year
PARENTS_ INPUTCONTEXT_ CONTEXTFAMRECparidwdeath- loc_idsource_ tableUnion and Neighbor- hoodDeclaredwdeath_ daywdeath_ monthwdeath_ year
CHILDREN_ INPUTCONTEXT_ CONTEXTFAMRECparidbirth- loc_idsource_ tableUnion and Neighbor- hoodDeclaredbirth_daybirth_ monthbirth_ year
In the following two sections we will explain at greater length how the transposing process is completed, first for entities and then for the relationships.

4.1 ENTITY MAPPING FILE

Our example IDS tables in Table 3 provide a template for the mapping files, as shown in Table 6. We suppose here that Table 3 was created by hand from one row in each input data file (Table 1). Tables 6a–b show how our example INDIVIDUAL table can be directly converted to rows in the ENTITY file: field names in the INDIVIDUAL and CONTEXT tables are given in boldface above the corresponding fields in the ENTITY file. Content of the cells in the ENTITY file that are identical to their values in the INDIVIDUAL or CONTEXT table are filled in; other cells are left blank. Note that the number of rows in ENTITY will be equivalent to the total number of rows for one individual of each role in the INDIVIDUAL table (10 rows for husband, 10 rows for wife, 4 rows for children in Table 6a) plus the number of rows in the CONTEXT table for one context of each level (1 row for union, 2 rows for place in Table 6b). ENTITY does not include the IDS Id field, which is made automatically by the IDS Transposer. ENTITY requires two fields that link input files to IDS tables: tableName tells the IDS Transposer the source of the data (PARENTS_INPUT, CHILDREN_INPUT, and PLACES_INPUT); entityType tells the IDS Transposer to which IDS table the output will be added (INDIVIDUAL or CONTEXT). The Timestamp fields in the mapping file correspond directly to the Timestamp fields in the IDS tables: values for the IDS fields Date_type, Estimation, and Missing are given directly in the DateType, DateEstimationType, and DateMissingType fields of the mapping table; values of the other Timestamp fields come from the input data files. Table 7 shows the completed ENTITY mapping file. Note that rows 1-10 indicate how the IDS Transposer is to create rows in the IDS INDIVIDUAL table for each husband in PARENTS_INPUT; rows 11-20 indicate how the IDS Transposer is to create rows in the IDS INDIVIDUAL table for each wife in PARENTS_INPUT; rows 21-24 indicate how the IDS Transposer is to create rows in the IDS INDIVIDUAL table for each child in CHILDREN_INPUT; row 25 indicates how the IDS Transposer is to create rows in the IDS CONTEXT table for each union in PARENTS_INPUT, and rows 26-27 indicate how the IDS Transposer is to create rows in the IDS CONTEXT table for each place in PLACES_INPUT. Because rows 1-20 and 25 of the ENTITY mapping file come from the PARENTS_INPUT file, the field TableName takes the value “PARENTS_INPUT” for those rows. TableName takes the value “CHILDREN_INPUT” for rows 21-24, and “PLACES_INPUT” for rows 26-27. The field EntityType takes the value of “INDIVIDUAL” for rows 1-24, which specify how information should be displayed in the IDS INDIVIDUAL table, and “CONTEXT” for rows 25-27, which specify information destined for the CONTEXT table. EntityID in the mapping file specifies the field in the input file giving an ID code for the relevant person. Since rows 1-10 create information about husbands, EntityID for those rows takes the value “hid,” the name of the variable we created for the husband’s identification code (Table 4a). Similarly, EntityID for rows 11-20 takes the value “wid” and EntityID for rows 21-24 takes the value “kidid.” For the final three rows, the EntityID field takes the identification code for the relevant context: since row 25 refers to a union, its identification code is “parid” from PARENTS_INPUT; rows 26-27 refer to places, so EntityID takes the value of “placeid,” the identification code we gave to places in PLACES_INPUT. The Source field in an IDS table may be used to identify different source documents within a database. The IDS Transposer expects a Source column in the input data to provide this information. In this example, we created a variable called Source_table in all input data files. If the name of this variable varied across input files, Source could take different values, as in the example given below (see 5. An Alternative Data Structure). The type field in ENTITY is identical to the Type field in the IDS tables. Note that types are listed separately for each role in the original dataset (e.g., FIRST_NAME is listed once for husbands, once for wives, and once for children). Information that will go into the Value field in the IDS table is listed either in VariableName or Value or Value_ID_C in the mapping file. Either the VariableName field or the Value field will contain information; the others will be blank. For types that refer only to a date, such as BIRTH_DATE, all three fields will be blank because the relevant information is a date described by the ‘time stamp’. If the information for the IDS Value field is in the input data file, the name of the relevant variable/column is specified in the VariableName field in the mapping file. For example, the value of FIRST_NAME for husbands is contained in the variable Hfirst, so we list that in the Variable- Name column of the first row of the mapping file. Note that in row 11, which produces the FIRST_NAME row for all wives, we specify “wfirst,” the variable containing wives’ first names in PARENTS_INPUT, in the VariableName field of the mapping file. If the information for the IDS Value field is not in the input data file, but is the same for all individuals in a role (that is, all husbands, all wives, or all children), the value is specified directly in the Value field of the mapping table. In our example, the sex of parents is not given in PARENTS_INPUT. Assuming that all husbands are male and all wives female, we enter “Male” in the Value field of the mapping file for row 3 and “Female” in the Value field of the mapping file for row 13. The text given in the Value column is assigned to the person identified in the entityID column: “hid” for husbands and “wid” for wives. If the IDS value for a given Type is a context identification code (Value_Id_C), the field in the input data set containing that code is listed in the Value_ID_C field of the mapping file. In our example, husband’s birth location is a context, given in the variable Hbirthloc_id in PARENTS_ INPUT, so we list “Hbirthloc_id” in the Value_ID_C field. For rows 25-27, which produce the CONTEXT table, the context identification code goes in the En-tityID field of the mapping file, as described above, and the Value_ID_C field is blank. The value for LEVEL is specified directly in the Value field of the mapping file; the names of places are contained in the Place field in PLACES_INPUT, so “Place” is listed in the VariableName field of the mapping file. For the Timestamp, the values of the IDS fields Date_type and Missing are specified directly in the corresponding fields of the entity mapping file (DateType and DateMissingType). Consequently, these fields will be the same for every IDS row created from that ENTITY mapping. Thus, all rows describing the husband’s FIRST_NAME in IDS will get the Date_type and Missing provided in the first row of the ENTITY table. Births, marriages, and deaths are listed as “Event” in the DateType field, because they represent transitions occurring on a specific day. Occupations are assigned to DateType “Declared,” because we do not know when an individual changed occupation. The value of the Missing field in the IDS tables is specified directly only if it is the same for the given combination of Type and role. For example, “Time_Invariant” is specified in row 3 for SEX, because we assume that all husbands are male at all times. The DateEstimationType field is used to indicate that a date or period in the timestamp has been estimated by the database provider. When a field/column in the input file is specified in DateEstimationType, the IDS Transposer uses values from this field for all rows of this type. For example, setting DateEstimationType to “est_var” tells IDS Transposer to take values for the Estimation field from that column in the input file. If no field/column is listed in DateEstimationType, the IDS Transposer will create this variable from the date fields. IDS Transposer uses “Exact” if all components of the date are present in the data, “Year_Month” if the day is missing, and “Year” if only year is given. These are specific to individuals, so if one person has only the year of his birth listed in the input data file and another has the full date, Estimation in the resulting IDS INDIVIDUAL table will be “Year” for the first and “Exact” for the second. The fields Day, Month, Year, Startday, Startmonth, Startyear, Endday, Endmonth, and Endyear in the mapping file take the names of the fields in the input data file containing the relevant day, month, and year. Date variables are to be listed for each relevant combination of Type and structural role. In our example, “Hbirth_day” is given in the Day field of the mapping file for the husband’s BIRTH_DATE and the husband’s BIRTH_LOCATION. Because we assume that information about occupation was collected from the marriage certificate, we list the marriage date fields in the OCCUPATION rows. Note that we use the date of the marriage (Marriage_day, Marriage_month, Marriage_year) to create a record in the CONTEXT table for each marital “union.” In this case we put the marriage date in IDS fields Day, Month, Year, which we identify as an “event.” Marriage is the starting date for a marital union, but we did not use the starting date fields (Startday, Startmonth, Startyear), because we have not computed ending dates for each union. IDS expects a starting date to have a corresponding ending date. Periods that are not completely defined may cause problems for programs designed to extract data from IDS.[2]

4.2 RELATIONSHIP MAPPING FILE

As in the ENTITY file, each row in the RELATIONSHIP file creates a row in an IDS table for every row with relevant information in the input data files. Table 8 shows the relationship between fields in the RELATIONSHIP file and the fields that they fill in the INDIV_INDIV, INDIV_CONTEXT, and CONTEXT_CONTEXT files. We fill templates for the RELATIONSHIP table by selecting the minimum number of rows to fill the IDS tables in Tables 3c, 3d, and 3e. Templates for the RELATIONSHIP table are shown in Table 9. Table 9a shows six rows destined for the INDIV_INDIV table. There are three dyadic pairs (husband-wife, father-child, mother-child), and each pair enters the table twice (e.g. “husband” and “wife”). Relationships between individuals are listed in a reciprocal way, once for the relationship of the first person to the second and again for the relationship of the second person to the first (e.g., spousal relationships will be listed twice, once with “Husband” as the Relation and once with “Wife” as the Relation). The marriages described in the PARENTS table yield one relationship pair, husband-wife, and the births described in the CHILDREN table provide two pairs, mother-child and father-child. Table 9b shows how we place each type of person in the INDIV_CONTEXT table. The three rows in this table represent the roles in a marital union: husband, wife, child. Relationships between individuals and contexts are listed once, with the Relation field referring to the relationship of the individual to the context. In Table 9c we represent five rows with CONTEXT_CONTEXT relationships. Relationships between contexts are listed once, with the lower level given first and the Relation field referring to the relationship of the lower level context to the higher level context. The template has one row for the hierarchy of places (Neighborhood/Municipality/State) that is described in the PLACES_INPUT file (see below). CONTEXT_CONTEXT information placing the family (marital union) in a neighborhood is recorded at four key events: marriage, husband’s death, wife’s death, and the births of children. Each of these events generates a “Union and Neighborhood” row in CONTEXT_CONTEXT. The templates in Table 9 are completed in the RELATIONSHIP table shown in Table 10. In the RELATIONSHIP table rows 1-2 refer to the spousal relationship, rows 3-6 refer to relationships between children and parents. Rows 7-8 place husbands and wives in marital unions, and row 9 associates children with the unions of their parents. Row 10 is used to read the hierarchy contexts from the PLACE_INPUT file. Note that we added three columns to PLACES_INPUT that were not in the original PLACES file: Level, Nested_in, and Relvar. The Nested_in column is the ID of the next higher level in the context hierarchy, and Relvar is the relationship of this level to the next higher level. For example, “Los Angeles” is a “Municipality”, which is nested in the “State” of “California”. The relationship between “Los Angeles” and “California” is “Municipality and State”. Since the relationship is read from PLACES_INPUT, we put Relvar in the Relation_Variable column of the RELATIONSHIP file. Locations of “Union in Neighborhood” are derived from the PARENTS_INPUT and CHILDREN_INPUT files in Rows 11-14. Row 11 in the RELATIONSHIP table identifies the location of the union at the time of the marriage. Rows 12 and 13 show where the family was living when the husband and wife died. Row 14 will add a row to CONTEXT_CONTEXT each time that a child is born. We use DateType “Declared” for these rows, because we do not know if and when the location of the union changed. Note that we are locating individuals in places by associating them with unions. We could also choose to associate individuals directly with places, by using their IDs in the INDIV_CONTEXT table, but the approach shown here is more efficient. Since individuals are linked to unions, each “Union in Neigh-borhood” row in the CONTEXT_CONTEXT table automatically associates every person in the family with the new neighborhood. Again, the TableName field is the name of the input data file that contains the information: PARENTS_INPUT for rows 1-2, 7-8, and 10-13 and CHILDREN_INPUT for rows 3-6 and 14. The Relationship-Type field indicates the IDS table to which the information will be written: INDIV_INDIV for the spousal and parent-child relationships in rows 1-6, INDIV_CONTEXT for rows 7-9, and CONTEXT_CONTEXT for rows 10-14. These rows comprise the full RELATIONSHIP file, so all relationships of a given type are created in the same way. The DatabaseID, Source, and Timestamp fields in the RELATIONSHIP file are exactly the same as those in the ENTITY file and will not be discussed here. The FromEntityID field in the RELATIONSHIP file points the IDS Transposer to the variable in the input data file that contains the identification code that will be the value of the Id_I_1 field in the INDIV_IN-DIV table, the Id_I field in the INDIV_CONTEXT table, or the Id_C_1 field in the CONTEXT_CONTEXT table. The ToEntityID field points to the variable in the input data file that contains the identification code that will be the value of the Id_I_2 field in the INDIV_INDIV table, the Id_C field in the INDIV_ CONTEXT table, or the Id_C_2 field in the CONTEXT_CONTEXT table. Note that the ID variables shown in FromEntityID and ToEntityID fields are reversed as we move from “Husband” to “Wife” in rows 1 and 2. The RELATIONSHIP file includes two fields that provide information for the Relation field in the resulting IDS tables: Relation and RelationVariable. Just as with the VariableName and Value fields in the ENTITY file, only one of these two fields may have a value in any given row. In our example, all but one of the relationships are specified directly in the mapping file through the Relation field. But in row 10 we use a RelationVariable to read from the Relvar column in the PLACES_INPUT file, which contains “Municipality and State” or “Neighborhood and Municipality”.

5 IDS TRANSPOSER

We now have all of the files necessary to use the IDS Transposer. Our input data files are the PARENTS_INPUT, CHILDREN_INPUT, and PLACE_INPUT tables described above in Step 2. Our mapping files are ENTITY and RELATIONSHIP shown in Table 7 and Table 10. Note that these tables include all of the rows necessary for mapping the FAMREL dataset in the format we created in Step 2. All files must be in .csv (comma-separated values) format. The IDS Transposer allows users to map to these files on their computer or server and emails the resulting IDS tables to users. The IDS Transposer is found at this URL: http://www.icpsr.umich.edu/icpsrweb/ICPSR/idsTransposer/idsTransposer Figure 1 shows a screen shot of the data entry page, and Figure 2 shows the result of a successful execution.
Figure 1

IDS Transposer File Selection Screen

Figure 2

IDS Transposer File Selection Screen

6 AN ALTERNATIVE DATA STRUCTURE

The example described above started with a file structure typically found in family reconstitutions studies. All information about a married couple is contained in one file, and their children are described in a second file. Here we show the same data in an alternative file structure where the data are arranged by source document: marriages, births, and deaths. Table 11 shows the input data files used in this example. Notice that the husband and wife in each marriage appear in all three files.
Table 11
a Example 2: MARRIAGES_INPUT
paridhfirsthlastwfirstwlasthoccwocchidwidMarriage_dayMarriage_monthMarriage_yearmarrlocmarrloc_idsource
1000FranklinEdwardsNellKimSalespersonEye Doctor10001100022731786Watts104Marriages
1001TristramHernandezMarinaKennedyPoliceBanker10011100122451778Tarzana128Marriages
1002BradfordFisherCynthiaVasquezHistorianFilm10021100222761759Glassell101Marriages
1003OliverPetersonAntoniaPorterDesignerMaitre10031100322981737Montecito137Marriages
1004MorganLeeAdelineWoodsEnglishScience100411004214111807Hollywood166Marriages
Table 12 shows the ENTITY mapping file for the data in Table 11. An important difference from the previous example is that names are given in all three input files. This is normally the case in the original sources, and it is common for names to differ between sources. For example, “William” in one source may be “Bill” in another. The IDS format described in Table 12 captures these variations, because names are taken from each input table. In IDS each version of a name is “Declared” with a date and a Source (“Births”, “Marriages”, or “Deaths”).
Table 12

Example 2: ENTITY Mapping File

Table- NameEntity- TypeDataBaseIDEntityIDSourceTypeVariable- NameValueValue_ ID_CDate - TypeDateEstimation- TypeDayMonthYearStart DayStart- MonthStart YearEnd - DayEnd- MonthEnd YearDateMiss- ingType
marriag- es_inputINDIVIDUALFAMREChidsourceFIRST_ NAMEhfirstDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMREChidsourceLAST_ NAMEhlastDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMREChidsourceSEXMaleTime_In- variant
marriag- es_inputINDIVIDUALFAMREChidsourceOCCUPATIONhoccDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMREChidsourcemarriage_ DATEEventmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMREChidsourcemarriage_ LOCATIONmarrlocmarr- loc_idDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMRECwidsourceFIRST_ NAMEwfirstDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMRECwidsourceLAST_ NAMEwlastDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMRECwidsourceSEXFemaleDeclaredTime_In- variant
marriag- es_inputINDIVIDUALFAMRECwidsourceOCCUPATIONwoccDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMRECwidsourcemarriage_ DATEEventmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIVIDUALFAMRECwidsourcemarriage_ LOCATIONmarrlocmarr- loc_idDeclaredmarriage_ daymarriage_ monthmarriage_ year
deaths_ inputINDIVIDUALFAMRECidisourceFIRST_ NAMEfnameDeclareddeath_ daydeath_ monthdeath_ year
deaths_ inputINDIVIDUALFAMRECidisourceLAST_ NAMElnameDeclareddeath_ daydeath_ monthdeath_ year
deaths_ inputINDIVIDUALFAMRECidisourceDEATH_ DATEEventdeath_ daydeath_ monthdeath_ year
deaths_ inputINDIVIDUALFAMRECidisourceDEATH_ LOCATIONdeath- locdeath- loc_idDeclareddeath_ daydeath_ monthdeath_ year
births_inputINDIVIDUALFAMRECkididFIRST_ NAMEfnameFIRST_ NAMEDeclaredbirth_ daybirth_ monthbirth_ year
births_inputINDIVIDUALFAMRECkididLAST_ NAMElnameLAST_ NAMEDeclaredbirth_ daybirth_ monthbirth_ year
births_inputINDIVIDUALFAMRECkididsourceBIRTH_ DATEEventbirth_ daybirth_ monthbirth_ year
births_inputINDIVIDUALFAMRECkididsourceBIRTH_ LOCATIONbirthlocbirth- loc_idDeclaredbirth_ daybirth_ monthbirth_ year
plac- es_inputCONTEXTFAMRECpla- ceidLEVELlevelTime_In- variant
plac- es_inputCONTEXTFAMRECpla- ceidNAMEplaceTime_In- variant
The RELATIONSHIP mapping file in Table 13 differs from the previous example in its handling of contexts. In this example, we do not consider the marital union a context. Instead, we associate individuals with contexts on the date of each event through the INDIV_CONTEXT table.
Table 13

Example 2: RELATIONSHIP Mapping File

Table- NameRelation- shipTypeDataba- seIDFro- mEntityIDToEntityIDSourceRelationRelation - VariableDate - TypeDateEstimationTypeDayMonthYearStart- dayStart- monthStart- yearEnd- dayEnd- monthEnd- yearDateMiss- ingType
marriag- es_inputINDIV_INDIVFAMREChidwidsourceHusbandDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIV_INDIVFAMRECwidhidsourceWifeDeclaredmarriage_ daymarriage_ monthmarriage_ year
births_inputINDIV_INDIVFAMRECdadidkididsourceFatherDeclaredbirth_ daybirth_ monthbirth_ year
births_inputINDIV_INDIVFAMRECkididdadidsourceChildDeclaredbirth_ daybirth_ monthbirth_ year
births_inputINDIV_INDIVFAMRECmomidkididsourceMotherDeclaredbirth_ daybirth_ monthbirth_ year
births_inputINDIV_INDIVFAMRECkididmomidsourceChildDeclaredbirth_ daybirth_ monthbirth_ year
places_inputCONTEXT_ CONTEXTFAMRECplaceidnested_inrelvar
marriag- es_inputINDIV_ CONTEXTFAMREChidmarrloc_idsourceDeclaredmarriage_ daymarriage_ monthmarriage_ year
marriag- es_inputINDIV_ CONTEXTFAMRECwidmarrloc_idsourceDeclaredmarriage_ daymarriage_ monthmarriage_ year
births_inputINDIV_ CONTEXTFAMRECkididbirthloc_idsourceDeclaredbirth_ daybirth_ monthbirth_ year
deaths_inputINDIV_ CONTEXTFAMRECidideath- loc_idsourceDeclareddeath_ daydeath_ monthdeath_ year

7 CONCLUSION

The IDS Transposer is a powerful tool for moving data into the IDS standard. It efficiently changes the structure of a dataset in which one row represents an individual, a family, or an event to the entity-attribute-value system used in IDS, where one row represents the time-stamped value of a specific attribute. Typically, IDS data files are long and narrow with many more rows than the original data source. Thus, in our first example we began with three tables with a modest number of rows – PARENTS (100 rows), CHILDREN (365 rows), PLACES (75 rows), and we ended with five much longer tables – INDIVIDUAL (3460 rows), CONTEXT (248 rows), INDIV_INDIV (1660 rows), INDIV_CONTEXT (565 rows), CONTEXT_CONTEXT (539 rows). The keys to using IDS Transposer are its two mapping files, ENTITY and RELATIONSHIP, which describe how the information in the original data files will be moved into IDS tables. We have illustrated a methodology for creating these mapping files. First, we manually transform a sample from the original dataset into IDS. Then, we use this sample IDS to make templates for the ENTITY and RELATIONSHIP mapping files and to find the minimum number of rows required in each mapping file. Finally, we complete the mapping files by adding variable names and constant values. Note that we modified the original data files to create input files for IDS Transposer. In particular, we added IDs for persons and places that did not have separate IDs in the original files; date variables were separated into day, month, and year. Clearly, preparation of the mapping files is much more time-consuming than running the IDS Trans-poser application, and we conclude here with some recommendations on creating those mappings. Use standard IDS Types and Values available from the IDS METADATA table whenever possible. Since IDS is intended to promote sharing of data management and data analysis software, following the standard will reduce the need to customize data extraction programs. Be aware that a single piece of information (especially a date) may be repeated on many rows in IDS. Be cautious in using periods (start date/end date) rather than single dates. Most of the sources used in historical demography do come from records associated with particular dates rather than periods of time. In particular, occupations and residences are usually known only at the time a document was created. If a man is recorded as a “carpenter” in his marriage certificate, we only know his occupation on that particular day. He may have been a “laborer” one year earlier and a “shopkeeper” one year later. Occupations are usually “declarations” that pertain to a specific date. Always check the IDS tables against the cases that you translated manually to be sure that you are getting the results that you expect. Run IDS validation and consistency tests to look for problems in the data or mapping files.
  1 in total

1.  The Intermediate Data Structure (IDS) for Longitudinal Historical Microdata, version 4.

Authors:  George Alter; Kees Mandemakers
Journal:  Hist Life Course Stud       Date:  2014-05-26
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.