| Literature DB >> 27756296 |
Arno Maetens1, Robrecht De Schreye2, Kristof Faes2,3, Dirk Houttekier2, Luc Deliens2,4, Birgit Gielen5, Cindy De Gendt6, Patrick Lusyne7, Lieven Annemans3, Joachim Cohen2.
Abstract
BACKGROUND: The use of full-population databases is under-explored to study the use, quality and costs of end-of-life care. Using the case of Belgium, we explored: (1) which full-population databases provide valid information about end-of-life care, (2) what procedures are there to use these databases, and (3) what is needed to integrate separate databases.Entities:
Keywords: Administrative databases; Data linkage; Disease-specific databases; End-of-life; Full-population
Mesh:
Year: 2016 PMID: 27756296 PMCID: PMC5069861 DOI: 10.1186/s12904-016-0159-7
Source DB: PubMed Journal: BMC Palliat Care ISSN: 1472-684X Impact factor: 3.234
Overview of population-level databases identified as relevant for end-of-life care research
| Database administrators | Database name | Population | Information provided in database |
|---|---|---|---|
| Inter Mutualistic Agency (IMA) | Population Database | Every Belgian citizen who is a member of one of the seven (compulsory) Belgian sickness funds, information in Population Database is updated twice each year from 2002 onwards | Socio-demographic characteristics (age, sex, date of death, place of residence, family composition, use of supportive measures) |
| Pharmanet Database | Medication supply characteristics (substance, quantity, prescriber, expenses, refunds, delivery date) | ||
| Medical Claims Database | Health and medical care use characteristics (quantity of use, reimbursement, supplier, supplier institution, length of treatment) | ||
| Belgian Cancer Registry | Cancer registry | Every new cancer diagnosis of Belgian residents, registered by oncological care programs and laboratories for anatomic pathology | Diagnostic characteristics (date of diagnosis, type of cancer, TNM gradation) |
| Statistics Belgium | Death certificate database | Every Belgian decedent with a registered death certificate | Direct and indirect causes of death (in ICD-10 codes), socio-demographics about the deceased, place of death |
| Demographic dataset | Every Belgian citizen | Nationality group, household composition | |
| Socio-economic survey (SES) 2001 and Census 2011 | Every Belgian citizen, information gathered from multiple external administrative databases using social security number (Census 2011) | Highest attained education level, occupation, housing comfort | |
| IPCAL dataset | Every Belgian citizen | Net income by category | |
| Identified but not used in our research | |||
| Belgian Ministry of Health | Minimal Hospital Dataset | Every hospital admission in non-psychiatric general hospitals | Medical, nursery and personnel data for in-hospital care |
Fig. 1Step-by-step overview of linkage procedure. IMA DWH: InterMutualistic Agency Data Warehouse; TTP VI CBSS: Trusted Third Party Crossroads Bank for Social Security; SPOC NIC: Single Point of Contact National InterMutualistic College; BCR: Belgian Cancer Registry; TTP eHealth: Trusted Third Party eHealth; StatBel BE: Statistics Belgium; SCRA: Small Cells Risk Analysis; SSN: Social Security Number; C1/C2: coding 1/2; Explanatory note: The linkage procedure consisted of 13 steps (cf. arrows Fig. 1). Step 1: All cases from Belgian decedents since January 1, 2010 are selected in the IMA databases with their specific identifier coded (C2). These are then decoded (C1) by the TTP VI (CBSS). Step 2: The security officer of the National InterMutualistic College decodes the identifiers (C1) into actual social security numbers. Step 3.1: The IMA subset of social security numbers is sent by secure means to the separate TTP eHealth. Step 3.2: TTP eHealth receives the social security numbers from all cases in the Cancer Registry selected for the study (decedents since January 1, 2010). Step 4.1 and 4.2: To avoid any party from having access to both the sensitive data and the social security numbers, the established principle of random transport numbers (RN) is used. TTP eHealth assigned these RNs for the selected cases from IMA (4.1) and BCR (4.2) and provides these RNs to both data agencies in order to transmit the sensitive data safely to the TTP VI (CBSS). Step 4.3: TTP eHealth recodes the social security numbers into a final code that can be made available to the researchers (Cproject). These are sent, with the RNs as a cross reference coding, to the TTP VI (CBSS). Step 5: The selected cases and the corresponding requested data from the Cancer Registry are securely transmitted to the TTP VI (CBSS). Step 6: The selected IMA cases (but not yet the corresponding requested data) are securely transmitted to the TTP VI (CBSS). Step 7: The selected cases are transferred to the IMA datawarehouse (based on C2) so as to allow the selection of all data corresponding to these cases. Step 8: The selected cases and the corresponding requested data from IMA are securely transmitted to the TTP VI (CBSS). Step 9: The social security numbers and corresponding RNs are transferred safely to Statistics Belgium in order to allow selection of the correct cases. A social security number has already been attributed by Statistics Belgium to every case in the death certificate data (which do not contain the social security numbers) based on a deterministic linkage between the death certificate database and the national registry database based on date of birth, sex, and municipality of residence. Step 10: Statistics Belgium sends the requested data from the selected cases to TTP VI (CBSS) who links these with the data from IMA and Cancer Registry using the RNs. Step 11: The TTP VI (CBSS) recodes all data one final time based on the Cproject coding. Step 12: A small cells risk analysis is performed to minimize the risk of re-identification based on a combination of variables. Step 13: The complete linked database is stored on a separate IMA data server, which is only accessible to the researchers through a Virtual Private Network (VPN) connection with secure token
Considerations for researchers planning to link databases
| Topics | Considerations |
|---|---|
| Exploring relevant databases | Are my research questions clear and well-defined? What data are needed to answer them? |
| What is/are my study population(s)? What data are needed to identify it? | |
| What database(s) contains the core data and could thus be selected as a starting point? | |
| When a starting database is chosen, what data are lacking to fully address the research questions? Where can we find them? | |
| How can we establish contact with the database administrators of the databases? Obtain principal approval from all administrators (e.g. by presenting the study to the board of directors) | |
| What is the cost associated with each database? | |
| Variable selection | What specific variables do we need from the selected databases to answer our research questions? |
| Are the variables we want available and linkable between the different databases? | |
| Does the preferred selection of variables complicate the linking procedure considerably? Balance the gain in information with the increase in complexity and time. | |
| What is the required level of detail for each variable? Balance the preferred level with what is allowed in terms of data protection (e.g. through small cells risk analysis to determine risk of re-identification based on a combination of variables) | |
| Do we have sufficient storage capacity and analysis hardware to store and analyze all the data we want? | |
| Access procedures | What ethical and privacy procedures need to be followed to link and access the selected database? |
| What technical procedures need to be followed to link and access the selected databases? | |
| Infrastructure | How will data be stored safely? Is infrastructure provided by researchers or by database administrators? What is the cost for this infrastructure? |
| How will data be protected? Physical and digital protection need to be guaranteed. | |
| How can data be accessed in a safe and easy way? What hardware and software do we need to access and analyze the requested data? |