| Literature DB >> 33709061 |
Sebastien Cossin1,2, Serigne Diouf1,2, Romain Griffier1,2, Philippine Le Barrois d'Orgeval1,2, Gayo Diallo2, Vianney Jouhet1,2.
Abstract
INTRODUCTION: Vital status is of central importance to hospital clinical research. However, hospital information systems record only in-hospital death information. Recently, the French government released a publicly available dataset containing death-certificate data for over 25 million individuals. The objective of this study was to link French death certificates to the Bordeaux University Hospital records to complete the vital status information.Entities:
Keywords: death certificates; information storage and retrieval; medical record linkage; search engine; supervised machine learning
Year: 2021 PMID: 33709061 PMCID: PMC7935495 DOI: 10.1093/jamiaopen/ooab005
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Shared attributes of the hospital and French DMF datasets. Missing values (percentages) are indicated in parentheses. ML features were calculated by comparing the attribute values of hospital records and those of death certificates. String distance methods based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal string alignment), qgrams (q-gram, cosine, Jaccard distance), phonetics (soundex), and heuristic metrics (Jaro, Jaro-Winkler) were calculated for the first name and the last name
| Hospital | French DMF | Methods of comparison (features) |
|---|---|---|
| Last names (0%) | Last name (0%) | String distances |
| First Name (0%) | First names (0%) | String distances |
| Birth date (0%) | Birth date (0.66%) | Equal or not for the date, year, month and day |
| Gender (0%) | Gender (0%) | Equal or not |
| Birth location (39%) | Birth location (0.54%) | Equal or not |
| Birth country (6.9%) | Birth country (0.02%) | Equal or not |
| Last registered patient address (1.7%) | Death location (0.11%) | Equal or not for department and region of death |
| Last visit date (0%) | Date of death (0%) | Time difference in days |
Figure 1.Overview of the record-linkage strategy. In the first step, a query is sent to Elasticsearch for each hospital record to retrieve a limited number of N candidate death certificates. In the second step, ML models predict the match probability of a hospital record and a candidate certificate. In the third step, the pair is classified as a nonlink, undetermined, or link according to the upper and lower thresholds.