| Literature DB >> 27556963 |
Gisele Pinto de Oliveira1, Ana Luiza de Souza Bierrenbach2, Kenneth Rochel de Camargo3, Cláudia Medina Coeli4, Rejane Sobrino Pinheiro4.
Abstract
OBJECTIVE: To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs.Entities:
Mesh:
Year: 2016 PMID: 27556963 PMCID: PMC4988803 DOI: 10.1590/S1518-8787.2016050006327
Source DB: PubMed Journal: Rev Saude Publica ISSN: 0034-8910 Impact factor: 2.106
Examples of rules used in the sequential algorithm.
| Rule | Patient’s name | Mother’s name | Date of birth | Address | Constraints |
|---|---|---|---|---|---|
| 1 | Exact | Exact | Exact | - | No missing values Patient’s full name with at least 15 characters in length |
| 2 | Same Soundex (of full name) | Exact | Exact | - | No missing values Soundex composed of 3 or + parts No newborns or twins |
| 3 | Same Soundex (of full name) | Same Soundex (of full name) | Same day and year | - | No missing values Considering only uncommon names |
| 4 | Same four initial characters for first and middle names + exact Soundex for surname | Same Soundex (first name and surname) | Exact | Exact | No missing values No newborns or twins |
Number and percentage of single records or by groups by deterministic and probabilistic record linkage. Sinan-TB, 2009-2011.
| Records by group (n) | Deterministic record linkage | Probabilistic record linkage | ||
|---|---|---|---|---|
|
|
| |||
| Cases (n) | % | Cases (n) | % | |
| 1 | 34,506 | 78.7 | 35,266 | 80.5 |
| 2 | 6,944 | 15.8 | 6,546 | 14.9 |
| 3 | 1,502 | 3.4 | 1,280 | 2.9 |
| 4 | 548 | 1.3 | 468 | 1.1 |
| 5 | 215 | 0.5 | 160 | 0.4 |
| 6 | 66 | 0.2 | 60 | 0.1 |
| 7 | 35 | 0.1 | 35 | 0.1 |
| 9 | 9 | 0 | - | - |
| 10 | - | - | 10 | 0 |
|
| ||||
| Total | 43,825 | 100 | 43,825 | 100 |
Correlation analysis of record linkage techniques.
| Probabilistic record linkage | Deterministic record linkage | ||||
|---|---|---|---|---|---|
|
| |||||
| Non-pair | Pair | Total | |||
|
|
| ||||
| n | % | n | % | ||
| Non-pair | 33,980 | 96.4 | 1,285 | 3.6 | 35,265 |
| Pair | 527 | 6.2 | 8,033 | 93.8 | 8,560 |
|
| |||||
| Total | 34,507 | 78.7 | 9,318 | 21.3 | 43,825 |
Sensitivity and specificity analysis of record linkage techniques.
| Standard | Total | Deterministic record linkage | Probabilistic record linkage | ||
|---|---|---|---|---|---|
|
|
| ||||
| Pair | Non-pair | Pair | Non-pair | ||
| Pair | 9,741 | 9,283 | 458 | 8,491 | 1,250 |
| Non-pair | 34,084 | 35 | 34,049 | 69 | 34,015 |
| Total | 43,825 | 9,318 | 34,507 | 8,560 | 35,265 |
| Sensitivity (95%CI) | 95.3 | (94.8–95.7) | 87.2 | (86.5–87.8) | |
| Specificity (95%CI) | 99.9 | (99.8–99.9) | 99.8 | (99.7–99.8) | |
Characteristic of records with discordant classification by deterministic and probabilistic record linkage.
| Record characteristics | Pair by deterministic and non-pair by probabilistic linkage | Pair by probabilistic and non-pair by deterministic linkage | ||
|---|---|---|---|---|
|
|
| |||
| (N = 1,285) | (N = 527) | |||
| Median | CI95% | |||
| Score | - | - | 24.2 | 20.3–32.3 |
|
| ||||
| n | % | n | % | |
| Missing sex value | 0 | 0 | 0 | 0 |
| Missing name value | 0 | 0 | 0 | 0 |
| Missing mother’s name value | 68 | 5.3 | 76 | 14.4 |
| Missing date of birth value | 81 | 6.3 | 58 | 11.0 |
| Missing address value | 7 | 0.5 | 20 | 3.8 |
| Combined: unknown mother’s name; or unknown date of birth; or unknown address | 129 | 10.0 | 141 | 26.7 |
|
| ||||
| Link characteristics |
|
| ||
|
|
| |||
| (N = 733) | (N = 293) | |||
|
| ||||
| n | % | n | % | |
| Difference in sex | 115 | 15.7 | 0 | 0 |
| Similarity measure for name lower than 70,0% | 160 | 25.9 | 0 | 0 |
| Similarity measure for mother’s name lower than 70,0% | 147 | 26.6 | 36 | 16.6 |
| Similarity measure for date of birth lower than 70,0% | 45 | 8.3 | 25 | 10.6 |
|
| ||||
| Median | IC95% | Median | IC95% | |
|
| ||||
| Similarity measure for name x 100 | 81.6 | 69.4–94.4 | 100 | 95.0–100 |
| Similarity measure for mother’s name x 100 | 91.3 | 68.2–100 | 94.4 | 85.7–100 |
| Similarity measure for date of birth | 100 | 10.0–100 | 87.5 | 75.0–100 |
a Calculating Levenshtein distance and assessing the difference in sex between records required comparing the records of the record group of the same patient. Comparing discordant records: (i) when the two discordant records were identified by only one of the techniques, the calculation was done between them; (ii) when they were records identified by both techniques and only one of them was not identified by one of the techniques, the calculation was done by comparing the unidentified record with the record of highest score in the group.
b For this calculation, the group of records that were blocked by sex or had missing information for one of the variables were excluded.