| Literature DB >> 27580049 |
Giuseppe Roberto1, Ingrid Leal2, Naveed Sattar3, A Katrina Loomis4, Paul Avillach2,5, Peter Egger6, Rients van Wijngaarden7, David Ansell8, Sulev Reisberg9, Mari-Liis Tammesoo10,11, Helene Alavere10,11, Alessandro Pasqua12, Lars Pedersen13, James Cunningham14, Lara Tramontan15, Miguel A Mayer16, Ron Herings7, Preciosa Coloma2, Francesco Lapi1, Miriam Sturkenboom2, Johan van der Lei2, Martijn J Schuemie17,18, Peter Rijnbeek2, Rosa Gini1.
Abstract
Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93-100%), while drug-based components were the main contributors in RLDs (81-100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.Entities:
Mesh:
Year: 2016 PMID: 27580049 PMCID: PMC5006970 DOI: 10.1371/journal.pone.0160648
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data sources’ characteristics*.
| Data source (Original organizationacronym) | Type of data source | Catchment area | Cumulative number of participants in the database | Average follow-up time | Diagnoses | Medication (coding system) | Diagnostic procedures/tests (coding system) | Laboratory results (coding system for measurments) |
|---|---|---|---|---|---|---|---|---|
| Record linkage system | Tuscany (Italy) | 5 millions | 9 years | • Inpatient | ATC | ICD9CM or local terminology | - | |
| Record linkage system | The northern and central region of Jutland. (Denmark) | 2.3 millions | 13 years | • Inpatient, secondary care | ATC | NOMESCO | - | |
| Record linkage system | Netherlands (Certain regions, mainly South East and North-West) | 10 millions | 10 years | • Inpatient | ATC | Local terminology | Local terminology | |
| Primary care | Italy | 2.3 millions | 10 years | • Primary care | ATC | Local terminology | Local terminology | |
| Primary care | United Kingdom | 12 millions | 9 years | • Primary care, READ | ATC | Local terminology | Local terminology | |
| Primary care | Netherlands | 2.8 millions | 3 years | • Primary care | ATC | Local terminology | Local terminology | |
| Hospital | Barcelona (three city districts) | 1.5 millions | 5 years | • Admissions, outpatients, major ambulatory surgery and emergency room | Local terminology & the Spanish Medicines Agency codes | ICD9CM | Local terminology | |
| Biobank | Estonia | 52000 | Not applicable | • Primary care/Self reported | ATC | Local terminology | Local terminology |
*Information reported in the table is updated at January 2013.
Fig 1The standard procedure for data derivation.
Component algorithms description.
| Component algorithm acronym | Algorithm description | Record retrieval rules | Case’s index date |
|---|---|---|---|
| DIAG_T2DM_PC | Patients who have ≥1 diagnoses of T2DM recorded in a primary care setting | Records of (Diabetes type 2) occurs in [diagnosis fields] of [tables collected during primary care] | 1st record |
| DIAG_T2DM _SC | Patients who have ≥1 diagnoses recorded in a secondary care setting | Records of (Diabetes type 2) occurs in [diagnosis fields] of [tables collected during secondary care] | 1st record |
| DIAG_T2DM _INP | Patients who ≥1 diagnoses recorded during a hospital admission | Records of (Diabetes type 2) occurs in [diagnosis fields] of [tables collected during inpatient care] | 1st record |
| DIAG_DMUNSPEC | Patients who ≥1 diagnoses of unspecified diabetes recorded in primary, secondary, or inpatients care | Records of (Diabetes unspecified) occurs in [diagnosis fields] of [tables collected in primary, secondary, or inpatients care] | 1st record |
| DIAG_DMUNSPEC_OTH | Patients who have ≥1 diagnoses recorded in a setting other than primary, secondary, or inpatients care | Records of (Unspecified diabetes) occurs in [diagnosis fields] of [tables collected in other settings] | 1st record |
| DIAG_T1DM | Patients who have ≥1 diagnoses of T1DM recorded in any care setting | Records of (Diabetes mellitus type I) occurs in [diagnosis fields] of [any table collecting diagnoses] | 1st record |
| DIAG_EXCL | Patients who have ≥1 diagnoses of conditions excluding T2DM other than T1DM recorded in any care setting | Records of ((Metabolic problems around pregnancy) OR (Metabolic/pancreatic problems, non type 2 diabetes) OR (Polycystic Ovary Syndrome) occurs in [diagnosis fields] of [any table collecting diagnoses] | 1st record |
| DRUG_INSULIN_ONE | Patients who have ≥1 recorded prescriptions/dispensings of insulin | Records of (Insulins and analogues) occurs in [ATC field] of [drugs tables] | 1st record |
| DRUG_INSULIN | Patients who have ≥2 recorded prescriptions/dispensings of insulin in a calendar year | Records of (Insulins and analogues) occurs in [ATC field] of [drugs tables] | 2nd record |
| DRUG_ORAL_ONE | Patients who have ≥1 recorded prescriptions/dispensings of non-insulin antidiabetic drugs | Records of (Drugs used in diabetes, excl insulin) occurs in [ATC field] of [drugs tables] | 1st record |
| DRUG_ORAL | Patients who ≥2 prescriptions/dispensings of non-insulin antidiabetics in a calendar year | Records of (Drugs used in diabetes, excl insulin) occurs in [ATC field] of [drugs tables] | 2nd record |
| TEST_GLUCO5_1YR | Patients who have ≥5 records of utilization of blood glucose measurements within 1 year | Records of (Blood glucose measurement) occurs in [code of test field] of [tables collecting laboratory test results or dispensings] | 5th record |
| TEST_GLUCO2_PYEAR_5YRS | Patients who have ≥2 records of utilization of blood glucose measurements per year for 5 consecutive years | Records of (Blood glucose measurement) occurs in [code of test field] of [tables collecting laboratory test results or dispensings] | 2nd record |
| LABVAL_ HbA1c | Patients who have ≥2 laboratory results recorded from a glycated hemoglobin test higher than 6.5% (48 mmol/mol) | Records of (Glycated Haemoglobin) occurs in [code of test field] of [tables collecting laboratory test results] AND [result field] of the same record is higher than 6.5% (or 48 mmol/mol, according to unit of measurement adopted in the table) | 2nd record |
| LABVAL_FAST_GLUC | Patients who have ≥2 laboratory results recorded from a fasting plasma glucose measurement higher than 126 mg/dl) | Records of (Fast gluc) occurs in [code of test field] of [tables collecting laboratory test results] AND [result field] of the same record is higher than 126 mg/dl | 2nd record |
| LABVAL_LCURVE_GLUC | Patients who have ≥2 laboratory results recorded from a glucose tolerance test higher than 200 mg/dl | Records of (LcurveGLuc) occurs in [code of test field] of [tables collecting laboratory test results] AND [result field] of the same record is higher than 200 mg/dl | 2nd record |
*Codes and free text keywords corresponding to the medical concepts embedded in component algorithms (in brackets) are reported in S1 Table.
Fig 2Comparison of results from individual component algorithms: four examples.
Fig 3Recommended composite algorithms: age band-specific percentages of subjects identified on the relevant total study population.
PPV: Positive Predictive Value.
Impact of extracted component algorithms on total case population identified in each participating data source through the application of the relevant recommended composite algorithm.
| N | 3391177 | 1372883 | 1405220 | 3278013 | 992924 | 945691 | 22430 | 15713 | |
| N in A | 254045 | 77616 | 57712 | 253197 | 67096 | 81658 | 779 | 2466 | |
| % of A in N | 7.5 | 5.7 | 4.1 | 7.7 | 6.8 | 8.6 | 3.5 | 15.7 | |
| N in B | n.e. | n.e. | n.e. | 253197 | 62191 | 43438 | 779 | n.e. | |
| % of B in A | - | - | - | 100.0% | 92.7% | 52.6% | 100.0% | - | |
| PRR if B added | - | - | - | +0.0% | +0.0% | +0.6% | +0.0% | - | |
| N in B | 95303 | 27887 | 13098 | n.e. | n.e. | n.e. | n.e. | 2520 | |
| % of B in A | 37.5% | 35.9% | 15.1% | - | - | - | - | 100.0% | |
| PRR if B added | +0.0% | +0.0% | +7.6% | - | - | - | - | +2.2% | |
| N in B | n.e. | 35744 | n.e. | n.e. | n.e. | n.e. | n.e. | n.e. | |
| % of B in A | - | 46.1% | - | - | - | - | - | - | |
| PRR if B added | - | +0.0% | - | - | - | - | - | - | |
| N in B | 191999 | n.e. | n.e. | n.e. | n.e. | 79035 | n.e. | n.e. | |
| % of B in A | 73.2% | - | - | - | - | 94.3% | - | - | |
| PRR if B added | +2.4% | - | - | - | - | +2.5% | - | - | |
| N in B | 149806 | n.e. | n.e. | n.e. | n.e. | n.e. | n.e. | n.e. | |
| % of B in A | 59.0% | - | - | - | - | - | - | - | |
| PRR if B added | +0.0% | - | - | - | - | - | - | - | |
| N in B | 18147 | 17896 | n.e. | n.e. | 8816 | 2050 | 164 | 78 | |
| % of B in A | 6.9% | 18.1% | - | - | 8.8% | 0.0% | 2.8% | 0.0% | |
| PRR if B added | +0.2% | +4.9% | - | - | +4.3% | +2.5% | +18.2% | +3.2% | |
| N in B | 13741 | 7895 | 2904 | n.e. | n.e. | 5782 | n.e. | 78 | |
| % of B in A | 1.1% | 1.8% | 1.5% | - | - | 0.3% | - | 1.7% | |
| PRR if B added | +4.3% | +8.3% | +3.5% | - | - | +6.8% | - | +1.5% | |
| N in B | 191999 | 43622 | 13098 | 253197 | 62191 | 79035 | 779 | 2520 | |
| % of B in A | 73.2% | 56.2% | 15.1% | 100.0% | 92.7% | 94.3% | 100.0% | 100.0% | |
| PRR if B added | +2.4% | +0.0% | +7.6% | +0.0% | +0.0% | +2.5% | +0.0% | +2.2% | |
| N in B | 45522 | 22074 | 21192 | 41019 | 15020 | 11607 | n.e. | n.e. | |
| % of B in A | 17.9% | 25.4% | 25.8% | 16.1% | 19.0% | 12.3% | - | - | |
| PRR if B added | +0.0% | +3.0% | +10.9% | +0.1% | +3.4% | +2.0% | - | - | |
| N in B | 62341 | 23319 | 0 | 0 | 17719 | 0 | 18 | 0 | |
| % of B in A | 21.2% | 26.5% | - | - | 22.0% | - | 1.5% | - | |
| PRR if B added | +3.4% | +3.6% | - | - | +4.4% | - | +0.8% | - | |
| N in B | 216338 | 57153 | 57712 | 136370 | 51589 | 45624 | - | 0 | |
| % of B in A | 85.2% | 71.0% | 100.0% | 51.7% | 76.9% | 53.0% | - | - | |
| PRR if B added | +0.0% | +2.7% | +0.0% | +2.1% | +0.0% | +2.9% | - | - | |
| N in B | 273952 | 61604 | 0 | 0 | 54181 | 62110 | 45 | 0 | |
| % of B in A | 87.5% | 72.7% | - | - | 80.8% | 70.6% | 5.8% | - | |
| PRR if B added | +20.3% | +6.7% | - | - | +0.0% | +5.4% | +0.0% | - | |
| N in B | 295676 | 70405 | 64016 | 151576 | 58355 | 65076 | 40 | 0 | |
| % of B in A | 93.0% | 81.1% | 100.0% | 57.7% | 82.6% | 73.1% | 50.0% | - | |
| PRR if B added | +23.4% | +9.6% | +10.9% | +2.2% | +4.4% | +6.6% | +4.1% | - | |
| N in B | 266940 | 16999 | 0 | 0 | 0 | 0 | 0 | 0 | |
| % of B in A | 45.8% | 21.6% | - | - | - | - | - | - | |
| PRR if B added | +59.3% | +0.3% | - | - | - | - | - | - | |
| N in B | 172784 | 28583 | 0 | 0 | 0 | 0 | 0 | 0 | |
| % of B in A | 32.6% | 36.1% | - | - | - | - | - | - | |
| PRR if B added | +35.4% | +0.7% | - | - | - | - | - | - | |
| N in B | 335466 | 34801 | 0 | 0 | 0 | 0 | 0 | 0 | |
| % of B in A | 52.8% | 44.1% | - | - | - | - | - | - | |
| PRR if B added | +79.2% | +0.8% | - | - | - | - | - | - | |
| N in B | 0 | 0 | 0 | 0 | 0 | 32153 | 0 | 0 | |
| % of B in A | - | - | - | - | - | 38.6% | - | - | |
| PR if B added | - | - | - | - | - | +0.8% | - | - | |
| N in B | 0 | 0 | 62400 | 0 | 44271 | 20196 | 0 | 0 | |
| % of B in A | - | - | 65.1% | - | 63.6% | 24.1% | - | - | |
| PRR if B added | - | - | +43.0% | - | +2.4% | +0.7% | - | - | |
| N in B | 0 | 0 | 0 | 0 | 0 | 32 | 0 | 0 | |
| % of B in A | - | - | - | - | - | 0.0% | - | - | |
| PRR if B added | - | - | - | - | - | +0.0% | - | - | |
| N in B | 0 | 0 | 62400 | 0 | 44271 | 38764 | 0 | 0 | |
| % of B in A | - | - | 65.1% | - | 63.6% | 46.5 | - | - | |
| PRR if B added | - | - | +43.0% | - | +2.4% | +1.0% | - | - | |
Since patients can be identified by more than one component algorithms, percentages may overlap.
Grey cells correspond to component algorithms that were included in the relevant recommended composite algorithm.
NIAD: Non-Insulin Antidiabetic Drugs.
A = recommended composite algorithm.
B = tested component algorithm(s).
N = Study population.
PRR = prevalence rate ratio of “A or B” in N with respect to the percentage of A in N.
n.e. = not extracted