Literature DB >> 31819655

Validation Of Cancer Diagnoses In Electronic Health Records: Results From The Information System For Research In Primary Care (SIDIAP) In Northeast Spain.

Martina Recalde1,2, Cyntia B Manzano-Salgado1,2, Yesika Díaz1, Diana Puente1,2, Maria Del Mar Garcia-Gil1, Rafael Marcos-Gragera3,4, Josefa Ribes-Puig5,6, Jaume Galceran7, Margarita Posso8, Francesc Macià8, Talita Duarte-Salles1.   

Abstract

BACKGROUND: Electronic health records are becoming an increasingly valuable resource for epidemiology but their data quality needs to be quantified. We aimed to validate twenty-five types of incident cancer cases in the Information System for Research in Primary Care (SIDIAP) in Catalonia with the population-based cancer registries of Girona and Tarragona as the gold-standard.
METHODS: We calculated the sensitivity, positive predictive values (PPV), and the time-difference between the date of diagnosis entered into the SIDIAP and into the registries. We added hospital discharge cancer diagnoses to the SIDIAP to assess sensitivity changes.
RESULTS: We identified 27,046 incident cancer diagnoses in the SIDIAP from 2009-2015 among the 949,841 residents of Girona and Tarragona. The cancer types with the highest sensitivity were breast (89%, 95% CI: 88-90%), colorectal (81%, 95% CI: 80-82%), and prostate (81%, 95% CI: 80-83%). Trachea, bronchus and lung cancers had the highest PPV (76%, 95% CI: 74%-78%) followed by stomach (72%, 95% CI: 68-75%) and pancreas (71%, 95% CI: 67-75%). Most cancer diagnoses were reported with less than three months of difference between the SIDIAP and the registries. More cases were registered first in the registries than in the SIDIAP. By adding cancer diagnoses based on hospital discharge data, sensitivity increased for all cancers, especially for gallbladder and biliary tract for which the sensitivity increased by 21%.
CONCLUSION: The SIDIAP includes 76% of the cancer diagnoses in the cancer registries but includes a considerable number of cases that are not in the registries. The SIDIAP reports most of the cancer diagnoses within a three-month period difference from the date of diagnosis in the cancer registries. Our results support the use of the SIDIAP cancer diagnoses for epidemiological research when cancer is the outcome of interest. We recommend adding hospital discharge data to the SIDIAP to increase data quality, particularly for less frequent cancer types.
© 2019 Recalde et al.

Entities:  

Keywords:  cancer; electronic health records; population-based cancer registries; primary health care; validation studies

Year:  2019        PMID: 31819655      PMCID: PMC6899079          DOI: 10.2147/CLEP.S225568

Source DB:  PubMed          Journal:  Clin Epidemiol        ISSN: 1179-1349            Impact factor:   4.790


Introduction

Cancer is one of the leading causes of morbidity and mortality worldwide.1 In 2018, there were 18 million new cases and 9 million deaths.2 In Spain, cancer is a significant burden for the National Health System: cancer is the second most frequent overall cause of death and results in more than 250,000 new invasive cancer cases every year.3 Therefore, conducting research focused on understanding cancer epidemiology is important both at the national and international levels. The use of databases of routinely collected electronic health records (EHRs) is becoming more common in epidemiology and clinical research. Due to their size, amount of data availability, representativeness, and long-term follow-up, EHR databases offer a great opportunity to conduct cancer research.4 Another advantage of large health record databases is that they provide sufficient statistical power to detect uncommon outcomes such as rare cancer types.5 However, validation processes are required to quantify the correctness of the data and to increase the reliability of large health record databases for use in subsequent observational studies.6 The information recorded in EHRs by primary health care professionals in Catalonia – a region in Northeast Spain with 7.5 million inhabitants (2017) – comprises the Information System for Research in Primary Care (SIDIAP) platform.7 Since the SIDIAP aims to provide reliable information to support research in primary health care, validation studies are performed regularly.8 A previous study assessed the validity of lung, colon and rectum, prostate, breast, and cervix uteri cancers in the SIDIAP during the period 2009–2012 with sensitivities ranging from 64% (cervix uteri) to 92% (breast).9 However, this study compared SIDIAP cancer cases with those from the registry of a single hospital in Barcelona. Although the data collection for this hospital is rigorous for a specific area in Barcelona, this area is not representative of the general population of Catalonia. Furthermore, the hospital does not have data available for research use on hematological cancers. A study validating more cancer types and using population-based cancer registries as the gold-standard may increase the scope of the validity of cancer diagnosis in the SIDIAP as well as its use in new areas of research. The aim of this study was to validate twenty-five types of incident cancer cases in the SIDIAP using the population-based cancer registries of Girona and Tarragona as the gold-standard and to assess the time-difference in the date of diagnosis between the SIDIAP and these cancer registries.

Methods

Data Sources

We performed a cross-sectional study in the SIDIAP during the years 2009–2015, using data from the two population-based cancer registries that exist in Catalonia, the Girona and Tarragona cancer registries, as the gold-standard. The SIDIAP includes information recorded in EHRs by health professionals during routine visits at 287 primary health care centers from the Institut Català de la Salut (ICS, Catalan Health Institute).10,11 The SIDIAP has anonymized records for more than seven million people and is representative of the Catalan population in terms of age, sex, and geographic distribution.11 It includes information on disease diagnoses (International Classification for Diseases, 10th revision [ICD-10]), drug prescriptions and dispensations in the primary care setting, and clinically relevant parameters (eg, weight, blood pressure, laboratory tests). It is also linked to a hospital discharge database for patients who attend ICS hospitals (30% of the SIDIAP population).12,13 The cancer registries of Girona (created in 1994) and Tarragona (in 1980) cover 20% of the Catalan population.14,15 They collect cancer diagnoses from public and private hospitals, anatomopathological and hematological laboratories, mortality registries, and other information sources.16–18 Both cancer registries comply with the International Agency for Research on Cancer quality requirements.19

Study Population And Cancer Case Definition

In the SIDIAP, incident cancer cases were identified as the first cancer diagnosis from 2009 to 2015 among inhabitants of the provinces of Girona and Tarragona. We had the number of incident cancer cases from the cancer registries during 2005–2015 for Girona and during 2005–2013 for Tarragona available for reference. Cases registered during 2005–2008 were used to clean prevalent cases (Figure 1). The linkage between the SIDIAP and the cancer registries data was performed by a Trusted Third Party (the ICS in this study) using the unique personal identification number of patients. We obtained approval from the Clinical Research Ethics Committee of the IDIAPJGol (project code: P14/074) and the Research Ethics Committee of the Hospital Doctor Josep Trueta (project code: 2017.024).
Figure 1

Time period covered by each data source with respect to the duration of the study.

Notes: Figure adapted from Margulis, A. et al. (2017). Validation of Cancer Cases Using Primary Care, Cancer Registry, and Hospitalization Data in the UK. Epidemiology, 29(2), 1.

Abbreviation: SIDIAP, Information System for Research in Primary Care.

Time period covered by each data source with respect to the duration of the study. Notes: Figure adapted from Margulis, A. et al. (2017). Validation of Cancer Cases Using Primary Care, Cancer Registry, and Hospitalization Data in the UK. Epidemiology, 29(2), 1. Abbreviation: SIDIAP, Information System for Research in Primary Care. We used ICD-10 codes and date of diagnosis to identify the following 25 cancer types in adults (aged ≥18 years): head and neck (ICD-10 codes: C00-C14), esophagus (C15), stomach (C16), colorectal (C18–21), liver (C22), gallbladder and biliary tract (C23-24), pancreas (C25), larynx (C32), trachea, bronchus, and lung (C33-34), bone and articular cartilage (C40-C41), malignant melanoma of skin (C43), breast (C50), cervix uteri (C53), corpus uteri (C54-C55), ovary (C56), prostate (C61), testis (C62), kidney (C64), bladder (C67), brain, central nervous system, pituitary gland and pineal gland (C70-72, C75.1-C75.3), thyroid (C73), Hodgkin lymphoma (C81), non-Hodgkin lymphoma (C82-C86, C96), multiple myeloma (C90), and leukemia (C91-95).20 We excluded other and unspecified malignant neoplasm of skin (C44). Other unspecified or very low-frequency cancers (n<100) were excluded. Diagnosis in hospital discharge data was registered using ICD-9 codes.21 We mapped diagnosis codes to ICD–10 using available conversion codes eCIEMaps v3.1.9, which we have provided in .

Other Variables

In the SIDIAP, we had information on the primary care center to which individuals were assigned in 2016 (Girona, Tarragona), date of diagnosis, sex (women, men), age (18–35, 36–50, 51–65, ≥66), and nationality (Spanish, non-Spanish). Socioeconomic status was assessed using the “Mortalidad en áreas pequeñas Españolas y Desigualdades Socioeconómicas y Ambientales” (MEDEA) deprivation index, which we categorized into quintiles for anonymization purposes. The 1st and the 5th quintiles represent the least and most deprived levels of the urban population in Catalonia, respectively.22 We included a rural category since the MEDEA index was not available for people living in these areas.

Statistical Analysis

We performed a descriptive analysis of the overall number of cancer cases in SIDIAP and of the confirmed (ie, matched diagnoses between the SIDIAP and the cancer registries) vs non-confirmed cases (ie, in the SIDIAP but not in the cancer registries) by sex, age, nationality, MEDEA deprivation index, and year of diagnosis, in Girona and Tarragona, and we used a Chi-squared test to assess for significant differences.23 We used the Catalonia Cancer Registries (CCRs, Girona and Tarragona combined) data as the gold-standard to calculate the sensitivity and the positive predictive values (PPVs) for each cancer type (an illustration of our calculations is available in . As secondary analyses, we stratified the sensitivity and PPV analyses by province (Girona and Tarragona) to assess if there were geographical differences and by sex, nationality, age, and the MEDEA deprivation index to assess if there were differences for specific population groups. We also checked if the sensitivities improved after including cancer diagnoses from the hospital discharge database. For the confirmed cases, we calculated the time difference (months) between the date of diagnosis registered in the SIDIAP and the date registered in the CCRs. We used R version 3.5.0 for all the statistical analyses and considered p-values <0.05 to be statistically significant.

Results

Sociodemographic Characteristics Of SIDIAP And Confirmed Cases

In the SIDIAP, we identified 496,356 inhabitants of Girona in 2016, of which 16,211 had a cancer diagnosis between 2009 and 2015, and 453,485 inhabitants of Tarragona, of which 10,835 had a cancer diagnosis between 2009 and 2013. There were more cancer cases registered in the SIDIAP among men (55%, 56% for Girona and Tarragona, respectively), people aged 66 years or older (45%, 49%), Spanish citizens (94%, 95%), and people living in rural areas (32%, 37%) (Table 1).
Table 1

Descriptive Characteristics Of The SIDIAP Population With A Cancer Diagnosis In Girona (2009–2015) And Tarragona (2009–2013) By Confirmation Status From The Cancer Registries

SIDIAP Cancer Cases, n (%)
GironaTarragona
SIDIAP CasesConfirmed CasesNon-Confirmedap-valuebSIDIAP CasesConfirmed CasesNon-Confirmedap-valueb
CharacteristicsN=16,211N=9296N=6915N=10,835N=7182N=3653
Sex
 Women7300 (45.0%)4207 (45.3%)3093 (44.7%)0.5154732 (43.7%)2994 (41.7%)1738 (47.6%)<0.001
 Men8911 (55.0%)5089 (54.7%)3822 (55.3%)6103 (56.3%)4188 (58.3%)1915 (52.4%)
Age (years)c
 18–35624 (3.8%)298 (3.2%)326 (4.7%)<0.001375 (3.5%)168 (2.3%)207 (5.7%)<0.001
 36–502830 (17.5%)1634 (17.6%)1196 (17.3%)1636 (15.1%)1058 (14.7%)578 (15.8%)
 51–655417 (33.4%)3280 (35.3%)2137 (30.9%)3471 (32.0%)2474 (34.5%)997 (27.3%)
 ≥ 667340 (45.3%)4084 (43.9%)3256 (47.1%)5353 (49.4%)3482 (48.5%)1871 (51.2%)
Nationality
 Spanish15,182 (93.7%)8731 (93.9%)6451 (93.3%)0.11010,328 (95.3%)6930 (96.5%)3398 (93.0%)<0.001
 Non-Spanish1029 (6.3%)565 (6.1%)464 (6.7%)507 (4.7%)252 (3.5%)255 (7.0%)
MEDEA deprivation indexd
 Quintile 11784 (11.0%)1171 (12.6%)613 (8.9%)<0.001813 (7.5%)528 (7.3%)285 (7.8%)0.022
 Quintile 21461 (9.0%)978 (10.5%)483 (7.0%)1062 (9.8%)716 (10.0%)346 (9.5%)
 Quintile 32180 (13.5%)1152 (12.4%)1028 (14.9%)1201 (11.1%)784 (10.9%)417 (11.4%)
 Quintile 42540 (15.7%)1209 (13.0%)1331 (19.2%)1664 (15.4%)1097 (15.3%)567 (15.5%)
 Quintile 52025 (12.5%)1029 (11.1%)996 (14.4%)1366 (12.6%)936 (13.0%)430 (11.8%)
 Rural areas5179 (31.9%)3205 (34.5%)1974 (28.5%)4001 (36.9%)2677 (37.3%)1324 (36.2%)
 “Missing”1042 (6.4%)552 (5.9%)490 (7.1%)728 (6.7%)444 (6.2%)284 (7.8%)
Year of diagnosis
 20092354 (14.5%)1151 (12.4%)1203 (17.4%)<0.0012143 (19.8%)1150 (16.0%)993 (27.2%)<0.001
 20102329 (14.4%)1342 (14.4%)987 (14.3%)2154 (19.9%)1433 (20.0%)721 (19.7%)
 20112310 (14.3%)1401 (15.1%)909 (13.1%)2046 (18.9%)1443 (20.1%)603 (16.5%)
 20122374 (14.6%)1438 (15.5%)936 (13.5%)2212 (20.4%)1553 (21.6%)659 (18.1%)
 20132365 (14.6%)1471 (15.8%)894 (12.9%)2280 (21.0%)1603 (22.3%)677 (18.5%)
 20142259 (13.9%)1383 (14.9%)876 (12.7%)---
 20152220 (13.7%)1110 (11.9%)1110 (16.1%)---

Notes: aNon-confirmed cases either have different diagnoses in SIDIAP and the registry or were not available in the registry. bComparison of confirmed vs non-confirmed cases using Chi-squared of independence test. cAge in 2009. dQuintile 1 of the MEDEA Index represents the least deprived and quintile 5 represents the most deprived. Rural was included as a category since the index cannot be calculated for people living in rural areas.

Abbreviations: SIDIAP, Information System for Research in Primary Care.

Descriptive Characteristics Of The SIDIAP Population With A Cancer Diagnosis In Girona (2009–2015) And Tarragona (2009–2013) By Confirmation Status From The Cancer Registries Notes: aNon-confirmed cases either have different diagnoses in SIDIAP and the registry or were not available in the registry. bComparison of confirmed vs non-confirmed cases using Chi-squared of independence test. cAge in 2009. dQuintile 1 of the MEDEA Index represents the least deprived and quintile 5 represents the most deprived. Rural was included as a category since the index cannot be calculated for people living in rural areas. Abbreviations: SIDIAP, Information System for Research in Primary Care. We confirmed 9,296 cancer cases in Girona and 7,182 in Tarragona. Compared to non-confirmed cases, confirmed cases had a higher proportion of men in Tarragona (58% vs 52%) as well as people aged 51 to 65 in both provinces (35% vs 31% in Girona; 34% vs 27% in Tarragona) but a lower proportion of socioeconomically deprived individuals in Girona (11% vs 14%) (Table 1).

Overall Validation

Out of the 21,559 cancer cases registered in the CCRs, 16,478 (76%) were in the SIDIAP. The cancer types with the highest sensitivities in Catalonia were breast (89%, 95% CI: 88–90%), colorectal (81%, 95% CI: 80–82%), and prostate (81%, 95% CI: 80–83%) (Table 2). Almost all cancer types had sensitivities above 60% in both provinces. The exceptions were head and neck (51%, 95% CI: 47–55%) and gallbladder and biliary tract (29%, 95% CI: 23–35%) (Table 2).
Table 2

Validity Of The ICD-10 Codes Used To Identify Incident Cancer Diagnoses Registered In The SIDIAP Database, Cataloniaa (2009–2015)b

Cancer Type (ICD-10 CM)Cancer Cases, nSensitivity, % (95% CI)PPV, % (95% CI)
CCRsSIDIAPConfirmed
Head and neck (C00-C14)65081933251.1 (47.2–54.9)40.5 (37.2–43.9)
Esophagus (C15)21125515774.4 (68.5–80.3)61.6 (55.6–67.5)
Stomach (C16)67363345567.6 (64.1–71.1)71.9 (68.4–75.4)
Colorectal (C18-C21)37434329303581.1 (79.8–82.3)70.1 (68.7–71.5)
Liver (C22)56162536464.9 (60.9–68.8)58.2 (54.4–62.1)
Gallbladder & biliary tract (C23-C24)1971075728.9 (22.6–35.3)53.3 (43.8–62.7)
Pancreas (C25)57859041972.5 (68.8–76.1)71.0 (67.4–74.7)
Larynx (C32)33740322667.1 (62.0–72.1)56.1 (51.2–60.9)
Trachea, bronchus & lung (C33-C34)21522155163175.8 (74.0–77.6)75.7 (73.9–77.5)
Bone and articular cartilage (C40-C41)391062461.5 (46.3–76.8)22.6 (14.7–30.6)
Malignant melanoma of skin (C43)55096241775.8 (72.2–79.4)43.3 (40.2–46.5)
Breast (C50)33254456295889.0 (87.9–90.0)66.4 (65.0–67.8)
Cervix uteri (C53)19841611859.6 (52.8–66.4)28.4 (24.0–32.7)
Corpus uteri (C54-C55)57666142473.6 (70.0–77.2)64.1 (60.5–67.8)
Ovary (C56)26339819072.2 (66.8–77.7)47.7 (42.8–52.6)
Prostate (C61)28203596228681.1 (79.6–82.5)63.6 (62.0–65.1)
Testis (C62)13917510273.4 (66.0–80.7)58.3 (51.0–65.6)
Kidney (C64)53673039774.1 (70.4–77.8)54.4 (50.8–58.0)
Bladder (C67)14562370110876.1 (73.9–78.3)46.8 (44.7–48.8)
Brain and CNS (C70-C72, C75.1-C75.3)c39354429875.8 (71.6–80.1)54.8 (50.6–59.0)
Thyroid (C73)39543226466.8 (62.2–71.5)61.1 (56.5–65.7)
Hodgkin lymphoma (C81)1441429263.9 (56.0–71.7)64.8 (56.9–72.6)
Non-Hodgkin lymphoma (C82-C86, C96)70990947266.6 (63.1–70.0)51.9 (48.7–55.2)
Multiple myeloma (C90)29436223379.3 (74.6–83.9)64.4 (59.4–69.3)
Leukemia (C91-C95)62087141967.6 (63.9–71.3)48.1 (44.8–51.4)

Notes: aProvinces of Girona and Tarragona. bData from the Tarragona Cancer Registry was only available for 2009–2013. cInclude pituitary gland and pineal gland tumors.

Abbreviations: CI, Confidence Interval; CNS, Central Nervous System; CCRs, Catalonia Cancer Registries; ICD-10, International Classification for Diseases, 10th revision; PPV, positive predictive values; SIDIAP, Information System for Research in Primary Care.

Validity Of The ICD-10 Codes Used To Identify Incident Cancer Diagnoses Registered In The SIDIAP Database, Cataloniaa (2009–2015)b Notes: aProvinces of Girona and Tarragona. bData from the Tarragona Cancer Registry was only available for 2009–2013. cInclude pituitary gland and pineal gland tumors. Abbreviations: CI, Confidence Interval; CNS, Central Nervous System; CCRs, Catalonia Cancer Registries; ICD-10, International Classification for Diseases, 10th revision; PPV, positive predictive values; SIDIAP, Information System for Research in Primary Care. Out of the 27,046 SIDIAP cancer cases present in Catalonia, 16,478 (61%) were also in the CCRs. The trachea, bronchus and lung cancers had the highest PPV (76%, 95% CI: 74–78%) followed by stomach (72%, 95% CI: 68–75%) and pancreas (71%, 95% CI: 67–75%) cancers (Table 2). On the other hand, bone and articular cartilage (23%, 95% CI: 15–31%) and cervix uteri (28%, 95% CI: 24–33%) cancers had the lowest PPVs (Table 2). Most cancer diagnoses were reported within less than three months of difference between the SIDIAP and the registries (Figure 2). More cases were reported first in the cancer registries than in the SIDIAP. Only kidney cancer had more than twenty-five percent of cases reported first in the SIDIAP compared to the CCRs.
Figure 2

Time-difference (months) in the date of cancer diagnosis recorded in the SIDIAP and the population-based Catalonia Cancer Registriesa (2009–2015)b.

Notes: aPopulation-based cancer registries from the provinces of Girona and Tarragona. bData from the Tarragona Cancer Registry was only available for 2009–2013. Negative values indicate SIDIAP diagnosis before the registries’ diagnosis date. Brain and CNS include pituitary gland and pineal gland tumors.

Abbreviations: CNS, Central Nervous System; m, months; SIDIAP, Information System for Research in Primary Care.

Time-difference (months) in the date of cancer diagnosis recorded in the SIDIAP and the population-based Catalonia Cancer Registriesa (2009–2015)b. Notes: aPopulation-based cancer registries from the provinces of Girona and Tarragona. bData from the Tarragona Cancer Registry was only available for 2009–2013. Negative values indicate SIDIAP diagnosis before the registries’ diagnosis date. Brain and CNS include pituitary gland and pineal gland tumors. Abbreviations: CNS, Central Nervous System; m, months; SIDIAP, Information System for Research in Primary Care.

Secondary Analyses

Overall, Girona had higher sensitivities than Tarragona, especially for cancers of the cervix uteri (68% vs 52%, for Girona and Tarragona, respectively), Hodgkin lymphoma (69% vs 56%) and head and neck (56% vs 45%) (). The only cancer for which Tarragona had a higher sensitivity than Girona was for bone and articular cartilage (56% vs 75%). Regarding PPVs, Tarragona had higher estimates than Girona, except for six cancer types. We observed the biggest differences for bladder (33% vs 69% %, for Girona and Tarragona, respectively), colorectal (65% vs 77%) and larynx (52% vs 63%) cancers. The cancer types for which Girona had the biggest differences in PPVs with Tarragona were gallbladder and biliary tract (56% vs 44%) and Hodgkin lymphoma (71% vs 56%) (). Overall, sensitivity estimates differed by age groups, and PPVs estimates differed by age, nationality and socioeconomic status. Those older than 66 years showed lower sensitivities than those aged between 36 and 65 years for most cancer types (). Overall, PPVs were lower in those aged between 18 and 35 years than in the rest of age groups, in non-Spanish than in the Spanish population and in the most deprived compared to the least deprived MEDEA quintiles (). Besides the abovementioned situations, we did not observe any other change in the sensitivity and PPVs according to sex, age, nationality, and socioeconomic status, with exception of certain specific cancer types ( and ). When adding cancer diagnoses from hospital discharge to primary care data, we observed an increase in sensitivity for all cancer types. Gallbladder and biliary tract cancer had the most substantial change in sensitivity, changing from 29% to 50% (). We also observed changes above 10% for larynx (67% to 83%), head and neck (51% to 66%) and liver (65% to 78%) cancers ().

Discussion

This study validated cancer diagnoses recorded in primary care using the data of the two provincial population-based cancer registries that exist in Catalonia as the gold-standard. We found that 23 out of 25 cancer types had sensitivities above 60%. PPV estimates were generally lower than the sensitivities observed in most cancer types. The number of cancer cases in the SIDIAP that were not confirmed by the cancer registries was high for some specific cancer sites. More cases were first recorded in the cancer registries rather than in the SIDIAP, though for most cancer cases, the time difference between both data sources did not exceed three months. Including cancer diagnoses from hospital discharge data considerably improved the reliability of the data for specific cancer types. We observed a high sensitivity for the majority of cancer types. Breast, colorectal and prostate cancers had the highest sensitivities, which are some of the most incident tumors and thoroughly screened cancers in systematic programs (breast and colorectal) and strongly sought by opportunistic screening (prostate) in Catalonia.24,25 Furthermore, these cancers take part in the rapid diagnostic circuit program run in Catalonia, which could also contribute to an increase in the accuracy of diagnosis in primary care.26 Previous studies conducted in the United Kingdom (UK) that compared primary care data with hospital and cancer registry data also reported high sensitivities for breast, prostate, and colorectal cancers, highlighting that these cancers are usually managed by general practitioners.27,28 In Catalonia, a previous study comparing SIDIAP cases with those registered in a hospital cancer registry in Barcelona, also reported high sensitivities for breast, colorectal and prostate cancers.9 High sensitivities are important to enhance study inclusiveness and to be able to ascertain common exposures.29 A high sensitivity paired with a high specificity (which is important for classifying outcomes) facilitates both the study of cancer as an outcome as well as the identification of the cases’ common exposures. In our study, the lowest sensitivities were found for cancers that are less frequent and that are more commonly managed in hospitals, such as gallbladder and biliary tract or bone and articular cartilage.9,24,30,31 We are not aware of any previous national or international studies validating the primary care diagnosis of these cancer types using external sources. Thus, our results indicate that using SIDIAP cancer diagnoses for research when cancer is the outcome of interest is reliable for most common cancer types in Catalonia but may be insufficient for less frequent types. PPV estimates were generally lower than the sensitivities observed in most cancer types. The number of cancer cases in the SIDIAP that were not confirmed by the cancer registries was high for some specific cancer sites. A previous study validating only colorectal, lung, gastro-esophageal and urological cancer diagnosis in primary care in the UK reported higher PVV estimates than in our study, ranging from 92% to 98%.28 This study hypothesized that some of the reasons behind non-confirmed cases might be a disagreement in the type of cancer diagnosed in each data source, or the possibility of suspicious symptoms being registered as cancer diagnoses in primary care.28 In agreement with this hypothesis, we found that approximately 10% of the non-confirmed cases by the cancer registries were due to disagreement in the type of cancer diagnosis between the data sources. The low PPV for cervix uteri cancer (included in the rapid diagnostic circuit in Catalonia) could be due to detected suspicious symptoms recorded as cancers in SIDIAP; however, we did not have the information needed to prove this hypothesis. Another factor that can influence PPVs is the prevalence of the cancer type which could partially explain the low PPVs of bone and articular cartilage (106 cases registered in the SIDIAP) and gallbladder and biliary tract (107 cases registered). High PPVs are important when we want to identify a cohort of people that only includes people with the condition of interest but do not need to be representative of all cases.29 Therefore, the SIDIAP does not appear to be an appropriate database to create a cohort of cancer patients, except for certain cancer types (eg, trachea, bronchus and lung, stomach, pancreas or colorectal cancers). More research needs to be conducted to understand the reasons behind non-confirmed cancer cases in SIDIAP. Most cancer diagnoses were reported within less than three months of difference between the SIDIAP and the registries, and generally, the cancer registries reported the cases earlier than the SIDIAP. Our results are in line with two previous studies in the UK which assessed the time difference between the date of cancer diagnoses registered in the cancer registries and primary care databases. One study reported a median time difference in the date of diagnosis of 11 days (range 6–30 days) between a UK primary care database and the Cancer Registry in England for colorectal, lung, gastro-esophageal and urological cancers.28 The other study, also using information from the same UK primary care database and cancer registry but combining 11 cancer types, reported that 63% of cancer diagnoses were recorded with one month of difference between the data sources and 24% within one to three months of difference. However, the authors did not specify which source registered the diagnosis first.32 Although the time difference between the data sources was not substantial in our study, investigators should be aware of it when addressing time-related research questions in the SIDIAP, such as those in the cancer survival field. In our study, the inclusion of hospital discharge data to SIDIAP cancer diagnoses improved the sensitivity estimates for most cancer sites, with substantial improvements observed particularly for less frequent cancer types. The use of multiple data sources is highly recommended when using EHRs for epidemiological research since the advantages of each database can overcome the limitations of the others.4,33 Specifically, the need to link primary care databases to those from hospitals and cancer registries to correctly identify certain cancer types has been proposed in the UK.27 Therefore, considering both SIDIAP and hospital discharge databases can improve the reliability in the results of future research. This may be especially important for larynx, head and neck and liver cancers. For gallbladder and biliary tract cancer, despite the sizeable improvement in sensitivity after adding hospital discharge to SIDIAP cancer diagnoses, the final sensitivity estimate (50%) seems insufficient to perform future studies using this cancer type as an outcome. If data is available, future studies may consider restricting their analyses to confirmed cases only to avoid misclassifications and attain data robustness. The main strengths of this study are first, the use of the SIDIAP database, which provides a large and representative sample of the Catalonian population and increases external validity.11 Second, the use of two population-based cancer registries as the gold-standard allowed us to validate numerous cancer types. Third, we were able to calculate the sensitivity of the SIDIAP cancer diagnoses, a type of measure that is often not reported in cancer validation studies. However, our study has limitations. First, since the SIDIAP is a primary care database, certain cancer types are harder to be detected at this level; nevertheless, we assessed the inclusion of hospital discharge information to account for this limitation. Second, textual information in medical records could be of value to distinguish cancer suspicions from actual diagnoses in the SIDIAP, but this information was not available in this study. Third, for this study we were only able to add cancer diagnoses from hospital discharge from the ICS hospitals, therefore we cannot confirm whether including information from all Catalan hospitals would permit better identification of cases for the same cancer types we found. Finally, our population of reference was the population of individuals assigned to a primary care center in Girona and Tarragona provinces in 2016 and, thus, we could not account for changes in patient address during the whole study period.

Conclusion

The SIDIAP includes 76% of the cancer diagnoses present in the cancer registries of Catalonia but also includes a considerable number of cases that are not in the registries. Overall, the SIDIAP reports cancer cases later than the registries but the time difference in the date of diagnosis between the databases is usually less than three months. Our results support the use of SIDIAP cancer diagnoses for national and international epidemiological research when cancer is used as an outcome, especially for the most frequent cancer types. The inclusion of cancer diagnoses from hospital discharge data is recommended to improve the reliability of certain cancer types such as head and neck, liver, larynx, and leukemia. However, our results do not support the use of SIDIAP data for all cancer sites when the purpose of the study is to identify a cohort of cancer patients. Further research is needed to understand the cancer cases recorded in the SIDIAP that were not confirmed by the cancer registries.
  15 in total

1.  [SIDIAP database: electronic clinical records in primary care as a source of information for epidemiologic research].

Authors:  Bonaventura Bolíbar; Francesc Fina Avilés; Rosa Morros; Maria del Mar Garcia-Gil; Eduard Hermosilla; Rafael Ramos; Magdalena Rosell; Jordi Rodríguez; Manuel Medina; Sebastian Calero; Daniel Prieto-Alhambra
Journal:  Med Clin (Barc)       Date:  2012-03-22       Impact factor: 1.725

2.  Construction and validation of a scoring system for the selection of high-quality data in a Spanish population primary care database (SIDIAP).

Authors:  M Del Mar García-Gil; Eduardo Hermosilla; Daniel Prieto-Alhambra; Francesc Fina; Magdalena Rosell; Rafel Ramos; Jordi Rodriguez; Tim Williams; Tjeerd Van Staa; Bonaventura Bolíbar
Journal:  Inform Prim Care       Date:  2011

Review 3.  Population-based cancer registries in Spain and their role in cancer control.

Authors:  C Navarro; C Martos; E Ardanaz; J Galceran; I Izarzugaza; R Peris-Bonet; C Martínez
Journal:  Ann Oncol       Date:  2010-05       Impact factor: 32.976

4.  Validity of cancer diagnosis in a primary care database compared with linked cancer registrations in England. Population-based cohort study.

Authors:  A Dregan; H Moller; T Murray-Thomas; M C Gulliford
Journal:  Cancer Epidemiol       Date:  2012-06-21       Impact factor: 2.984

5.  [Constructing a deprivation index based on census data in large Spanish cities(the MEDEA project)].

Authors:  M Felícitas Domínguez-Berjón; Carme Borrell; Gemma Cano-Serral; Santiago Esnaola; Andreu Nolasco; M Isabel Pasarín; Rebeca Ramis; Carme Saurina; Antonio Escolar-Pujolar
Journal:  Gac Sanit       Date:  2008 May-Jun       Impact factor: 2.139

6.  Cancer incidence in Spain, 2015.

Authors:  J Galceran; A Ameijide; M Carulla; A Mateos; J R Quirós; D Rojas; A Alemán; A Torrella; M Chico; M Vicente; J M Díaz; N Larrañaga; R Marcos-Gragera; M J Sánchez; J Perucha; P Franch; C Navarro; E Ardanaz; J Bigorra; P Rodrigo; R Peris Bonet
Journal:  Clin Transl Oncol       Date:  2017-01-16       Impact factor: 3.405

7.  Tradeoffs between accuracy measures for electronic health care data algorithms.

Authors:  Jessica Chubak; Gaia Pocobelli; Noel S Weiss
Journal:  J Clin Epidemiol       Date:  2011-12-23       Impact factor: 6.437

8.  Cancer incidence in The Health Improvement Network.

Authors:  Kevin Haynes; Kimberly A Forde; Rita Schinnar; Patricia Wong; Brian L Strom; James D Lewis
Journal:  Pharmacoepidemiol Drug Saf       Date:  2009-08       Impact factor: 2.890

9.  Linking of primary care records to census data to study the association between socioeconomic status and cancer incidence in Southern Europe: a nation-wide ecological study.

Authors:  Maria Garcia-Gil; Josep-Maria Elorza; Marta Banque; Marc Comas-Cufí; Jordi Blanch; Rafel Ramos; Leonardo Méndez-Boo; Eduardo Hermosilla; Bonaventura Bolibar; Daniel Prieto-Alhambra
Journal:  PLoS One       Date:  2014-10-20       Impact factor: 3.240

10.  How to validate a diagnosis recorded in electronic health records.

Authors:  Francis Nissen; Jennifer K Quint; Daniel R Morales; Ian J Douglas
Journal:  Breathe (Sheff)       Date:  2019-03
View more
  7 in total

1.  Is it time to use real-world data from primary care in Alzheimer's disease?

Authors:  Anna Ponjoan; Josep Garre-Olmo; Jordi Blanch; Ester Fages; Lia Alves-Cabratosa; Ruth Martí-Lluch; Marc Comas-Cufí; Dídac Parramon; María Garcia-Gil; Rafel Ramos
Journal:  Alzheimers Res Ther       Date:  2020-05-18       Impact factor: 6.982

2.  Excess cases of influenza and the coronavirus epidemic in Catalonia: a time-series analysis of primary-care electronic medical records covering over 6 million people.

Authors:  Ermengol Coma Redon; Nuria Mora; Albert Prats-Uribe; Francesc Fina Avilés; Daniel Prieto-Alhambra; Manuel Medina
Journal:  BMJ Open       Date:  2020-07-29       Impact factor: 2.692

3.  Body mass index and waist circumference in relation to the risk of 26 types of cancer: a prospective cohort study of 3.5 million adults in Spain.

Authors:  Martina Recalde; Veronica Davila-Batista; Yesika Díaz; Michael Leitzmann; Isabelle Romieu; Heinz Freisling; Talita Duarte-Salles
Journal:  BMC Med       Date:  2021-01-14       Impact factor: 8.775

4.  Cancer and the risk of coronavirus disease 2019 diagnosis, hospitalisation and death: A population-based multistate cohort study including 4 618 377 adults in Catalonia, Spain.

Authors:  Elena Roel; Andrea Pistillo; Martina Recalde; Sergio Fernández-Bertolín; María Aragón; Isabelle Soerjomataram; Mazda Jenab; Diana Puente; Daniel Prieto-Alhambra; Edward Burn; Talita Duarte-Salles
Journal:  Int J Cancer       Date:  2021-11-03       Impact factor: 7.316

5.  Association between metabolic syndrome and 13 types of cancer in Catalonia: A matched case-control study.

Authors:  Tomàs López-Jiménez; Talita Duarte-Salles; Oleguer Plana-Ripoll; Martina Recalde; Francesc Xavier-Cos; Diana Puente
Journal:  PLoS One       Date:  2022-03-04       Impact factor: 3.240

6.  Cancer diagnosis in primary care after second pandemic year in Catalonia: a time-series analysis of primary care electronic health records covering about 5 million people.

Authors:  Núria Mora; Carolina Guiriguet; Roser Cantenys; Leonardo Méndez-Boo; Mercè Marzo-Castillejo; Mència Benítez; Francesc Fina; Mireia Fàbregas; Eduardo Hermosilla; Albert Mercadé; Manuel Medina; Ermengol Coma
Journal:  Fam Pract       Date:  2022-07-21       Impact factor: 2.290

7.  Prehospital care for ovarian cancer in Catalonia: could we do better in primary care? Retrospective cohort study.

Authors:  Carmen Vela-Vallespín; Paula Manchon-Walsh; Luisa Aliste; Josep M Borras; Mercè Marzo-Castillejo
Journal:  BMJ Open       Date:  2022-07-22       Impact factor: 3.006

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.