Literature DB >> 29480830

A novel approach for medical research on lymphomas: A study validation of claims-based algorithms to identify incident cases.

Cécile Conte1, Aurore Palmaro, Pascale Grosclaude, Laetitia Daubisse-Marliac, Fabien Despas, Maryse Lapeyre-Mestre.   

Abstract

The use of claims database to study lymphomas in real-life conditions is a crucial issue in the future. In this way, it is essential to develop validated algorithms for the identification of lymphomas in these databases. The aim of this study was to assess the validity of diagnosis codes in the French health insurance database to identify incident cases of lymphomas according to results of a regional cancer registry, as the gold standard.Between 2010 and 2013, incident lymphomas were identified in hospital data through 2 algorithms of selection. The results of the identification process and characteristics of incident lymphomas cases were compared with data from the Tarn Cancer Registry. Each algorithm's performance was assessed by estimating sensitivity, predictive positive value, specificity (SPE), and negative predictive value.During the period, the registry recorded 476 incident cases of lymphomas, of which 52 were Hodgkin lymphomas and 424 non-Hodgkin lymphomas. For corresponding area and period, algorithm 1 provides a number of incident cases close to the Registry, whereas algorithm 2 overestimated the number of incident cases by approximately 30%. Both algorithms were highly specific (SPE = 99.9%) but moderately sensitive. The comparative analysis illustrates that similar distribution and characteristics are observed in both sources.Given these findings, the use of claims database can be consider as a pertinent and powerful tool to conduct medico-economic or pharmacoepidemiological studies in lymphomas.
Copyright © 2017 The Authors. Published by Wolters Kluwer Health, Inc. All rights reserved.

Entities:  

Mesh:

Year:  2018        PMID: 29480830      PMCID: PMC5943849          DOI: 10.1097/MD.0000000000009418

Source DB:  PubMed          Journal:  Medicine (Baltimore)        ISSN: 0025-7974            Impact factor:   1.889


Introduction

Lymphomas are a large and heterogeneous group of lymphoid neoplasms with distinct biological and clinical features, treatment, and prognosis.[ Non-Hodgkin lymphoma (NHL) is the most frequent hematologic malignancy and account for approximately 90% of lymphomas.[ In the last 15 years, incidence of NHL has increased steadily, whereas the progress of pharmacological treatments improves NHL median survival time with a constant decrease of mortality.[ In parallel, there is an increased incidence of Hodgkin lymphomas (HL) in adolescents and young adults with a large number of surviving patients.[ Consequently, there are increased number of patients exposed to potential cancer-related consequences such as long-term adverse effects of treatment, polypharmacy and drug interactions, risk of 2nd cancer, and relapse. Moreover, oncohematology represents a fast-evolving field with continuous scientific progress, update, and changes especially in genomics and biology, diagnostic improvement, and therapeutics with targeted therapy.[ Therapeutic changes are based on results of randomized controlled trials, conducted on a limited number of patients with drastic selection criteria. As a consequence, these patients are nonrepresentative of patients in the real clinical practice (i.e., older, with polymorbidity, and polypharmacy) and real-life data remain scarce.[ Moreover, long-term effects of new antineoplastic agents remain unknown after marketing authorization. In this context, real-life data are required to conduct pharmacoepidemiological studies, especially, safety evaluation. Multiple sources provide useful data to conduct observational study on lymphoma, as data collected in cancer registries and retrospective or prospective surveys. However, the French health insurance system database (Système National d’Informations inter-Régimes de l’Assurance Maladie, SNIIRAM) may be used as a pertinent and complementary tool for this research purpose because of several strengths that can minimize classic bias associated with other sources. First, this national database provides extensive data covering a population of more than 65 million inhabitants. The large number of patient recorded in this database permits to increase statistic power of analyses especially for studying rare disease. Moreover, the completeness of the data could minimize selection bias related to the constitution of specialized cancer center's cohorts and attrition bias related to long-term follow-up. Selection bias is an important problem giving results not always transposable to the target population. Then, it provides anonymous and individual data on patient characteristics with demographic data, long-term chronic diseases (‘affections de longue durée’, LTDs), and vital status. The access to ambulatory healthcare consumption (reimbursed drugs and medical acts) and the linkage with data from the national hospital database (‘Programme de Médicalisation des Systèmes d’information’, PMSI) gives a complete overview of lymphomas care pathway for several years all over France. The database includes also data regarding some drugs used during hospitalization such as rituximab, a cornerstone of the treatment of several types of lymphomas. Hence, this database provides extensive data on drug exposure minimizing information bias (recall bias, nonresponse bias, or reporting bias) and of great interest to conduct medico-economic study in lymphomas.[ Moreover, it could be a pertinent tool for quality measurement of healthcare use in screening or treatment of lymphomas, as highlighted in other cancer.[ In the light of the above and to improve validity of studies conducted within this database, it is essential to develop validated methods for accurate identification of specific diseases.[ For lymphoma cases, it is crucial to classify with precision NHL by subtypes because of heterogeneity of diseases, treatments, and prognosis. Some identification algorithms have been validated to detect incident cancer cases but, to the best of our knowledge, there is no validated algorithm to identify incident cases of HL and NHL.[ The aims of this study were to assess the validity of hospital diagnosis codes in the PMSI database to identify incident cases of lymphomas according to results of a regional cancer registry and to compare baseline characteristics of lymphoma cases between sources.

Materials and method

Study design and data sources

The population source was inhabitants of the Tarn department, an administrative area of 384,474 inhabitants in southwestern France. Two algorithms were defined to detect lymphomas cases using PMSI and/or LTD data available in the SNIIRAM database. Incident lymphoma cases were identified using antecedent of hospitalization for lymphoma recorded with hospital diagnosis. An incident case must have no previous record of lymphoma diagnosis during an observation period of 24 months. The results of this identification process were compared with data from the Tarn Cancer Registry considered as the “gold standard” in this area. Complete data from the registry were available until December 31, 2013, thus, data related to hematologic malignancies were extracted from January 1, 2010 to December 31, 2013. In parallel, PMSI and LTD data were extracted from January 1, 2008 to December 31, 2013 for inhabitants of the Tarn department, allowing the reconstitution of an observation period to identify incident cases.

The tarn cancer registry

It is a population-based cancer registry assessed every 5 years by the “Comité d’évaluation des registres”. Quality controls are carried out by the registry using tools provided by the International Agency for Research on Cancer and the data are regularly included in the “Cancer Incidence in 5 Continents” monograph series since 1982. Cancers were defined according to the International Classification of Diseases for Oncology, 3rd edition (ICD-O-3). Nominative data are collected and coded in accordance with international guidelines. Identification of potential incident cancer cases is done using several relevant data sources like oncology regional network, anatomopathology laboratories, office from specialized physicians, and LTD and PMSI data. Every case is validated after crossing these data sources and checking medical records. For all patients, the following data are available: demographic data, cancer diagnosis date, stage of the cancer, cancer topography and morphology, vital status, and so on.[ Lymphoma cases were identified through 2 selection periods (2010–2013 and 2011–2013) on the basis of the WHO classification[ to assess the impact of length of observation in algorithms’ performance. Selection of incident Multiple myeloma (ICD-O code ‘9732/3’), plasmacytoma (ICD-O code ‘9731/3’), and extramedullary plasmacytoma (ICD-O code ‘9734/3’) cases has been previously studied separately.[ A complete list of codes considered to identify lymphomas cases is given in Table 1.
Table 1

Lymphomas diagnoses codes used for patients’ selection in the registry (ICD-O-3) and PMSI/LTD data (ICD-10).

Lymphomas diagnoses codes used for patients’ selection in the registry (ICD-O-3) and PMSI/LTD data (ICD-10).

The PMSI database

In France, public and private hospital payment is based on diagnosis-related group system. For each patient hospital stay, a standard discharge summary (Résumé de Sortie Standardisé) is produced with the aim of providing a precise measure of activity which is then used for reimbursement purpose. In this context, the PMSI database contains demographic data, routinely collected medical data (diagnosis, procedures), and administrative data (date and length of stay, hospital location). Diagnoses are coded according to International Classification of Diseases, 10th revision (ICD-10). They provide the leading cause of hospital admission with main diagnosis (MD). They give accuracy on patient's management with related diagnosis (RD) and on major comorbidities and complications with associated diagnosis (AD).The coding quality of these data are regularly checked by internal controls and external audit. Diagnoses from ‘long-term conditions’ scheme. LTDs are defined by severe and/or chronic diseases that require expansive or chronic treatment. There is a list established by decree that include 30 diseases, of which hematologic malignancies. After physician request, there is an exemption of copayment for care in relation with LTD. Diagnoses is coded according to ICD-10.

Algorithms of selection of incident lymphomas cases in PMSI and LTD data

For the 2 selection periods (2010–2013 and 2011–2013), inhabitants of the Tarn department with lymphoma were identified in the PMSI database through 2 algorithms: Algorithm 1: at least a MD of lymphoma or an MD of chemotherapy in combination with a RD or AD of lymphoma; Algorithm 2: at least a MD or RD or AD of lymphoma. For each algorithm, the impact of LTD data in combination with PMSI data were explored. Then, the use of only LTD data to identify incident lymphoma cases were evaluated through algorithm 3 (at least 1 code of lymphoma in LTD data).A complete list of codes used is given in Table 1. To be defined as incident, patients must have no record of lymphoma diagnosis code in the 24 months (selection within period 2010–2013) or 36 months (selection within period 2011 and 2013) before the 1st hospitalization date for lymphoma found in our dataset.

Matching

The linkage between the registry and PMSI and/or LTD was done using a probabilistic matching on the basis of combinations of 5 variables: family name, birth name, 1st name, date of birth, sex, place of birth (“commune”, lowest administrative area in France). About 24 possible combinations were tested patients matching for at least 1 combination of these variables were considered as matched.

Analysis

Descriptive statistics were used to characterize the study population (Table 2). Qualitative variables were expressed in frequencies and percentages. Quantitative variables were expressed as median and interquartile range. The results of the identification process and characteristics of patients in PMSI/LTD databases were compared with true cases from the Tarn Cancer Registry considered as the “gold standard”. Thus, each algorithm performance was assessed by estimating sensitivity (SE), predictive positive value (PPV), specificity (SPE), and negative predictive value (NPV).True positives (TP) were incident cases identified in the PMSI/LTD databases recorded in the registry as incident cases of lymphoma. False positives (FPs) were incident cases in PMSI/LTD database not recorded as incident cases in the Registry. False negatives (FNs) were incident cases recorded in the registry but not identified as incident cases in PMSI/LTD databases. Hence, FN can correspond to matched incident lymphoma in the registry not identified by the algorithm applied on the PMSI/LTD data or to incident lymphoma in the registry with no corresponding data in PMSI/LTD databases (not matched patients) PMSI/LTD databases (Fig. 1). The impact of the length of observation period and the use of LTD on algorithm performance was assessed for each algorithm (Table 3). For both algorithms, performance of detection was evaluated for each subtype of lymphoma (list of codes used in Table 1 and results in Table 4). To identify the reasons of discrepancies between the registry and the PMSI database: an exploratory analysis of FN and FP was done. For this purpose, we conducted a multivariate regression logistic to determine characteristics of incident lymphomas in the registry associated with the probability of not being identified in the PMSI database (FN) (Table 5). The FN status was used as the explanatory variable (FN = 1 for FN and FN = 0 for TP). Lymphomas characteristics included in the model were the following: age as a continuous variable, sex, type of lymphoma, and stage according to the Binet staging system or the Ann Arbor staging system. Data analyses were carried out using SAS 9.4 software (SAS Inst., Cary, NC).
Table 2

Characteristics of lymphomas in the tarn cancer registry between 2010 and 2013, n = 476.

Figure 1

Estimation of algorithms performance's parameters. FN = false negatives, FP = false positives, LTD = long-term chronic diseases, NPV = negative predictive value, PMSI = Programme de Médicalisation des Systèmes d’information, PPV = predictive positive value, Se = sensitivity, Spe = specificity, TN = true negatives, TP = true positives.

Table 3

Se and PPV for both algorithms and selection period (all lymphomas).

Table 4

Se and PPV for both algorithms by subtype of lymphomas.

Table 5

Characteristics of incident lymphomas in the registry not identified through the PMSIa, n = 476.

Characteristics of lymphomas in the tarn cancer registry between 2010 and 2013, n = 476. Estimation of algorithms performance's parameters. FN = false negatives, FP = false positives, LTD = long-term chronic diseases, NPV = negative predictive value, PMSI = Programme de Médicalisation des Systèmes d’information, PPV = predictive positive value, Se = sensitivity, Spe = specificity, TN = true negatives, TP = true positives. Se and PPV for both algorithms and selection period (all lymphomas). Se and PPV for both algorithms by subtype of lymphomas. Characteristics of incident lymphomas in the registry not identified through the PMSIa, n = 476.

Confidentiality

All data were treated confidentially and were only those already extracted for internal use of the Tarn Cancer Registry. Ethical approval has been given by the French ethical committee and Data Protection Supervisory Authority: ([Commission Nationale de l’Informatique et des Libertés (reference number: 99 80 15 (12/1998), 99 80 15 version 2 (10/2003)]).

Results

Population and algorithms performances for all lymphomas

Between 2010 and 2013, among the 384,474 inhabitants of the Tarn department, the registry identified 476 validated incident cases of lymphomas, of which 52 HL cases and 424 NHL cases. Among the 424 NHL patients, diffuse large B cell lymphomas (DLBCL) was the most common subtype accounting for 32.1% (n = 136) of patients, followed by chronic lymphocytic leukemia (CLL)/small lymphocytic lymphoma (SLL) (N = 99; 23.3%), other mature B-cell NHL (n = 88; 20.7%), follicular lymphoma (n = 60; 14.2%), and mature T-cell NHL (n = 41; 9.7%). The median age was 69 (58–81) years old with a majority of men (n = 253; 53.2%). Characteristics of patients are presented in Table 2. For corresponding area and period, PMSI data were available for 15,522 patients and LTD data for 7885 patients. Among the 476 lymphomas patients, 203 (42.6%) patients were matched with LTD data and 377 (79.2%) were matched with PMSI data. When using PMSI data only, algorithm 1 provides a number of incident cases close to the Registry (475 vs. 476), whereas algorithm 2 overestimated the number of incident cases by approximately 30%. For algorithm 1, SE and PPV were closed, respectively 66.8% (95% confidence interval (CI) [62.5–70.9]) and 67.0% (95% CI [62.6–71]). For algorithm 2, SE was increased by up to 10%, whereas PPV was decreased by up to 9% because of a decrease number of FNs counterbalanced by an increase number of FPs. Both algorithms presented high SPE and NPV (99.9%). The results of SE and PPV calculation for all lymphomas are presented in Table 2. For each algorithm, there was no impact of length of observation in algorithm performance. The use of LTD data alone for identifying lymphomas in claims database resulted in poor performances with SE around 33% and PPV around 70%. The use of LTD data in combination with PMSI data had no impact in algorithm performance to detect incident cases of lymphomas. Characteristics of lymphomas were similar when using the 3 sources. The diagnosis date in the Registry was closed to the 1st hospitalization date identified with a median delay of 0[−1; 21] days.

Algorithms performances by subtypes of lymphomas

Performances of detection of incident cases by both algorithms differ according to lymphomas subtypes for SE and PPV. However, values of SPE and NPV remains maximal (99.9%) for each lymphoma subtype. The results of SE and PPV calculation by lymphomas subtypes are presented in Table 3.

HL

Among the 52 HL identified in the registry between 2010 and 2013, 49 were selected by the 2 algorithms leading to very high SE of 94.2% (95% CI [84.4–98.0]). However, PPV was 10% higher for algorithm 1 than for algorithm 2.

B-NHL

Among the 296 B cell non-Hodgkin's lymphoma (B-NHL) identified in the registry between 2010 and 2013, 221 were selected by algorithm 1 leading to a SE of 74.6% (95% CI [69.4–79.3]) and a PPV of 64.6% (95% CI [59.4–69.5]). For corresponding period, 240 B-NHL were selected by algorithm 2 leading to higher SE (81.1% (95% CI [76.2–85.1])) and a slight decrease in PPV around 5%.

T-NHL

For T cell non-Hodgkin's lymphoma (T-NHL) patients, SE dropped to low values of 48.6% (95% CI [33.4–64.1]) for algorithm 1 and 59.4% (95% CI [43.5–73.6]) for algorithm 2. PPV values were similar (78.3% vs. 64.7%) for each algorithm when considering the width of the CIs.

CLL/SLL

The use of algorithm 2 to identify new CLL patients resulted in better performances with a SE of 47% (95% CI [37.5–56.7]) against a SE of 19.0% (95% CI [12.5–27.8]) for algorithm 1. PPV values were similar for each algorithm.

Exploratory analysis of FN

Among the 158 FN, 59 were found in the PMSI database, whereas 99 were not found. For matched FN, reasons of misclassification were: Exclusion of patients by algorithm 1: considered as prevalent (n = 2), patients with only an AD or RD of lymphoma (n = 31), or missing value for type of diagnosis (n = 12). Coding error (n = 7): lymphomas were coded as other hematologic malignancies (n = 5) such as Waldenström macroglobulinamia, other malignant immunoproliferative diseases, and leukemia or only lymphoma's localization or procedures was coded (n = 2). No corresponding data in the PMSI database for corresponding period for patients with cutaneous lymphoma, low-grade follicular lymphoma, or CLL Binet stage A (n = 7). The results of the univariate and multivariate logistic regression are given in Table 5. After adjustment, characteristics of incident lymphomas associated with an increased probability of being a FN were: older age, type of lymphoma (NHL patients), and localized stage of lymphoma.

Exploratory analysis of FP

Among the 157 FP, only 10 patients were matched with the registry. These patients were identified in the registry with other hematologic malignancies as follows: chronic myeloid leukemia, lymphoproliferative disorder, refractory anemia with excess blasts, and interdigitating dendritic cell sarcoma. Among the FP with no record in the registry, we identified in PMSI data 45 (30.6%) CLL, 28 (19.0%) DLBCL, 10 (6.8%) HL, 15 (10.2%) follicular lymphoma, 43 (29.2%) other mature B-cell NHL, and 6 (4.1%) mature T-cell NHL.

Discussion

Main findings

The proposed algorithms are extremely specific and consequently diagnosis codes in the PMSI database allow an accurate identification of new lymphomas cases. By contrast, these algorithms are moderately sensitive. Algorithm 1 based on diagnosis and procedure codes seem to be more accurate with optimal performance parameters and incidence close to the registry. The length of the observation period and the combination of LTD with PMSI data do not improve performances. Algorithms exhibited very different performances according to lymphomas subtype, ranging to very poor performance for CLL to very acceptable parameters for HL. The implications of these findings suggest that the use of the PMSI database alone is not enough sensitive to conduct epidemiological studies. Indeed, the incidence provided by PMSI data is close to the registry because FN and FP have similar frequencies and counterbalanced each other.

Strengths and limitations

Our study presents some limitations. First, this study was conducted in a specific geographic area. Hence, we cannot exclude a lack of representativeness of the algorithms’ performance at the national level to detect incident lymphomas cases. Even if coding practice are standardized at the national level and are improving over time, we cannot exclude some discrepancies between hospitals, according to their interpretation of national coding rules. Finally, the performance of algorithm may be underestimated because of a potential failure of linkage between the registry and the PMSI database leading to an increased number of FNs and FPs. Our study presents several strengths. First, our study provides for the 1st time a validated algorithm to detect incident lymphoma in the French SNIIRAM, but also suitable for other healthcare database using ICD-10th classification medico-administrative database. Some selection algorithms have been validated in cancer but the literature related to hematological diseases is very poor with only 1 systematic review of validated method to identify lymphoma in administrative data. This review identified only 1 publication with a validated algorithm defined with ICD-9 code. The results of this validation study were concordant with our results.[ Moreover, validation study using ICD-10 are lacking for European and Nordic database, in which ICD-10 is more frequent. Then, our results demonstrate that this approach is of great interest to conduct pharmacoepidemiological or medico-economic studies in lymphomas because of several strengths. First, SPE of each algorithm is maximal allowing an accurate identification of cases. Then, the French health insurance database provides the exhaustiveness of healthcare consumption data at the national level. Finally, our analysis revealed that incident lymphomas not detected as incident or identified in the PMSI database are more likely to be old, with localized stage of lymphoma and concern more NHL patients. According to these findings, FN may concern patients never hospitalized for their lymphoma because of different disease management and/or a gap between diagnosis and treatment. These results suggest that it would have been of interest to conduct analyses of SE including only treated lymphoma patients but this information was lacking in the registry database. However, when regarding algorithm performances by lymphomas subtype, the results directly reflects the heterogeneity of lymphoma care pathway and questioned on the relevance of the use of PMSI data to select new cases in certain lymphomas subtypes. In fact, the very low SE for CLL identification can be explained because a majority of CLL is nonprogressive at diagnosis and does not require active treatment.[ As a corollary, algorithm 2 results in better performances in CLL because CLL or chemotherapy for CLL is not necessarily the leading cause of hospitalization for these patients. The same reason can be cited for T-NHL. Apart from the majority of FNs corresponded to cutaneous lymphomas which do not require hospitalization and are nondetectable by PMSI data.[ By contrast, algorithms revealed very high SPE and SE to detect HL patients. These results can be explained because HL always requires inpatient treatment and variability in ICD-10 code is minor.[ Given the low incidence of this disease and the completeness of SNIIRAM data at the national level, the SNIIRAM database could be used as a relevant and powerful tool to conduct pharmacoepidemiological studies with exhaustive real-life data in HL. Finally, our results illustrate that PMSI data can be used to describe with accuracy lymphomas and that the date of diagnosis can be estimated by the 1st hospitalization for lymphoma found in the dataset. However, the use of ICD-10 to classify NHL by subtypes lacks precision because of the multiplicity of code to register 1 subtype of lymphomas. For that matter, the classification system used impact directly data produced on lymphomas. As depicted by Adzersen et al,[ the choice of the classification system leads to differences on incidence rate estimates from data coming from a same registry dataset. Differences were stronger for B-NHL. In our study, differences between registry and hospital data may directly result from these discrepancies between ICD-O-3 and ICD-10.

International initiatives

These considerations and examples highlight that the relevance of the use of claims database for research purpose must be based on a case by case reflection process. In this way, several aspect must be consider to improve validity of the results of future studies conducted on these databases like intrinsic features of diseases and management, type, design, and aims of study conducted. The development of validated tool and the use of standardized method are crucial for the validity of future active surveillance study in lymphomas. In this way, several initiative and project are conducted with the aims to harmonize detection of medical event in claims database in the United States and in Europe (Mini Sentinel program, Observational Medical Outcomes Partnership, Pharmacoepidemiological Research on Outcomes of Therapeutics by a European Consortium).[ This validation study follows this quality approach and demonstrates that claims database, and the French SNIIRAM specifically can be a useful and powerful tool for postmarketing studies or medico-economic context for proper research purpose.
  1 in total

1.  Validity of claims-based algorithms for selected cancers in Japan: Results from the VALIDATE-J study.

Authors:  Cynthia de Luise; Naonobu Sugiyama; Toshitaka Morishima; Takakazu Higuchi; Kayoko Katayama; Sho Nakamura; Haoqian Chen; Edward Nonnenmacher; Ryota Hase; Sadao Jinno; Mitsuyo Kinjo; Daisuke Suzuki; Yoshiya Tanaka; Soko Setoguchi
Journal:  Pharmacoepidemiol Drug Saf       Date:  2021-06-01       Impact factor: 2.890

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.