Literature DB >> 30241269

Data Reliability and Coding Completeness of Cancer Registry Information Using Reabstracting Method in the National Cancer Institute: Thailand, 2012 to 2014.

Anupong Sirirungreung¹, Rangsiya Buasom¹, Chuleeporn Jiraphongsa¹, Suleeporn Sangrajrang¹.

Abstract

PURPOSE: Data quality is a core value of cancer registries, which bring about greater understanding of cancer distribution and determinants. Thailand established its cancer registry in 1986; however, studies focusing on data reliability have been limited. This study aimed to assess the coding completeness and reliability of the National Cancer Institute (NCI) hospital-based cancer registry, Thailand.
METHODS: This study was conducted using the reabstracting method. We focused on seven cancer sites-the colon, rectum, liver, lung, breast, cervix, and prostate-registered between 2012 and 2014 in the NCI hospital-based cancer registry. Missing data were identified for coding completeness calculation among important variables. The agreement rate and κ coefficient were computed to represent data reliability.
RESULTS: For reabstracting, we retrieved 957 medical records from a total of 5,462. These were selected using the probability proportional to size method, stratified by topology, sex, and registered year. The overall coding completeness of the registered and reabstracted data was 89.9% and 93.6%, respectively. In addition, the overall agreement rate among variables ranged from 84.7% to 99.6%, and κ coefficient ranged from 0.619 to 0.995. The misclassification among unilateral organs caused lower coding completeness and agreement rate of laterality coding. The completeness of current residency could be improved using the reabstracting method. The lowest agreement rate was found among various categories of diagnosis basis. Sex misclassification for male breast cancer was identified.
CONCLUSION: The coding completeness and data reliability of the NCI hospital-based cancer registry met the standard in most critical variables. However, some challenges remain to improve the data quality. The reabstracting method could identify the critical points affecting the quality of cancer registry data.

Entities: Disease Gene Species

Mesh：

Year: 2018 PMID： 30241269 PMCID： PMC6223438 DOI： 10.1200/JGO.17.00147

Source DB: PubMed Journal: J Glob Oncol ISSN： 2378-9506

INTRODUCTION

According to the world cancer statistics report 2013 by the International Agency for Research on Cancer, global cancer burden increased between 2008 (12.7 million new cases and 7.6 million deaths) and 2012 (14.1 million new cases and 8.2 million cancer deaths).[1] In Thailand, cancer was the leading cause of death over the last decade. In 2014, 70,075 Thais died as a result of cancer.[2] The cancer registry is an essential part of cancer prevention and control. The cancer registry is an organization for the systematic collection, storage, analysis, interpretation, and reporting of data on patients with cancer.[3] The data quality of the cancer registry is important to describe the extent of the cancer burden, to be a source of material for etiology studies, and to monitor and assess cancer prevention and control activities. Comparability, completeness, validity or accuracy, and timeliness are four dimensions to evaluate the data quality of the cancer registry.[4,5] The two types of cancer registries defined by population are hospital-based and population-based. The hospital-based cancer registry gathers information on patients with cancer visited in a particular hospital, whereas the population-based cancer registry attempts to collect data on patients with newly diagnosed cancer in a well-defined population.[3] Established in 1996, to date, the Thailand cancer registry has covered 389 hospitals from all regions and become a population-based cancer registry in some geographic areas.[6] Few studies have been conducted regarding data quality of the cancer registry in Thailand. Between 1998 and 2000, Sriplung[7] found that only three of nine cancer registry centers met the standard for cancer incidence in five continents according to two indices: the percent of death certificates only and the percent of morphologically verified cases. In a capture-recapture study by Suwanrungruang et al,[8] the completeness of data across nine Thai cancer registries from 2003 to 2007 varied from 70% to 99.7%. In addition, Suwanrungruang et al[8] suggested that the mortality incidence ratio should not be used to evaluate the completeness of case ascertainment, because it does not reflect the quality of cancer registry procedures.[9] To address the cancer registry process in terms of data agreement with source of medical records, the reabstracting method could be deployed.[4] The National Cancer Institute (NCI) in Bangkok, Thailand comprises functions of both hospital- and population-based cancer registries. The hospital-based cancer registry of NCI registers new cases of patients with cancer who visit the NCI. The patients of NCI come from many parts of Thailand. However, the population-based function of NCI’s cancer registry collects data from other hospital within the Bangkok catchment area. In total, 2,916 to 3,917 new cancer cases annually were entered in the NCI hospital-based cancer registry.[10] Data entry is performed by well-trained cancer registry officers. The double-entry process for routine data entry is not used. However, a logical check using International Agency for Research on Cancer CanReg program coding[11] is manually run by programmers when combining data for annual reports. Moreover, a web application program, which automatically checks, was widely implemented after 2014. Thus, we conducted this cancer data quality study using the reabstracting method to evaluate coding completeness and data reliability of the NCI hospital-based cancer registry.

METHODS

We retrieved the registered data and medical records concerning patients with colon, rectum, liver, lung, breast, cervix, and prostate cancer registered by the NCI hospital-based cancer registry between 2012 and 2014. Those topographies constituted the most common cancers among males and females in Thailand. The sample size was calculated based on a proportion to size calculation[12] using the proportion of non-Bangkok residents (P = 0.231),[10] which contributed the largest number of samples to accomplish the study objectives. We added 10% to the sample size to compensate for any lost medical records, so the size totaled 971. Samples of records were then selected using the probability proportional to size method stratified by topology, sex, and registered year. We audited all cases of male breast cancer, because the number was small (n = 8). Registered and reabstracted data were compared. The registered data comprised a set of patient data retrieved from the NCI hospital-based cancer registry database, which assisted the new linkage of identification numbers and marked patient identity variables. The reabstracted data comprised a set of data collected from the medical records by a new team of trained NCI audit staff. Three trained audit staff members from the national data audit units of NCI were deployed to reabstract the data according to national guidelines for cancer registry records. Reviewed and paper-recorded data were entered using Centers for Disease Control and Prevention Epi-Info, version 3.5.4. Double data entry was followed to recheck any data entry error. The 14 variables included date of birth, sex, religion, citizenship registration province, current resident province, date of diagnosis, diagnosis basis, two-digit topology code, malignant behavior, laterality, grade, five-digit morphology code, life status, and last contact date, to be retrieved from registered data and reviewed. The missing data were defined as blank, unknown, ruled out cancer (for behavior variable), non-stage or not applicable (for grade variable), unavailable code 80003 (for morphology), unavailable code 809 (for topology), or unidentified data in medical records. To check the missing data, coding completeness was calculated as follows: coding completeness (%) = 100 − (number of missing value/total number of data elements). The data reliability was presented in two indices: the data agreement rate and Cohen’s κ coefficient. The R program with package irr (Various Coefficients of Interrater Reliability and Agreement), version 0.84 was used to calculate the data agreement rate and unweighted κ coefficient of each variable except date. The missing data of each record were excluded from the analysis. The discrepancies of date variables were categorized as > 30 days, > 180 days, and > 365 days. Percent of nondiscrepancies of each category were calculated as: nondiscrepancies (%) = (number of reabstracted records with no discrepancies on date variable/total number of reabstracted records) × 100. Because NCI serves as an excellence center for cancer care in Thailand, some patients were referred from other hospitals. The diagnosis date was defined as that recorded in the referral paper. When the date was not recorded, or for unclear diagnosis, the following dates were used: biopsy or pathologic confirmation at NCI, cytology or imaging report, and physical examination by doctors. The life status and last contact date were excluded from calculation because of uncertain follow-up protocol. The stratified analysis of agreement rate and percent of nondiscrepancies was conducted to identify the variability across the year of diagnosis and cancer site. The data retrieval and reabstraction processes were conducted between August and September 2015. This study received approval from the Ethics Committee of NCI, Thailand (protocol No. 102_2015RB_IN432).

RESULTS

From 2012 to 2014, 5,462 records of colon, rectum, liver, lung, breast, cervix, and prostate cancer were added in the NCI hospital-based cancer registry. A total of 957 records were reabstracted and compared with the registered records. Of 979 retrieved records, 22 were missing during the reabstracted process. The sampling distribution of NCI hospital-based cancer registry records, reabstracted records by year of diagnosis, cancer site, and sex are presented in Table 1.

Table 1

Sampling Distribution of Reabstracted Data by Year of Diagnosis, Cancer Site, and Sex: The National Cancer Institute Hospital-Based Cancer Registry, Thailand, 2012 to 2014

Sampling Distribution of Reabstracted Data by Year of Diagnosis, Cancer Site, and Sex: The National Cancer Institute Hospital-Based Cancer Registry, Thailand, 2012 to 2014 In total, 13,706 data elements (979 records × 14 data elements) from registered data and 13,398 data elements (957 records × 14 data elements) from reabstracted data were assessed for missing data. The overall coding completeness rates were 89.9% and 93.6%, respectively. A lower level of coding completeness was observed for grade, laterality, and morphology variables. As a result, the reabstracting process could improve the current resident province of the patients (Table 2). When we assumed that the missing current resident province was the same as citizenship registration province in the registered data, then the reabstracting process could improve 17 of 504 data elements (3.4%) of missing registered data. In all, 179 of 263 (68.1%) data sets on laterality among unilateral organs (ie, cervix, liver, rectum, colon, and prostate) were missing.

Table 2

Coding Completeness of Registered and Reabstracted Data: The National Cancer Institute Hospital-Based Cancer Registry, Thailand, 2012 to 2014

Coding Completeness of Registered and Reabstracted Data: The National Cancer Institute Hospital-Based Cancer Registry, Thailand, 2012 to 2014 The overall agreement rate among variables ranged from 84.7% to 99.6%, and κ coefficient ranged from 0.619 to 0.995 (Table 3). The lowest overall agreement rate was observed for diagnosis basis (84.7%), followed by morphology code (88.4%), grade (89.3%), and laterality (89.8%).

Table 3

Data Agreement Rate and κ Coefficient Between Registered and Reabstracted Data: The National Cancer Institute Hospital-Based Cancer Registry, Thailand, 2012 to 2014

Data Agreement Rate and κ Coefficient Between Registered and Reabstracted Data: The National Cancer Institute Hospital-Based Cancer Registry, Thailand, 2012 to 2014 The percent of nondiscrepancies regarding the diagnosis date following > 30-day criteria was 73.6%, which dramatically increased to 92.0% using > 180-day criteria. Consequently, the percent of nondiscrepancies concerning date of birth were 86.0% and 87.7%, respectively (Table 4). Using the > 30-day criteria, we found that the lowest percent of nondiscrepancies were in regard to date of diagnosis in prostate (40.9%), breast (67.8%), colon (71.6%), and rectal (74.2%) cancers.

Table 4

Percent of Nondiscrepancies of Date Variables: The National Cancer Institute Hospital-Based Cancer Registry, Thailand, 2012 to 2014

Percent of Nondiscrepancies of Date Variables: The National Cancer Institute Hospital-Based Cancer Registry, Thailand, 2012 to 2014 The agreement rate and percent of nondiscrepancies by year of diagnosis (2012 to 2014) across key study variables were similar (Fig 1). However, after stratifying by cancer site (Fig 2), we found that the lowest overall agreement rates of diagnosis basis were observed for lung (59.3%), liver (70.2%), and prostate (77.2%) cancers. For the morphology code, the lowest overall agreement rates of diagnosis basis were for cervix cancer (71.4%), followed by liver (78.6%) and colon (87.9%) cancers. For grade, the lowest overall agreement rates of diagnosis basis were rectum cancer (83.0%), followed by breast (88.4%) and colon (89.2%) cancers. For laterality, the lowest overall agreement rates of diagnosis basis were in prostate cancer (42.9%), followed by liver (60.3%) and colon (79.2%) cancers.

Fig 1

Data agreement rate between registered and reabstracted data by year of diagnosis from the National Cancer Institute hospital-based cancer registry, Thailand, 2012 to 2014.

Fig 2

Data agreement rate between registered and reabstracted data by cancer site from the National Cancer Institute hospital-based cancer registry, Thailand, 2012 to 2014.

Data agreement rate between registered and reabstracted data by year of diagnosis from the National Cancer Institute hospital-based cancer registry, Thailand, 2012 to 2014. Data agreement rate between registered and reabstracted data by cancer site from the National Cancer Institute hospital-based cancer registry, Thailand, 2012 to 2014. The misclassification of sex among males with breast cancer in the registered data was identified. We found that five of eight were female by reabstracting results. The double check was conducted by reevaluating with medical records, and the results were the same.

DISCUSSION

The quality of cancer registry data could affect interpretation of cancer situation and risk factor identification.[13-16] The reabstracting method was used to evaluate data agreement with source of medical records.[4] However, some variations, in assessing the data agreement, were based on the data source compared. One group of studies used the existing database, such as cohort study, mortality data, and population census.[13,15,17,18] Another used medical records as the data source and deployed a well-trained audit to reabstract the data.[19-21] In this study, we compared registry data with reabstracted data from medical records in a hospital. Our study found that the overall coding completeness rate was good in most variables; however, some need to be improved. A 90% of coding completeness rate was recommended for survival analysis in one US study.[13] Age (date of birth) and sex variables of the NCI registered data met the US Cancer Statistics publication standard criteria.[22] However, four of 14 variables (laterality, current resident province, grade, and morphology code) displayed < 90% coding completeness. Some challenges of coding completeness need to be addressed. First, the laterality misclassification mostly resulted from unilateral organs coded as unknown. It could be improved using logical checking and recode. Second, the current resident province could be improved by medical record review only. However, a further exploration of similarity between citizenship registration and current resident was required. The coding completeness rates for some variables of reabstracted data were lower than the registered data. This could be attributed to handwriting interpretation difficulty and fading of printing over time. The high discrepancies of diagnosis date (> 30 days) might have occurred especially for the slowly progressing cancers and those requiring more diagnostic intervention. This was due to time lags between the first visit diagnosis date by clinical criteria and second visit confirmation by pathology results. However, this may also have resulted from not strictly relying on diagnosis-date criteria, because the staff may not have waited for the confirmed results. Waiting for complete data of diagnosis basis and diagnosis date might improve the quality of data. Furthermore, limitations of data retrieval may have occurred. Some medical records, especially hard copies of referral papers, may have been lost when returned to the local hospital. Similar to other studies, in this study overall data reliability was good (ie, data agreement rate > 80% and κ coefficient of 0.6).[20,21] However, the data agreement rate in some countries could reach from 76.2% to 100%.[19-21,23,24] Opportunities to further improve data accuracy were identified. First, the NCI cancer registry record had various categories of diagnosis basis (ie, history, imaging, biochemistry test, cytology test, metastasis biopsy, and primary biopsy), which confuse coders. A clearer definition and related evidence of each diagnosis basis category need to be considered. Some complex histologic tumors might be difficult to identify, such as those associated with ovarian cancer.[17] However, a subcategorized diagnosis basis may cause variation and discrepancies across tumor type. To answer the discrepancies for clinical diagnosis versus morphology, or morphology versus cytogenetics, requires more specialists’ decisions (eg, a pathologist or oncologist), which lies beyond the capability of our cancer registry officers. In addition, some discrepancies were found between malignancy and primary tumor, which causes challenges in cancer diagnosis over time.[25] Second, miscoding of laterality in unilateral organs may have resulted from a variety of diagnosis records, such as the right lobe of the liver, right side of the colon, or left side of the prostate. However, the coders need to follow the standard of laterality identification provided by the SEER program,[26] which objectively describes the paired organs. Breast cancer in males is rare, and therefore misclassification by sex may distort the incident rate interpretation. A process of sex verification should be implemented when a male patient with breast cancer is identified. Our study was subject to limitations. First, the internal validity across reabstracting team members was not assessed; however, we tried to limit this variation by training and providing a coding manual. Second, we found 22 medical records that could not be retrieved from the NCI Medical Record Department. Medical records of dead patients might have been pulled out from the documentation system and then brought together, making them difficult to sort and follow. In addition, some were lost in the referral process. However, these constituted a small number and did not disturb sampling distribution and result interpretation. Finally, other cancer sites excluded in this study may have more variability in agreement rate, such as nonsolid tumors, which use more complex morphology coding. Thus, the overall interpretation of this study should cautiously infer overall data reliability and coding completeness of the NCI hospital-based cancer registry. The interpretation of agreement rate and κ coefficient should be a concern for variables that have large amounts of missing data. Although the missing data cannot directly be identified as discordant between original and reabstracted data, Krippendorff’s α might be recommended for this issue.[27-30] However, agreement rate and κ coefficient are mostly presented in the literature regarding cancer registry data quality.[19-21,23,24] The process of data quality evaluation using various methods and indicators could emphasize an opportunity for improvement of the cancer registry. Furthermore, training registrars[31] and use of computer algorithm technology[32,33] are proven processes of data quality improvement. Therefore, the NCI has developed the Thai Cancer-Based Online web application to improve data quality and information sharing between cancer registry centers across the nation.[34] In conclusion, between 2012 and 2014, the overall data quality of the Thai NCI in terms of coding completeness and reliability was good compared with international standards. The reabstracting method provided more insights on data recoding protocol and adherence among coders. Additional improvement is recommended regarding the completeness of current resident status, defining of diagnosis date, diagnosis basis categorization, laterality coding of unilateral organs, and verification of rare male breast cancer. A double-check approach for data accuracy should be considered to update more available data. An algorithm in computer programming technology needs to be deployed to pose reminders of rare cancers among a particular sex or age group.

23 in total

1. Quality of case ascertainment in cancer registries: a proposal for a virtual three-source capture-recapture technique.

Authors: Krittika Suwanrungruang; Hutcha Sriplung; Pattarawin Attasara; Somnuk Temiyasathit; Rangsiya Buasom; Narate Waisri; Karnchana Daoprasert; Supot Kamsa-Ard; Cheamchit Tasanapitak
Journal: Asian Pac J Cancer Prev Date: 2011

2. Coding completeness and quality of relative survival-related variables in the National Program of Cancer Registries Cancer Surveillance System, 1995-2008.

Authors: Reda J Wilson; M E O'Neil; E Ntekop; Kevin Zhang; Y Ren
Journal: J Registry Manag Date: 2014

3. Comparability, diagnostic validity and completeness of Nigerian cancer registries.

Authors: B J S al-Haddad; Elima Jedy-Agba; Emmanuel Oga; E R Ezeome; Christopher C Obiorah; Michael Okobia; J Olufemi Ogunbiyi; Cornelius Ozobia Ukah; Abidemi Omonisi; A M E Nwofor; Festus Igbinoba; Clement Adebamowo
Journal: Cancer Epidemiol Date: 2015-04-08 Impact factor: 2.984

4. Accuracy and completeness of the New Zealand Cancer Registry for staging of invasive breast cancer.

Authors: Sanjeewa Seneviratne; Ian Campbell; Nina Scott; Rachel Shirley; Tamati Peni; Ross Lawrenson
Journal: Cancer Epidemiol Date: 2014-07-16 Impact factor: 2.984

Review 5. Evaluation of data quality in the cancer registry: principles and methods Part II. Completeness.

Authors: D Max Parkin; Freddie Bray
Journal: Eur J Cancer Date: 2009-01-06 Impact factor: 9.162

6. Case completeness and data accuracy in the Centers for Disease Control and Prevention's National Program of Cancer Registries.

Authors: Kathleen K Thoburn; Robert R German; Mary Lewis; Phyllis Janie Nichols; Faruque Ahmed; Jeannette Jackson-Thompson
Journal: Cancer Date: 2007-04-15 Impact factor: 6.860

7. Completeness of case ascertainment and survival time error in English cancer registries: impact on 1-year survival estimates.

Authors: H Møller; S Richards; N Hanchett; S P Riaz; M Lüchtenborg; L Holmberg; D Robinson
Journal: Br J Cancer Date: 2011-05-10 Impact factor: 7.640

8. Measuring inter-rater reliability for nominal data - which coefficients and confidence intervals are appropriate?

Authors: Antonia Zapf; Stefanie Castell; Lars Morawietz; André Karch
Journal: BMC Med Res Methodol Date: 2016-08-05 Impact factor: 4.615

9. Quality and completeness improvement of the Population-based Cancer Registry of São Paulo: linkage technique use.

Authors: Stela Verzinhasse Peres; Maria do Rosário Dias de Oliveira Latorre; Luana Fiengo Tanaka; Fernanda Alessandra Silva Michels; Monica La Porte Teixeira; Claudia Medina Coeli; Márcia Furquim de Almeida
Journal: Rev Bras Epidemiol Date: 2016 Oct-Dec

10. The reliability of Cancer Registry records.

Authors: M C Gulliford; J Bell; H M Bourne; A Petruckevitch
Journal: Br J Cancer Date: 1993-04 Impact factor: 7.640