Literature DB >> 22953174

Decision tree of occupational lung cancer using classification and regression analysis.

Tae-Woo Kim1, Dong-Hee Koh, Chung-Yill Park.   

Abstract

OBJECTIVES: Determining the work-relatedness of lung cancer developed through occupational exposures is very difficult. Aims of the present study are to develop a decision tree of occupational lung cancer.
METHODS: 153 cases of lung cancer surveyed by the Occupational Safety and Health Research Institute (OSHRI) from 1992-2007 were included. The target variable was whether the case was approved as work-related lung cancer, and independent variables were age, sex, pack-years of smoking, histological type, type of industry, latency, working period and exposure material in the workplace. The Classification and Regression Test (CART) model was used in searching for predictors of occupational lung cancer.
RESULTS: In the CART model, the best predictor was exposure to known lung carcinogens. The second best predictor was 8.6 years or higher latency and the third best predictor was smoking history of less than 11.25 pack-years. The CART model must be used sparingly in deciding the work-relatedness of lung cancer because it is not absolute.
CONCLUSION: We found that exposure to lung carcinogens, latency and smoking history were predictive factors of approval for occupational lung cancer. Further studies for work-relatedness of occupational disease are needed.

Entities:  

Keywords:  CART; Decision tree; Latency; Occupational lung cancer; Smoking

Year:  2010        PMID: 22953174      PMCID: PMC3430888          DOI: 10.5491/SHAW.2010.1.2.140

Source DB:  PubMed          Journal:  Saf Health Work        ISSN: 2093-7911


Introduction

Worldwide, lung cancer is the leading cause of death by cancer [1,2] and most common cancer among occupational related cancers [3-5]. Approximately 90% of men and 60% of women developing lung cancer are smokers [6]. According to many studies, work-related lung cancer occupies 5-20% of total lung cancer cases, a wide variance [7-9]. However, because workrelated lung cancer is differentiated from other lung cancers only by occupational exposure, not by histological morphology, pathologic form, clinical manifestation or outcome, distinction between occupational and non-occupational lung cancer is very difficult [10]. Therefore, doctors specializing in occupational and environmental medicine are typically consulted when compensating for occupational lung cancer [10]. Different approval rules for occupational diseases exist in each nation. In Korea, approval and compensation of occupational diseases are prescribed by Article 34 of the Industrial Accident Compensation Insurance Act. The work-relatedness of disease is decided by medical experts considering exposure experience and duration. The decision standards of work-relatedness for lung cancer in this Article included: worker exposed to tar and worked more than 10 years; worker exposed to chrome or chrome compounds and worked more than 2 years and worker exposed to asbestos. However, these standards were only applicable to industrial workers exposed to carcinogens such as crystalline silica [11], nickel [12], cadmium [13], arsenic [14], working in industries such as aluminum production [15], iron and steel founding [16], and rubber industry [17]. Therefore, because the determination of work-relatedness involves vague standards, many specialists' decisions may be subjective. In Korea Workers' Compensation and Welfare Service (COMWEL), which decides work-relatedness of diseases, clinical and occupational doctors formed the Occupational Disease Award Commission to determine work-relatedness. If a workrelatedness decision was very difficult, COMWEL sent the case to Occupational Safety and Health Research Institute (OSHRI) to perform an epidemiologic investigation. The OSHRI survey team, comprised of occupational and environmental medicine specialists and hygienists, surveyed the workplace environment for probable exposure materials via material safety data sheets or analysis of bulk samples, job history, general characteristics of the worker and family history. If possible, this team measured exposure concentration, but in many cases, measurement was not performed. They also extensively reviewed literature related with each case. The Occupational Disease Specialist Commission formed by university professors of occupational medicine and researchers of OSHRI discussed the work-relatedness of each case. After this discussion, researchers sent brief survey reports including the commission's recommendation for work-relatedness to COMWEL. Finally, COMWEL decided the work-relatedness of each case taking into consideration OSHRI's recommendation. In Korea, because only workers approved as having an occupational disease are compensated by COMWEL, the decision of work-relatedness is very important to workers and COMWEL. In spite of this importance, comprehensive studies on this decision-making process are very sparse. The aim of the present study is to find variables affecting the work-relatedness of lung cancer and to derive a decision tree of occupational lung cancer to help objectively decide work-relatedness.

Materials and Methods

Materials

From 1992 to 2007, 160 cases of lung cancer were referred to COMWEL for determination of work-relatedness. Among them, 6 cases postponing their decision and 1 case not having brief report were excluded. Therefore, 153 cases were included in this study.

Methods

Each of OSHRI's survey reports for lung cancer was reviewed, and 15 variables that probably affected decisions on occupational lung cancer were found; however, only 8 of them were included in this study. These variables were work-relatedness of lung cancer (occupational vs non-occupational lung cancer), demographic characteristics (sex, age diagnosed with lung cancer), health habit (smoking history), clinical factors (histological classification) and occupational factors (industry classification, exposed or probably exposed materials, latent period and working period). Occupational lung cancer was defined as primary lung cancer related with work. Industries were classified by International Standard Industrial Classification (ISIC Rev. 4) and divided into manufacturing and non-manufacturing industries. One major potential lung carcinogen to humans among several exposed or probable exposed materials was selected as the exposure material. By International Agency for Research on Cancer (IARC) classification, these exposure materials were classified as known (carcinogenic, IARC group 1) or unknown (probably, possibly and not carcinogenic, IARC group 2A, 2B and 4) carcinogen to lung. When cases of exposure to diesel engine exhaust (DEE) were reviewed, it was found that polycyclic aromatic hydrocarbons (PAHs) included in DEE [18] had a greater effect on the decision of occupational lung cancer than only DEE. Therefore, although DEE was IARC group 2A [19], DEE as PAHs was classified as IARC group 1 for this study. Latency was defined as the time from the date the worker was first exposed to exposure materials to diagnosis date. If the exposure period was not accurate, latency was defined as the period from the date first worked to diagnosis date. Working period was the period employed in the workplace that workers were exposed to known or unknown lung carcinogens.

Statistical analysis

Age diagnosed with lung cancer, smoking history, latency and working period were analyzed by independent t-test. Sex, histological classification and industry classification were analyzed by chi-square test. For deriving the occupational lung cancer prediction tree, Classification and Regression Tree (CART) analysis was used. CART analysis is a tree-building technique and suited to the generation of clinical decision rules [20,21]. It is a non-parametric technique that can handle highly skewed or multi-modal numerical data, as well as categorical predictors [22]. It can uncover complex interactions between predictors that are most important in determining the outcome variable to be explained [23]. The CART method is a relatively automatic machine learning method, which produces decision trees that are easily to interpret [20]. In CART analysis, the decision of occupational lung cancer was selected as the target variable and 7 variables were included as confounding variables. For validation of the tree, objects were divided randomly into training (122 cases, 80%) and validation data (31 cases, 20%) by data partition. The decision tree was derived from training data and evaluated by validation data. A multivariate logistic regression model was constructed from training data and tested in the validation data. For comparisons of accuracy of CART and the logistic regression model, area under receiver operating characteristic curves (AUCs) of two models was used in training and validation data.

Results

General characteristics and exposure materials

Among the 153 cases were 57 decisions of occupational lung cancer cases (37.3%) and 96 decisions of non-occupational lung cancer (62.7%). Mean age of the two groups was statistically similar (50.8 vs 50.3 years). Males comprised a greater proportion of occupational lung cancer than non-occupational lung cancer, but the difference was not significant (p = 0.620). Smoking history in non-occupational lung cancer was approximately 5 pack-years more than occupational but not significant (p = 0.089). Latency in occupational lung cancer was approximately 7 years longer than in non-occupational lung cancer (p < 0.001), and the working period of occupational lung cancer was 6 years longer (p < 0.001). According to industrial classification, proportion of occupational lung cancer in non-manufacturing industry was significantly higher than manufacturing industry (53.1% vs 29.8%, p = 0.003). According to histological classification, adenocarcinoma was most common followed by squamous cell carcinoma and small cell lung carcinoma (Table 1).
Table 1

Descriptive characteristics of the survey populations by occupational and non-occupational lung cancer

*SCC: squamous cell carcinoma.

†SCLC: small cell lung cancer.

‡Latent period: from the date that worker was first exposed by exposure materials to diagnosis date.

§Working period: the period employed in workplace that workers were exposed to known or unknown lung carcinogen.

In this study, there were 24 kinds of exposure material. Exposure cases to asbestos were most common (45 cases, 29.4%), followed by crystalline silica (21 cases, 13.7%) and dust (18 cases, 11.8%). Known lung carcinogens included asbestos, crystalline silica, hexavalent chromium (Cr(VI)), PAHs, Radon and Cokes oven Emission (COE). There were 92 cases of exposure to known carcinogens. While cases exposed to Cr(VI) had the highest proportion of occupational lung cancer (71.4%), workers exposed to PAHs had the lowest proportion (50%). There was only 1 case each of Radon and COE exposure (Table 2).
Table 2

Exposed material during work in the workplace ( ) = %

*PAHs: polycyclic aromatic hydrocarbons.

†Cr(IV): hexavalent chromium.

‡COE: coke oven emission.

§VCM: vinyl chloride monomer.

∥MMVF: man-made vitreous fiber.

¶MWF: metal working fluid.

**Cr(III): trivalent chromium.

CART model

In the training model, the first prediction factor of decision for occupational lung cancer was carcinogenesis to lung of exposure materials. Forty-seven cases (62.7%) among 75 cases exposed to known lung carcinogens were determined to be occupational lung cancer. All cases exposed to unknown lung carcinogens were decided as lung cancer developing by non-occupational causes. Among cases exposed to known lung carcinogens, the second predictor was latency of 8.6 years or higher. Forty-seven (71.2%) among 66 cases with a latency period of 8.6 years or higher were decided as occupational lung cancer. All cases with a latency of less than 8.6 years were decided as non-occupational. In cases exposed to known lung carcinogens and having a latency of 8.6 years or higher, smoking less than 11.25 pack-years provided additional prediction value. Thirtytwo (91.4%) of 35 cases were decided as occupational lung cancer. Fifteen (48.4%) of 31 cases having 11.25 pack-years or higher were decided as lung cancer developed by occupational exposure (Fig. 1).
Fig. 1

Predictors of occupational lung can cer in training data. Each node is based on available data for each predictive variable. Each approval rate for each predictor ismarked in box. Each predictor was written beside line.

Applying the rule of training data in the validation tree, 17 (58.8%) among 31 cases exposed to known lung carcinogens were decided as occupational lung cancer. Ten (62.5%) of 16 cases exposed to known lung carcinogens and having a latency of 8.6 years or higher were decided as occupational lung cancer. In 9 cases exposed to known lung carcinogens, having 8.6 years or higher latency and smoking less than 11.25 pack-years, 7 cases (77.8%) were decided as occupational lung cancer (Fig. 2). In logistic regression, carcinogenesis to lung of exposure materials, smoking amount and working period were analyzed as significant variables. With an increase of 1 pack-year, work-relatedness for occupational lung cancer decreased 0.94 times. Increasing one year of the working period increased workrelatedness 1.15 times.
Fig. 2

Decision tree of occupational lung can cer applied in validation data. Each node is based on available data for each pre dictive variable. Each approval rate for each predictor ismarked in box. Each predictor was written beside line.

Based on the AUCs, the accuracy of the CART model (0.914) was more than that of the logistic regression model (0.824).

Discussion

When deciding the work-relatedness of lung cancer, 4 steps should be considered: whether or not the worker was exposed to lung carcinogen or worked in an industry known for development of lung cancer, exposure dose of lung carcinogen, sufficient latency period for developing lung cancer and smoking amount [10]. In the CART model used in this study, 3 of the 4 steps were included (not exposure dose) and the accuracy of this model was very high. Our result that occupational exposure by known lung carcinogen was the first predictor of an occupational lung cancer decision was not surprising. Problems remain when deciding the work-relatedness of lung cancer in workers exposed to probable lung carcinogens such as man-made vitreous fibers [24]. Studies investigating the proportion of occupational cancer have included workers exposed to probable or possible carcinogens, as well as known carcinogen to humans [7-9]. In this model, only workers exposed to known lung carcinogens were determined to have lung cancer by occupational exposure. In Korea, if workers exposed to possible lung carcinogens request COMWEL to determine the work-relatedness of their lung cancer, the result may be negative. In Korea, the work-relatedness of lung cancer miners is determined by other laws for miners; therefore, we had only 1 survey report for workers exposed to radon. Known carcinogens to humans included in this model were asbestos [25], crystalline silica [11], PAHs [26], Cr(IV) [27] and COE [28]. It is known that these materials have a synergic effect for the development of lung cancer [29]. Because we included only 1 major lung carcinogen, these interactions were ignored. In each survey report, synergic effects of exposure materials actually showed a partially positive effect on the work-relatedness decision, but the effect was unclear. The interaction acted as an additional reason for a decision in workers strongly suspected of having occupational lung cancer. In the CART model, sufficient latency in the work-relatedness decision was 8.6 years or higher. Latency of a solid tumor is generally 10-12 years [30]. The cut point of latency in our result was shorter because a worker exposed to asbestos for 9 years was decided as having occupational lung cancer. Except for 1 case exposed to asbestos, latency in them was higher than 10 years. Purposes of CART were accurate prediction and classification. Therefore, latency values of 8.6 year or higher in this model should be estimated to exclude the 9 cases of non-occupational lung cancer. This result was a disadvantage of CART that prediction error should have occurred around the splitting point of interval variables [31]. In smoking studies using CART, high smoking quantity increased risk of lung cancer [32]. A dose-response relationship between smoking and lung cancer has been found, and the risk of lung cancer in the smoker group in the first two decades of smoking showed twice the odds ratio (OR) when compared to the non-smoker group [33]. In another study, lung cancer risk in the smoker group smoking more than 30 years increased than when smoking for less than 30 years [34]. In this study, most workers smoked less than 11.25 pack-years were classified as occupational lung cancer. This result could be interpreted as the carcinogenic effect of smoking was extremely important in the decision process for smokers with low smoking quantity or non-smokers. In cases that were definitely exposed to carcinogen and not exposed to environmental materials, such as asbestos in slate, were decided as occupational lung cancer. Considering the many causes of lung cancer, the synergic effect between smoking and occupational exposure to carcinogens has been shown [35]. In this model, this effect was found in workers smoking more than 9.75 pack-years. Although many studies supported this interaction [36,37], many studies have shown contradictory results [38,39]. Therefore the interaction between smoking and carcinogens is indefinite [40]. According to Hertz-Picciotto, proof that arsenic and smoking have an synergistic interaction in the development of lung cancer is compelling [39]. In present study, low smoking amount had a favorable effect on the work-related evaluation, while a high smoking amount did not have any clear implication. Asbestos is carcinogenic in human lungs, but 16 of 45 cases exposed to asbestos were decided as non-occupational lung cancer. In the review of 16 cases, 8 cases had a short working period of less than 8 years or irregular exposure and 4 cases were heavy smokers. In others, we suspected exposure to asbestos, but could not find certain evidence of exposure in bulk samples or work environment evaluation. The KOSHA team performed exposure assessment (personal and area airborne asbestos fiber concentration) in 11 of 45 cases exposed to asbestos, and 6 cases of the 11 were occupational lung cancer. Nine of the 11 cases were estimated by Time-Weighted Average (TWA) and 2 cases during short time. In cases decided as occupational lung cancer, chrysotile and tremolite were found in airborne samples. In cases estimated by TWA, airborne personal asbestos concentration was 0.003-0.4 fiber/cc, and area concentration was 0.002 fiber/cc. In cases estimated during 14 minutes, airborne personal and area concentration were 0.4 and 0.13 fiber/cc. In cases decided as non-occupational lung cancer, although chrysotile and tremolite were found, asbestos fiber concentration was very low (0 or 0.005 fiber/cc). In only a short time sample, personal and area asbestos concentration were 0.027 fiber/cc (22 minutes) and 0.159 fiber/cc (21 minutes). Although this case was exposed to high concentrations of asbestos, the KOSHA team determined it was non-occupational lung cancer due to short latency and the nature of his duties (supervising construction work) (Table 3). According to the asbestos exposure assessment, we know that low asbestos fiber concentration affects the decision of work-relatedness. However, we could not find asbestos exposure concentration standards for determining work-relatedness because many cases exposed to asbestos had no estimate of exposure concentration.
Table 3

Industry, latent period, working period, fiber type, airborne asbestos concentration and work-relatedness in cases measured by personal and area sampling

*Airborne concentration: estimated Time-Weighted Average (TWA) except for case 3 and 9.

†Min: measured minutes.

Although both latency and working period were statistically significant in t-test for all cases, only latency was included. This result might be interpreted that latency was more important than working period in approval of occupational lung cancer. Also, the latency in our model could already reflect the effect of the working period. In the chi-square test for industry classification of all cases, a significant increase in occupational lung cancer was found in the non-manufacturing industry compared with the manufacturing industry. In the CART model, industry classification was not included. When all cases were classified according to the first predictor (occupational exposure by known lung carcinogen), significance of industry classification disappeared. This result revealed the characteristic of CART analysis presenting decision processes for classifying patterns. Our study has several limitations. First, occupational decision by OSHRI may be inaccurate. For decision accuracy, a commission composed of 2 professors of occupational and environmental medicine, 11 researchers (9 specialists of occupational and environmental medicine and 2 hygienists) and several residents of OSHRI discussed and decided workrelatedness of diseases. Therefore, most of OSHRI's decisions may be relatively reliable. Second, this study was based on only Korean data. Each nation has their own rules and social consensus for occupational decisions. Therefore, our results may only be applicable in Korea. Third, because of the small sample size, the prediction cut point was unstable. Although the accuracy of this model was very high, it could be due to small objects. Minimizing the instability, analysis was repeatedly performed for an optimal tree. Fourth, exposure dose of the worker was not included in this analysis. In Korea, estimation of hazardous materials exposure - except for dust - in the workplace started after late 1990; therefore, exposure data from before this does not exist. We presumed, qualitatively, past exposure dose according to many studies done in other nations. We estimated recent exposure dose and collected current exposure data from workers' companies. However, because most of them were very low, we did not use them for occupational decisions. Absence of past exposure dose may be the starting point of this study. If exposure estimation were performed, this study would not be needed. Although not completed, we thought that the working period could partially estimate occupational exposure. However, the working period was not included in the decision tree. Fifth, selection bias may have existed because this study included only cases reviewed by COMWEL. Up to 2007, most cases were decided by OSHRI, except for the lung cancer cases among miners that were decided by COMWEL,. Therefore, this bias may be low. Finally, we did not carry out cross-validation with other data. Our data was unique occupational decision data in Korea. Therefore, we do not know the accurate validity of these results. In conclusion, we found that exposure to lung carcinogen, latency and smoking history were prediction factors of approval for occupational lung cancer. This decision tree must be considered as a minimal decision standard of work-relatedness for lung cancer, because doctors that decide work-relatedness must take into account the intricacies of each case. To make accurate decision standards for occupational lung cancer, additional studies for elevating validation have to be performed.
  30 in total

1.  Synergism between occupational arsenic exposure and smoking in the induction of lung cancer.

Authors:  I Hertz-Picciotto; A H Smith; D Holtzman; M Lipsett; G Alexeeff
Journal:  Epidemiology       Date:  1992-01       Impact factor: 4.822

2.  Asbestos, smoking, and lung cancer: interaction and attribution.

Authors:  B W Case
Journal:  Occup Environ Med       Date:  2006-08       Impact factor: 4.402

Review 3.  Diesel exhaust particles.

Authors:  H-E Wichmann
Journal:  Inhal Toxicol       Date:  2007       Impact factor: 2.724

4.  A risk model for prediction of lung cancer.

Authors:  Margaret R Spitz; Waun Ki Hong; Christopher I Amos; Xifeng Wu; Matthew B Schabath; Qiong Dong; Sanjay Shete; Carol J Etzel
Journal:  J Natl Cancer Inst       Date:  2007-05-02       Impact factor: 13.506

5.  Long-term follow-up study of mortality and the incidence of cancer in a cohort of workers at a primary aluminum smelter in Sweden.

Authors:  Ove Björ; Lena Damber; Clarence Edström; Tohr Nilsson
Journal:  Scand J Work Environ Health       Date:  2008-12       Impact factor: 5.024

6.  Occupationally related cancer risk among coke oven workers: 30 years of follow-up.

Authors:  J P Costantino; C K Redmond; A Bearden
Journal:  J Occup Environ Med       Date:  1995-05       Impact factor: 2.162

7.  The global burden of disease due to occupational carcinogens.

Authors:  Timothy Driscoll; Deborah Imel Nelson; Kyle Steenland; James Leigh; Marisol Concha-Barrientos; Marilyn Fingerhut; Annette Prüss-Ustün
Journal:  Am J Ind Med       Date:  2005-12       Impact factor: 2.214

8.  The contribution of occupational risks to the global burden of disease: summary and next steps.

Authors:  Marilyn Fingerhut; Deborah Imel Nelson; T Driscoll; Marisol Concha-Barrientos; Kyle Steenland; Laura Punnett; Annette Prüss-Ustün; J Leigh; C Corvalan; G Eijkemans; J Takala
Journal:  Med Lav       Date:  2006 Mar-Apr       Impact factor: 1.275

9.  Lung cancer incidence among Norwegian nickel-refinery workers 1953-2000.

Authors:  Tom K Grimsrud; Steinar R Berge; Jan Ivar Martinsen; Aage Andersen
Journal:  J Environ Monit       Date:  2003-04

10.  The LLP risk model: an individual risk prediction model for lung cancer.

Authors:  A Cassidy; J P Myles; M van Tongeren; R D Page; T Liloglou; S W Duffy; J K Field
Journal:  Br J Cancer       Date:  2007-12-18       Impact factor: 7.640

View more
  2 in total

1.  Supervised Machine Learning Empowered Multifactorial Genetic Inheritance Disorder Prediction.

Authors:  Taher M Ghazal; Hussam Al Hamadi; Muhammad Umar Nasir; Mohammed Gollapalli; Muhammad Zubair; Muhammad Adnan Khan; Chan Yeob Yeun
Journal:  Comput Intell Neurosci       Date:  2022-05-31

2.  Lung Cancer Mortality in the Swiss Working Population: The Effect of Occupational and Non-Occupational Factors.

Authors:  Nicolas Bovio; Pascal Wild; Irina Guseva Canu
Journal:  J Occup Environ Med       Date:  2021-12-01       Impact factor: 2.162

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.