David C Wheeler1, Kellie J Archer2, Igor Burstyn3, Kai Yu4, Patricia A Stewart5, Joanne S Colt6, Dalsu Baris6, Margaret R Karagas7, Molly Schwenn8, Alison Johnson9, Karla Armenti10, Debra T Silverman6, Melissa C Friesen6. 1. 1.Department of Biostatistics, School of Medicine, Virginia Commonwealth University, 830 East Main Street, Richmond, VA 23298, USA dcwheels@gmail.com. 2. 1.Department of Biostatistics, School of Medicine, Virginia Commonwealth University, 830 East Main Street, Richmond, VA 23298, USA. 3. 2.Drexel University, School of Public Health, Nesbitt Hall, 3215 Market Street, Philadelphia, PA 19104, USA. 4. 3.Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, MSC 9776, Bethesda, MD 20892, USA. 5. 4.Stewart Exposure Assessments, LLC, 6045 27th Street North, Arlington, VA 22207, USA. 6. 5.Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, MSC 9776, Bethesda, MD 20892, USA. 7. 6.Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, 7927 Rubin Building, Lebanon NH 03756, USA. 8. 7.Maine Cancer Registry, 286 Water Street, 4th Floor, 11 State House Station, Augusta, Maine 04333-0011, USA. 9. 8.Vermont Cancer Registry, Vermont Department of Health, P.O. Box 70, Burlington, VT 05402-0070, USA. 10. 9.New Hampshire Department of Health and Human Services, 29 Hazen Drive, Concord, NH 03301, USA.
Abstract
OBJECTIVES: To evaluate occupational exposures in case-control studies, exposure assessors typically review each job individually to assign exposure estimates. This process lacks transparency and does not provide a mechanism for recreating the decision rules in other studies. In our previous work, nominal (unordered categorical) classification trees (CTs) generally successfully predicted expert-assessed ordinal exposure estimates (i.e. none, low, medium, high) derived from occupational questionnaire responses, but room for improvement remained. Our objective was to determine if using recently developed ordinal CTs would improve the performance of nominal trees in predicting ordinal occupational diesel exhaust exposure estimates in a case-control study. METHODS: We used one nominal and four ordinal CT methods to predict expert-assessed probability, intensity, and frequency estimates of occupational diesel exhaust exposure (each categorized as none, low, medium, or high) derived from questionnaire responses for the 14983 jobs in the New England Bladder Cancer Study. To replicate the common use of a single tree, we applied each method to a single sample of 70% of the jobs, using 15% to test and 15% to validate each method. To characterize variability in performance, we conducted a resampling analysis that repeated the sample draws 100 times. We evaluated agreement between the tree predictions and expert estimates using Somers' d, which measures differences in terms of ordinal association between predicted and observed scores and can be interpreted similarly to a correlation coefficient. RESULTS: From the resampling analysis, compared with the nominal tree, an ordinal CT method that used a quadratic misclassification function and controlled tree size based on total misclassification cost had a slightly better predictive performance that was statistically significant for the frequency metric (Somers' d: nominal tree = 0.61; ordinal tree = 0.63) and similar performance for the probability (nominal = 0.65; ordinal = 0.66) and intensity (nominal = 0.65; ordinal = 0.65) metrics. The best ordinal CT predicted fewer cases of large disagreement with the expert assessments (i.e. no exposure predicted for a job with high exposure and vice versa) compared with the nominal tree across all of the exposure metrics. For example, the percent of jobs with expert-assigned high intensity of exposure that the model predicted as no exposure was 29% for the nominal tree and 22% for the best ordinal tree. CONCLUSIONS: The overall agreements were similar across CT models; however, the use of ordinal models reduced the magnitude of the discrepancy when disagreements occurred. As the best performing model can vary by situation, researchers should consider evaluating multiple CT methods to maximize the predictive performance within their data.
OBJECTIVES: To evaluate occupational exposures in case-control studies, exposure assessors typically review each job individually to assign exposure estimates. This process lacks transparency and does not provide a mechanism for recreating the decision rules in other studies. In our previous work, nominal (unordered categorical) classification trees (CTs) generally successfully predicted expert-assessed ordinal exposure estimates (i.e. none, low, medium, high) derived from occupational questionnaire responses, but room for improvement remained. Our objective was to determine if using recently developed ordinal CTs would improve the performance of nominal trees in predicting ordinal occupational diesel exhaust exposure estimates in a case-control study. METHODS: We used one nominal and four ordinal CT methods to predict expert-assessed probability, intensity, and frequency estimates of occupational diesel exhaust exposure (each categorized as none, low, medium, or high) derived from questionnaire responses for the 14983 jobs in the New England Bladder Cancer Study. To replicate the common use of a single tree, we applied each method to a single sample of 70% of the jobs, using 15% to test and 15% to validate each method. To characterize variability in performance, we conducted a resampling analysis that repeated the sample draws 100 times. We evaluated agreement between the tree predictions and expert estimates using Somers' d, which measures differences in terms of ordinal association between predicted and observed scores and can be interpreted similarly to a correlation coefficient. RESULTS: From the resampling analysis, compared with the nominal tree, an ordinal CT method that used a quadratic misclassification function and controlled tree size based on total misclassification cost had a slightly better predictive performance that was statistically significant for the frequency metric (Somers' d: nominal tree = 0.61; ordinal tree = 0.63) and similar performance for the probability (nominal = 0.65; ordinal = 0.66) and intensity (nominal = 0.65; ordinal = 0.65) metrics. The best ordinal CT predicted fewer cases of large disagreement with the expert assessments (i.e. no exposure predicted for a job with high exposure and vice versa) compared with the nominal tree across all of the exposure metrics. For example, the percent of jobs with expert-assigned high intensity of exposure that the model predicted as no exposure was 29% for the nominal tree and 22% for the best ordinal tree. CONCLUSIONS: The overall agreements were similar across CT models; however, the use of ordinal models reduced the magnitude of the discrepancy when disagreements occurred. As the best performing model can vary by situation, researchers should consider evaluating multiple CT methods to maximize the predictive performance within their data.
Authors: Joanne S Colt; Margaret R Karagas; Molly Schwenn; Dalsu Baris; Alison Johnson; Patricia Stewart; Castine Verrill; Lee E Moore; Jay Lubin; Mary H Ward; Claudine Samanic; Nathaniel Rothman; Kenneth P Cantor; Laura E Beane Freeman; Alan Schned; Sai Cherala; Debra T Silverman Journal: Occup Environ Med Date: 2010-09-23 Impact factor: 4.402
Authors: Melissa C Friesen; Anjoeka Pronk; David C Wheeler; Yu-Cheng Chen; Sarah J Locke; Dennis D Zaebst; Molly Schwenn; Alison Johnson; Richard Waddell; Dalsu Baris; Joanne S Colt; Debra T Silverman; Patricia A Stewart; Hormuzd A Katki Journal: Ann Occup Hyg Date: 2012-11-25
Authors: Renee N Carey; Timothy R Driscoll; Susan Peters; Deborah C Glass; Alison Reid; Geza Benke; Lin Fritschi Journal: Occup Environ Med Date: 2013-10-24 Impact factor: 4.402
Authors: Anjoeka Pronk; Patricia A Stewart; Joseph B Coble; Hormuzd A Katki; David C Wheeler; Joanne S Colt; Dalsu Baris; Molly Schwenn; Margaret R Karagas; Alison Johnson; Richard Waddell; Castine Verrill; Sai Cherala; Debra T Silverman; Melissa C Friesen Journal: Occup Environ Med Date: 2012-07-27 Impact factor: 4.402
Authors: David C Wheeler; Igor Burstyn; Roel Vermeulen; Kai Yu; Susan M Shortreed; Anjoeka Pronk; Patricia A Stewart; Joanne S Colt; Dalsu Baris; Margaret R Karagas; Molly Schwenn; Alison Johnson; Debra T Silverman; Melissa C Friesen Journal: Occup Environ Med Date: 2012-11-15 Impact factor: 4.402
Authors: Melissa C Friesen; David C Wheeler; Roel Vermeulen; Sarah J Locke; Dennis D Zaebst; Stella Koutros; Anjoeka Pronk; Joanne S Colt; Dalsu Baris; Margaret R Karagas; Nuria Malats; Molly Schwenn; Alison Johnson; Karla R Armenti; Nathanial Rothman; Patricia A Stewart; Manolis Kogevinas; Debra T Silverman Journal: Ann Occup Hyg Date: 2016-01-04
Authors: Calvin B Ge; Melissa C Friesen; Hans Kromhout; Susan Peters; Nathaniel Rothman; Qing Lan; Roel Vermeulen Journal: Ann Work Expo Health Date: 2018-11-12 Impact factor: 2.179
Authors: Albeliz Santiago-Colón; Carissa M Rocheleau; Stephen Bertke; Annette Christianson; Devon T Collins; Emma Trester-Wilson; Wayne Sanderson; Martha A Waters; Jennita Reefhuis Journal: Ann Work Expo Health Date: 2021-07-03 Impact factor: 2.179
Authors: Jin-Ah Sim; Young Ae Kim; Ju Han Kim; Jong Mog Lee; Moon Soo Kim; Young Mog Shim; Jae Ill Zo; Young Ho Yun Journal: Sci Rep Date: 2020-07-01 Impact factor: 4.379