Daniel E Russ1, Kwan-Yuet Ho1, Joanne S Colt2, Karla R Armenti3, Dalsu Baris2, Wong-Ho Chow4, Faith Davis5, Alison Johnson6, Mark P Purdue2, Margaret R Karagas7, Kendra Schwartz8, Molly Schwenn9, Debra T Silverman2, Calvin A Johnson1, Melissa C Friesen2. 1. Division of Computational Bioscience, Center for Information Technology, NIH, Bethesda, Maryland, USA. 2. Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA. 3. Division of Public Health Services, New Hampshire Department of Health and Human Services, Bureau of Public Health Statistics and Informatics, Concord, New Hampshire, USA. 4. University of Texas MD Anderson Cancer Center, Houston, Texas, USA. 5. Department of Public Health Sciences, University of Alberta, Edmonton, Alberta, Canada. 6. Vermont Cancer Registry, Burlington, Vermont, USA. 7. Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA. 8. Department of Family Medicine and Public Health Sciences, Wayne State University, Detroit, Michigan, USA. 9. Maine Cancer Registry, Augusta, Maine, USA.
Abstract
BACKGROUND: Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. METHODS: Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14 983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in 2 occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. RESULTS: For 11 991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6-digit and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (κ 0.6-0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. CONCLUSIONS: Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiological studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
BACKGROUND: Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. METHODS: Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14 983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in 2 occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. RESULTS: For 11 991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6-digit and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (κ 0.6-0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. CONCLUSIONS: Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiological studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Entities:
Keywords:
Computers and information technology < Methodology; speciality
Authors: K Teschke; A F Olshan; J L Daniels; A J De Roos; C G Parks; M Schulz; T L Vaughan Journal: Occup Environ Med Date: 2002-09 Impact factor: 4.402
Authors: Joanne S Colt; Margaret R Karagas; Molly Schwenn; Dalsu Baris; Alison Johnson; Patricia Stewart; Castine Verrill; Lee E Moore; Jay Lubin; Mary H Ward; Claudine Samanic; Nathaniel Rothman; Kenneth P Cantor; Laura E Beane Freeman; Alan Schned; Sai Cherala; Debra T Silverman Journal: Occup Environ Med Date: 2010-09-23 Impact factor: 4.402
Authors: Tom Koeman; Nadine S M Offermans; Yvette Christopher-de Vries; Pauline Slottje; Piet A Van Den Brandt; R Alexandra Goldbohm; Hans Kromhout; Roel Vermeulen Journal: Ann Occup Hyg Date: 2012-07-17
Authors: Thomas Rémen; Lesley Richardson; Corinne Pilorget; Gilles Palmer; Jack Siemiatycki; Jérôme Lavoué Journal: Ann Work Expo Health Date: 2018-08-13 Impact factor: 2.179
Authors: MaryBeth B Freeman; Lori A Pollack; Judy R Rees; Christopher J Johnson; Randi K Rycroft; David L Rousseau; Mei-Chin Hsieh Journal: Am J Ind Med Date: 2017-08 Impact factor: 2.214
Authors: Amber N Wilcox; Debra T Silverman; Melissa C Friesen; Sarah J Locke; Daniel E Russ; Noorie Hyun; Joanne S Colt; Jonine D Figueroa; Nathaniel Rothman; Lee E Moore; Stella Koutros Journal: Cancer Causes Control Date: 2016-11-01 Impact factor: 2.506
Authors: Jonathan Davis; Corinne Peek-Asa; Ann Marie Dale; Ling Zhang; Carri Casteel; Cara Hamann; Bradley A Evanoff Journal: Am J Ind Med Date: 2021-09-07 Impact factor: 3.079
Authors: Jo Steinson Stenehjem; Ronnie Babigumira; Melissa C Friesen; Tom Kristian Grimsrud Journal: Am J Ind Med Date: 2019-03-28 Impact factor: 3.079
Authors: Annika M Schoene; Ioannis Basinas; Martie van Tongeren; Sophia Ananiadou Journal: Int J Environ Res Public Health Date: 2022-07-13 Impact factor: 4.614
Authors: Stephen A Goutman; Jonathan Boss; Christopher Godwin; Bhramar Mukherjee; Eva L Feldman; Stuart A Batterman Journal: Int Arch Occup Environ Health Date: 2022-05-20 Impact factor: 2.851