Sooyoung Yoo1, Eunsil Yoon1, Dachung Boo1, Borham Kim1, Seok Kim1, Jin Chul Paeng2, Ie Ryung Yoo3, In Young Choi4,5, Kwangsoo Kim6, Hyun Gee Ryoo7,8, Sun Jung Lee4,5, Eunhye Song9, Young-Hwan Joo10, Junmo Kim11, Ho-Young Lee1,2. 1. Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea. 2. Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea. 3. Division of Nuclear Medicine, Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea. 4. Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea. 5. Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea. 6. Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, South Korea. 7. Department of Nuclear Medicine, Seoul National University Hospital, Seoul, South Korea. 8. Department of Nuclear Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea. 9. Department of Data Science Research, Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, South Korea. 10. Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea. 11. Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, South Korea.
Abstract
BACKGROUND: Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date. OBJECTIVE: We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports. METHODS: Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data. RESULTS: The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%. CONCLUSION: As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
BACKGROUND: Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date. OBJECTIVE: We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports. METHODS: Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data. RESULTS: The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%. CONCLUSION: As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Authors: S Ortiz; J M Rodríguez; T Soria; D Pérez-Flores; A Piñero; J Moreno; P Parrilla Journal: Otolaryngol Head Neck Surg Date: 2001-03 Impact factor: 3.497
Authors: Adam Yala; Regina Barzilay; Laura Salama; Molly Griffin; Grace Sollender; Aditya Bardia; Constance Lehman; Julliette M Buckley; Suzanne B Coopey; Fernanda Polubriaginof; Judy E Garber; Barbara L Smith; Michele A Gadd; Michelle C Specht; Thomas M Gudewicz; Anthony J Guidi; Alphonse Taghian; Kevin S Hughes Journal: Breast Cancer Res Treat Date: 2016-11-08 Impact factor: 4.872
Authors: George Hripcsak; Jon D Duke; Nigam H Shah; Christian G Reich; Vojtech Huser; Martijn J Schuemie; Marc A Suchard; Rae Woong Park; Ian Chi Kei Wong; Peter R Rijnbeek; Johan van der Lei; Nicole Pratt; G Niklas Norén; Yu-Chuan Li; Paul E Stang; David Madigan; Patrick B Ryan Journal: Stud Health Technol Inform Date: 2015
Authors: Rimma Belenkaya; Michael J Gurley; Asieh Golozar; Dmitry Dymshyts; Robert T Miller; Andrew E Williams; Shilpa Ratwani; Anastasios Siapos; Vladislav Korsik; Jeremy Warner; W Scott Campbell; Donna Rivera; Tatiana Banokina; Elizaveta Modina; Shantha Bethusamy; Henry Morgan Stewart; Meera Patel; Ruijun Chen; Thomas Falconer; Rae Woong Park; Seng Chan You; Hokyun Jeon; Soe Jeong Shin; Christian Reich Journal: JCO Clin Cancer Inform Date: 2021-01
Authors: Arika E Wieneke; Erin J A Bowles; David Cronkite; Karen J Wernli; Hongyuan Gao; David Carrell; Diana S M Buist Journal: J Pathol Inform Date: 2015-06-23