Kallie J Chen1, Priya H Dedhia2, Joseph R Imbus2, David F Schneider2. 1. Division of Endocrine Surgery at University of Wisconsin School of Medicine and Public Health, Department of Surgery, Madison, Wisconsin. Electronic address: kjchen3@wisc.edu. 2. Division of Endocrine Surgery at University of Wisconsin School of Medicine and Public Health, Department of Surgery, Madison, Wisconsin.
Abstract
BACKGROUND: Critical thyroid nodule features are contained in unstructured ultrasound (US) reports. The Thyroid Imaging, Reporting, and Data System (TI-RADS) uses five key features to risk stratify nodules and recommend appropriate intervention. This study aims to analyze the quality of US reporting and the potential benefit of Natural Language Processing (NLP) systems in efficiently capturing TI-RADS features from text reports. MATERIALS AND METHOD: This retrospective study used free-text thyroid US reports from an academic center (A) and community hospital (B). Physicians created "gold standard" annotations by manually extracting TI-RADS features and clinical recommendations from reports to determine how often they were included. Similar annotations were created using an automated NLP system and compared with the gold standard. RESULTS: Two hundred eighty-two reports contained 409 nodules at least 1-cm in maximum diameter. The gold standard identified three nodules (0.7%) which contained enough information to calculate a complete TI-RADS score. Shape was described most often (92.7% of nodules), whereas margins were described least often (11%). A median number of two TI-RADS features are reported per nodule. The NLP system was significantly less accurate than the gold standard in capturing echogenicity (27.5%) and margins (58.9%). One hundred eight nodule reports (26.4%) included clinical management recommendations, which were included more often at site A than B (33.9 versus 17%, P < 0.05). CONCLUSIONS: These results suggest a gap between current US reporting styles and those needed to implement TI-RADS and achieve NLP accuracy. Synoptic reporting should prompt more complete thyroid US reporting, improved recommendations for intervention, and better NLP performance.
BACKGROUND: Critical thyroid nodule features are contained in unstructured ultrasound (US) reports. The Thyroid Imaging, Reporting, and Data System (TI-RADS) uses five key features to risk stratify nodules and recommend appropriate intervention. This study aims to analyze the quality of US reporting and the potential benefit of Natural Language Processing (NLP) systems in efficiently capturing TI-RADS features from text reports. MATERIALS AND METHOD: This retrospective study used free-text thyroid US reports from an academic center (A) and community hospital (B). Physicians created "gold standard" annotations by manually extracting TI-RADS features and clinical recommendations from reports to determine how often they were included. Similar annotations were created using an automated NLP system and compared with the gold standard. RESULTS: Two hundred eighty-two reports contained 409 nodules at least 1-cm in maximum diameter. The gold standard identified three nodules (0.7%) which contained enough information to calculate a complete TI-RADS score. Shape was described most often (92.7% of nodules), whereas margins were described least often (11%). A median number of two TI-RADS features are reported per nodule. The NLP system was significantly less accurate than the gold standard in capturing echogenicity (27.5%) and margins (58.9%). One hundred eight nodule reports (26.4%) included clinical management recommendations, which were included more often at site A than B (33.9 versus 17%, P < 0.05). CONCLUSIONS: These results suggest a gap between current US reporting styles and those needed to implement TI-RADS and achieve NLP accuracy. Synoptic reporting should prompt more complete thyroid US reporting, improved recommendations for intervention, and better NLP performance.
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Wendy W Chapman; Prakash M Nadkarni; Lynette Hirschman; Leonard W D'Avolio; Guergana K Savova; Ozlem Uzuner Journal: J Am Med Inform Assoc Date: 2011 Sep-Oct Impact factor: 4.497
Authors: Flavio Barbosa; Lea Maria Zanini Maciel; Elizabeth Melmi Vieira; Paulo M de Azevedo Marques; Jorge Elias; Valdair Francisco Muglia Journal: Clinics (Sao Paulo) Date: 2010 Impact factor: 2.365
Authors: Gary Gamme; Tyler Parrington; Edward Wiebe; Sunita Ghosh; Brendan Litt; David C Williams; Todd P W McMullen Journal: Can J Surg Date: 2017-04 Impact factor: 2.089