Hongli Lin1, Changxing Huang2, Weisheng Wang2, Jiawei Luo2, Xuedong Yang3, Yuling Liu2. 1. College of Computer Science and Electronic Engineering and Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities, Hunan University, Changsha 410082, China. Electronic address: hllin@hnu.edu.cn. 2. College of Computer Science and Electronic Engineering and Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities, Hunan University, Changsha 410082, China. 3. Department of Computer Science, University of Regina, Regina, SK, Canada.
Abstract
RATIONALE AND OBJECTIVES: The purpose of this study was to measure and analyze interobserver disagreement in rating diagnostic characteristics of pulmonary nodules on computed tomography scans using the Lung Imaging Database Consortium and Image Database Resource Initiative (LIDC/IDRI) database, and then to provide investigators with understanding the variability in rating diagnostic characteristics among radiologists. MATERIALS AND METHODS: A histogram-based accumulated nodule-level approach is proposed to measure interobserver disagreement in rating diagnostic characteristics of pulmonary nodules among radiologists. The mean rating differences of radiologists on nodule level are calculated; next, a histogram of the accumulated nodule-level disagreements is constructed; and finally, mean, variance, skewness, and kurtosis statistics based on the histogram are extracted to analyze and summary interobserver disagreement in terms of the assessment of diagnostic characteristics of radiologists. Using the developed computer scheme, the disagreement of radiologists in rating all of 1880 distinct nodules from 1018 computed tomography scans are analyzed using original ratings as well as combined ratings according to the LIDC/IDRI instruction. RESULTS: The interobserver disagreement in rating diagnostic characteristics according to the defined categories of the LIDC/IDRI is substantial. The mean values of disagreement range from 0.0052 to 0.2341. The highest disagreement lies in rating subtlety characteristics, whereas internal structure receives the lowest disagreement of 0.0052. The calcification, texture, spiculation, lobulation, malignancy, sphericity, and margin receive disagreements of 0.0393, 0.1351, 0.1616, 0.1943, 0.2144, 0.2174, and 0.2228, respectively. CONCLUSIONS: Disagreements exist across radiologists in rating diagnostic characteristics of pulmonary nodules, and the disagreement levels vary from each other. Agreement among radiologists is improved by combining ratings according to the LIDC/IDRI instruction. For investigators, understanding and appreciating the disagreement level of each diagnostic characteristic is required when using them in related researches.
RATIONALE AND OBJECTIVES: The purpose of this study was to measure and analyze interobserver disagreement in rating diagnostic characteristics of pulmonary nodules on computed tomography scans using the Lung Imaging Database Consortium and Image Database Resource Initiative (LIDC/IDRI) database, and then to provide investigators with understanding the variability in rating diagnostic characteristics among radiologists. MATERIALS AND METHODS: A histogram-based accumulated nodule-level approach is proposed to measure interobserver disagreement in rating diagnostic characteristics of pulmonary nodules among radiologists. The mean rating differences of radiologists on nodule level are calculated; next, a histogram of the accumulated nodule-level disagreements is constructed; and finally, mean, variance, skewness, and kurtosis statistics based on the histogram are extracted to analyze and summary interobserver disagreement in terms of the assessment of diagnostic characteristics of radiologists. Using the developed computer scheme, the disagreement of radiologists in rating all of 1880 distinct nodules from 1018 computed tomography scans are analyzed using original ratings as well as combined ratings according to the LIDC/IDRI instruction. RESULTS: The interobserver disagreement in rating diagnostic characteristics according to the defined categories of the LIDC/IDRI is substantial. The mean values of disagreement range from 0.0052 to 0.2341. The highest disagreement lies in rating subtlety characteristics, whereas internal structure receives the lowest disagreement of 0.0052. The calcification, texture, spiculation, lobulation, malignancy, sphericity, and margin receive disagreements of 0.0393, 0.1351, 0.1616, 0.1943, 0.2144, 0.2174, and 0.2228, respectively. CONCLUSIONS: Disagreements exist across radiologists in rating diagnostic characteristics of pulmonary nodules, and the disagreement levels vary from each other. Agreement among radiologists is improved by combining ratings according to the LIDC/IDRI instruction. For investigators, understanding and appreciating the disagreement level of each diagnostic characteristic is required when using them in related researches.
Authors: Johanna Uthoff; Nicholas Koehn; Jared Larson; Samantha K N Dilger; Emily Hammond; Ann Schwartz; Brian Mullan; Rolando Sanchez; Richard M Hoffman; Jessica C Sieren Journal: Eur Radiol Date: 2019-04-01 Impact factor: 5.315