Leo Joskowicz1, D Cohen2, N Caplan3, J Sosna3. 1. The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, 9190401, Jerusalem, Israel. josko@cs.huji.ac.il. 2. The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, 9190401, Jerusalem, Israel. 3. Department of Radiology, Hadassah Hebrew University Medical Center, Jerusalem, Israel.
Abstract
PURPOSE: To quantify the inter-observer variability of manual delineation of lesions and organ contours in CT to establish a reference standard for volumetric measurements for clinical decision making and for the evaluation of automatic segmentation algorithms. MATERIALS AND METHODS: Eleven radiologists manually delineated 3193 contours of liver tumours (896), lung tumours (1085), kidney contours (434) and brain hematomas (497) on 490 slices of clinical CT scans. A comparative analysis of the delineations was then performed to quantify the inter-observer delineation variability with standard volume metrics and with new group-wise metrics for delineations produced by groups of observers. RESULTS: The mean volume overlap variability values and ranges (in %) between the delineations of two observers were: liver tumours 17.8 [-5.8,+7.2]%, lung tumours 20.8 [-8.8,+10.2]%, kidney contours 8.8 [-0.8,+1.2]% and brain hematomas 18 [-6.0,+6.0] %. For any two randomly selected observers, the mean delineation volume overlap variability was 5-57%. The mean variability captured by groups of two, three and five observers was 37%, 53% and 72%; eight observers accounted for 75-94% of the total variability. For all cases, 38.5% of the delineation non-agreement was due to parts of the delineation of a single observer disagreeing with the others. No statistical difference was found for the delineation variability between the observers based on their expertise. CONCLUSION: The variability in manual delineations for different structures and observers is large and spans a wide range across a variety of structures and pathologies. Two and even three observers may not be sufficient to establish the full range of inter-observer variability. KEY POINTS: • This study quantifies the inter-observer variability of manual delineation of lesions and organ contours in CT. • The variability of manual delineations between two observers can be significant. Two and even three observers capture only a fraction of the full range of inter-observer variability observed in common practice. • Inter-observer manual delineation variability is necessary to establish a reference standard for radiologist training and evaluation and for the evaluation of automatic segmentation algorithms.
PURPOSE: To quantify the inter-observer variability of manual delineation of lesions and organ contours in CT to establish a reference standard for volumetric measurements for clinical decision making and for the evaluation of automatic segmentation algorithms. MATERIALS AND METHODS: Eleven radiologists manually delineated 3193 contours of liver tumours (896), lung tumours (1085), kidney contours (434) and brain hematomas (497) on 490 slices of clinical CT scans. A comparative analysis of the delineations was then performed to quantify the inter-observer delineation variability with standard volume metrics and with new group-wise metrics for delineations produced by groups of observers. RESULTS: The mean volume overlap variability values and ranges (in %) between the delineations of two observers were: liver tumours 17.8 [-5.8,+7.2]%, lung tumours 20.8 [-8.8,+10.2]%, kidney contours 8.8 [-0.8,+1.2]% and brain hematomas 18 [-6.0,+6.0] %. For any two randomly selected observers, the mean delineation volume overlap variability was 5-57%. The mean variability captured by groups of two, three and five observers was 37%, 53% and 72%; eight observers accounted for 75-94% of the total variability. For all cases, 38.5% of the delineation non-agreement was due to parts of the delineation of a single observer disagreeing with the others. No statistical difference was found for the delineation variability between the observers based on their expertise. CONCLUSION: The variability in manual delineations for different structures and observers is large and spans a wide range across a variety of structures and pathologies. Two and even three observers may not be sufficient to establish the full range of inter-observer variability. KEY POINTS: • This study quantifies the inter-observer variability of manual delineation of lesions and organ contours in CT. • The variability of manual delineations between two observers can be significant. Two and even three observers capture only a fraction of the full range of inter-observer variability observed in common practice. • Inter-observer manual delineation variability is necessary to establish a reference standard for radiologist training and evaluation and for the evaluation of automatic segmentation algorithms.
Entities:
Keywords:
Humans; Observer variation; Reproducibility of results
Authors: Charles R Meyer; Timothy D Johnson; Geoffrey McLennan; Denise R Aberle; Ella A Kazerooni; Heber Macmahon; Brian F Mullan; David F Yankelevitz; Edwin J R van Beek; Samuel G Armato; Michael F McNitt-Gray; Anthony P Reeves; David Gur; Claudia I Henschke; Eric A Hoffman; Peyton H Bland; Gary Laderach; Richie Pais; David Qing; Chris Piker; Junfeng Guo; Adam Starkey; Daniel Max; Barbara Y Croft; Laurence P Clarke Journal: Acad Radiol Date: 2006-10 Impact factor: 3.173
Authors: Suhny Abbara; Philipp Blanke; Christopher D Maroules; Michael Cheezum; Andrew D Choi; B Kelly Han; Mohamed Marwan; Chris Naoum; Bjarne L Norgaard; Ronen Rubinshtein; Paul Schoenhagen; Todd Villines; Jonathon Leipsic Journal: J Cardiovasc Comput Tomogr Date: 2016-10-12
Authors: Neha Bhooshan; Navesh K Sharma; Shahed Badiyan; Adeel Kaiser; Fred M Moeslein; Young Kwok; Pradip P Amin; Svetlana Kudryasheva; Michael D Chuong Journal: J Gastrointest Oncol Date: 2016-12
Authors: Hans Kristian Bø; Ole Solheim; Asgeir Store Jakola; Kjell-Arne Kvistad; Ingerid Reinertsen; Erik Magnus Berntsen Journal: J Neurooncol Date: 2016-11-11 Impact factor: 4.130
Authors: Vanya V Valindria; Ioannis Lavdas; Wenjia Bai; Konstantinos Kamnitsas; Eric O Aboagye; Andrea G Rockall; Daniel Rueckert; Ben Glocker Journal: IEEE Trans Med Imaging Date: 2017-04-17 Impact factor: 10.048
Authors: Bradley Spieler; Carl Sabottke; Ahmed W Moawad; Ahmed M Gabr; Mustafa R Bashir; Richard Kinh Gian Do; Vahid Yaghmai; Radu Rozenberg; Marielia Gerena; Joseph Yacoub; Khaled M Elsayes Journal: Abdom Radiol (NY) Date: 2021-03-31
Authors: Kimerly A Powell; Gregory J Wiet; Brad Hittle; Grace I Oswald; Jason P Keith; Don Stredney; Steven Arild Wuyts Andersen Journal: Int J Comput Assist Radiol Surg Date: 2021-02-13 Impact factor: 2.924
Authors: Jeffrey Solomon; Nina Aiosa; Dara Bradley; Marcelo A Castro; Syed Reza; Christopher Bartos; Philip Sayre; Ji Hyun Lee; Jennifer Sword; Michael R Holbrook; Richard S Bennett; Dima A Hammoud; Reed F Johnson; Irwin Feuerstein Journal: Int J Comput Assist Radiol Surg Date: 2020-07-09 Impact factor: 2.924
Authors: Michela Antonelli; Annika Reinke; Spyridon Bakas; Keyvan Farahani; Annette Kopp-Schneider; Bennett A Landman; Geert Litjens; Bjoern Menze; Olaf Ronneberger; Ronald M Summers; Bram van Ginneken; Michel Bilello; Patrick Bilic; Patrick F Christ; Richard K G Do; Marc J Gollub; Stephan H Heckers; Henkjan Huisman; William R Jarnagin; Maureen K McHugo; Sandy Napel; Jennifer S Golia Pernicka; Kawal Rhode; Catalina Tobon-Gomez; Eugene Vorontsov; James A Meakin; Sebastien Ourselin; Manuel Wiesenfarth; Pablo Arbeláez; Byeonguk Bae; Sihong Chen; Laura Daza; Jianjiang Feng; Baochun He; Fabian Isensee; Yuanfeng Ji; Fucang Jia; Ildoo Kim; Klaus Maier-Hein; Dorit Merhof; Akshay Pai; Beomhee Park; Mathias Perslev; Ramin Rezaiifar; Oliver Rippel; Ignacio Sarasua; Wei Shen; Jaemin Son; Christian Wachinger; Liansheng Wang; Yan Wang; Yingda Xia; Daguang Xu; Zhanwei Xu; Yefeng Zheng; Amber L Simpson; Lena Maier-Hein; M Jorge Cardoso Journal: Nat Commun Date: 2022-07-15 Impact factor: 17.694
Authors: Lena Maier-Hein; Matthias Eisenmann; Duygu Sarikaya; Keno März; Toby Collins; Anand Malpani; Johannes Fallert; Hubertus Feussner; Stamatia Giannarou; Pietro Mascagni; Hirenkumar Nakawala; Adrian Park; Carla Pugh; Danail Stoyanov; Swaroop S Vedula; Kevin Cleary; Gabor Fichtinger; Germain Forestier; Bernard Gibaud; Teodor Grantcharov; Makoto Hashizume; Doreen Heckmann-Nötzel; Hannes G Kenngott; Ron Kikinis; Lars Mündermann; Nassir Navab; Sinan Onogur; Tobias Roß; Raphael Sznitman; Russell H Taylor; Minu D Tizabi; Martin Wagner; Gregory D Hager; Thomas Neumuth; Nicolas Padoy; Justin Collins; Ines Gockel; Jan Goedeke; Daniel A Hashimoto; Luc Joyeux; Kyle Lam; Daniel R Leff; Amin Madani; Hani J Marcus; Ozanan Meireles; Alexander Seitel; Dogu Teber; Frank Ückert; Beat P Müller-Stich; Pierre Jannin; Stefanie Speidel Journal: Med Image Anal Date: 2021-11-18 Impact factor: 13.828