Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 An Empirical Study Into Annotator Agreement, Ground Truth Estimation, and Algorithm Evaluation.

Literature DB >> 27019487

An Empirical Study Into Annotator Agreement, Ground Truth Estimation, and Algorithm Evaluation.

Thomas A Lampert, Andre Stumpf, Pierre Gancarski.

Abstract

Although agreement between the annotators who mark feature locations within images has been studied in the past from a statistical viewpoint, little work has attempted to quantify the extent to which this phenomenon affects the evaluation of foreground-background segmentation algorithms. Many researchers utilize ground truth (GT) in experimentation and more often than not this GT is derived from one annotator's opinion. How does the difference in opinion affects an algorithm's evaluation? A methodology is applied to four image-processing problems to quantify the interannotator variance and to offer insight into the mechanisms behind agreement and the use of GT. It is found that when detecting linear structures, annotator agreement is very low. The agreement in a structure's position can be partially explained through basic image properties. Automatic segmentation algorithms are compared with annotator agreement and it is found that there is a clear relation between the two. Several GT estimation methods are used to infer a number of algorithm performances. It is found that the rank of a detector is highly dependent upon the method used to form the GT, and that although STAPLE and LSML appear to represent the mean of the performance measured using individual annotations, when there are few annotations, or there is a large variance in them, these estimates tend to degrade. Furthermore, one of the most commonly adopted combination methods-consensus voting-accentuates more obvious features, resulting in an overestimation of performance. It is concluded that in some data sets, it is not possible to confidently infer an algorithm ranking when evaluating upon one GT.

Entities: Disease

Year: 2016 PMID： 27019487 DOI： 10.1109/TIP.2016.2544703

Source DB: PubMed Journal: IEEE Trans Image Process ISSN： 1057-7149 Impact factor: 10.856

Keyword Cloud
Cited

11 in total

Review 1. Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis.

Authors: Davood Karimi; Haoran Dou; Simon K Warfield; Ali Gholipour
Journal: Med Image Anal Date: 2020-06-20 Impact factor: 8.545

2. Segmentation evaluation with sparse ground truth data: Simulating true segmentations as perfect/imperfect as those generated by humans.

Authors: Jieyu Li; Jayaram K Udupa; Yubing Tong; Lisheng Wang; Drew A Torigian
Journal: Med Image Anal Date: 2021-01-26 Impact factor: 8.545

3. Enhanced Field-Based Detection of Potato Blight in Complex Backgrounds Using Deep Learning.

Authors: Joe Johnson; Geetanjali Sharma; Srikant Srinivasan; Shyam Kumar Masakapalli; Sanjeev Sharma; Jagdev Sharma; Vijay Kumar Dua
Journal: Plant Phenomics Date: 2021-05-16

4. MeQryEP: A Texture Based Descriptor for Biomedical Image Retrieval.

Authors: G Deep; J Kaur; Simar Preet Singh; Soumya Ranjan Nayak; Manoj Kumar; Sandeep Kautish
Journal: J Healthc Eng Date: 2022-04-11 Impact factor: 3.822

5. Investigation of Bias in Continuous Medical Image Label Fusion.

Authors: Fangxu Xing; Jerry L Prince; Bennett A Landman
Journal: PLoS One Date: 2016-06-03 Impact factor: 3.240

6. 3D multi-view convolutional neural networks for lung nodule classification.

Authors: Guixia Kang; Kui Liu; Beibei Hou; Ningbo Zhang
Journal: PLoS One Date: 2017-11-16 Impact factor: 3.240

7. The feasibility of using citizens to segment anatomy from medical images: Accuracy and motivation.

Authors: Judith R Meakin; Ryan M Ames; J Charles G Jeynes; Jo Welsman; Michael Gundry; Karen Knapp; Richard Everson
Journal: PLoS One Date: 2019-10-10 Impact factor: 3.240

8. Segmentation of roots in soil with U-Net.

Authors: Abraham George Smith; Jens Petersen; Raghavendra Selvan; Camilla Ruø Rasmussen
Journal: Plant Methods Date: 2020-02-08 Impact factor: 4.993

9. On the objectivity, reliability, and validity of deep learning enabled bioimage analyses.

Authors: Dennis Segebarth; Matthias Griebel; Christoph M Flath; Robert Blum; Nikolai Stein; Cora R von Collenberg; Corinna Martin; Dominik Fiedler; Lucas B Comeras; Anupam Sah; Victoria Schoeffler; Teresa Lüffe; Alexander Dürr; Rohini Gupta; Manju Sasi; Christina Lillesaar; Maren D Lange; Ramon O Tasan; Nicolas Singewald; Hans-Christian Pape
Journal: Elife Date: 2020-10-19 Impact factor: 8.140

10. Quantifying Parkinson's disease motor severity under uncertainty using MDS-UPDRS videos.

Authors: Mandy Lu; Qingyu Zhao; Kathleen L Poston; Edith V Sullivan; Adolf Pfefferbaum; Marian Shahid; Maya Katz; Leila Montaser Kouhsari; Kevin Schulman; Arnold Milstein; Juan Carlos Niebles; Victor W Henderson; Li Fei-Fei; Kilian M Pohl; Ehsan Adeli
Journal: Med Image Anal Date: 2021-07-21 Impact factor: 13.828