Henry Dieckhaus1, Rozanna Meijboom2, Serhat Okar3, Tianxia Wu4, Prasanna Parvathaneni3, Yair Mina5,6, Siddharthan Chandran2, Adam D Waldman2, Daniel S Reich3, Govind Nair1. 1. qMRI Core Facility, NINDS, National Institutes of Health, Bethesda, MD. 2. Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom. 3. Translational Neuroradiology Section, NINDS, National Institutes of Health, Bethesda, MD. 4. Clinical Trials Unit, NINDS, National Institutes of Health, Bethesda, MD. 5. Viral Immunology Section, NINDS, National Institutes of Health, Bethesda, MD; and. 6. Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
Abstract
OBJECTIVES: Automated whole brain segmentation from magnetic resonance images is of great interest for the development of clinically relevant volumetric markers for various neurological diseases. Although deep learning methods have demonstrated remarkable potential in this area, they may perform poorly in nonoptimal conditions, such as limited training data availability. Manual whole brain segmentation is an incredibly tedious process, so minimizing the data set size required for training segmentation algorithms may be of wide interest. The purpose of this study was to compare the performance of the prototypical deep learning segmentation architecture (U-Net) with a previously published atlas-free traditional machine learning method, Classification using Derivative-based Features (C-DEF) for whole brain segmentation, in the setting of limited training data. MATERIALS AND METHODS: C-DEF and U-Net models were evaluated after training on manually curated data from 5, 10, and 15 participants in 2 research cohorts: (1) people living with clinically diagnosed HIV infection and (2) relapsing-remitting multiple sclerosis, each acquired at separate institutions, and between 5 and 295 participants' data using a large, publicly available, and annotated data set of glioblastoma and lower grade glioma (brain tumor segmentation). Statistics was performed on the Dice similarity coefficient using repeated-measures analysis of variance and Dunnett-Hsu pairwise comparison. RESULTS: C-DEF produced better segmentation than U-Net in lesion (29.2%-38.9%) and cerebrospinal fluid (5.3%-11.9%) classes when trained with data from 15 or fewer participants. Unlike C-DEF, U-Net showed significant improvement when increasing the size of the training data (24%-30% higher than baseline). In the brain tumor segmentation data set, C-DEF produced equivalent or better segmentations than U-Net for enhancing tumor and peritumoral edema regions across all training data sizes explored. However, U-Net was more effective than C-DEF for segmentation of necrotic/non-enhancing tumor when trained on 10 or more participants, probably because of the inconsistent signal intensity of the tissue class. CONCLUSIONS: These results demonstrate that classical machine learning methods can produce more accurate brain segmentation than the far more complex deep learning methods when only small or moderate amounts of training data are available (n ≤ 15). The magnitude of this advantage varies by tissue and cohort, while U-Net may be preferable for deep gray matter and necrotic/non-enhancing tumor segmentation, particularly with larger training data sets (n ≥ 20). Given that segmentation models often need to be retrained for application to novel imaging protocols or pathology, the bottleneck associated with large-scale manual annotation could be avoided with classical machine learning algorithms, such as C-DEF.
OBJECTIVES: Automated whole brain segmentation from magnetic resonance images is of great interest for the development of clinically relevant volumetric markers for various neurological diseases. Although deep learning methods have demonstrated remarkable potential in this area, they may perform poorly in nonoptimal conditions, such as limited training data availability. Manual whole brain segmentation is an incredibly tedious process, so minimizing the data set size required for training segmentation algorithms may be of wide interest. The purpose of this study was to compare the performance of the prototypical deep learning segmentation architecture (U-Net) with a previously published atlas-free traditional machine learning method, Classification using Derivative-based Features (C-DEF) for whole brain segmentation, in the setting of limited training data. MATERIALS AND METHODS: C-DEF and U-Net models were evaluated after training on manually curated data from 5, 10, and 15 participants in 2 research cohorts: (1) people living with clinically diagnosed HIV infection and (2) relapsing-remitting multiple sclerosis, each acquired at separate institutions, and between 5 and 295 participants' data using a large, publicly available, and annotated data set of glioblastoma and lower grade glioma (brain tumor segmentation). Statistics was performed on the Dice similarity coefficient using repeated-measures analysis of variance and Dunnett-Hsu pairwise comparison. RESULTS: C-DEF produced better segmentation than U-Net in lesion (29.2%-38.9%) and cerebrospinal fluid (5.3%-11.9%) classes when trained with data from 15 or fewer participants. Unlike C-DEF, U-Net showed significant improvement when increasing the size of the training data (24%-30% higher than baseline). In the brain tumor segmentation data set, C-DEF produced equivalent or better segmentations than U-Net for enhancing tumor and peritumoral edema regions across all training data sizes explored. However, U-Net was more effective than C-DEF for segmentation of necrotic/non-enhancing tumor when trained on 10 or more participants, probably because of the inconsistent signal intensity of the tissue class. CONCLUSIONS: These results demonstrate that classical machine learning methods can produce more accurate brain segmentation than the far more complex deep learning methods when only small or moderate amounts of training data are available (n ≤ 15). The magnitude of this advantage varies by tissue and cohort, while U-Net may be preferable for deep gray matter and necrotic/non-enhancing tumor segmentation, particularly with larger training data sets (n ≥ 20). Given that segmentation models often need to be retrained for application to novel imaging protocols or pathology, the bottleneck associated with large-scale manual annotation could be avoided with classical machine learning algorithms, such as C-DEF.
Authors: Spyridon Bakas; Hamed Akbari; Aristeidis Sotiras; Michel Bilello; Martin Rozycki; Justin S Kirby; John B Freymann; Keyvan Farahani; Christos Davatzikos Journal: Sci Data Date: 2017-09-05 Impact factor: 6.444
Authors: Carmen Tur; Marcello Moccia; Frederik Barkhof; Jeremy Chataway; Jaume Sastre-Garriga; Alan J Thompson; Olga Ciccarelli Journal: Nat Rev Neurol Date: 2018-01-12 Impact factor: 42.937
Authors: Nancy L Sicotte; Rhonda R Voskuhl; Seth Bouvier; Rochelle Klutch; Mark S Cohen; John C Mazziotta Journal: Invest Radiol Date: 2003-07 Impact factor: 6.016
Authors: Chris H Polman; Stephen C Reingold; Brenda Banwell; Michel Clanet; Jeffrey A Cohen; Massimo Filippi; Kazuo Fujihara; Eva Havrdova; Michael Hutchinson; Ludwig Kappos; Fred D Lublin; Xavier Montalban; Paul O'Connor; Magnhild Sandberg-Wollheim; Alan J Thompson; Emmanuelle Waubant; Brian Weinshenker; Jerry S Wolinsky Journal: Ann Neurol Date: 2011-02 Impact factor: 10.422