| Literature DB >> 31403237 |
Fabian Isensee1,2, Marianne Schell3, Irada Pflueger4, Gianluca Brugnara3, David Bonekamp4, Ulf Neuberger3, Antje Wick5, Heinz-Peter Schlemmer4, Sabine Heiland3, Wolfgang Wick5,6, Martin Bendszus3, Klaus H Maier-Hein1, Philipp Kickingereder3.
Abstract
Brain extraction is a critical preprocessing step in the analysis of neuroimaging studies conducted with magnetic resonance imaging (MRI) and influences the accuracy of downstream analyses. The majority of brain extraction algorithms are, however, optimized for processing healthy brains and thus frequently fail in the presence of pathologically altered brain or when applied to heterogeneous MRI datasets. Here we introduce a new, rigorously validated algorithm (termed HD-BET) relying on artificial neural networks that aim to overcome these limitations. We demonstrate that HD-BET outperforms six popular, publicly available brain extraction algorithms in several large-scale neuroimaging datasets, including one from a prospective multicentric trial in neuro-oncology, yielding state-of-the-art performance with median improvements of +1.16 to +2.50 points for the Dice coefficient and -0.66 to -2.51 mm for the Hausdorff distance. Importantly, the HD-BET algorithm, which shows robust performance in the presence of pathology or treatment-induced tissue alterations, is applicable to a broad range of MRI sequence types and is not influenced by variations in MRI hardware and acquisition parameters encountered in both research and clinical practice. For broader accessibility, the HD-BET prediction algorithm is made freely available (www.neuroAI-HD.org) and may become an essential component for robust, automated, high-throughput processing of MRI neuroimaging data.Entities:
Keywords: artificial neural networks; brain extraction; deep learning; magnetic resonance imaging; neuroimaging; skull stripping
Mesh:
Year: 2019 PMID: 31403237 PMCID: PMC6865732 DOI: 10.1002/hbm.24750
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.038
Characteristics of the datasets analyzed within the present study
| EORTC‐26101 | LPBA40 | NFBS | CC‐359 | ||
|---|---|---|---|---|---|
| Training set | Test set | ||||
| Patients ( | 372 | 211 | 40 | 125 | 359 |
| MRI exams ( | 1,568 | 833 | 40 | 125 | 359 |
| MRI exams per patient (median, IQR) | 4 (3–6) | 4 (3–6) | 1 | 1 | 1 |
| Institutes ( | 25 | 12 | 1 | 1 | 2 |
| Patients per institute (median, IQR) | 7 (4–15) | 11 (3–20) | 1 | 1 | 60/299 |
| MRI sequence ( | |||||
| T1‐w | 1,568 | 833 | 40 | 125 | 359 |
| cT1‐w | 1,623 | 898 | – | – | – |
| FLAIR | 1,940 | 895 | – | – | – |
| T2‐w | 1,455 | 793 | – | – | – |
| MR vendors ( | |||||
| Siemens | 535 | 395 | – | 125 | 120 |
| Philips | 350 | 157 | – | – | 119 |
| General electric | 640 | 267 | 40 | – | 120 |
| Toshiba | 12 | – | – | – | – |
| Unknown | 31 | 14 | – | – | – |
| MR field strength ( | |||||
| 1.0 T | – | 9 | –– | – | – |
| 1.5 T | 631 | 78 | 40 | – | 179 |
| 3.0 T | 216 | 317 | – | 125 | 180 |
| 1.5 or 3 T | 619 | 415 | – | – | – |
| Unknown | 104 | 14 | – | – | – |
Abbreviations: IQR, interquartile range; LPBA40, LONI Probabilistic Brain Atlas; MRI, magnetic resonance imaging; NFBS, Nathan Kline Institute Enhanced Rockland Sample Neurofeedback Study; CC‐359, Calgary‐Campinas‐359.
higher number of MRI sequences (as compared to total number of MRI exams) due to inclusion of both 2D and 3D acquisition (if available).
Figure 1Dice coefficient and Hausdorff distance (95th percentile) obtained from the individual sequences T1‐w, cT1‐w, FLAIR, and T2‐w with the HD‐BET algorithm and for MONSTR in the EORTC‐26101 test set using violin charts (and superimposed box plots). Obtained median Dice coefficients were > 0.95 for all sequences. The performance of brain extraction on cT1‐w, FLAIR, or T2‐w in terms of Dice coefficient (higher values indicate better performance) and Hausdorff distance (lower values indicate better performance) closely replicated the performance seen on T1‐w (left column zoomed to the relevant range of Dice values ≥0.9 and Hausdorff distance [HD95] ≤15 mm; right column depicting the full range of the data) [Color figure can be viewed at http://wileyonlinelibrary.com]
Descriptive statistics on brain extraction performance (median and interquartile range (IQR) for Dice coefficient and Hausdorff distance) in the EORTC test set for the different MRI sequences (T1‐w, cT1‐w, FLAIR, T2‐w)
| MRI sequence type | DICE coefficient | Hausdorff distance (95th percentile) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HD‐BET | MONSTR | Statistics | HD‐BET | MONSTR | Statistics | |||||||
| median | IQR | median | IQR | abs(Z) |
| median | IRQ | median | IRQ | abs(Z) |
| |
| T1‐w | 97.6 | (97.0–98.0) | 95.4 | (94.0–96.1) | 30.62 | <.001 | 3.3 | (2.2–3.3) | 4.43 | (3.71–5.79) | 26.72 | <.001 |
| cT1‐w | 96.9 | (96.1–97.4) | 94.6 | (93.2–95.6) | 26.48 | <.001 | 3.9 | (2.8–4.1) | 5.48 | (4.36–6.96) | 26.92 | <.001 |
| FLAIR | 96.4 | (95.2–97.0) | 92.4 | (91.0–93.7) | 32.16 | <.001 | 5.0 | (3.4–5.0) | 8.15 | (6.00–11.0) | 31.30 | <.001 |
| T2‐w | 96.1 | (95.2–96.7) | 93.1 | (92.0–94.0) | 30.64 | <.001 | 5.0 | (3.9–5.0) | 8.0 | (5.78–10.0) | 29.47 | <.001 |
Figure 2Comparison of Dice coefficients between the HD‐BET brain extraction algorithm and the six public brain extraction methods for each of the test datasets using violin charts (and superimposed box plots) [higher values indicate better performance]. Obtained median Dice coefficients were highest for the HD‐BET algorithm across all datasets (see left column visualizing the relevant range of Dice values ≥0.9). Note the spread of the Dice coefficients, which is consistently lower for the HD‐BET algorithm (right column visualizing the whole range of Dice values) [Color figure can be viewed at http://wileyonlinelibrary.com]
Figure 3Comparison of Hausdorff distance (95th percentile) between the HD‐BET algorithm and the six public brain extraction methods for each of the test datasets using violin charts (and superimposed box plots; lower values indicate better performance). The median Hausdorff distance was lowest for the HD‐BET algorithm across all datasets (see left column visualizing the relevant range of Hausdorff distance ≤15 mm). Note the spread of the Hausdorff distance, which is consistently lower for the HD‐BET algorithm (right column visualizing the whole range of values) [Color figure can be viewed at http://wileyonlinelibrary.com]
Wilcoxon matched‐pairs signed‐rank tests comparing the performance (Dice coefficient, Hausdorff distance) of the HD‐BET algorithm with six competing brain extraction algorithms. For every test, we reported the absolute value of the Z‐statistics [abs(Z)], the Bonferroni‐adjusted p‐value and the effect size [r] (with r values >.1 corresponding to a small effect, .3 to a medium effect, and .5 to a large effect size; Cohen, 1988)
| Dataset | Variable | BET | 3DSkullStrip | BSE | Robex | BEaST | MONSTR | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| abs(Z) |
|
| abs(Z) |
|
| abs(Z) |
|
| abs(Z) |
|
| abs(Z) |
|
| abs(Z) |
|
| ||
| EORTC‐26101 test set | Dice | 24.31 | <.001 | .60 | 29.39 | <.001 | .72 | 27.69 | <.001 | .68 | 26.96 | <.001 | .48 | 3.89 | <.001 | .78 | 30.62 | <.001 | .75 |
| Hausdorff | 27.14 | <.001 | .66 | 27.88 | <.001 | .68 | 29.18 | <.001 | .72 | 25.69 | <.001 | .46 | 28.16 | <.001 | .71 | 26.72 | <.001 | .66 | |
| LPBA40 | Dice | 3.95 | <.001 | .44 | 7.7 | <.001 | .86 | 7.7 | <.001 | .86 | 7.26 | <.001 | .81 | 7.33 | <.001 | .82 | 7.12 | <.001 | .81 |
| Hausdorff | 2.03 | .221 | – | 7.7 | <.001 | .86 | 7.7 | <.001 | .86 | 3.94 | <.001 | .44 | 3.69 | .001 | .41 | 4.73 | <.001 | .54 | |
| NFBS | Dice | 13.67 | <.001 | .86 | 13.67 | <.001 | .86 | 12.5 | <.001 | .79 | 13.65 | <.001 | .86 | 11.22 | <.001 | .71 | 2.87 | 1 | – |
| Hausdorff | 13.68 | <.001 | .87 | 13.68 | <.001 | .87 | 11.08 | <.001 | .70 | 13.63 | <.001 | .86 | 12.79 | <.001 | .81 | 9.53 | <.001 | .61 | |
| CC‐359 | Dice | 22.72 | <.001 | .85 | 23.02 | <.001 | .86 | 21.69 | <.001 | .81 | 17.82 | <.001 | .67 | 21.05 | <.001 | .79 | 23.17 | <.001 | .87 |
| Hausdorff | 22.97 | <.001 | .86 | 23.05 | <.001 | .86 | 21.57 | <.001 | .80 | 21.77 | <.001 | .81 | 22.64 | <.001 | .84 | 23.20 | <.001 | .87 | |
Using the 95th percentile of the Hausdorff distance (mm).
Abbreviations: LPBA40, LONI Probabilistic Brain Atlas; NFBS, Nathan Kline Institute Enhanced Rockland Sample Neurofeedback Study; CC‐359, Calgary‐Campinas‐359.
Improvement of the performance for brain extraction with the HD‐BET algorithm on T1‐w sequences. The difference for each of the competing algorithms (as compared to HD‐BET) was calculated on a case‐by‐case basis and summarized for all algorithms for each dataset by calculating the median and IQR. Positive values for the change in Dice coefficient (i.e., higher values with HD‐BET), and negative values for the change in the Hausdorff distance (i.e., lower values with HD‐BET) indicate better performance
| Dice coefficient | Hausdorff distance | |||
|---|---|---|---|---|
| Median | IQR | Median | IQR | |
| EORTC‐26101 test set | +2.50 | +1.47, + 4.26 | −2.46 | −4.82, −1.41 |
| LPBA40 | +1.16 | +0.62, +4.30 | −0.66 | −4.28, −0.14 |
| NFBS | +1,67 | +0.67, +3.85 | −1.91 | −3.39, −0.92 |
| CC‐359 | +2.11 | +1.02, +3.88 | −2.51 | −3.86, −1.43 |
Using the 95th percentile of the Hausdorff distance (mm).
Abbreviations: IQR, interquartile range; LPBA40, LONI Probabilistic Brain Atlas; MRI, magnetic resonance imaging; NFBS, Nathan Kline Institute Enhanced Rockland Sample Neurofeedback Study; CC‐359, Calgary‐Campinas‐359.
Figure 4Representative cases showing the performance for T1‐w images of the different brain extraction algorithms at the 5th percentile and the median Dice coefficients in the EORTC‐26101 test set. Depicted in red the calculated brain masks from different brain extraction methods, in blue the ground‐truth brain masks (for illustrative purposes only) and in pink their intersection. While BET, BEaST, and MONSTR tend to underestimate the brain mask in these cases by removing brain tissue from the mask, 3DSkullStrip, BSE, and ROBEX tend to overestimate by including nonbrain tissue (e.g., skull, fat, nasal, and orbital cavity) in the mask [Color figure can be viewed at http://wileyonlinelibrary.com]
Figure 5Representative cases showing the performances of HD‐BET and MONSTR for cT1‐w, FLAIR, and T2‐w images at 5th percentiles and medians of the Dice coefficients in the EORTC test set. Depicted in red the calculated brain masks (HD BET or MONSTR), in blue the ground‐truth brain masks (for illustrative purposes only) and in pink their intersection. Similar to T1‐w images MONSTR tends to underestimate in the brain mask in these cases by removing brain tissue from the masks and additionally for the 5th percentile in cT1‐w and T2‐w images tends to overestimate by including nonbrain tissue around the nasal cavities [Color figure can be viewed at http://wileyonlinelibrary.com]