Literature DB >> 30271370

ISLES 2016 and 2017-Benchmarking Ischemic Stroke Lesion Outcome Prediction Based on Multispectral MRI.

Stefan Winzeck¹, Arsany Hakim², Richard McKinley², José A A D S R Pinto³, Victor Alves³, Carlos Silva³, Maxim Pisov^4,5, Egor Krivov⁵, Mikhail Belyaev⁵, Miguel Monteiro⁶, Arlindo Oliveira⁶, Youngwon Choi⁷, Myunghee Cho Paik⁷, Yongchan Kwon⁷, Hanbyul Lee⁷, Beom Joon Kim⁸, Joong-Ho Won⁷, Mobarakol Islam⁹, Hongliang Ren⁹, David Robben¹⁰, Paul Suetens¹⁰, Enhao Gong¹¹, Yilin Niu¹², Junshen Xu¹¹, John M Pauly¹¹, Christian Lucas¹³, Mattias P Heinrich¹³, Luis C Rivera¹⁴, Laura S Castillo¹⁴, Laura A Daza¹⁴, Andrew L Beers¹⁵, Pablo Arbelaezs¹⁴, Oskar Maier¹³, Ken Chang¹⁵, James M Brown¹⁵, Jayashree Kalpathy-Cramer¹⁵, Greg Zaharchuk¹⁶, Roland Wiest², Mauricio Reyes¹⁷.

Abstract

Performance of models highly depend not only on the used algorithm but also the data set it was applied to. This makes the comparison of newly developed tools to previously published approaches difficult. Either researchers need to implement others' algorithms first, to establish an adequate benchmark on their data, or a direct comparison of new and old techniques is infeasible. The Ischemic Stroke Lesion Segmentation (ISLES) challenge, which has ran now consecutively for 3 years, aims to address this problem of comparability. ISLES 2016 and 2017 focused on lesion outcome prediction after ischemic stroke: By providing a uniformly pre-processed data set, researchers from all over the world could apply their algorithm directly. A total of nine teams participated in ISLES 2015, and 15 teams participated in ISLES 2016. Their performance was evaluated in a fair and transparent way to identify the state-of-the-art among all submissions. Top ranked teams almost always employed deep learning tools, which were predominately convolutional neural networks (CNNs). Despite the great efforts, lesion outcome prediction persists challenging. The annotated data set remains publicly available and new approaches can be compared directly via the online evaluation system, serving as a continuing benchmark (www.isles-challenge.org).

Entities: CellLine Chemical Disease Gene Species

Keywords: MRI; benchmarking; datasets; deep learning; machine learning; prediction models; stroke; stroke outcome

Year: 2018 PMID： 30271370 PMCID： PMC6146088 DOI： 10.3389/fneur.2018.00679

Source DB: PubMed Journal: Front Neurol ISSN： 1664-2295 Impact factor: 4.003

1. Introduction

Defining the location and extent of a stroke lesion is an essential step toward acute stroke assessment. Of special interest is the development of a lesion over time, as this could provide valuable information about tissue outcome after stroke onset. Modern magnetic resonance imaging (MRI) techniques, including diffusion and perfusion imaging, have shown great value to distinguish between acutely infarcted tissue (known as “core”) and hypo-perfused tissue (known as “penumbra”). However, available automated methods used to characterize core and penumbra regions from MRI information lack accuracy and cannot correctly capture the complexity of the image information. Hence, there is a great need for advanced data analysis techniques that identify these regions and predict tissue outcome in a more reproducible and accurate way. Eventually, such tools will be available to support clinicians in their decision-making process (e.g., deciding for or against thrombolytic therapy). In recent years machine learning methods for medical image computing have shown unprecedented levels of progress. The area of supervised machine learning (i.e., where computer models are trained based on existing pre-annotated datasets) and particular deep learning, has gained much popularity and has shown great potential for medical applications where image quantification and interpretation is important for the decision making process (1). Along with this, the benchmarking of machine learning techniques for medical image computing has become a central area of interest at the annual Medical Image Computing and Computer Assisted Intervention (MICCAI) conference, where algorithms are tested and evaluated using curated datasets and common evaluation metrics. The ISLES challenge was created as an effort to raise the interest of the medical image computing community to make progress on the challenging aspects of stroke lesion characterisation from MRI data. The work of Maier and colleagues summarizes the lessons learned from the ISLES 2015 edition (2), where image analysis at the subacute and acute stages provided insights as to how accurate machine learning approaches could characterize core and penumbra regions. In the following years the discussions happening among interdisciplinary teams at the ISLES challenge, allowed the community to advance toward the challenge of stroke lesion prediction from MRI data. This is of great interest in a clinical routine, as the responsible physician needs to decide quickly, whether the particular stroke patient could benefit from an interventional treatment (i.e., thrombectomy or thrombolysis). This decision is often draw on basis of lesion appearance, the time passed since stroke onset and the clinicians personal experience. Objective methods that reliably predict lesions and clinical outcome only from the acute scans would be a powerful tool to support and accelerate decision making during the critical phase.

1.1. Current methods

From the literature review presented by Maier et al. (2), summarizing the state of the art until 2016, the recent machine learning methods for stroke lesion segmentation and outcome prediction clearly show the transition from classic machine learning tools [e.g., (3, 4)] to approaches based on deep learning (5–10). Generally, the accuracy of those methods is tightly connected to the data set they have been applied to and prevent a direct comparison. For this reason, the development of a publicly available benchmarking, such as ISLES is crucial to facilitate the analysis of current machine learning technologies and leverage research lines to improve them. The ISLES challenge held in 2016 and 2017 have hosted a total of 24 teams participating in the lesion segmentation and outcome prediction sub-tasks. In this article, we present the main results and findings in benchmarking machine learning approaches presented at ISLES 2016 and 2017. The ISLES challenges feature 75 cases from two different centers, including perfusion and diffusion imaging (Raw Perfusion, CBF, CBV, TTP, Tmax, ADC, MTT) as well as clinical information (time-since-stroke, time-to-treatment, TICI and mRS scores). Through reference annotations produced by two clinical experts, and a set of quantitative metrics and qualitative expert evaluations, we analyse and describe common strategies and approaches leading to best algorithmic performance. We present the progress of these algorithms, and current challenges that these techniques need to overcome in order to integrate them into the time-critical clinical workflow of stroke patients.

1.2. Motivation for ISLES and challenge setup

Automated methods for lesion segmentation and prediction are part of an active research field. Since results are highly dependent on the size and quality of the used data, comparison of independent validation methods is challenging. In order to compare different automated methods, researchers typically have to reimplement algorithms presented in previous publications, which is known to be a difficult task due to the complexity of the algorithms, and lack of detailed description of their implementation. Although the community is changing and provides more frequently open source code is more frequently provided, benchmarking remains time consuming. For these reasons, computational challenges aim to provide a platform allowing a fair and on going validation of various methods tackling a predefined problem. The ISLES challenge follows this direction by providing a stroke imaging database and benchmarking platform that facilitates the comparisons of new algorithms for lesion segmentation and prediction. ISLES was launched for the first time in 2015 and was successfully continued in the subsequent 2 years. Researchers interested in this challenge could register online and download the imaging data via the SICAS Medical Image Repository (SMIR) platform (11). The training data was provided in a preprocessed format that allowed teams to apply their algorithms directly without need of pre-processing. Furthermore, this ensured that performance differences are mainly driven by the prediction models, rather than different preprocessing steps. Eventually, methods could be directly compared and ranked on a leaderboard to discover the most successful approach.

1.2.1. ISLES 2016

While the focus for ISLES 2015 lied on ischemic stroke lesion segmentation (2), ISLES 2016 aimed for the outcome prediction of lesions. Therefore, multispectral MRI data from acute phase of 35 stroke patients were provided together with lesion maps annotated on 3–9 month follow-up scans. After a period of several weeks, participating teams (See Table 1) were asked to apply their algorithm to 19 unseen test data. The lesion labels for the test data were generated by two raters independently, and merged via the STAPLE algorithms (12) to generate a fused ground-truth dataset. On basis of the performance on this test data set, methods were ranked to define a winner of the challenge. As a second task, teams were asked to predict the clinical mRS score, which denotes the degree of disability. Upon analysis of the results, we acknowledge that the latter task required more data for a reliable statistical analysis, which is why they are not presented in this paper. However, the reader is referred to the official website for ISLES 2016 for more details.

Table 1

Participants of ISLES 2016 (more details and main features of each method see Appendix ISLES16-A1 to ISLES16-A7).

CH-UBE	University of Bern, Switzerland
	Incorporating time to reperfusion into the FASTER (3) model of stroke tissue-at-risk
DE-UZL	Institute of Medical Informatics, Universität zu Lübeck, Germany
	Random forests for stroke lesion and clinical outcome prediction
HK-CUH	Deptartment of Computer Science and Engineering, The Chinese University of Hong Kong
	Residual Volumetric Network for Ischemic Stroke Lesion Segmentation
KR-SUC^*	Department of Statistics, Seoul National University, Korea
KR-SUK^*	Deep Convolutional Neural Network Approach for Brain Lesion Segmentation
KR-SUL^*
PK-PNS	Pakistan Institute of Nuclear Science and Technology, Islamabad, Pakistan
	Segmentation of Ischemic Stroke Lesion using Random Forests in Multi-modal MRI Images
UK-CVI	CVIP, Comp. at School of Science and Eng., University of Dundee, UK
	Combination of CNN and Hand-crafted feature for Ischemic Stroke Lesion Segmentation
US-SFT	University of Southern California, Fractal Analytics, TopicIQ
	A Deep-Learning Based Approach for Ischemic Stroke Lesion Outcome Prediction

These methods are variants of a single method.

Participants of ISLES 2016 (more details and main features of each method see Appendix ISLES16-A1 to ISLES16-A7). These methods are variants of a single method.

1.2.2. ISLES 2017

Similarly to ISLES 2016, in 2017 participants were asked to predict lesion outcome on MRI data. The data set of ISLES 2016 was expanded to a total of 43 patients for the training phase, and 32 cases for methods evaluation (see Table 3). For the additional 13 test cases, added in 2017, only one groundtruth was generated (in contrast to the other 19 cases from ISLES 2016, for which two annotations per cases exist). For ISLES 2017, participants were asked to submit an abstract, describing their approach, until August 2017. Mid August the test data was distributed and teams had the chance to apply their models and submit their final prediction 2 weeks later. Participating teams and their submitted abstract titles can be found in Table 2, along with main features of each method (detailed description of methodology in Appendix).

Table 3

Details of the ISLES 2016 & 2017 Data.

	2016	2017
Number of cases	35 training and 19 testing	43 training and 32 testing
Number of expert segmentations for training and testing sets	1 (training), 2 (testing)	1 (training), 1 (testing)
MRI sequences	ADC, rBF, rBV, MTT, TMAX, TTP, Raw PWI	ADC, rBF, rBV, MTT, TMAX, TTP, Raw PWI

Table 2

Participants of ISLES 2017 (more details and main characteristic of each method see Appendix ISLES17-A1 to ISLES17-A14).

AAMC	Athinoula A. Martinos Center, USA
	Ensembling 3D U-Nets For Ischemic Stroke Lesion Segmentation
HKU-1	Hong Kong University of Science and Technology, China
	Deep Adversarial Networks for Stroke Lesion Segmentation
HKU-2	Hong Kong University of Science and Technology, China
	Stochastic Dense Network for Brain Lesion Segmentation
INESC	INESC-ID, Portugal
	Fully Convolutional Neural Network for 3D Stroke Lesion Segmentation
KU	Korea University, Korea
	Gated Two-Stage Convolutional Neural Networks for Ischemic Stroke Lesion Segmentation
KUL	KU Leuven, Belgium
	Dual-scale Fully Convolutional Neural Network for Final Infarct Prediction
MIPT	Moscow Institute of Physics and Technology, Russia
	Neural Networks Stroke Lesion Segmentation
NEU	NEUROPHET Inc. Seoul, South Korea
	Combination of U-Net and Densely Connected Convolutional Networks
NUS	National University of Singapore, Singapore
	Fully Convolutional Network with Hypercolumn Features for Brain Lesion Segmentation
SNU-1^*	Seoul National University, Korea
SNU-2^*	Schemic Stroke Lesion Segmentation with Convolutional Neural Networks for Small Data
SU	Stanford University, USA
	Multi-scale Patch-wise 3D CNN for Ischemic Stroke Lesion Segmentation
UA	Universidad de los Andes, Colombia
	Volumetric Multimodality Neural Network For Ischemic Stroke Segmentation
UL	University of Luebeck, Germany
	2D Multi-Scale Res-Net for Stroke Segmentation
UM	Universito of Minho, Portugal
	Combining Clinical Information for Stroke Lesion Outcome Prediction using Deep Learning

These methods are variants of a single method.

Participants of ISLES 2017 (more details and main characteristic of each method see Appendix ISLES17-A1 to ISLES17-A14). These methods are variants of a single method. The access to the ISLES data remains open so that future research efforts can easily be compared against the existing benchmark.

1.3. Data and methods

1.3.1. Data acquisition and pre-processing

Subjects used for the database, were patients treated for acute ischemic stroke at the University Hospital of Bern or at the UMC Freiburg between 2005 and 2015. Diagnosis of ischemic stroke was performed by identification of lesions on DWI and PWI MR imaging. Digital subtraction angiography was employed to document proximal occlusion of the middle cerebral artery (M1 or M2 segment). Patient inclusion criteria considered: (I) Identification of ischemic stroke lesions on DWI and PWI imaging, (II) proximal occlusion of the middle cerebral artery (M1 or M2 segment) documented on digital subtraction angiography, (III) attempt for endovascular therapy was undertaken, either by intra-arterial thrombolysis (before 2010) or by mechanical thrombectomy (since 2010), (IV) no motion artifacts during pretreatment MR imaging, and (V) patients had a minimum age of 18 years at the time of stroke. Patients were excluded if they had undergone a purely diagnostic angiography and if stenosis or occlusion of the carotid artery were found. MR imaging was performed on a 1.5T (Siemens Magnetom Avanto), and on a 3T MRI system (Siemens Magnetom Trio). The stroke protocol encompassed whole brain DWI, (24 slices, thickness 5mm, repetition time 3,200ms, echo time 87ms, number of averages 2, matrix 256*256) yielding isotropic b0 and b1000 as well as apparent diffusion coefficient maps (ADC) that were calculated automatically. Additionally, a T2 image was acquired for each case, which was not released to participants but used later for the generation of the groundtruth lesion outcome delineations (section 1.3.2) For PWI, the standard dynamic-susceptibility contrast enhanced perfusion MRI (gradient-echo echo-planar imaging sequence, repetition time 1,410ms, echo time 30ms, field of view 230*230mm, voxel size: 1.8*1.8*5.0mm, slice thickness 5.0mm, 19 slices, 80 acquisitions) was acquired. PWI images were acquired during first pass of a standard bolus of 0.1mmol/kg gadobutrol (Gadovist, Bayer Schering Pharma, Berlin, Germany). Contrast medium was injected at a rate of 5ml/s followed by a 20ml bolus of saline at a rate of 5ml/s. Perfusion maps were obtained by block-circular singular value decomposition using the Olea-Sphere software v2.3(Olea Medical, La Ciotat). Raw PWI images were also released to participants in the form of a single 4D NifTI image, to allow teams interested in using a different parametric map reconstruction method. All PWI maps (rBF, rBV, TTP, Tmax, MTT) were rigidly registered to the ADC image and automatically skull-stripped (2) to extract the brain area only. We remark, this alignment step was performed to standardize the pre-processing step, hence, to factor out this pre-processing step from the evaluation of results. The cohort curated in 2016 was then extended into a larger dataset for the challenge in 2017. Table 3 summarizes the ISLES 2016 and 2017 dataset. Details of the ISLES 2016 & 2017 Data.

1.3.2. Groundtruth lesion outcome segmentation

The lesion outcome status was manually segmented by a board-certified neuroradiologist using 3D Slicer v4.5.0-1, and based on the 90-day follow-up T2 image. Regions of maximal extent of the final infarction, including haemorrhagic transformation but excluding hyper-intense areas on the acute T2 image (i.e., infarctions due to previous CVI), were delineated on every transversal slice. The 90-day follow-up lesion was chosen to be delineated, since it yields a more reliable final lesion volume than the apparent lesion volume that is observable on subacute images. Groundtruth images were converted into the NIfTI format for distribution to participants. For the 19 test cases of ISLES 2016, two lesion annotations were generated by individual raters, and subsequently merged via STAPLE algorithm (12).

1.3.3. Lesion characteristics

We performed a correlation analysis to assess a possible connection between clinical variables and the performance of the automated methods. The evaluation was conducted for ISLES 2017 submitted methods. Table 4 summarizes the collected information.

Table 4

Summary of lesion characteristics for ISLES 2017 Data.

Lesion count	mean [min, max] = 2.46[1, 14]
Lesion volume	mean [min, max] = 37.83ml[1.6ml, 160.4ml]
Lesion localisation in Lobes	for all 32 cases lesions were located in more than one lobe
Lesion localisation	n_subcortical=3, n_cortical=29
Involved territory	n_MCA=29, n_MCA+PCA = 1, n_multiple=1
Midline shift	not present for any of the 32 cases
Laterality	n_left=16, n_right=16
White matter lesions^*	n₀=9, n₁=10, n₂=8, n₃=5

n, number of cases with given feature; MCA, middle cerebral artery, PCA, posterior cerebral artery.

Fazekas Classification: 0, absent; 1, punctuate; 2, beginning confluent areas; 3, large confluence.

Summary of lesion characteristics for ISLES 2017 Data. n, number of cases with given feature; MCA, middle cerebral artery, PCA, posterior cerebral artery. Fazekas Classification: 0, absent; 1, punctuate; 2, beginning confluent areas; 3, large confluence.

1.3.4. Evaluation metrics

As quantitative evaluation metrics of the presented methods, we calculated the Dice score as a measure of overlap between manually outlined and automatically predicted lesions. To further shed light on the algorithm's effect we computed precision and sensitivity scores. With TP, true positives; FP, false positive and FN, false negative; the metrics were defined as followed: Alongside these, we measured the maximum surface distance between automatically defined volume and the manually delineated groundtruth volumes by means of the Hausdorff distance (HD). Denoting A and B as the surface voxels of groundtruth and segmentation volume, respectively, we calculated: As distance measure d(·, ·) we used the Euclidean distance. Additionally, the average symmetric surface distance (ASSD) was computed for ISLES 2016: with the average surface distance (ASD) defined as:

1.3.5. Ranking approach

In order to rank participant's submission for ISLES 2017, we focused on Dice score, as it combines both precision and sensitivity into one metric, and the HD metric. First, both measurements were computed for each patient data individually. Then, all participants were ranked for each metric separately on a case-wise basis such that a high Dice score and a low HD resulted in a high rank. The mean of both ranks yielded a case specific rank. A participant's total rank is obtained by averaging the ranks over all cases (see Figure 1). Ranks for ISLES 2016 were computed in the same way for both available groundtruths. Furthermore, ASSD was included alongside Dice and HD for ISLES 2016. In case where teams were not submitting all testing results, the Dice scores were completed with 0 and a large (i.e., 1e+5) value was set for HD. All unsuccessful segmentation (Dice = 0), were always ranked last. Segmentations with the exact same metrics received the same rank.

Figure 1

Ranking Scheme. The teams were sorted by their different performance metrics e.g., Dice score (DC) and assigned a rank value per case. Ranks for each team were then separately averaged on a case-wise basis. The final team's rank was then calculated as the mean of all its case-ranks.

1.3.6. Fusion and thresholding of softmax maps

Fusing the output of several classifiers has been shown to yield better results than the single classifiers. This concept is the foundation for ensemble learners, such as random forest (13), and has also been shown to be beneficial for tumor lesion segmentation (14, 15). In theory, each different model could provide valuable, complementary information to enhance the overall segmentation performance. All submitted methods for ISLES 2017 were deep neuronal networks. These include by design a final classification layer, which is commonly a softmax function that provides voxel-wise output values between [0, 1] (further referred to as softmax maps). This output can be interpreted as a probability of voxel belonging to its given class (in this case healthy or lesion tissue). To leverage potential benefit of several submitted models, we averaged the softmax maps of the top five and top three ranked methods for each individual case, followed by its thresholding at the 0.5 mark. Moreover, the softmax maps were thresholded at various levels and subsequently binarised in order to analyse the robustness of methods. Finally, the Dice score was computed between these binary images and the groundtruth.

1.3.7. Statistical analysis

To assess statistical differences between the submitted methods we applied a Friedman test, a non-parametric, one-way analysis of variances for repeated measurements, and post-hoc Dunn test for multiple comparison between teams. For all tests we used GraphPad PRISM Version 5.0.1. The levels of significant differences are marked in plots with asterisks (*p < 0.05, **p < 0.01, and ***p < 0.001).

2. Results

2.1. ISLES 2016

2.1.1. Inter-observer variance

The annotated volumes by rater 1 range from median [Q1, Q3] = 16.7 [6.1, 41.6] ml, and for rater 2 from median [Q1, Q3] = 9.0 [2.9, 36.8] ml, revealed the tendency of rater 1 having segmented more tissue as lesion than rater 2. In 18 out of 19 cases, rater 1 outlined larger lesion volumes, which holds true especially for rather small lesions. Comparing the overlap between manually outlined lesions of both raters yielded an average Dice score of 0.58 ± 0.20, with median [Q1, Q3] = 0.62 [0.39, 0.77]. The relative low coherence between the experts' annotations shows shows the difficulty of outlining the follow-up image.

2.1.2. Leaderboard and statistical analysis

Table 5 shows the ranking of the submitted methods. Only four (KR-SUC, CH-UBE, HK-CHU, PK-PNS) out of nine teams managed to get a successful lesion prediction (Dice > 0) for all 19 cases. The ranking reflects mostly the teams' Dice ranks, except for CH-UBE which was ranked on fourth place despite the second lowest average Dice score (not shown in table). This can be explained by the relative good HD (not shown in table) in comparison to the last ranked teams (see Table 5 places 7–9).

Table 5

Leaderboard ISLES 2016: The rank specifies the final value to order methods relative to each other by performance.

Place	Team	Rank	Dice rank	HD rank	ASSD rank	Cases
1	KR-SUL	3.03	3.37	2.79	2.92	18/19
2	KR-SUC	3.57	3.58	3.71	3.42	18/19
3	KR-SUK	3.82	3.74	4.13	3.61	19/19
4	CH-UBE	3.95	4.26	3.76	3.82	19/19
5	DE-UZL	4.21	4.21	3.82	4.61	19/19
6	UK-CVI	4.08	5.11	4.68	5.45	16/19
7	HK-CHU	5.59	5.08	5.53	6.16	19/19
8	PK-PNS	6.48	6.34	7.58	5.55	12/19
9	US-SFT	8.07	8.03	8.03	8.16	11/19

Dice, HD, and ASSD rank are the average achieved ranks for each participating team per case. The last column gives the number of successfully (Dice > 0) predicted lesions. Best mean values printed in bold.

Leaderboard ISLES 2016: The rank specifies the final value to order methods relative to each other by performance. Dice, HD, and ASSD rank are the average achieved ranks for each participating team per case. The last column gives the number of successfully (Dice > 0) predicted lesions. Best mean values printed in bold. Analysing the Dice scores across all methods showed that almost all methods are superior to that of US-SFT, which was ranked last. Only PK-PNS, which came second to last, was not found statistically different from US-SFT. The winning approaches (KR-SUC, KR-SUK, KR-SUL) achieved also significantly higher Dice scores than PK-PNS. All methods ranked in second cluster of groups(CH-UBE, DE-UZL, HK-CUH, UK-CVI) did not show statistically significant differences to one and another (see Figure 2).

Figure 2

Significant differences between the 9 submitted methods for ISLES 2016. Each node stands for one participating team. A connection between the nodes represents a significant difference between both lesion prediction models. Methods at the tail side of the arrow indicate superiority to the corresponding connected one. The stronger or weaker a model is the more outgoing or incoming connections (#outgoing/#incoming, respectively), are associated with a team's node. Additionally, the node's color saturation indicates the strength of a method (differences in Friedman test rank sum), with better methods appearing more saturated (i.e., darker blue). All methods, except for PK-PNS, are significantly better than US-SFT (post-hoc Dunn test p < 0.05).

Figure 3

Distribution of Dice scores computed between the automatic lesion predictions and both groundtruths (GT1 and GT2) individually for ISLES 2016. For all teams the Dice scores computed with respect to rater 1 were significantly lower than those calculated with respect to the 2nd groundtruth (GT2).

2.2. ISLES 2017

2.2.1. Leaderboard

Only one (SU) of the 15 teams was able to predict stroke lesions (Dice > 0) consistently for all 32 cases. Examining the average Dice and HD rank for each time separately, revealed that the second ranking team (UL) yielded a lower Dice rank than the following two teams (i.e., HKU-1 and INESC). However, UL achieved the best HD rank, which secured its second place (see Table 6).

Table 6

Leaderboard ISLES 2017: While the rank denotes the final value used to sort the teams performances relative to each other.

Place	Team	Rank	Dice rank	HD rank	Cases
1	SNU-2	5.25	4.53	5.97	30/32
2	UL	5.42	6.16	4.69	29/32
3	HKU-1	5.55	5.09	6.00	29/32
4	INESC	5.92	5.00	6.84	31/32
5	KUL	6.03	6.19	5.88	30/32
6	SNU-1	6.47	6.25	6.69	29/32
7	UM	6.58	6.31	6.84	31/32
8	MIPT	6.72	6.34	7.09	30/32
9	SU	7.20	7.09	7.31	32/32
10	KU	8.75	10.09	7.41	28/32
11	AAMC	9.05	8.63	9.47	27/32
12	UA	9.78	9.31	10.25	29/32
13	NUS	9.95	9.50	10.41	29/32
14	NEU	10.44	11.88	9.00	16/32
15	HKU-2	11.80	12.50	11.09	14/32

Dice and HD rank are the average achieved ranks for each participating team. The cases column denotes the number of successfully (DC > 0) predicted lesions. Best mean values printed in bold.

Leaderboard ISLES 2017: While the rank denotes the final value used to sort the teams performances relative to each other. Dice and HD rank are the average achieved ranks for each participating team. The cases column denotes the number of successfully (DC > 0) predicted lesions. Best mean values printed in bold.

2.2.2. Dice, precision and sensitivity

Table 7 summarizes the participating teams' performance, measured by Dice score, precision and sensitivity, highlighting the strengths of different models. Team KUL's model was the most precise while showing lower sensitivity. AAMC's model showed the highest sensitivity while lacking in precision. Although HKU-1 achieved the highest mean Dice score, it was ranked third seemingly due to a lower HD rank (compare Table 6). Even top ranking models reached a low average Dice score of around 0.3, underlining the substantial difficulty of lesion forecasting.

Table 7

Average Dice score, precision and sensitivity for individual teams across all 32 cases for ISLES 2017.

Place	Team	Dice	Precision	Sensitivity
1	SNU-2	0.31 ± 0.23	0.36 ± 0.27	0.45 ± 0.31
2	UL	0.29 ± 0.21	0.34 ± 0.26	0.51 ± 0.33
3	HKU-1	0.32 ± 0.23	0.34 ± 0.27	0.39 ± 0.28
4	INESC	0.30 ± 0.22	0.34 ± 0.27	0.51 ± 0.31
5	KUL	0.27 ± 0.22	0.44 ± 0.33	0.39 ± 0.31
6	SNU-1	0.28 ± 0.23	0.36 ± 0.31	0.41 ± 0.31
7	UM	0.29 ± 0.22	0.26 ± 0.24	0.61 ± 0.28
8	MIPT	0.27 ± 0.20	0.31 ± 0.28	0.39 ± 0.29
9	SU	0.26 ± 0.21	0.28 ± 0.25	0.56 ± 0.26
10	KU	0.17 ± 0.16	0.23 ± 0.28	0.36 ± 0.33
11	AAMC	0.23 ± 0.22	0.19 ± 0.20	0.62 ± 0.37
12	UA	0.19 ± 0.16	0.27 ± 0.25	0.21 ± 0.18
13	NUS	0.19 ± 0.16	0.29 ± 0.26	0.23 ± 0.22
14	NEU	0.11 ± 0.16	0.17 ± 0.25	0.12 ± 0.17
15	HKU-2	0.05 ± 0.10	0.17 ± 0.28	0.05 ± 0.11

All evaluation measures are given in mean ± standard deviation. Best mean values printed in bold.

Average Dice score, precision and sensitivity for individual teams across all 32 cases for ISLES 2017. All evaluation measures are given in mean ± standard deviation. Best mean values printed in bold. Analysing the Dice score per case disclosed a wide range of quality of lesion outcome prediction. While there are a few cases (28–32) where the average Dice score was above 0.5, the majority of cases turned out to be hard to predict. For 14 cases at least one team achieved a prediction that was overlapping with the groundtruth by 50% (Figure 5). For six cases (1–5, 9) none of the teams reached the overall mean Dice score (0.23).

Figure 5

Achieved Dice scores for each case across all 15 participating teams sorted by mean value. The dashed line shows the overall mean Dice score of 0.23 (red) and the 0.5 mark (black). Note that the case numbers were assigned according to ascending mean Dice score.

Performance metrics for all teams of ISLES 2017. Higher ranking teams (e.g., 1st place SNU-2) achieved Dice scores > 0.7 for some cases, however, overall Dice scores clustered around 0.2–0.3. The two teams ranked last (NEU and HKU-2) showed much lower Dice scores than all other teams, which was a consequence of the low number of successful submissions. The model of UM seemed to be most sensitive to detect lesions, but lacks in precision. Achieved Dice scores for each case across all 15 participating teams sorted by mean value. The dashed line shows the overall mean Dice score of 0.23 (red) and the 0.5 mark (black). Note that the case numbers were assigned according to ascending mean Dice score.

2.2.3. Statistical comparison of team performances

Figure 6 shows the comparison of the team's Dice scores on the test data set. Each method, represented as node, connects to other methods when a statistical differences in the Dice scores was found. Methods associated to nodes with more outgoing and less incoming connections can be considered stronger than other with less outgoing or more incoming connections. The nodes for stronger models were further grouped and indicated by a more saturated color. This visually highlights the winning team SNU-2 that showed overall higher Dice scores for the prediction lesions than the other six teams, while none of the other methods were significantly better. This is closely followed by HKU-1 and INESC having each five outgoing edges. The two worst methods (NEU, HKU-2) failed to predict the lesions for several subjects completely, resulting in poor performance inferior to most teams (9 and 10 respectively).

Figure 6

Significant differences between the 15 submitted methods at ISLES 2017. Each node stands for one participating team. A connection between two nodes represents a significant difference between both lesion prediction models, whereas the methods at the tail side was superior. The stronger or weaker a models the more outgoing or incoming connections (#outgoing/#incoming), are associated with a team's node. Additionally, the node color saturation indicates the strength of a method, with better methods appearing more saturated. Differences between methods were assessed via non-parametric ANOVA with repeated measurements (Friedman test) and subsequent, pair-wise comparison with Dunn test (p < 0.05).

2.2.4. Performance of single models vs. ensembles

As mentioned in section 1.3.6 we fused the softmax maps to create an ensemble of the top five (E5 = SNU-1, SNU-2, UL, INESC, KUL) and top three (E3 = SNU-2, UL, INESC) ranking teams and compared both ensembles to their individual models. All included models had no significantly different Dice score distributions in comparison to each other (see Figure 6). Figure 7 shows that Dice scores of both ensembles were very similarly distributed as the single models' Dice scores. Ensemble E3 did not result in an improved performance, although the median Dice score (0.28) was higher in comparison to ensemble E5 (0.25) and to the winning team SNU-2 (0.26). Likewise, its mean precision was higher (0.34), although not statistically significant, than most single models (SNU-1, SNU-2, UL, INESC). However, the mean sensitivity of E3 (0.51) could be raised over the one from SNU-1 (0.44).

Figure 7

Statistical comparison of lesion prediction performance of single models vs. ensembles. Left: An ensemble of five models (E5) could improve the Dice score in comparison to the two weaker models (SNU-1 p < 0.01, UL p < 0.05). This effect was, however, not observed when building an ensemble with three models (E3). Middle: The ensemble E5 significantly gained precision in contrast to most of the single models (SNU-1 p < 0.01, SNU-2 p < 0.05, UL p < 0.001, INESC p < 0.01). KUL's precision was higher or similar to that of the ensembles, showing no significant difference. Right: The ensemble E3 was found to be more sensitive to predict lesion than SNU-1's model. Overall the models show a fair ability to detect lesions. *p < 0.05, **p < 0.01, and ***p < 0.001. In contrast, ensemble E5 yielded a significantly better mean Dice score (0.31) than UL (0.28, p < 0.05) and SNU-1 (0.26, p < 0.01). Among the five teams, whose models were used to build the ensemble, SNU-1 was ranked the lowest, explaining why E5 performed significantly better that SNU-1 by itself. While the ensemble's sensitivity was not improved, combining all softmax maps together significantly increased the precision over four single models (p < 0.01, INES, SNU-1, SNU-2, UL). Figure 8 displays an example of the different participants' softmax maps as well as the fused softmax maps of both ensembles (E3 & E5). While softmax maps from INESC and SNU-2 showed similar certainty values through out the predicted lesion, the other three teams' softmax maps appeared to be more heterogeneous. In contrast to the smooth an blob-like structures predicted by SNU-1, SNU-2, INESC and KUL, UL's model provided a greater detail for boundaries. This is also cohesive with the findings, that UL has the highest HD rank (see Table 6) as this metric is considering closeness of boundaries. Dice scores of the lesion predictions for this particular patient could not be improved by ensembles (Dice = 0.76, Dice = 0.73) in comparison to the single teams (Dice = 0.76, Dice = 0.74, Dice = 0.60, Dice = 0.70, Dice = 0.69).

Figure 8

Example of different softmax maps of one patient. Top row: Diffusion (ADC) and perfusion (TTP) scan and the corresponding manual lesion annotation (LABEL) and the softmax maps of the ensembles of the top five (E5) and top three (E3) ranked teams. Bottom row: Softmax maps of the top five ranking teams. Both shape and certainty (see color bar) of the predicted lesion vary between the different participants.

2.2.5. Analysis of robustness of lesiron outcome prediction

We computed Dice scores between the manually outlined lesion groundtruth and differently thresholded and binarised softmax maps for the top five ranking teams. For four teams (SNU-2, UL, INESC & SNU-1) the Dice scores seemed to be fairly robust and centered around the initial threshold of 0.5. SNU-2's and INESC's prediction vary only in about 1 percentage point for different threshold values (see Appendix: Table A1). As an exception, KUL's softmax layer thresholded at a lower level of 0.3 resulted in a higher Dice score (0.28) compared to the the lower Dice (0.26) at a threshold level of 0.5. This effect is coherent with previous findings (see Table 7 and Figure 4) that KUL's produces highly precise predictions with relative low sensitivity. Thresholding at a lower level could assign more voxels to the lesion class, hence increased the model's sensitivity and effectively improve Dice scores.

Table A1

Dice score dependency of threshold for softmax maps.

	Thresholds
Team	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
INESC	0.28	0.29	0.29	0.29	0.29	0.29	0.29	0.29	0.29
KUL	0.22	0.26	0.28	0.27	0.26	0.23	0.20	0.15	0.02
SNU-1	0.20	0.23	0.25	0.26	0.26	0.27	0.23	0.20	0.16
SNU-2	0.29	0.29	0.30	0.30	0.30	0.30	0.30	0.30	0.30
UL	0.19	0.24	0.26	0.27	0.28	0.28	0.27	0.25	0.21

Figure 4

Dice score dependency of threshold for softmax maps. Overview of methods of participants of ISLES 2016. Overview of methods of participants of ISLES 2017.

2.2.6. Correlation of lesion volumes

When comparing predicted lesion volumes with the manually outlined lesion volumes for the top five ranked teams as mentioned in section 2.2.4, we found a significant correlation only for SNU-1 (Spearman coefficient r = 0.39) and for SNU-2 (Spearman coefficient r = 0.37). All other teams submission and the ensembles did not correlate with the human rater's annotations, with Spearman coefficients ranging from 0.28 (UL) to 0.35 (E5). As expected, the Dice scores of all models correlated significantly with the lesion volumes, such that the higher the volume the higher the Dice scores. Spearman coefficients were highest for UL (0.72), INESC (0.71) and E3 (0.70), and lowest for KUL (0.41) and SNU-1 (0.55). Mid-range Spearman coefficients were found for SNU-2 (0.59) and E5 (0.68).

3. Discussion

3.1. Current performance of stroke lesion outcome prediction methods

In ISLES 2016, results showed that deep learning models outperformed Random Classification Forests (RF). However, no conclusive superiority of deep learning was found against other machine learning approaches, as demonstrated by CNN-based approaches also ranking in the low tier ranks. Analysing precision and sensitivity revealed the tendency of models to yield over-estimated lesion segmentations. The large variability within the assessed metrics could be explained by the strong correlation between performance and lesion sizes. Discussions during the ISLES 2016 session led to the decision to enrich the existing ISLES dataset to further encourage participation of the computer science community. Especially, data driven approaches such as deep learning algorithms could truly benefit from larger data sets. Consequently, in ISLES 2017 the training and testing dataset were extended versions of the training and testing sets used in ISLES 2016. For both years, data were provided in minimally pre-processed format. This should should allow a more direct comparison of different stroke prediction models, without the influence of any specific pre-processing steps. Of course advanced processing could foster the tissue outcome prediction, however we argue that our focus for the challenges lies on the model development. Furthermore, the applied pre-processing steps were kept to a minimum and are commonly accepted techniques, such as co-registration. This did not prevent participants to further process the provided data. Although teams also had partly access to raw data (i.e., raw perfusion data), all of them preferred to work with the pre-processed data. All participating teams of ISLES 2017 suggested a deep learning approach, with top ranked methods featuring CNN architectures. Despite the increased size of the training data, the overall performance was surprisingly not much different than for ISLES 2016. Top ranked models were found to operate on a similar level, sharing similar architectures and system characteristics. Even ensembles of different CNNs were not strong enough to boost the performance further. These results suggest that CNNs' performance may have reached a plateau on this dataset. Future investigation need to strongly focus on improved training strategies for CNNs or on development of new methodologies to advance stroke lesion outcome prediction. Enhancing the performance especially for small sized lesions and incorporating non-imaging information could bear a strong potential for improvement. It has been shown that ensemble approaches or fusion of results can improve segmentation predictions (14, 15). Our findings suggest that the ensemble approaches had a tendency to perform better than single models. Despite the unimproved sensitivity of the ensembles, combining all softmax maps together significantly increased the precision over four single models. This suggests a reduction of false positive predictions. However the effect was not strong enough to result in statistical improvement over the highest ranked single method. It was also not entirely clear which model contributed to enhance or worsen the performance. In fact, the submissions for ISLES 2017 included single as well as ensembles of neural networks, but the ranking did not reflect an overall superiority of ensemble methods. Although the combination of several weak classifiers can cancel out individual model's limitations, it is nonetheless important to build an ensemble of strong methods to leverage benefits and justify increased computational costs of an ensemble based approaches. Examining each participating team's softmax maps was motivated to analyse their potential to describe their correctness and certainty to perform the task. As these models are intended to provide a prediction of stroke lesion outcome, we postulate that model calibration is an important aspect for future analysis of deep learning models used for stroke lesion prediction. Particularly, it will have to be investigated how model capacity, regularization and normalization can affect model's calibration, despite apparent increases in model's accuracy (16). Our findings support the use of different ranking metrics and align with the findings reported in Maier et al. (2). For example the team UL was ranked second in ISLES 2017 thanks to its top HD rank, despite being assigned a relative low Dice rank of 6.16, which would equate the fourth place on the leaderboard. Overall, the difficulty of the task is reflected by the low Dice scores, with top methods averaging a Dice rank of 0.3. The low Dice scores of the models can be explained by the inherent challenges of the prediction task. Contrary to stroke lesion segmentation, stroke lesion outcome models are trained to predict the lesion status at a 90-days follow-up image based on the acute imaging information. Inherently, many factors contribute to tissue recovery or infarction, which are not explicitly nor implicitly characterized in the imaging information acquired at time of the stroke infarct.

3.2. Limitations and remaining challenges

Looking at the evolution of ISLES over the past 3 years, a clear convergence of methodology is observable. While for ISLES 2015 and 2016, still classic machine learning models, such as RF were explored, all submissions of ISLES 2017 offered a variation of CNNs. With their undeniable benefits and success, deep learning methods have set new state-of-the-art benchmarks in many disciplines. Although at present time, this would be the sensible direction to develop further techniques for stroke lesion segmentation and outcome prediction, future challenges will need to encourage exploration of more diverse models. Particularly, we remark the importance of designing methodologies capable of incorporating clinical and physiological prior information on stroke infarction and recovery. The comparison of the automatic lesion outcome prediction with both expert annotations separately (ISLES 2016) showed a systematic bias toward a higher accordance with rater 2. While this emphasizes the importance of a common database to compare algorithms, it also unveils the general underlying dilemma of supervised learning methods and the intrinsic inter-rater variability observed in medical imaging applications. In best case, algorithms that learn solely from human annotation will only ever be as good as the best human rater and inevitable learn humans' fallacy. Overcoming this limitation calls for semi- and unsupervised learning techniques to teach the computer to detect abnormal brain tissue more accurately, as well as to consider inter-rater variability as source of information during the learning process (17). Nonetheless, a fair and consistent evaluation of such methods has yet to be established. Furthermore, our evaluation is challenged by the different levels of expertise in each team. Although there is a clear tendency that CNNs provide overall better results than RF, some CNNs were ranked lowest. This rather suggest potential deficiencies in the training scheme than a deficiency of this model class in general. Another challenge is the interpretability of the output of the applied models. Although models are desired to predict lesions with high precision and confidence level, there may lay valuable information in a models uncertainty for clinical decision making. Regarding lesion outcome prediction, uncertainty could give for example a better indicator of tissue at risk of infarction (e.g., naively thought: high certainty means high risk of becoming lesion tissue, while low certainty may reflect tissue likely to be healthy in future). For future challenges we recommend to ask teams to submit non-binary output maps (e.g., softmax maps) that support such analysis. Most methods work indeed best when incorporating multi-parametric information, however, the database will need to be explored, as in Pereira et al. (18) to gain knowledge on which MR sequences are important and to what extent.

4. Conclusion

Over the past years, the ISLES team was able to build an increasingly larger MRI database for ischemic stroke lesion MRI. With this publicly available dataset and a continuously open evaluation system, ISLES has the potential to serve as a standard benchmark framework, where researchers can test their algorithms against an existing pool of described and compared methods (14 ISLES 2015 methods for lesion segmentation, and 28 ISLES 2015 & 2016 and 2017 methods for lesion outcome prediction). Despite the great efforts and accomplishments present at ISLES, automatic segmentation of stroke lesions, and more so lesion outcome prediction remain challenging tasks. Deep learning approaches have great potential to leverage clinical routine for stroke lesion patients, but last years of progress at ISLES indicate that further developments are needed to support clinical decision making by incorporating imaging and readily-available non-imaging clinical information, collateral flow modeling, and further improve the interpretability of deep learning systems used for the clinical decision making process of stroke patients.

Ethics statement

All datasets were fully anonymised through skull-stripping and removal of all patient informations by means of conversion of dice files to nifty volumetric files following the regulations of the Swiss Law for Human Research. Further information added below for sake of completeness (In German). Anonymisierung: Unter anonymisiertem biologischem Material und anonymisierten gesundheitsbezogenen Daten ist die irreversible Aufhebung des Personenbezuges zu verstehen. Eine solche liegt dann vor, wenn Material bzw. Datenüberhaupt nicht oder nur mit einem un-verhältnismässig grossen Aufwand an Zeit, Kosten und Arbeitskraft der betreffenden Person zugeordnet werden können (vgl. Art. 3 Bst. i HFG und Art. 25 Abs. 1 HFV). Wann den Anforde-rungen an eine korrekte Anonymisierung Genüge getan ist, ist je nach Einzelfall zu entschei-den: Die Streichung nur des Namens kann bei einer sehr grossen Datenmenge (grosse Perso-nenpopulation) genügen, auch wenn andere Parameter (z.B. Geburtsjahr) verbleiben. Ist die betroffene Population jedoch sehr klein, so ist das Entfernen nur des Namens nicht ausreichend (vgl. Botschaft zum HFG, S. 8096). Insbesondere unkenntlich gemacht oder gelöscht wer-den müssen Namen, Adresse, Geburtsdatum und eindeutig kennzeichnende Identifikati-onsnummern (Art. 25 Abs. 2 HFV). Das im ursprünglichen Art. 14 HFG (vgl. Botschaft zum HFG, S. 8105) vorgesehene Ver-bot der Anonymisierung von biologischem Material bzw. Personendaten bei Forschungsprojek-ten mit Bezug zu schweren Krankheiten wurde auf Antrag der vorberatenden Kommission vom Nationalrat gestrichen (vgl. Amtliches Bulletin des Nationalrats, 09.079, Verhandlung vom 10.03.2011). Hintergrund war vermutungsweise das in Art. 32 Abs. 3 HFG festgelegte Informa-tions- und Widerspruchsrecht der Patienten bei Forschung mit anonymisiertem biologischen Material und genetischen Daten. Dadurch sind die Patienten nämlich ausreichend geschützt, ein zusästzliches Verbot schien vor diesem Hintergrund wohl obsolet. Mit Streichung des ur-sprünglichen Artikels 14 HFG ist die Forschung mit anonymisiertem biologischem Material also auch bei Forschungsprojekten mit Bezug zu schweren Krankheiten zulässig, sofern die betroffenen Personen vorgängig korrekt informiert und auf ihr Widerspruchsrecht hingewiesen wurden.

Author contributions

SW, AH, and MR collected data, performed analysis, wrote manuscript. RM, RW, JP, VA, CS, MP, MB, EK, MM, AO, YC, YK, MP, BK, J-HW, MI, HR, DR, PS, YN, EG, JX, JMP, GZ, EK, CL, MH, LC, PA, AB, KC, JB, JK-C, LR, LD and OM performed analysis, wrote manuscript.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Table A2

Overview of methods of participants of ISLES 2016.

CH-UBE	Random Forest classifier integrating time to reperfusion
DE-UZL	Random Forests classifier
HK-CUH	U-Net architecture; summation instead of concatenation of different pathways
KR-SUC	Ensemble of U-Net architecture and fully convolutional neural network
KR-SUK
KR-SUL
PK-PNS	Random Forest classifier
UK-CVI	Combination of CNN and hand-crafted features
US-SFT	U-Net architecture

Table A3

Overview of methods of participants of ISLES 2017.

AAMC	3D CNN U-Net architecture; increased number of layers and convolutional filter, multiple down-sampling path ways; anisotropic patch size of 16 × 16 × 4; prediction of 16 overlapping patches per voxels, that are averaged. Morphological operations to reduce small clusters of erroneous predictions
HKU-1	U-Net architecture, including data augmentation and batch normalization, adversarial training of two deep neural networks to avoid over-fitting
HKU-2	3D CNN U-Net architecture; long short-term memory (LSTM) to capture information in 3rd dimension of MRI scans; data augmentation
INESC	V-Net architecture; new loss-function: sum of standard cross-entropy loss and dice-loss
KU	Hierarchy of 2 CNNs. 1st CNN discriminates lesion and healthy tissue, 2nd CNN only acts up on voxels where the 1st CNN was uncertain; auto-context (use of probability maps from 1st CNN)
KUL	U-net architecture; data augmentation via x-axis flip, Gaussian noise and small linear intensity transformations; ensemble of 4 networks; suppression of prediction in non-dominant hemisphere
MIPT	Ensemble of E-Net, DeepMedic, and two U-Nets; 2D and 3D architectures; weighted sum of models' predictions; data augmentation: rotation, flips, registration, and elastic co-registration to template
NEU	Combination 3D U-Net and densely connected CNN; refinement with CRF
NUS	PixelNet applied to lesion outcome prediction
SNU-1/SNU-2	Ensemble of three CNNs: U-Net, DeepMedic, pyramid scene parsing network; negative Dice score loss
SU	3D CNN with 2 scale pathways; data augmentation through rigid transformations, weighted ratios on positive and negative labels
UA	CNN with 4 scale pathways
UL	2D U-Net with skip connections; Dice loss is added up to total loss; inversely weighted loss to tackle class imbalance
UM	2D U-Net in combination with clinical information

14 in total

Review 1. A survey on deep learning in medical image analysis.

Authors: Geert Litjens; Thijs Kooi; Babak Ehteshami Bejnordi; Arnaud Arindra Adiyoso Setio; Francesco Ciompi; Mohsen Ghafoorian; Jeroen A W M van der Laak; Bram van Ginneken; Clara I Sánchez
Journal: Med Image Anal Date: 2017-07-26 Impact factor: 8.545

2. Enhancing interpretability of automatically extracted machine learning features: application to a RBM-Random Forest system on brain lesion segmentation.

Authors: Sérgio Pereira; Raphael Meier; Richard McKinley; Roland Wiest; Victor Alves; Carlos A Silva; Mauricio Reyes
Journal: Med Image Anal Date: 2017-12-20 Impact factor: 8.545

3. Fully automated stroke tissue estimation using random forest classifiers (FASTER).

Authors: Richard McKinley; Levin Häni; Jan Gralla; M El-Koussy; S Bauer; M Arnold; U Fischer; S Jung; Kaspar Mattmann; Mauricio Reyes; Roland Wiest
Journal: J Cereb Blood Flow Metab Date: 2016-01-01 Impact factor: 6.200

4. Extra tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences.

Authors: Oskar Maier; Matthias Wilms; Janina von der Gablentz; Ulrike M Krämer; Thomas F Münte; Heinz Handels
Journal: J Neurosci Methods Date: 2014-11-21 Impact factor: 2.390

5. Brain tumor segmentation with Deep Neural Networks.

Authors: Mohammad Havaei; Axel Davy; David Warde-Farley; Antoine Biard; Aaron Courville; Yoshua Bengio; Chris Pal; Pierre-Marc Jodoin; Hugo Larochelle
Journal: Med Image Anal Date: 2016-05-19 Impact factor: 8.545

6. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation.

Authors: Konstantinos Kamnitsas; Christian Ledig; Virginia F J Newcombe; Joanna P Simpson; Andrew D Kane; David K Menon; Daniel Rueckert; Ben Glocker
Journal: Med Image Anal Date: 2016-10-29 Impact factor: 8.545

7. ISLES 2015 - A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI.

Authors: Bjoern H Menze; Heinz Handels; Mauricio Reyes; Oskar Maier; Janina von der Gablentz; Levin Ḧani; Mattias P Heinrich; Matthias Liebrand; Stefan Winzeck; Abdul Basit; Paul Bentley; Liang Chen; Daan Christiaens; Francis Dutil; Karl Egger; Chaolu Feng; Ben Glocker; Michael Götz; Tom Haeck; Hanna-Leena Halme; Mohammad Havaei; Khan M Iftekharuddin; Pierre-Marc Jodoin; Konstantinos Kamnitsas; Elias Kellner; Antti Korvenoja; Hugo Larochelle; Christian Ledig; Jia-Hong Lee; Frederik Maes; Qaiser Mahmood; Klaus H Maier-Hein; Richard McKinley; John Muschelli; Chris Pal; Linmin Pei; Janaki Raman Rangarajan; Syed M S Reza; David Robben; Daniel Rueckert; Eero Salli; Paul Suetens; Ching-Wei Wang; Matthias Wilms; Jan S Kirschke; Ulrike M Kr Amer; Thomas F Münte; Peter Schramm; Roland Wiest
Journal: Med Image Anal Date: 2016-07-21 Impact factor: 8.545

8. The Insight ToolKit image registration framework.

Authors: Brian B Avants; Nicholas J Tustison; Michael Stauffer; Gang Song; Baohua Wu; James C Gee
Journal: Front Neuroinform Date: 2014-04-28 Impact factor: 4.081

9. White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks.

Authors: R Guerrero; C Qin; O Oktay; C Bowles; L Chen; R Joules; R Wolz; M C Valdés-Hernández; D A Dickie; J Wardlaw; D Rueckert
Journal: Neuroimage Clin Date: 2017-12-20 Impact factor: 4.881

10. The virtual skeleton database: an open access repository for biomedical research and collaboration.

Authors: Michael Kistler; Serena Bonaretti; Marcel Pfahrer; Roman Niklaus; Philippe Büchler
Journal: J Med Internet Res Date: 2013-11-12 Impact factor: 5.428

24 in total

1. Benchmark on Automatic 6-month-old Infant Brain Segmentation Algorithms: The iSeg-2017 Challenge.

Authors: Li Wang; Dong Nie; Guannan Li; Elodie Puybareau; Jose Dolz; Qian Zhang; Fan Wang; Jing Xia; Zhengwang Wu; Jiawei Chen; Kim-Han Thung; Toan Duc Bui; Jitae Shin; Guodong Zeng; Guoyan Zheng; Vladimir S Fonov; Andrew Doyle; Yongchao Xu; Pim Moeskops; Josien P W Pluim; Christian Desrosiers; Ismail Ben Ayed; Gerard Sanroma; Oualid M Benkarim; Adria Casamitjana; Veronica Vilaplana; Weili Lin; Gang Li; Dinggang Shen
Journal: IEEE Trans Med Imaging Date: 2019-02-27 Impact factor: 10.048

2. A Machine Learning Approach to Predict Acute Ischemic Stroke Thrombectomy Reperfusion using Discriminative MR Image Features.

Authors: Haoyue Zhang; Jennifer Polson; Kambiz Nael; Noriko Salamon; Bryan Yoo; William Speier; Corey Arnold
Journal: IEEE EMBS Int Conf Biomed Health Inform Date: 2021-08-10

3. A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms.

Authors: Sook-Lei Liew; Bethany P Lo; Miranda R Donnelly; Artemis Zavaliangos-Petropulu; Jessica N Jeong; Giuseppe Barisano; Alexandre Hutton; Julia P Simon; Julia M Juliano; Anisha Suri; Zhizhuo Wang; Aisha Abdullah; Jun Kim; Tyler Ard; Nerisa Banaj; Michael R Borich; Lara A Boyd; Amy Brodtmann; Cathrin M Buetefisch; Lei Cao; Jessica M Cassidy; Valentina Ciullo; Adriana B Conforto; Steven C Cramer; Rosalia Dacosta-Aguayo; Ezequiel de la Rosa; Martin Domin; Adrienne N Dula; Wuwei Feng; Alexandre R Franco; Fatemeh Geranmayeh; Alexandre Gramfort; Chris M Gregory; Colleen A Hanlon; Brenton G Hordacre; Steven A Kautz; Mohamed Salah Khlif; Hosung Kim; Jan S Kirschke; Jingchun Liu; Martin Lotze; Bradley J MacIntosh; Maria Mataró; Feroze B Mohamed; Jan E Nordvik; Gilsoon Park; Amy Pienta; Fabrizio Piras; Shane M Redman; Kate P Revill; Mauricio Reyes; Andrew D Robertson; Na Jin Seo; Surjo R Soekadar; Gianfranco Spalletta; Alison Sweet; Maria Telenczuk; Gregory Thielman; Lars T Westlye; Carolee J Winstein; George F Wittenberg; Kristin A Wong; Chunshui Yu
Journal: Sci Data Date: 2022-06-16 Impact factor: 8.501

4. DeepNeuro: an open-source deep learning toolbox for neuroimaging.

Authors: Andrew Beers; James Brown; Ken Chang; Katharina Hoebel; Jay Patel; K Ina Ly; Sara M Tolaney; Priscilla Brastianos; Bruce Rosen; Elizabeth R Gerstner; Jayashree Kalpathy-Cramer
Journal: Neuroinformatics Date: 2021-01

5. Prediction of Clinical Outcome in Patients with Large-Vessel Acute Ischemic Stroke: Performance of Machine Learning versus SPAN-100.

Authors: B Jiang; G Zhu; Y Xie; J J Heit; H Chen; Y Li; V Ding; A Eskandari; P Michel; G Zaharchuk; M Wintermark
Journal: AJNR Am J Neuroradiol Date: 2021-01-07 Impact factor: 3.825

6. Prediction of Stroke Infarct Growth Rates by Baseline Perfusion Imaging.

Authors: Anke Wouters; David Robben; Soren Christensen; Henk A Marquering; Yvo B W E M Roos; Robert J van Oostenbrugge; Wim H van Zwam; Diederik W J Dippel; Charles B L M Majoie; Wouter J Schonewille; Aad van der Lugt; Maarten Lansberg; Gregory W Albers; Paul Suetens; Robin Lemmens
Journal: Stroke Date: 2021-09-30 Impact factor: 7.914

7. Learning to Predict Ischemic Stroke Growth on Acute CT Perfusion Data by Interpolating Low-Dimensional Shape Representations.

Authors: Christian Lucas; André Kemmling; Nassim Bouteldja; Linda F Aulmann; Amir Madany Mamlouk; Mattias P Heinrich
Journal: Front Neurol Date: 2018-11-26 Impact factor: 4.003

Review 8. Artificial Intelligence and Acute Stroke Imaging.

Authors: J E Soun; D S Chow; M Nagamine; R S Takhtawala; C G Filippi; W Yu; P D Chang
Journal: AJNR Am J Neuroradiol Date: 2020-11-26 Impact factor: 3.825

9. Predicting Infarct Core From Computed Tomography Perfusion in Acute Ischemia With Machine Learning: Lessons From the ISLES Challenge.

Authors: Arsany Hakim; Søren Christensen; Mauricio Reyes; Greg Zaharchuk; Stefan Winzeck; Maarten G Lansberg; Mark W Parsons; Christian Lucas; David Robben; Roland Wiest
Journal: Stroke Date: 2021-05-07 Impact factor: 7.914

10. Tissue outcome prediction in hyperacute ischemic stroke: Comparison of machine learning models.

Authors: Joseph Benzakoun; Sylvain Charron; Guillaume Turc; Wagih Ben Hassen; Laurence Legrand; Grégoire Boulouis; Olivier Naggara; Jean-Claude Baron; Bertrand Thirion; Catherine Oppenheim
Journal: J Cereb Blood Flow Metab Date: 2021-06-23 Impact factor: 6.960