Literature DB >> 29527502

Systematic comparison of different techniques to measure hippocampal subfield volumes in ADNI2.

Susanne G Mueller¹, Paul A Yushkevich², Sandhitsu Das², Lei Wang³, Koen Van Leemput⁴, Juan Eugenio Iglesias⁵, Kate Alpert³, Adam Mezher⁶, Peter Ng⁶, Katrina Paz⁶, Michael W Weiner⁷.

Abstract

Objective: Subfield-specific measurements provide superior information in the early stages of neurodegenerative diseases compared to global hippocampal measurements. The overall goal was to systematically compare the performance of five representative manual and automated T1 and T2 based subfield labeling techniques in a sub-set of the ADNI2 population.
Methods: The high resolution T2 weighted hippocampal images (T2-HighRes) and the corresponding T1 images from 106 ADNI2 subjects (41 controls, 57 MCI, 8 AD) were processed as follows. A. T1-based: 1. Freesurfer + Large-Diffeomorphic-Metric-Mapping in combination with shape analysis. 2. FreeSurfer 5.1 subfields using in-vivo atlas. B. T2-HighRes: 1. Model-based subfield segmentation using ex-vivo atlas (FreeSurfer 6.0). 2. T2-based automated multi-atlas segmentation combined with similarity-weighted voting (ASHS). 3. Manual subfield parcellation. Multiple regression analyses were used to calculate effect sizes (ES) for group, amyloid positivity in controls, and associations with cognitive/memory performance for each approach.
Results: Subfield volumetry was better than whole hippocampal volumetry for the detection of the mild atrophy differences between controls and MCI (ES: 0.27 vs 0.11). T2-HighRes approaches outperformed T1 approaches for the detection of early stage atrophy (ES: 0.27 vs.0.10), amyloid positivity (ES: 0.11 vs 0.04), and cognitive associations (ES: 0.22 vs 0.19). Conclusions: T2-HighRes subfield approaches outperformed whole hippocampus and T1 subfield approaches. None of the different T2-HghRes methods tested had a clear advantage over the other methods. Each has strengths and weaknesses that need to be taken into account when deciding which one to use to get the best results from subfield volumetry.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2017 PMID： 29527502 PMCID： PMC5842756 DOI： 10.1016/j.nicl.2017.12.036

Source DB: PubMed Journal: Neuroimage Clin ISSN： 2213-1582 Impact factor: 4.881

Introduction

Hippocampal neuronal and/or glial dysfunction or neuronal loss severe enough to cause hippocampal volume loss in quantitative magnetic resonance imaging (MRI) is a common feature of many brain disorders, e.g., Alzheimer's disease, multiple sclerosis, epilepsy, schizophrenia, post-traumatic stress syndrome, Parkinson's disease or traumatic brain injury (Miller and O'Callaghan, 2005, West et al., 1994, Papadopoulos et al., 2009, Bluemcke et al., 1999, Harrison, 2004, McEwen and Magarinos, 1997, Swartz et al., 2006, Pereira et al., 2013). Even non-brain disorders, e.g., hypertension, hypothyroidism or diabetes (Cooke et al., 2014, Petrovitch et al., 2000, Korf et al., 2006) can be associated with hippocampal abnormalities. The hippocampus is not a homogeneous structure but consists of several histologically and functionally specialized but nonetheless tightly interconnected subfields: the subiculum (SUB) which is subdivided into the prosubiculum, subiculum proper, pre- and parasubiculum, the cornu ammonis sectors (CA) 1–3 and the dentate gyrus (DG) (Duvernoy et al., 2013). Animal and histopathological studies have shown that different pathological conditions affect subfields differently, e.g., Alzheimer's disease and hypoxia damage CA1, schizophrenia CA2, traumatic brain injury and post-traumatic stress syndrome CA3 and temporal lobe epilepsy damages the dentate gyrus (e.g., West et al., 1994, Fukutani et al., 2000, Bluemcke et al., 1999, Lucassen et al., 2006, Baldwin et al., 1997, Bluemcke et al., 2007). These observations suggest that subfield specific information might allow for a better differentiation of different disease processes and consequently for an earlier diagnosis than total hippocampal volume (Small, 2014). The advent of high field (3 Tesla and higher) MRI platforms and the possibility to acquire high resolution images of the hippocampal formation depicting details of its internal structure within a few minutes resulted in the development of a growing number of manual and computational subfield labeling approaches. These techniques were applied to a variety of diseases in small single-site studies and usually found subfield volumetry to be superior to standard whole hippocampal volumetry for the detection of hippocampal damage in the early stages of the disease process, for differentiating between diseases or for the investigation of structure/function relationships (e.g., Wang et al., 2006, Ballmaier et al., 2008, Mueller et al., 2008, Schobel et al., 2009, Neylan et al., 2010, Bender et al., 2013, Kerchner et al., 2014, Schoene-Bake et al., 2014, Chao et al., 2014, Hsu et al., 2015, De Flores et al., 2015, Pluta et al., 2012, Yushkevich et al., 2015b). The promise of hippocampal subfield volumetry techniques however led to two unexpected developments that could potentially limit the usefulness of this approach. One is the lack of standardized boundary definitions for each subfield that complicates the comparison of results from different laboratories. This problem has been recognized by the research community and is now being addressed by an international work group that will develop a harmonized subfield labeling protocol (Yushkevich et al., 2015a, Wisse et al., 2017) using a similar approach as for the harmonization of the outer hippocampus boundaries (Boccardi et al., 2011). The other development is the variety of available labeling techniques (e.g., Pipitone et al., 2014, Kim et al., 2014, Zeineh et al., 2001, Thompson et al., 2004, Chatelat et al., 2008, Goubran et al., 2014). Each of them has its strengths and weaknesses which can complicate the comparison of findings across laboratories but also confuse researchers new to this field who must decide which of the available technique fits their needs best. The goal of this study was not to compare all existing subfield labeling approaches/protocols but to concentrate on the performance of five commonly used approaches (4 automated, 1 manual labeling) in a common data set. These techniques were selected because each represents a different approach for subfield labeling and because all automated approaches except one are publicly available and used by the research community. The intention was not to determine which of these four approaches captures the disease specific pathology best because this is not possible without a ground truth, i.e. histopathological confirmation, but rather to compare where they find the most prominent differences between groups and how large the differences are. The ideal common data set for such a comparison consists of a large population of well-characterized subjects who show the whole range of normal age-related to severely disease related hippocampal atrophy. This data set became available when the steering committee of the Alzheimer's Disease Neuroimaging Initiative (ADNI) added a high resolution hippocampal sequence to the ADNI MR protocol in a subset of the ADNI sites (cf. Methods). Based on the image type used for the parcellation two main subfield volumetry approaches can be distinguished. 1. Approaches using a whole brain T1 weighted image typically at a resolution of 1 mm isotropic or close to that resolution, e.g., shape analysis, radial distance technique, voxel-based morphometry or deformation-based morphometry (Csernansky et al., 1998, Chatelat et al., 2008, Thompson et al., 2004, Yushkevich et al., 2010a). Of the currently available T1-based approaches, the Bayesian inference labeling as implemented in Freesurfer 5.1. (Van Leemput et al., 2009) and shape analysis based on Large Deformation Diffeomorphic Metric Mapping (Khan et al., 2008) were selected for this project. The former because the algorithm is publicly available and is frequently used, and the second because it was one of the earliest approaches for subfield volumetry that has been continuously refined and optimized for 3 T images. 2. Approaches using a T2 weighted hippocampal image. These images are characterized by a submillimeter in plane resolution but thick slices to increase S/N and therefore depict more details of the internal structure than a T1 image. This has been exploited by manual labeling approaches and increasingly also by automated approaches. Of the currently available approaches ASHS (Yushkevich et al., 2015b) and the latest version of Bayesian inference labeling implemented in Freesurfer 6.0 (Iglesias et al., 2015) were chosen for this project because both algorithms are publicly available and are used by the larger research community. The manual labeling protocol developed by Mueller et al. (2007) was chosen as manual reference because its limited labeling scheme made it possible to label the over 100 images required for this project within a reasonable amount of time. The selected approaches use different labeling protocols which prevents a direct comparison of the subfield volumes. Instead of focusing on the subset of subfields known to show atrophy and other histopathological features of AD from autopsy studies, e.g., subiculum, CA1, it was therefore decided in a first step to calculate the effect sizes for each and every subfield provided by the approach and compare the performance of the subfields with the largest effect size for each approach regardless of biological plausibility. This provides the type of information that is needed to determine which one of the approaches evaluated here is the most cost efficient in a clinical trial that uses subfield volumetry as an outcome measure. In a second step, it was investigated how well the findings reflected the pattern of subfield atrophy described in histopathological studies of AD. The intention was to compare the performance of these approaches over the whole range of Alzheimer's disease (AD) related hippocampal atrophy and thus the following comparisons were chosen: Ability to detect a group effect for subfield and mesio-temporal volumes in a population consisting of cognitively normal older controls, non-demented subjects with different degrees of mild cognitive impairment and subjects diagnosed with AD (group). Ability to detect subfield and mesio-temporal volumes losses in non-demented subjects with different degrees of mild cognitive impairment (MCI) compared to cognitively normal elderly controls (MCI). Ability to detect an effect of amyloid positivity on subfield and mesio-temporal volumes in cognitively normal subjects (Abeta) which is the stage when treatment with a disease modifying drug would be most efficient. Ability to detect a significant association between general cognitive performance/memory performance and subfield and mesio-temporal volumes in cognitively normal subjects (ADAScog and RVLT).

Methods

Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to investigate which combination of the measures obtained by serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessments allows for the most efficient monitoring of the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). The initial ADNI was followed by ADNI GO, ADNI 2 and now ADNI 3. To date these 3 projects have recruited > 1500 55–90-year-old participants who were either cognitively normal or diagnosed with early (EMCI), late (LMCI) mild cognitive impairment or with early AD. The ADNI2 hippocampal subfield project was not part of the original ADNI2 project and was made possible by a grant of the Alzheimer's Association. It consisted of three phases. Phase 1: 2011–2012 harmonization and modification (reduction of acquisition time to 8 min) of existing high resolution T2 hippocampus 3 T sequences. Phase 2: January 2012–December 2012: Testing the performance of existing subfield algorithms on ADNI2 imaging data and modification to optimize performance, e.g., development of strategies to allow for partial coverage, testing of different atlases, quality control routines etc. Phase 3. Evaluation phase: January 2013–December 2015. Identification of final study population and data processing

MRI acquisition

The ADNI2 MRI protocol has been optimized to provide comparable images from different 3 T platforms from the three major vendors Siemens, Philips and General Electric. The high resolution hippocampus sequence was added to the existing ADNI2 imaging protocol of 20 sites with Siemens magnets in December 2012 and acquired at the end of the official protocol. The following images were used for this project: 1. T1 weighted MPrage TR/TE/TI 2300/2.95/900 ms, sagittal, 1.1 × 1.1 × 1.2 mm resolution. 2. T2 weighted turbo spin echo TR/TE 8020/50 ms, inplane resolution 0.4 × 0.4 mm, slice thickness 2 mm, 28 or 32 slices, coronal, oriented perpendicularly to the long axis of the hippocampus covering the hippocampal head and body but not always the tail in all cases (please see adni.loni.usc.edu/methods/documents/mri-protocols/ Siemens sequences for details).

Image quality control

684 high resolution exams of 393 different subjects had been acquired between December 2012 and November 2015. The image quality was systematically assessed by trained raters for: hippocampus coverage (full/head and body or incomplete), correctness of FOV angulation regarding the hippocampal axes (meeting specification or not meeting specification), contrast/noise of internal structure of hippocampus (good, moderate, bad), severity of motion artifacts (good, moderate, severe) and other (e.g., artifacts from vessels, pathology). 543 high resolution images from 349 different subjects passed quality control. 84 exams were rejected due to severe motion artifacts which were typically associated with bad contrast/noise. 57 exams were excluded due to positioning errors resulting in incomplete coverage of hippocampal head and body or incorrect angulation. Please see Supplementary material for list of images selected for this project.

Study population

The images from 106 different subjects were selected for further processing. The reason to focus only on a subset of all available exams was the limited processing capacity of the manual subfield labeling arm that had only one qualified rater. Subjects were selected based on the availability of clinical diagnosis, results of the florbetapir amyloid PET (cut off for amyloid positivity: 1.11) and neuropsychological exam at the timepoint of the selection. Since the intent was to compare the techniques regarding their performance in the early stages of the disease, MCI and healthy controls were preferentially selected. Please see Table 1 for demographic information etc. The cognitive functioning of all subjects had been assessed with the standardized ADNI test battery. The Alzheimer's Disease Assessment Scale – cognition (ADAScog) and the immediate recall of the Rey Auditory verbal learning test were chosen from that battery. The former is a measure of global cognitive performance and was expected to show associations with subfields differentiating between cognitively intact and mildly impaired subjects. The latter was chosen as a hippocampus specific measure for encoding/learning and was expected to show an association with CA3 and/or dentate gyrus volume (Mueller et al., 2011). (See Fig. 1.)

Table 1

Study population demographics.

Group	Normal	MCI	AD
No	41	57	8
Female/male	22/19	22/35	4/4
Age	75.1 (7.6)	71.1 (7.6)	75.6 (9.0)
Apo E4 pos	10	22	4
SUVR	1.12 (0.21)	1.21 (0.23)	1.41 (0.13)
CDR SB	0.11 (0.38)	1.80 (1.54)	4.56 (1.43)
MMSE	28.8 (1.5)	27.4 (2.8)	21.0 (3.8)
ADAScog	8.4 (4.6)	15.2 (10.0)	31.8 (9.0)
RVLT immediate	47.1 (10.8)	39.7 (13.8)	20.6 (6.1)
RVLT learning	5.8 (2.4)	4.6 (3.1)	2.1 (1.5)
RVLT forgetting	3.8 (3.3)	4.6 (3.1)	4.0 (1.3)

Apo E4 pos, at least one Apo E4 allele; SUVR standardized uptake.

Value ratio relative to cerebellar gray CDR SB, clinical dementia rating scale sum of boxes.

ADAS cog, Alzheimer's Disease Assessment Scale – cognition.

MMSE, Mini-Mental-State-Exam; RVLT, Rey verbal learning test.

Fig. 1

FS Sub, Freesurfer 5.1 subfield parcellation labels; yellow, presubiclum; green, subiculum; red, CA1; blue, CA2/3; bright blue, hippocampal fissure; lilac, fimbria; shape analysis, green, hippocampal boundaries. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Parcellation schemes. Manual, manual parcellation labels; green, entorhinal cortex; blue, subiculum; red, CA1; yellow, CA1–2 transition zone; maroon, CA3&dentate. FS 6.0, labels of Freesurfer 6.0 subfield parcellation; yellow, parasubiculum; purple, presubiculum; blue, subiculum; red, CA1, green, CA2/3; tan, CA4; brown, molecular layer; light blue, GC-DG; light green, HATA; lilac, fimbria, ASHS parcellation. FS Sub, Freesurfer 5.1 subfield parcellation labels; yellow, presubiclum; green, subiculum; red, CA1; blue, CA2/3; bright blue, hippocampal fissure; lilac, fimbria; shape analysis, green, hippocampal boundaries. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Study population demographics. Apo E4 pos, at least one Apo E4 allele; SUVR standardized uptake. Value ratio relative to cerebellar gray CDR SB, clinical dementia rating scale sum of boxes. ADAS cog, Alzheimer's Disease Assessment Scale – cognition. MMSE, Mini-Mental-State-Exam; RVLT, Rey verbal learning test.

Image processing

High resolution T2 sequences

Manual labeling

Input: Resampled high resolution T2 hippocampal image to obtain images where the hippocampal cross-section is perpendicular to the long axis of the left and the right hippocampus. Output: Left and right label masks in space of resampled image and text file with volumes covered by labels. Parcellation scheme and atlas: Rview (www.colin-studholme.net/software/software.html) is used for display and labeling. Labels for entorhinal cortex, subiculum, CA1, CA2 and dentate gyrus & CA3 are generated. Validation: NA. Customization: Label definition and region to be labeled can be customized. Quality control: Consistency check, i.e., each label is checked after first pass labeling for labeling accuracy and consistency after processing the complete study population and manually edited if necessary before transferring the volumes into the project database. Computing power: Standard desktop computer with 2.80GHz CPU and 6.00 GB RAM. The resampling step (preparation and computation of left and right hippocampus) takes ca. 1.5 h/subject, manual labeling ca. 60–90 min/subject for an experienced rater. Brief method summary: The labeling method including assessment of measurement reliability has been described in detail previously (Mueller et al., 2007, Mueller et al., 2010). In brief, the marking scheme depends on anatomical landmarks, particularly on a hypointense line representing myelinated fibers in the stratum moleculare/lacunosum which can be reliably visualized on these high resolution images. Together with external landmarks, e.g., fimbria, collateral sulcus etc. this line is used to identify and manually label the subiculum, the cornu ammonis sectors CA1-CA3 and the dentate gyrus on a length of about 1 cm in the anterior third of hippocampal body. For the sake of efficiency, the labeling section is restricted to the anterior section of the hippocampal body (ca. 60–90 min/subject for an experienced rater). Limitations: Specially trained rater required, time consuming. Method requires the acquisition of suitable high resolution T2 images.

Automated segmentation of hippocampal subfields (ASHS)

Input: T1-MRI and high resolution T2 hippocampal image. Output: Left and right multi-label images and text file with volumes of each subregion. Parcellation scheme and atlas: Labels for hippocampal subfields CA1–3 and CA4/DG, subiculum, and extrahippocampal cortical regions parahippocampal cortex, entorhinal cortex, BA35 and BA36 together constituting perirhinal cortex, are generated. The atlas provided with the public distribution of the software consists of 28 subjects in whom the subfields were labeled manually by an experienced rater. The atlas population consists of an almost equal proportion of older healthy control and amnestic mild cognitive impairment (aMCI) patients. The method has recently been adapted to 7 T, the latest software release includes a 7 T atlas (Wisse et al., 2016). Validation: This method has been validated using k-fold cross-validation against manual segmentation in an in-vivo dataset of older healthy controls and aMCI patients (Yushkevich et al., 2015b). Customization: The method can be adapted regarding number and definition of labels and also regarding atlas images, i.e., it is possible to generate a customized library. Training software is provided as part of the package to utilize such a customized library. A set of 20 atlas images is sufficient. Quality control: Label images can be displayed overlaid on T2 hippocampal images with ITK snap software (www.itksnap.org) and edited if necessary. Manual editing was not performed on the data presented in this paper. Computing power: The method is based on a multi-atlas technique that requires multiple pairwise image registrations, which are ideally performed in parallel in a computing cluster. The software provides interface to Sun Grid Engine scheduler to facilitate this. The method takes 2–3 h to complete on a modern Linux-based cluster. Brief method summary: This software, named (ASHS) implements a previously published multi-atlas segmentation technique (Yushkevich et al., 2010b). Briefly, the target MRI is registered with a bank of T2-weighted atlas MRIs that includes manually labeled subregions. These manually derived labels were transferred to the target image to produce a set of candidate segmentations. A consensus segmentation is obtained using joint label fusion (Wang et al., 2013a), that takes into account similarity of each atlas image to the target image. Systematic segmentation errors are corrected using a learning-based biased correction technique (Wang et al., 2013b) to generate the final segmentation. Finally, a bootstrapping phase repeats joint label fusion and corrective learning with atlas-to-target registrations seeded by the segmentation result from the previous phase. Limitations: The atlas provided with the publicly distributed version of the software (https://www.nitrc.org/projects/ashs/) is composed of older healthy controls and mild cognitive impairment patients. This atlas is suitable for studies in an Alzheimer's disease population or older healthy subjects, but must be used with caution in a dataset of non-neurodegenerative pathologies and younger subjects, for which the results might be non-optimal, particularly if volumetric analysis is desired. The best results should be achieved with a customized population specific atlas generated from images generated on the same magnet (cf. customization).

FreeSurfer 6.0 (FS 6.0)

Input: T1-MRI and/or high resolution T2 hippocampal image. Output: Assuming that both a T1 and a high resolution T2 scan are provided, the algorithm produces a high resolution T2 image in the space of input T1 scan and all other FreeSurfer volumes, a left and right multi-label image with the discrete segmentation of the substructures of each hippocampus, at 0.333 mm resolution, in the physical space of the FreeSurfer T1 data and two text files with the estimated volumes of the hippocampal substructures and of the whole hippocampus. These volumes are computed from probabilistic (rather than binary, winner-take-all) segmentations, so simply counting the number of voxels that have been labeled as a specific structure in the discrete segmentation will not yield the same result as the volume reported in this file. Parcellation scheme and atlas: The hippocampal substructures that are segmented by the software are hippocampal tail; parasubiculum; presubiculum; subiculum; CA1; CA2 + CA3; CA4; hippocampus–amygdala transition area (HATA); granule cell layer of dentate gyrus (GC-DG); molecular layer; fimbria; and hippocampal fissure (not included for computing the whole hippocampal volume). These structures were manually labeled in 15 ex vivo scans, and a probabilistic atlas of hippocampal anatomy was built from these delineations, in combination with manual segmentations of 39 1 mm T1 scans of the whole brain at the structure level (i.e., whole hippocampus, whole amygdala, etc.). These 39 scans are important to learn the defining image features of the anatomy of the brain structures surrounding the hippocampus (Iglesias et al., 2015). Validation: Indirect validation by assessing the ability of the estimated subfield volumes to separate different groups (Iglesias et al., 2015). Customization: Not possible. Quality assessment: Labels can be displayed in FreeSurfer's Freeview to assess label accuracy. Editing is theoretically possible but requires advanced knowledge of hippocampal anatomy and was not done for this project. Computing power: The algorithm takes approximately 30 min to complete, using a single core. Brief method summary: The method is based on a probabilistic atlas of hippocampal anatomy derived from manual segmentations made on 15 ex vivo scans (including four subjects diagnosed with AD) scanned at (on average) 0.13 mm isotropic resolution. Using a Bayesian algorithm, these manual delineations were then combined with manual segmentations from 39 1 mm T1 scans (including 10 of subjects diagnosed with AD) that had been labeled at the whole structure level, to create a single probabilistic atlas. The atlas is represented as a tetrahedral mesh, in which each node has a corresponding vector of probabilities for the different structures encoded in the atlas. The mesh is adaptive, such that more convoluted regions of the atlas are represented by finer tetrahedra. Once the atlas has been built, segmentation of the hippocampal substructures is posed as a Bayesian inference problem, such that the probability of the segmentation given the input scan and the atlas is maximized. The intensities of the voxels in the scan to analyze are assumed to be samples of a Gaussian mixture model conditioned on the hidden segmentation. The parameters of this model are learned directly from the test scan to segment, which makes the method robust to changes in MRI contrast, and also enables it to immediate generalize to multispectral MRI data (even when some of the channels do not completely cover the hippocampal formation), as described in (Iglesias et al., 2015). Longitudinal labeling has been developed for T1 weighted images (Iglesias et al., 2016) but not been tested for high resolution T2 images with thick slices. Limitations: The atlas was built using elderly subjects. Its applicability to studies of younger populations or other diseases has not been tested. Finally, a drawback of the ability of this method to adapt to the MRI contrast is that it is not possible to further optimize its performance by training it in a subset of the imaging data to be segmented.

T1 weighted methods

Shape analysis

Input: T1-MRI and hippocampal label produced by the Freesurfer recon-all stream. Output: The primary outputs are binary whole-hippocampus segmentations, corresponding hippocampal 2D surfaces and surface-based subfield delineations. Secondary outputs are multivariate shape indices and summary shape deformation index (univariate). Parcellation scheme and atlas: The atlas segmentation is a manually traced subset of 74 ADNI1 subjects (not part of investigated cohort). Validation: The three hippocampal surface zones in the right hemisphere of ten randomly selected subjects were manually outlined and compared with the surface zones as mapped from the template. The intra-class correlation coefficients of the areas of the three surface zones were between 0.90 and 0.97 (Wang et al. 2006). Customizations: None. Quality assessment: The method provides a quick-check snapshot generation step for easy quality assurance. Computing power: Multi-atlas FS-LDDMM requires multiple pairwise image registrations, which are ideally performed in parallel in a computing cluster. On a modern Linux-based cluster, the method takes 2–4 h to complete. A web portal (http://ceramicca.ensc.sfu.ca/) is provided for collaborators. Brief method summary: Hippocampal surfaces were generated from T1-weighted images of all subjects using multi-atlas FS-LDDMM (Christensen et al., 2015, Khan et al., 2008, Khan et al., 2013, Wang et al., 2009). ADNI2 EMCI and control subjects that had not been selected for this analysis were used to define EMCI-related surface signature labels. This was done by first computing a population average hippocampal surface and then the vertex-wise deformation for each subject from this average. Vertex-wise generalized linear model (GLM) was used to compare the hippocampal deformation between ADNI2 EMCI and controls. Significant clusters of vertices were identified using Random Field Theory (Taylor and Worsley, 2007, Worsley et al., 1999). The collection of these vertices in each subfield represent “EMCI signature labels.” Next, the hippocampal surfaces of the ADNI2 subjects selected for this project were calculated. The subfield deformation can be represented in a multivariate or a univariate approach. The former uses a PCA to determine the principal components (PC) that account for 80% of the shape variance to represent the surface shape. Each subject's surface is expressed in terms of a linear combination of the PCs with the weights being a multivariate representation of the shape (Csernansky et al., 2004). For this analysis, the univariate approach was used, i.e., the vertex-wise surface deformation corresponding to each EMCI signature label was extracted and the mean calculated. This provided a measure of subfield specific deformation for each subject that is comparable with the other approaches.

FreeSurfer subfields version 5.1 (FS sub)

Input: T1 image. Output: label image and text file with subfield volumes. Parcellation scheme atlas: Identifies labels for fimbria, CA2–3, CA1, CA4 and dentate, presubiculum, subiculum, hippocampal fissure, hippocampus (tail) inferior lateral ventricle and choroid plexus based on information from an probabilistic atlas generated from the manual segmentations of the right hippocampus in 10 (6 young, 4 older) cognitively intact subjects (Van Leemput et al., 2009). Validation: Leave-one-out comparison with manual delineations in ultra-high resolution images. Customization: Not possible. Quality control: Each label map was assessed for accuracy, i.e., verified that all subfield labels were within the hippocampus, non-subfield labels, e.g., ventricle label, were not bleeding in the hippocampus and that the hippocampus was completely labeled. No label editing was done, substandard labels were rejected, only labels from images that were rated as pass were used in analysis. Computing power: 5 h per subject on a 2.83 Hz Intel Xeon E5440 processor. Brief summary: Automated segmentation of hippocampal subfields using Bayesian inference and a segmentation prior that is defined in the form of a tetrahedral mesh-based probabilistic atlas in which each mesh vertex has an associated vector of probabilities for different hippocampal subfields and surrounding structures of the medial temporal lobe. The subfield routine implemented in FreeSurfer (version 5.1, https://surfer.nmr.mgh.harvard.edu) was used for this project. Limitations: Simplified non-anatomical parcellation scheme (Wisse et al., 2014).

FreeSurfer hippocampus version 5.1 (FS Hippo)

Input: T1 weighted image. Output: label mask (aseg_aparc) and text file. Parcellation and atlas: Subcortical structures including the hippocampi of 39 healthy subjects were manually traced on T1 weighted images. These scans and their segmentations were then used to build a joint probabilistic atlas of labels and intensities. Customization: Not possible. Quality control: Each aseg_aparc label map including hippocampal labels was assessed for accuracy. Only hippocampal labels from images that were rated as full pass (all assessed labels met accuracy criteria) were used in this analysis. Computing power: 5 h per subject on a 2.83 Hz Intel Xeon E5440 processor. Brief summary: The FreeSurfer processing stream consists of the co-registration of the subject T1 image to an atlas followed by for bias correction, intensity normalization and brain extraction/removal of non-brain tissue. The subcortical structures are labeled by co-registering the subject brain to a probabilistic labeled template in Talairach space. FreeSurfer version 5.1, (https://surfer.nmr.mgh.harvard.edu) was used for this project. Limitations: Only global hippocampal labeling, no subfields.

Statistics

All volumes were corrected for intracranial volume (ICV calculated by FreeSurfer) using the following formula: corrected volume = volume × 1000/ICV. Multiple linear regression analyses with subfield volume as dependent and diagnostic Group, MCI status, SUVR (Abeta) or cognitive measures (ADAScog, RVLT immediate recall) as independent variables of interest and age (all comparisons) and education (cognitive associations only) as independent nuisance variables were performed. The impact of the nuisance variable gender was investigated but found to be non-significant after correcting for ICV and hence not included in the final statistical model. The R2 from these regression analyses were used to calculate standardized effect size and power (significance level alpha = 0.05) for the number of the subjects in the actual analysis and not for a fixed number of subjects. Effect sizes are reported rather than the more common p-values because the purpose of this paper was not to report that these subfield approaches are able find significant differences between groups etc. but rather to provide information about how large these differences are (Sullivan and Feinn, 2012). Statistical analyses were done in JMP12 (SAS Institute Inc.) and GPower 3.1 (http://www.gpower.hhu.de). SINGLEBAYES.EXE (Crawford and Garthwaite, 2007) was used to identify effect sizes that were significantly higher (p < 0.05, one tailed) than others in each comparison. Considering that all three diagnostic comparisons, i.e., comparisons 1–3, were designed to detect hippocampal atrophy caused by AD, it was expected that the order of the subfields when ranked from largest to smallest effect would be similar for each of these comparisons, e.g., if CA1 had the largest effect size for MCI, it was expected that it would also be the case for Abeta. The consistency of the effect size ranking for the different subfields was investigated using Pearson's correlation coefficients for each side separately, i.e., the effect sizes for comparison 1 were correlated with those from 2 and 3 and those from comparison 2 were correlated with comparison 3 and the mean of the resulting correlation coefficients calculated.

Results

Group comparisons and amyloid effects

Please see Table 2a, Table 2b, Table 2c for detailed effect sizes and power estimates of every subfield. With one exception, the manual labeling approach identified CA1 or CA1-CA2 transition volumes as the subfields with the largest effect sizes for diagnostic and neuropsychological variables of interest. This consistency was also reflected in the high correlation coefficients (mean r left = 0.61, mean r right = 0.61). The FS 6.0 approach identified CA4 and CA1 as having the largest effect sizes for the overall group comparison, fimbria and CA4 as having the largest effect size for the comparison between MCI and cognitively normal and subiculum and molecular layer as having the largest effect size for the comparison between amyloid positive and negative cognitively intact subjects. The inconsistency of the subfields identified as having the largest effect size in these tests was reflected in low correlation coefficients (mean r left = − 0.16, mean r right = − 0.16). ASHS identified the CA sector as having the largest effect sizes to detect group effects, BA36 and DG has having the largest effect sizes to detect differences between cognitively intact subjects and subjects diagnosed with MCI and parahippocampal gyrus and entorhinal cortex as having the largest effect size to detect amyloid associated subfield atrophy. The ranking of the effect sizes and thus consistency of the detected effects was low with r = − 0.27 on the left side but higher on the right side r = 0.3.

Table 2a

Effect sizes to detect “Group” effects.

Red, subfields with the highest power for alpha = 0.05 and effect size for each method; bold, method with best performance; *, subfield with significantly higher effect size than other subfields within this comparison. PHIPPO, parahippocampus, ERC entorhinal cortex; BA, Brodmann Area; PreSub, presubiculum, ParaSub, parasubiculum, SUB, subiculum, CA, cornu ammonis sector, Mol Lay, molecular layer, GC-ML-DG, granule cell layer of dentate gyrus, DG, dentate gyrus, HIPPO tail, posterior section of hippocampus, Total Hippo, total hippocampal volume from FreeSurfer.

Table 2b

Effect sizes distinction MCI vs. cognitively intact elderlies.

Red, subfields with the highest power for alpha = 0.05 and effect size for each method; bold, method with best performance. *, subfield with significantly higher effect size (p < 0.05) than other subfields within this comparison. PHIPPO, parahippocampus, ERC entorhinal cortex; BA, Brodmann Area; PreSub, presubiculum, ParaSub, parasubiculum, SUB, subiculum, CA, cornu ammonis sector, Mol Lay, molecular layer, GC-ML-DG, granule cell layer of dentate gyrus, DG, dentate gyrus, HIPPO tail, posterior section of hippocampus, Total Hippo, total hippocampal volume from FreeSurfer.

Table 2c

Effect sizes for the amyloid effects on hippocampal subfields in cognitively intact amyloid positive and negative controls.

Effect sizes to detect “Group” effects. Red, subfields with the highest power for alpha = 0.05 and effect size for each method; bold, method with best performance; *, subfield with significantly higher effect size than other subfields within this comparison. PHIPPO, parahippocampus, ERC entorhinal cortex; BA, Brodmann Area; PreSub, presubiculum, ParaSub, parasubiculum, SUB, subiculum, CA, cornu ammonis sector, Mol Lay, molecular layer, GC-ML-DG, granule cell layer of dentate gyrus, DG, dentate gyrus, HIPPO tail, posterior section of hippocampus, Total Hippo, total hippocampal volume from FreeSurfer. Effect sizes distinction MCI vs. cognitively intact elderlies. Red, subfields with the highest power for alpha = 0.05 and effect size for each method; bold, method with best performance. *, subfield with significantly higher effect size (p < 0.05) than other subfields within this comparison. PHIPPO, parahippocampus, ERC entorhinal cortex; BA, Brodmann Area; PreSub, presubiculum, ParaSub, parasubiculum, SUB, subiculum, CA, cornu ammonis sector, Mol Lay, molecular layer, GC-ML-DG, granule cell layer of dentate gyrus, DG, dentate gyrus, HIPPO tail, posterior section of hippocampus, Total Hippo, total hippocampal volume from FreeSurfer. Effect sizes for the amyloid effects on hippocampal subfields in cognitively intact amyloid positive and negative controls. Red, subfields with the highest power for alpha = 0.05 and effect size for each method; bold, method with best performance. *, subfield with significantly higher effect size (p < 0.05) than other subfields within this comparison. PHIPPO, parahippocampus, ERC entorhinal cortex; BA, Brodmann Area; PreSub, presubiculum, ParaSub, parasubiculum, SUB, subiculum, CA, cornu ammonis sector, Mol Lay, molecular layer, GC-ML-DG, granule cell layer of dentate gyrus, DG, dentate gyrus, HIPPO tail, posterior section of hippocampus, Total Hippo, total hippocampal volume from FreeSurfer. T1 based shape analysis identified CA1 and dentate as the subfields with the largest effect sizes for effect of group, and subiculum and CA1 as the subfields with the largest effects sizes to detect differences between cognitively intact and impaired subjects and amyloid positivity. Since the shape analysis only identifies 3 subfields, a formal consistency assessment with Pearson correlation coefficients was not possible. FS Sub identified the fimbria as having the largest effect sizes for the differentiation between cognitively intact and impaired subjects, subiculum and hippocampal tail as having the largest effect sizes to detect a group effect and hippocampal tail and CA2/CA3 as having the largest effect size to detect amyloid effects in cognitively intact controls. The ranking of the effect sizes was low on the left side with r = − 0.003 and moderate on the right side with r = 0.26. With exception of the effect size for group effect, the effect sizes of total hippocampal volume derived from FS Hippo was lower than those seen by the T2 high resolution image based approaches. The means of the subfields with the highest power and effect sizes for each T2 method (values in red in Table 2a, Table 2b, Table 2c, Table 3) were generally higher than those of the T1 based methods for all three comparisons (cf. Table 4).

Table 3

Effect sizes for the association subfield volumes with cognition in healthy controls: ADAScog and RAVLT immediate.

Red, subfields with the highest power for alpha = 0.05 and effect sizes for each method; bold, method with best performance. *, subfield with significantly higher effect size (p < 0.05) than other subfields within this comparison. PHIPPO, parahippocampus, ERC entorhinal cortex; BA, Brodmann Area; PreSub, presubiculum, ParaSub, parasubiculum, SUB, subiculum, CA, cornu ammonis sector, Mol Lay, molecular layer, GC-ML-DG, granule cell layer of dentate gyrus, DG, dentate gyrus, HIPPO tail, posterior section of hippocampus, Total Hippo, total hippocampal volume from FreeSurfer. ADAScog, Alzheimer's Disease Assessment Scale – cognition; RAVLT, immediate recall of the Rey Auditory verbal learning test.

Table 4

Comparison of power and effect sizes of T2 and T1 based subfield approaches.

	Mean power			Mean effect size			Mean power	Mean effect size
	High Res T2	Whole brain T1	P-value	High res T2	Whole brain T1	P-value	Total Hippo	Total Hippo
Group	1.0000	0.9900	0.24	0.5584	0.4419	0.33	1.0000	0.5042
MCI	0.9438	0.6241	0.02	0.1771	0.0768	0.01	0.8359	0.1122
Abeta	0.3051	0.1643	0.06	0.0749	0.0467	0.18	0.0742	0.0081
Cog	0.6659	0.5146	0.26	0.1850	0.1369	0.37	0.5102	0.1248

High Res T2 includes manual labeling, ASHS and Freesurfer 6.0 subfields.

Whole Brain T1 includes shape analysis and Freesurfer 5.1 subfields.

P-value, p for comparion T2 vs T1 subfield approach.

Effect sizes for the association subfield volumes with cognition in healthy controls: ADAScog and RAVLT immediate. Red, subfields with the highest power for alpha = 0.05 and effect sizes for each method; bold, method with best performance. *, subfield with significantly higher effect size (p < 0.05) than other subfields within this comparison. PHIPPO, parahippocampus, ERC entorhinal cortex; BA, Brodmann Area; PreSub, presubiculum, ParaSub, parasubiculum, SUB, subiculum, CA, cornu ammonis sector, Mol Lay, molecular layer, GC-ML-DG, granule cell layer of dentate gyrus, DG, dentate gyrus, HIPPO tail, posterior section of hippocampus, Total Hippo, total hippocampal volume from FreeSurfer. ADAScog, Alzheimer's Disease Assessment Scale – cognition; RAVLT, immediate recall of the Rey Auditory verbal learning test. Comparison of power and effect sizes of T2 and T1 based subfield approaches. High Res T2 includes manual labeling, ASHS and Freesurfer 6.0 subfields. Whole Brain T1 includes shape analysis and Freesurfer 5.1 subfields. P-value, p for comparion T2 vs T1 subfield approach. A well acknowledged problem of all subfield labeling approaches including those chosen for this project is the lack of a common set of rules to identify their boundaries which makes a direct comparison of the findings across different labs difficult (Yushkevich et al. 2015). Although the identification of a common set of rules for subfield labeling was beyond the scope of this project, it is possible to combine labels from more detailed protocols to mimic the parcellations of less detailed protocols. The less detailed protocol used by ASHS and shape analysis consisted of a CA label, a DG label and a subiculum label. The CA label was mimicked for the manual approach by combining the CA1 and CA1–2 transition zone volumes, for FS 6 by combining the volumes for subiculum, CA1, CA2/3 and molecular layer and for FS Sub by combining the volumes for subiculum, CA1, CA2/3. The presubiculum label of FS 6.0 and FS Sub corresponds to the subiculum label of ASHS, shape analysis and the manual approach. The DG label for FS 6.0 was mimicked by combining the labels of CA4 and granular cell layer dentate. The DG estimates for the manual labeling was based on its CA3&DG volume and those for FS sub on its CA4-DG label. Fig. 2 shows the effect sizes for these compounded labels. Using the compounded labels, all three high resolution T2 based approaches identified left and right CAc as the subfield with the highest effect size for group. Manual labeling and FS 6.0 identified left and right CAc and ASHS right CAc but left subiculum with left CAc as a close second as the most affected combined label for Abeta effects. All three identified either CAc or DG as the most affected subfield for the MCI effects. Taken together, simplifying parcellation schemes by compounding labels increased the consistency within each approach and across the three approaches. FS SUB identified left and right subiculum as the region with the largest group effect and left and right DG as the region with the largest MCI effect. Right CAc and left subiculum with left CAc as a close second showed the largest effect sizes for Abeta with the FS SUB approach. The use of the compounded labels also increased the consistency of the findings within FS SUB but not of those of high resolution T2 based approaches.

Fig. 2

Bar plots of effect sizes for combined labels. CA corresponds to all cornu ammonis labels, DG corresponds to the dentate gyrus and Sub to the subiculum. Please see text for a description how these labels were generated for each of the approaches. “Group” refers to the ability to detect a group effect in a population consisting of cognitively normal elderly controls, non-demented subjects with different degrees of mild cognitive impairment and subjects diagnosed with Alzheimer's Disease. ‘MCI’ refers to the ability to detect a group effect on subfield and mesio-temporal volumes in a population consisting of cognitively normal elderly controls, and non-demented subjects with different degrees of mild cognitive impairment. ‘Abeta’ is the ability to detect an effect of amyloid positivity on subfield and mesio-temporal volumes in cognitively normal subjects.

Subfield volumes as predictor of cognitive function in cognitively intact subjects

Left and right volumes were averaged for this assessment. Please see Table 3 for the detailed results. The manual approach identified CA1 and CA1–2 transition as having the largest effect sizes for ADAScog and RVLT immediate recall. FS 6.0 identified the presubiculum as the subfield with the largest effect for ADASCog and the dentate granular cell layer volume as the subfield with the largest effect size for immediate RVLT recall. ASHS found that ERC had the largest effect size for ADAScog and that CA had the largest effect size for immediate RVLT. Shape analysis identified the dentate gyrus as having the largest effect size for ADAScog and CA1 as the subfield with the largest effect size for RVLT immediate recall. Finally, FS Sub subfields showed the same pattern as FS 6.0, i.e., presubiculum had the highest effect size for ADAScog and CA4&dentate had the highest effect size for RVLT immediate recall. The means of the subfields with the highest power and effect sizes for each T2 method (values in red in Table 2a, Table 2b, Table 2c, Table 3) were generally higher than those of the T1 based methods for all both comparisons (cf. Table 4).

Discussion

The comparison revealed two major findings. 1. It was possible to acquire T2 high resolution hippocampal images in a large multi-site project that were comparable regarding image quality and accurate and consistent positioning to images acquired for dedicated small research projects where a one-to-one training and immediate feedback regarding quality are possible. This is an important step for this technique's successful transition from a research tool to a clinical routine procedure. 2. Using effect size as criterion subfield volumetry but especially approaches using a T2 high resolution image outperformed traditional hippocampal volumetry regarding their ability to detect conditions characterized by subtle hippocampal atrophy, i.e., comparisons between MCI vs CN, amyloid neg vs pos, and associations subfield cognition/memory in cognitively intact subjects. T2 high resolution approaches also outperformed subfield approaches using the whole brain T1 image for the comparisons between MCI vs CN and amyloid neg vs pos. However, the T2 high-resolution approaches identified different subfields as having the highest effect sizes for these comparisons. This was not unexpected given the differences of the labeling scheme. Reducing the differences of parcellation schemes between the three T2 high resolution approaches by combining labels and generating compounded labels eliminated left/right differences within each method and increased the consistency for the identification of the most affected region across methods. None of the different T2 high resolution methods tested had a clear advantage over the other methods. Each has strengths and weaknesses that need to be considered when deciding which one to use to get the best results from subfield volumetry in AD. Attempts to use high resolution T2 images to measure subfield volumes have been made since the introduction of the hippocampal unfolding technique by Zeineh et al. (2001) and have been intensified after 3 T magnets became widely available in academic hospitals and research institutions in the last decade. Despite its popularity for research and the development of reliable, fast and computationally efficient automated subfield volumetry approaches quantitative hippocampal subfield volumetry has yet to make the transition from research to clinical application. One of the reasons for the delay is the relatively lengthy acquisition time of about 9–13 min which is rather long for a busy clinical setting. For this project, this issue was addressed by shortening the acquisition time to about 8 min. Another reason is the need for careful positioning that requires some knowledge of hippocampal anatomy. Despite detailed instructions incorrect positioning with insufficient coverage of the hippocampus was a problem in the early phase of this project and resulted in the exclusion of about 8% of all acquired images. The problem was eliminated after adding the positioning information as a tab to the information displayed at the MR console. The images acquired for ADNI2 were of similar quality as those acquired in research settings. 12% were rated as not suitable for processing mostly due to motion artifacts or poor contrast in the hippocampal formation which compares well with the percentage of images excluded from processing in this population in research settings. Taken together, it can be concluded that it is possible to obtain high resolution T2 hippocampal images of suitable quality for manual and automated subfield labeling within a reasonable acquisition time in a multi-site project which makes it possible to use this sequence in imaging protocols for drug trials or for clinical purposes. There are two ways to assess the performance of different subfield volumetry approaches. The first is statistical efficiency which compares approaches based on effect size or minimal number of subjects needed to find any statistical difference. The second is biological plausibility which addresses the question to what degree the measurement with the highest effect size corresponds to the known histopathological features of the disease process. AD is characterized by a well-defined progression of its histopathological hallmarks amyloid plaques and neurofibrillary tangles. Particularly the latter has been shown to be closely associated with cognitive performance and neuron loss within the hippocampus and mesio-temporal lobe (Giannakopoulos et al., 1996, Giannakopoulos et al., 2007, Fukutani et al., 2000). The earliest site of neurofibrillary tangle accumulation in the preclinical state of AD is the trans-entorhinal cortex (BA 35) followed by the entorhinal cortex. The tangles then progress to the hippocampus where they cause the most prominent neuron loss in CA1 followed by that in the subiculum while the dentate gyrus is relatively spared in the early clinical stages (West et al., 1994). Subfield volumetry should ideally show the same kind of progression, i.e., the most prominent volume loss in the trans-entorhinal cortex/entorhinal cortex in the early preclinical phase followed by volume loss in CA1 (MCI and early AD) and subiculum (early AD) and affect the dentate/hilus region in the more advanced stages. Based on these considerations, CA1 and entorhinal cortex would be expected to show the highest effect sizes in all comparisons except for “group” where subiculum and eventually dentate gyrus are also affected. Regarding statistical efficiency the findings of this project replicated those of previous studies (e.g., De Flores et al., 2015, Mueller et al., 2010) that showed that subfield volumetry based on a T2 high resolution image outperforms hippocampus volumetry for the detection of subtle and more localized atrophy characterizing the early stages of AD as evidenced by the power and effect size calculations for “MCI” and “Abeta”. This was expected given that the increased accuracy for the detection of subtle atrophy was one of the major reasons for the development of subfield volumetry. Nonetheless, it is satisfying that this assumption was confirmed for all T2 high resolution approaches investigated here. Approaches based on a high resolution T2 hippocampal image had typically higher effect sizes than subfield approaches using the lower resolution whole brain T1. This finding was also not unexpected. The myelinated tissue of the stratum moleculare and lacunosum appears as a very characteristic hypointense line in this image and its acquisition parameters are usually set to maximize this contrast in the population of interest. In contrast, T1 based subfield approaches use standard whole brain T1 weighted images with a lower resolution whose contrast has not been optimized for the hippocampus. On such images the stratum moleculare and lacunare is slightly hyperintense compared to the surrounding tissue and far less prominent although it is possible to enhance it by acquiring and averaging multiple T1 images. Without any strong information from the internal structure though, subfield parcellation by FS SUB is mostly guided by the outer hippocampal boundaries and by the information from the probabilistic prior. This limitation and the simplified non-anatomical parcellation scheme reduce the ability of FS SUB to detect subtle disease related atrophy and explain why the subfields with highest effect sizes for the different comparisons, do not reflect the known atrophy pattern very well despite large effect sizes. The rationale behind subfield volumetry based on hippocampal shape analysis inherently acknowledges the limited information about the internal structure in the T1 image and tries instead to optimize the information provided by the outer boundaries. However, the ability of surface-based approaches to detect subtle disease related effects is on the one hand limited by the existence of physiological shape variants, e.g., the varying number of digitations in the hippocampal head or rotational variants of the body, and on the other hand by the fact that abnormalities restricted to a specific subfield are often not well reflected on the surface. In addition, the shape measure used in this analysis was the univariate summary index, selected to be comparable to other methods. Multivariate approaches are better suited to capture shape differences due to group differences. These limitations explain why shape analysis had the lowest effect sizes of the subfield specific approaches for comparisons assessing subtle atrophy. It is important to emphasize in this context that the area where shape analysis is likely to outperform the other approaches, i.e. its ability to identify the region of maximal atrophy along the anterior-posterior axis, was not tested in this project. If effect size is used as the sole criterion, none of the different high resolution hippocampal T2 based approaches consistently outperformed the other approaches, i.e., the effect size differences between the best performing labels of the three T2 based approaches were small and mostly not significant. This leaves the question how well they were able to detect the atrophy pattern associated with the disease process. One of the major criticisms of the manual labeling approach used for this project is that the labeling is restricted to the anterior third of the hippocampal body. While there is no doubt that this approach would miss localized atrophy in the posterior body, its effect sizes for the detection of a less focal atrophy pattern compared well with the automated whole hippocampus labeling approaches. The manual labeling identified CA1 or the neighboring CA1–2 transition area as the most prominently affected subfields except for the “MCI” comparison in the right hippocampus where the largest effect size was found in the dentate gyrus and CA3 area. The ranking of the effect sizes for the manual labeling was also quite consistent across the comparisons. It is possible that the limited labeling approach that excluded difficult to trace regions such as the hippocampal head and tail contributed to the consistency of the findings. ERC however had often the smallest effect sizes in these comparisons. The poor performance of ERC for the manual labeling was not unexpected. The re-slicing of the image that is necessary to optimize the cross-section for hippocampal labeling and the limited coverage of the ERC often aggravate partial volume artifacts in the ERC region and thus the non-disease related variability. This was different for ASHS. Except for the “group” comparison that identified CA as the most affected subfield and the “cognitively normal vs MCI” comparison that identified the right dentate gyrus as having the largest effect size, ASHS identified ERC, neighboring Brodmann Area (BA) 36 or the parahippocampus as the regions with the highest effect size with CA usually a close second. This is not surprising since ASHS is the only labeling approach evaluated here that has been optimized for entorhinal/perirhinal volumetry. It is the only approach that has a separate label for the transentorhinal cortex or BA 35 (Taylor and Probst, 2008) although the other, larger entorhinal/perirhinal labels outperformed the BA 35 volume in all comparisons. A downside of the detailed entorhinal/perirhinal labels was that it affected the consistency of the ranking order of the effect sizes. FS 6.0 uses the most detailed parcellation for hippocampal subfield labeling of all approaches investigated here. The detailed labeling makes the assessment of this approach somewhat challenging because there exist no histopathological studies that systematically investigated the distribution of AD related abnormalities across layers of different subfields. As seen in the assessment of the other high resolution T2 based methods, the subfields showing the largest effect sizes varied between comparisons and the consistency of the effect size order across comparisons was low. The higher number of labels in FS 6.0 likely contributed to the low consistency. Reducing the complexity of the labeling scheme by combining labels increased the consistency of the findings between left and right hippocampal volumes for each T2 approach and also across T2 approaches but reduced the effect sizes. The observations regarding detailed labeling schemes in the previous paragraph point to another important requirement for biologically meaningful subfield labeling. The input images need to consistently depict a minimal set of landmarks for these parcellation schemes to work well. This minimal number is higher for more detailed parcellations than for less sophisticated schemes. Furthermore, outputs from automated subfield labeling routines should undergo a rigorous visual quality control by raters who know the hippocampal anatomy well enough to spot labeling inaccuracies. Such a quality control step becomes more complicated with elaborate parcellation schemes that require expert knowledge. These concerns raise the question if these detailed rhinal and hippocampal parcellations of ASHS and FS 6.0 should not be reserved for high quality 3 T images with motion correction or even 7 T images that depict the landmarks reliably and are addressing hypotheses that require such a detailed parcellation while projects that do not acquire this kind of high quality image and do not have layer specific hypotheses should consider reducing the complexity of the labeling by combining to enhance reliability and reproducibility across projects. As mentioned before, each of T2 high resolution methods has its set of strengths and weaknesses. Manual segmentation is time consuming and therefore not an option for the processing of large datasets encompassing hundreds of images. It also requires experienced, specially trained rater(s) who are not available everywhere. However, this weakness can also turn into a strength because experienced raters can adapt more easily to physiological and pathological shape and contrast variants of the hippocampus than atlas-based automated subfield labeling approaches. ASHS, on the other hand, is based on multi-atlas segmentation, which has become a very popular segmentation approach in recent years, and produces state-of-the-art results in many medical image segmentation tasks. However, ASHS relies on the accuracy of an intensity-matched registration of the images to be segmented to a library of labeled images which is likely to suffer if the former has been acquired with a different sequence or platform than the atlas images. In the case of ASHS, this can be overcome by generating population and magnet specific libraries, but this usually requires again the input of expert raters who select the appropriate images and edit the library labels. FS 6.0's main advantage is that its intensity models are learned directly from the individual scan to be segmented, which makes it less dependent on magnet type and sequence. However, FS 6.0 uses the probabilistic information from an ex vivo atlas for this process and thus its performance will be impaired in cases where the intensity distribution deviates too much from the atlas due to shape variants or due to the disease processes. In conclusion, subfield volumetry outperformed hippocampal volumetry in their ability to detect subtle atrophy that characterizes the early stages or preclinical stages of AD: T2 high resolution based approaches performed better than T1 based approaches but the latter have the advantage that they do not require the acquisition of a dedicated image. The automated T2 based subfield labeling approaches tested in this project compared well with the manual approach and their accuracy is likely to further improve in the future. However, just as for manual labeling approaches, the accuracy of the automated algorithms depends on the quality of the image that is being labeled and on how well the atlas or a priori information used by the algorithm reflects the process of interest. The following is the supplementary data related to this article.

Supplementary material

ADNI 2 images used in this study

62 in total

1. Selective effect of Apo e4 on CA3 and dentate in normal aging and Alzheimer's disease using high resolution MRI at 4 T.

Authors: S G Mueller; N Schuff; S Raptentsetsang; J Elman; M W Weiner
Journal: Neuroimage Date: 2008-04-22 Impact factor: 6.556

2. A computational atlas of the hippocampal formation using ex vivo, ultra-high resolution MRI: Application to adaptive segmentation of in vivo MRI.

Authors: Juan Eugenio Iglesias; Jean C Augustinack; Khoa Nguyen; Christopher M Player; Allison Player; Michelle Wright; Nicole Roy; Matthew P Frosch; Ann C McKee; Lawrence L Wald; Bruce Fischl; Koen Van Leemput
Journal: Neuroimage Date: 2015-04-29 Impact factor: 6.556

Review 3. Stress effects on morphology and function of the hippocampus.

Authors: B S McEwen; A M Magarinos
Journal: Ann N Y Acad Sci Date: 1997-06-21 Impact factor: 5.691

4. Amyloid burden in cognitively normal elderly is associated with preferential hippocampal subfield volume loss.

Authors: Phillip J Hsu; Haochang Shou; Tammie Benzinger; Daniel Marcus; Tony Durbin; John C Morris; Yvette I Sheline
Journal: J Alzheimers Dis Date: 2015 Impact factor: 4.472

5. Distinct patterns of neuronal loss and Alzheimer's disease lesion distribution in elderly individuals older than 90 years.

Authors: P Giannakopoulos; P R Hof; E Kövari; P G Vallet; F R Herrmann; C Bouras
Journal: J Neuropathol Exp Neurol Date: 1996-12 Impact factor: 3.685

6. Multivariate hippocampal subfield analysis of local MRI intensity and volume: application to temporal lobe epilepsy.

Authors: Hosung Kim; Boris C Bernhardt; Jessie Kulaga-Yoskovitz; Benoit Caldairou; Andrea Bernasconi; Neda Bernasconi
Journal: Med Image Comput Comput Assist Interv Date: 2014

7. Fully-automated, multi-stage hippocampus mapping in very mild Alzheimer disease.

Authors: Lei Wang; Ali Khan; John G Csernansky; Bruce Fischl; Michael I Miller; John C Morris; M Faisal Beg
Journal: Hippocampus Date: 2009-06 Impact factor: 3.899

8. Regional vulnerability of hippocampal subfields and memory deficits in Parkinson's disease.

Authors: Joana B Pereira; Carme Junqué; David Bartrés-Faz; Blanca Ramírez-Ruiz; Maria-Jose Marti; Eduardo Tolosa
Journal: Hippocampus Date: 2013-04-30 Impact factor: 3.899

9. Multi-Atlas Segmentation with Joint Label Fusion.

Authors: Hongzhi Wang; Jung W Suh; Sandhitsu R Das; John B Pluta; Caryne Craige; Paul A Yushkevich
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2012-06-26 Impact factor: 6.226

Review 10. A Critical Appraisal of the Hippocampal Subfield Segmentation Package in FreeSurfer.

Authors: Laura E M Wisse; Geert Jan Biessels; Mirjam I Geerlings
Journal: Front Aging Neurosci Date: 2014-09-25 Impact factor: 5.750

28 in total

1. CAST: A multi-scale convolutional neural network based automated hippocampal subfield segmentation toolbox.

Authors: Zhengshi Yang; Xiaowei Zhuang; Virendra Mishra; Karthik Sreenivasan; Dietmar Cordes
Journal: Neuroimage Date: 2020-05-29 Impact factor: 6.556

2. In vivo hippocampal subfield shape related to TDP-43, amyloid beta, and tau pathologies.

Authors: Veronika Hanko; Alexandra C Apple; Kathryn I Alpert; Kristen N Warren; Julie A Schneider; Konstantinos Arfanakis; David A Bennett; Lei Wang
Journal: Neurobiol Aging Date: 2018-10-25 Impact factor: 4.673

3. Volumetric GWAS of medial temporal lobe structures identifies an ERC1 locus using ADNI high-resolution T2-weighted MRI data.

Authors: Shan Cong; Xiaohui Yao; Zhi Huang; Shannon L Risacher; Kwangsik Nho; Andrew J Saykin; Li Shen
Journal: Neurobiol Aging Date: 2020-07-14 Impact factor: 4.673

4. Cross-sectional and longitudinal medial temporal lobe subregional atrophy patterns in semantic variant primary progressive aphasia.

Authors: Laura E M Wisse; Molly B Ungrady; Ranjit Ittyerah; Sydney A Lim; Paul A Yushkevich; David A Wolk; David J Irwin; Sandhitsu R Das; Murray Grossman
Journal: Neurobiol Aging Date: 2020-11-23 Impact factor: 4.673

5. Test-retest reliability of hippocampal subfield volumes in a developmental sample: Implications for longitudinal developmental studies.

Authors: Roya Homayouni; Qijing Yu; Sruthi Ramesh; Lingfei Tang; Ana M Daugherty; Noa Ofen
Journal: J Neurosci Res Date: 2021-03-22 Impact factor: 4.433

6. Reduced hippocampal subfield volumes and memory function in school-aged children born preterm with very low birthweight (VLBW).

Authors: Synne Aanes; Knut Jørgen Bjuland; Kam Sripada; Anne Elisabeth Sølsnes; Kristine H Grunewaldt; Asta Håberg; Gro C Løhaugen; Jon Skranes
Journal: Neuroimage Clin Date: 2019-05-11 Impact factor: 4.881