Literature DB >> 32580125

Medial temporal atrophy in preclinical dementia: Visual and automated assessment during six year follow-up.

Gustav Mårtensson¹, Claes Håkansson², Joana B Pereira³, Sebastian Palmqvist⁴, Oskar Hansson⁴, Danielle van Westen⁵, Eric Westman⁶.

Abstract

Medial temporal lobe (MTL) atrophy is an important morphological marker of many dementias and is closely related to cognitive decline. In this study we aimed to characterize longitudinal progression of MTL atrophy in 93 individuals with subjective cognitive decline and mild cognitive impairment followed up over six years, and to assess if clinical rating scales are able to detect these changes. All MRI images were visually rated according to Scheltens' scale of medial temporal atrophy (MTA) by two neuroradiologists and AVRA, a software for automated MTA ratings. The images were also segmented using FreeSurfer's longitudinal pipeline in order to compare the MTA ratings to volumes of the hippocampi and inferior lateral ventricles. We found that MTL atrophy rates increased with CSF biomarker abnormality, used to define preclinical stages of Alzheimer's Disease. Both AVRA's and the radiologists' MTA ratings showed similar longitudinal trends as the subcortical volumes, suggesting that visual rating scales provide a valid alternative to automatic segmentations. Our results further showed that it took more than 8 years on average for individuals with mild cognitive impairment, and an Alzheimer's disease biomarker profile, to increase the MTA score by one. This suggests that discrete MTA ratings are too coarse for tracking individual MTL atrophy in short time spans. While the MTA scores from each radiologist showed strong correlations to subcortical volumes, the inter-rater agreement was low. We conclude that the main limitation of quantifying MTL atrophy with visual ratings in clinics is the subjectiveness of the assessment.

Entities: Disease Gene Species

Mesh：

Year: 2020 PMID： 32580125 PMCID： PMC7317671 DOI： 10.1016/j.nicl.2020.102310

Source DB: PubMed Journal: Neuroimage Clin ISSN： 2213-1582 Impact factor: 4.881

Introduction

Atrophy of the medial temporal lobe (MTL) is an important diagnostic biomarker in many different dementias, including Alzheimer’s Disease (AD). In research we quantify atrophy using automatic softwares that compute volume or thickness measures of regions of interests, specified by a neuroanatomical atlas. These softwares are either not sufficiently reliable for clinical usage, and the ones that have been approved are not widely implemented. To quantify atrophy in clinics, radiologists visually assess the degree of atrophy in a brain region according to established rating scales. The most widely used rating scale in clinical practice is Scheltens’ scale of Medial Temporal Atrophy (MTA) (Scheltens et al., 1992, Vernooij et al., 2019). The MTA scale quantifies the level of atrophy in hippocampus (HC) and its surrounding structures, the choroid fissure and inferior lateral ventricle (ILV). The MTA scale has been shown to reliably distinguish individuals with AD from healthy elderly (Scheltens et al., 1992, Wahlund et al., 1999, Westman et al., 2011). It is an ordinal scale ranging from 0 (no atrophy) to 4 (end-stage atrophy) where an integer score is given for each hemisphere. In Fig. 1 we provide examples of each score. Several studies have reported on the diagnostic ability and relevant clinical cut-offs of the MTA scale (Westman et al., 2011, Scheltens et al., 1992, Ferreira et al., 2015), and others have argued for the importance of reporting MTA in the clinical routine (Torisson et al., 2015, Håkansson et al., 2019, Wahlund et al., 2017).

Fig. 1

Example of the Scheltens’ MTA scale, with progressive atrophy of the hippocampus, the choroid fissure and the inferior lateral ventricle. The image selected for each score was given the same rating by both radiologists in this study. Each hemisphere is rated individually. Longitudinal progression of medial temporal atrophy, quantified through e.g. hippocampal volumes, has been studied in cognitively normal subjects as well as in preclinical, prodromal and probable AD (Rusinek et al., 2003, Ridha et al., 2006, Henneman et al., 2009a, Pettigrew et al., 2017). The reported annual decrease in HC volume varies between the studies. Rusinek et al. (2003) found a 0.36% volume loss/year for cognitively stable subjects, and a greater loss (1%/year) in individuals with cognitive decline. Henneman et al. (2009a) reported 2.2% annual HC volume loss for healthy controls, with greater atrophy rates in patients with MCI (-3.8%/year) and AD (-4.0%/year). Another study reported an up to 8% decrease in HC volume per year in asymptomatic individuals at risk of familial AD (Fox et al., 1996). Despite the large interest in longitudinal MTL atrophy, no studies have investigated how these correspond to clinical MTA ratings. The aim of this paper is to investigate longitudinal changes in the MTL in individuals with subjective cognitive decline (SCD) or mild cognitive impairment (MCI), and whether clinical rating scales can detect these changes. Two neuroradiologists and AVRA (Automatic Visual Ratings of Atrophy)—our recently developed software providing automated continuous MTA scores—rated 93 individuals scanned four times over six years using Scheltens’ MTA scale. Further, all images were segmented using the longitudinal FreeSurfer pipeline to extract hippocampal and inferior lateral ventricle volumes. We calculated atrophy rates of the MTL for visual and automated measures to understand what progression to expect in different stages of preclinical dementia.

Methods

Study population

The study population was part of the prospective and longitudinal Swedish BioFINDER (Biomarkers For Identifying Neurodegenerative Disorders Early and Reliably) study (www.biofinder.se) and comprised non-demented people with subjective and objective cognitive decline. All patients were consecutively enrolled at three outpatient memory clinics and were assessed by physicians specialized in dementia disorders. Inclusion criteria were: 1) referred to the memory clinics because of cognitive symptoms, 2) not fulfilling dementia criteria, 3) MMSE score of 24–30 points, 4) age 60–80 years and, 5) fluent in Swedish. Exclusion criteria were: 1) cognitive impairment that without doubt could be explained by a condition other than prodromal dementia, 2) severe somatic disease, and 3) refusing lumbar puncture or neuropsychological investigation. A neuropsychological battery assessing four broad cognitive domains including verbal ability, visuospatial construction, episodic memory, and executive functions was performed and a senior neuropsychologist then stratified all patients into those with SCD (no measurable cognitive deficits) or MCI according to the consensus criteria for MCI (Petersen, 2004). From this larger cohort we selected all individuals who had been followed up three times over the course of six years. As in the work by Pettigrew et al. (2017), we stratified the cohort based on abnormality in CSF amyloid- (A) and phosphorylated tau (p-tau; T) levels analyzed with Euroimmun essays (EUROIMMUN AG, Lübeck, Germany). We applied the cut-off A/A (Janelidze et al., 2016) to define A+ and p-tau pg/ml (Mattsson et al., 2018) for T+. This yielded the subgroups A−T− (i.e. denoting normal amyloid- and p-tau levels), A+T−, and A+T+. No individuals displayed the CSF combination A−T+. Demographics and clinical characteristics of these groups are summarized in Table 1.

Table 1

Demographics of the included participants at baseline. p-values were computed using Kruskal-Wallis H-test, testing the null hypothesis that medians are equal in all subgroups.

	All	A⁻T⁻	A⁺T⁻	A⁺T⁺	p-value
N	93	54	18	21	–
SCD/MCI	61/32	42/12	8/10	11/10	–
Age at bl.	70.06 ± 5.41	69.71 ± 5.57	70.18 ± 4.55	70.86 ± 5.58	0.208
Sex, F (%)	57.0	64.8	50.0	42.9	0.001
ApoE4 carriers (%)	38.7	14.8	66.7	76.2	<.001
Education (years)	12.01 ± 3.30	11.91 ± 3.34	11.67 ± 3.42	12.57 ± 3.02	0.108
MMSE at bl.	28.26 ± 1.72	28.57 ± 1.46	28.06 ± 1.75	27.62 ± 2.06	<.001
ADAS-DWR at bl	4.24 ± 2.73	3.41 ± 2.39	5.17 ± 2.41	5.38 ± 3.05	<.001
CSF Aβ42/40 ratio	0.12 ± 0.04	0.15 ± 0.03	0.09 ± 0.02	0.07 ± 0.02	–
CSF Aβ42 (pg/ml)	611.8 ± 259.4	788.0 ± 182.7	370.6 ± 135.5	399.3 ± 145.3	–
CSF p-tau (pg/ml)	56.7 ± 36.5	35.7 ± 10.9	47.9 ± 14.4	113.4 ± 27.6	–
N conv. to dementia (to AD)	19 (13)	5 (0)	5 (5)	9 (8)	–

Demographics of the included participants at baseline. p-values were computed using Kruskal-Wallis H-test, testing the null hypothesis that medians are equal in all subgroups.

MRI protocol

All T1-weighted MRI scans were acquired with an MPRAGE protocol on a 3T Siemens TrioTim with the following parameters: 1.2 mm slice thickness, 0.98 mm inplane resolution, 3.37 ms Echo Time, 1950 ms Repetition Time, 900 ms Inversion Time, and 9° Flip angle.

Visual assessments

Two neuroradiologists (Rad. 1 and Rad. 2) rated all available images according to Scheltens’ MTA scale (Scheltens et al., 1992), see Fig. 1. Instead of only reporting the most severe MTA score, as proposed by Scheltens et al. (1995), we included the MTA score of the left and right hemisphere separately. The raters were blinded to sex, age, diagnosis, amyloid- and tau status, subject ID and timepoint to not bias the ratings. Both radiologists assess MTA on a regular basis as part of their clinical work but have not trained together to facilitate rating consistency.

Automated methods

For automated MTA ratings we used our recently proposed software AVRA2 v0.8 (Mårtensson et al., 2019a). Briefly, AVRA is a deep learning model that was trained on more than 3000 MRI images from multiple cohorts rated by a single radiologist (none of the raters in the current study). It is based on convolutional neural networks and predicts MTA from features extracted from the raw images (i.e., not volumetric data), similar to how a radiologist would perform the assessment. The model has previously demonstrated substantial inter-rater agreement levels in multiple imaging cohorts from various memory clinics (Mårtensson et al., 2019a, Mårtensson et al., 2019b). The first step of the processing pipeline of AVRA is to align the anterior and posterior commissures (AC-PC) using FSL FLIRT (Jenkinson and Smith, 2001, Jenkinson et al., 2002). We visually inspected the rigid registrations to ensure that the AC-PC alignment had not failed, but no AVRA ratings were discarded based on this. Contrary to the radiologist ratings, AVRA outputs continuous MTA scores. This allows for capturing more subtle longitudinal changes in the MTA scores that is not possible with a discretized scale. All MRI scans were processed through TheHiveDB system (Muehlboeck et al., 2014) with FreeSurfer3 6.0.0 for automatic segmentation of cortical and subcortical structures, such as hippocampi and inferior lateral ventricles (Dale et al., 1999, Fischl et al., 2004). First, all images were processed cross-sectionally, and each output was visually inspected to detect images with inaccurate hippocampal or ventrical segmentation. Images that passed quality control were re-processed with FreeSurfer’s longitudinal pipeline for more consistent segmentation (Reuter et al., Jul 2012). The longitudinal output was once again visually inspected and cases with poor hippocampal or ventrical segmentations excluded. In total, 339 (out of 372) images from 87 (out of 93) participants were included in the study for further analyses.

Analyses

The analyses revolve around two central themes: to study the sensitivity and reliability of MTA ratings in a longitudinal setting, and to characterize medial temporal atrophy in preclinical dementia. For the first theme we used Cohen’s weighted kappa to assess the agreement between two sets of ratings. As there is no ground truth available, is a common metric to report in studies using visual ratings, where a high inter- and intra-rater agreement suggests that the rater is reliable and consistent. We further compared the manual and automated ratings to hippocampal and inferior lateral ventricle volumes. Although MTA is rated in a single slice—and does not assess volumes—we assume that reliable ratings should (anti-) correlate strongly with HC and ILV volumes. We studied visual rating sensitivity, i,e. their ability to capture MTL atrophy, by comparing within-subject changes in MTA ratings (“MTA”) to changes in HC and ILV volume (“HC” and “ILV”). Characterizing MTL atrophy in preclinical dementia was done by studying the cross-sectional and longitudinal progression of MTA scores, HC volumes and ILV volumes as a function of age. We approximated the average annual change in MTA scores (“MTA/year”), Mini Mental State Examination (MMSE), Alzheimer’s Disease Assessment Scale-delayed word recall (ADAS-DWR), HC and ILV volumes by fitting a least-squares regression line for each individual and measure. To study the effect of clinical status (i.e. SCD or MCI), we performed additional analyses on SCD and MCI subjects separately within each CSF group. The analyses including volumetric data were performed on the subset of images that passed the visual quality control. As supplementary analysis, we calculated the mean and standard deviations of the MTA ratings at each timepoint in subgroups with different CSF biomarker profiles and cognitive statuses.

Results

The rating agreements between radiologists and AVRA, and their correlations to HC and ILV volumes, are shown in Table 2. The weighted kappa agreements between the raters ranged from fair ( [0.2,0.4)) to substantial ( [0.6,0.8)) (Landis et al., 1977). All sets of ratings demonstrated similar Spearman correlations strengths to HC ( [-0.61,-0.50]) and ILV ( [0.82,0.89]) volumes. Violinplots illustrating the distribution of HC and ILV volumes per MTA score and rater are shown in Fig. 2, where we note that Rad. 1 systematically gave higher MTA scores than Rad. 2. We include confusion matrices between rating sets for left and right hemispheres, together with scatter- and boxplots visualizing the remaining associations in Table 2, as Supplementary data in Tables S1-S3 and Fig. S1.

Table 2

		Rad. 1		Rad. 2		AVRA
Measure	Metric	Left	Right	Left	Right	Left	Right
Rad. 1	κw			0.30	0.36	0.58	0.61
Rad. 2	κw	0.30	0.36			0.30	0.35
AVRA	κw	0.58	0.61	0.30	0.35
Rad. 1	rs			0.81	0.78	0.82	0.80
Rad. 2	rs	0.81	0.78			0.89	0.86
AVRA	rs	0.82	0.80	0.89	0.86
HC vol.	rs	−0.58	−0.51	−0.58	−0.50	−0.58	−0.61
ILV vol.	rs	0.82	0.82	0.85	0.87	0.89	0.89
MMSE	rs	−0.54	−0.54	−0.49	−0.43	−0.45	−0.44
ADAS-DWR	rs	0.51	0.51	0.55	0.52	0.56	0.56

Fig. 2

Violinplots of the radiologists’ MTA ratings and corresponding hippocampal volume (top) and inferior lateral ventricle volume (bottom). The width of the violins shows the distribution over volumes for each rating and rater, and the area indicates the number of images given a specific rating. The green dots show AVRA’s MTA rating for each image.

Inter-rater agreements () and Spearman correlations () between radiologists’ ratings, hippocampal (HC) and inferior lateral ventricle (ILV) volumes. For the correlation metrics, we compared one timepoint per subject so that all observations were independent. The p-values of all correlations entries were . Violinplots of the radiologists’ MTA ratings and corresponding hippocampal volume (top) and inferior lateral ventricle volume (bottom). The width of the violins shows the distribution over volumes for each rating and rater, and the area indicates the number of images given a specific rating. The green dots show AVRA’s MTA rating for each image. In Table 3 the baseline characteristics and average annual progression rates among study participants for all sets of ratings, MTL volumes, MMSE and ADAS-DWR are shown. No clear pattern was found between CSF groups in the cross-sectional baseline measures. However, all MTL measures showed that the atrophy rates increased with progressing AD CSF pathology, with the exception of the ratings from Rad. 1 that showed a milder progression in the A+T− group. By assessing the SCD and MCI patients separately, we observed that the CSF group differences in atrophy rates were larger in the MCI subset. We further noted that the atrophy rates were greater in the SCD subjects in A+T+ than in the MCI patients in the A−T− group. In Supplementary Fig. S2, we plot the associations between MTL rates and the continuous level of CSF p-Tau.

Table 3

	A⁻T⁻		A⁺T⁻		A⁺T⁺		p-value
Measure	Left	Right	Left	Right	Left	Right	Left	Right
Rad. 1: MTA at bl.	1.17 ± 0.66	1.17 ± 0.63	1.56 ± 0.68	1.28 ± 0.45	1.67 ± 0.78	1.43 ± 0.58	0.0282	0.2783
SCD only	1.07 ± 0.63	1.10 ± 0.61	1.38 ± 0.48	1.38 ± 0.48	1.27 ± 0.45	1.18 ± 0.39	0.2492	0.3542
MCI only	1.50 ± 0.65	1.42 ± 0.64	1.70 ± 0.78	1.20 ± 0.40	2.10 ± 0.83	1.70 ± 0.64	0.2835	0.1948
Rad. 1:ΔMTA/year	0.05 ± 0.09	0.05 ± 0.08	0.04 ± 0.07	0.07 ± 0.09	0.09 ± 0.11	0.12 ± 0.10	0.0255	<.0001
SCD only	0.05 ± 0.09	0.04 ± 0.08	0.04 ± 0.06	0.03 ± 0.05	0.07 ± 0.10	0.08 ± 0.11	0.8090	0.3825
MCI only	0.05 ± 0.09	0.06 ± 0.08	0.04 ± 0.07	0.11 ± 0.10	0.11 ± 0.11	0.16 ± 0.07	0.0016	<.0001
Rad. 2: MTA at bl.	0.50 ± 0.71	0.56 ± 0.79	0.61 ± 0.76	0.50 ± 0.69	0.62 ± 0.79	0.81 ± 0.85	0.7581	0.3620
SCD only	0.36 ± 0.65	0.52 ± 0.79	0.50 ± 0.71	0.62 ± 0.70	0.27 ± 0.45	0.45 ± 0.66	0.8081	0.8203
MCI only	1.00 ± 0.71	0.67 ± 0.75	0.70 ± 0.78	0.40 ± 0.66	1.00 ± 0.89	1.20 ± 0.87	0.6051	0.0902
Rad. 2:ΔMTA/year	0.05 ± 0.08	0.06 ± 0.09	0.11 ± 0.10	0.13 ± 0.07	0.15 ± 0.12	0.13 ± 0.12	<.0001	<.0001
SCD only	0.06 ± 0.09	0.06 ± 0.09	0.04 ± 0.07	0.10 ± 0.06	0.11 ± 0.12	0.10 ± 0.10	0.0155	0.0039
MCI only	0.03 ± 0.07	0.07 ± 0.09	0.16 ± 0.10	0.14 ± 0.07	0.19 ± 0.10	0.17 ± 0.13	<.0001	0.0033
AVRA: MTA at bl.	1.26 ± 0.58	1.26 ± 0.56	1.39 ± 0.71	1.40 ± 0.64	1.20 ± 0.58	1.28 ± 0.64	0.6503	0.5989
SCD only	1.18 ± 0.55	1.24 ± 0.55	1.10 ± 0.50	1.33 ± 0.69	1.02 ± 0.43	1.01 ± 0.50	0.7771	0.3942
MCI only	1.54 ± 0.60	1.34 ± 0.60	1.62 ± 0.77	1.46 ± 0.58	1.39 ± 0.65	1.57 ± 0.65	0.8216	0.6186
AVRA:ΔMTA/year	0.04 ± 0.04	0.04 ± 0.04	0.07 ± 0.05	0.08 ± 0.05	0.13 ± 0.08	0.11 ± 0.08	<.0001	<.0001
SCD only	0.04 ± 0.05	0.04 ± 0.04	0.07 ± 0.05	0.07 ± 0.05	0.11 ± 0.09	0.09 ± 0.08	<.0001	<.0001
MCI only	0.04 ± 0.04	0.06 ± 0.04	0.07 ± 0.05	0.09 ± 0.05	0.15 ± 0.07	0.14 ± 0.07	<.0001	<.0001
HC vol at bl. (mm³)	3629 ± 432	3753 ± 506	3698 ± 586	3834 ± 567	3331 ± 487	3433 ± 494	0.0858	0.1054
SCD only	3697 ± 414	3773 ± 479	3999 ± 571	4023 ± 547	3530 ± 325	3659 ± 309	0.1740	0.5019
MCI only	3409 ± 415	3686 ± 579	3431 ± 456	3666 ± 531	3151 ± 536	3229 ± 538	0.4706	0.1605
ΔHC/year (mm³/year)	−36.3 ± 26.9	−39.3 ± 25.5	−53.4 ± 29.7	−55.4 ± 31.3	−93.4 ± 33.2	−99.3 ± 42.0	<.0001	<.0001
SCD only	−34.7 ± 27.4	−35.7 ± 25.1	−36.8 ± 20.2	−46.0 ± 23.8	−79.4 ± 21.3	−87.8 ± 27.1	<.0001	<.0001
MCI only	−41.4 ± 24.5	−50.9 ± 23.2	−68.1 ± 29.0	−63.8 ± 34.6	−106.0 ± 36.7	−109.7 ± 49.7	<.0001	<.0001
ΔHC/year (%/year)	−1.0 ± 0.9	−1.1 ± 0.8	−1.6 ± 1.0	−1.5 ± 0.9	−2.9 ± 1.0	−2.9 ± 1.2	—	—
ILV vol at bl. (mm³)	777 ± 529	724 ± 523	1053 ± 739	860 ± 629	858 ± 507	817 ± 412	0.2098	0.3466
SCD only	700 ± 481	683 ± 472	804 ± 545	839 ± 772	698 ± 241	652 ± 329	0.7090	0.9822
MCI only	1029 ± 596	856 ± 644	1274 ± 813	879 ± 465	1001 ± 627	966 ± 423	0.6443	0.4588
ΔILV/year (mm³/year)	38.1 ± 41.3	38.9 ± 44.6	83.1 ± 74.2	84.1 ± 98.5	117.1 ± 76.5	107.8 ± 94.9	<.0001	<.0001
SCD only	37.7 ± 43.5	36.3 ± 45.5	69.8 ± 63.6	98.8 ± 123.3	86.7 ± 83.8	49.3 ± 44.9	<.0001	0.0002
MCI only	39.6 ± 33.3	47.1 ± 40.6	95.0 ± 80.7	71.0 ± 66.6	144.4 ± 56.8	160.4 ± 97.2	<.0001	<.0001
ΔILV/year (%/year)	4.8 ± 4.0	4.5 ± 3.9	7.9 ± 4.3	9.9 ± 7.0	14.7 ± 9.0	13.9 ± 10.0	—	—
ΔMMSE/year	−0.15 ± 0.47		−0.49 ± 0.70		−1.13 ± 1.02		<.0001
SCD only	-0.05 ± 0.30		-0.19 ± 0.34		-0.87 ± 1.05		<.0001
MCI only	-0.53 ± 0.71		-0.74 ± 0.82		-1.41 ± 0.92		<.0001
ΔADAS-DWR/year	−0.04 ± 0.38		0.14 ± 0.40		0.39 ± 0.50		<.0001
SCD only	-0.03 ± 0.31		0.01 ± 0.40		0.49 ± 0.62		<.0001
MCI only	-0.07 ± 0.57		0.25 ± 0.36		0.28 ± 0.26		0.0003

Average baseline (bl) MTA ratings and volumes, and average annual change for individuals with different CSF statuses. Rows in bold denote entries where the whole CSF group was considered (i.e. SCDs and MCIs), and ’SCD/MCI only’ refers to the subset of SCD/MCI subjects within the CSF group. MTA/year refers to the average annual change in MTA score of the study participants. The reported p-values were computed using Kruskal–Wallis H-test to test the null-hypothesis that the population medians of all CSF groups were equal. Applying a Bonferroni correction to a significance level of means rejecting the null-hypothesis for .00076, where m is the number of statistical comparisons. In Fig. 3 the trajectories of each study participant are displayed for left MTA (for all three raters), HC volume and ILV volume respectively. Measures of the right hemisphere, together with subcortical volumes normalized with total intracranial volume, showed similar characteristics and are provided as Supplementary data (Figs. S3-S4). The MTA ratings given by Rad. 1 and Rad. 2 for each individual were rarely lower at follow-up compared to previous timepoints, which would be a requirement of reliable and sensitive measures of the MTL if assuming monotonically increasing atrophy. The longitudinal trajectories of the FreeSurfer measures were generally smoother than AVRA’s MTA scores, which were not monotonically increasing for all individuals, suggesting some degree of rating variability. From Fig. 3 we see that the MTL measures of the MCI patients (orange lines) were generally more pathological than the SCD subjects (blue lines), which is confirmed in Table 3. We include examples of MRI scans for all timepoints for randomly selected participants as Supplementary data in Fig. S5.

Fig. 3

Medial temporal lobe measures of the left hemisphere plotted against age at scan time for different combinations of A (A) and phosphorylated tau (T) abnormality. From the top: MTA ratings by Rad. 1; MTA ratings by Rad. 2; MTA ratings by AVRA; Hippocampal (HC) volumes; Inferior lateral ventricles (ILV) volumes. Orange and blue lines show individual trajectories for SCD and MCI patients, respectively. The green dots show if a patient was diagnosed with dementia at the given timepoint. A small random offset () has been added to each individual’s Rad. 1 and Rad. 2 ratings to make it easier to distinguish between overlapping trajectories. To study the sensitivity of the discrete radiologist ratings, we investigated the changes in MTA scores and MTL volumes compared to baseline. In Fig. 4 we show kernel density plots that estimate the distribution of HC and ILV for follow-up images given the same MTA score (MTA = 0), +1 MTA (MTA = 1) and  + 2 MTA (MTA = 2). Both radiologists show similar distributions for MTA = 0 and the MTA = 1 entries, with a larger shift in means for MTA = 2. From these results it was possible to estimate that when HC equaled −238 mm3 (−8%) and −235 mm3 (−7%) it became more likely that the image was being rated with a higher MTA score, for Rad. 1 and Rad. 2 respectively. Corresponding values for ILV were 225 mm3 (27%) and 254 mm3 (33%).

Fig. 4

Shows distribution (kernel density plots) of the change in HC (left) and ILV (right) volumes between baseline and follow-up scan. A follow-up image rated the same as the baseline scan are in blue (“0 MTA”), 1 MTA score higher (“+1 MTA”) in orange, and 2 MTA scores higher (“+2 MTA”) in green. Solid lines are ratings from Rad. 1 and dotted lines from Rad. 2.

Discussion

In this study we investigated longitudinal medial temporal atrophy in preclinical dementia, and to what extent it is possible to capture these changes with Scheltens’ MTA scale. We found that both radiologists provided reliable ratings, capable of capturing longitudinal changes, despite low inter-rater agreement. This was due to systematic rating differences between the radiologists, which highlights the issue of using subjective methods to quantify atrophy. Further, we observed increased MTL atrophy rates with worsening cognition and CSF AD pathology. This is the first study to investigate longitudinal MTL atrophy using MTA ratings, which helps bridge the gap between neuroimaging research and clinical radiology. The rating agreement was only moderate between Rad. 2 and Rad. 1, as well as between Rad. 2 and AVRA. This is slightly lower than inter-rater agreements reported in studies using MTA, normally in the range (0.6, 0.9) (Koedam et al., 2011, Cavallin et al., 2012b, Velickaite et al., 2017, Ferreira et al., 2017). All sets of ratings showed strong correlation to both HC and ILV volumes. This was reasonable, given that another recently proposed model estimating MTA was based on a linear combination of HC and ILV volumes (Koikkalainen et al., 2019). Our reported Spearman correlations between MTA and HC volume were stronger than previously reported, with [-0.26,-0.37] (Wahlund et al., 1999, Cavallin et al., 2012a). This shows that both radiologists are reliable, but that their rating styles differ—with one being more conservative—leading to low agreements. Since none of the radiologists trained together prior to rating the images, the low is not surprising. These results demonstrate the issue of using subjective measures to quantify atrophy, where pathological status (normal/abnormal MTA) of a patient may differ depending on which radiologist performs the rating. On the other hand, 33 images failed the FreeSurfer segmentation upon visual QC. Having to discard almost 10% of the MRI scans due to software issues is not acceptable in a clinical setting. While other segmentation tools may be more reliable than FreeSurfer, inter-scanner variability, scanner software updates and image artifacts will always be obstacles that can influence performance (Guo et al., 2019, Mårtensson et al., 2019b). This does not seem to be an issue for visual ratings, where excellent intra-rater agreement has been demonstrated even across modalities (Wattjes et al., 2009). The benefits of using objective measures will outweigh the disadvantages—particularly as softwares become more robust—but it is important to understand that a software may fail in other ways than humans. In accordance with previous studies we found increased HC (and ILV) atrophy rates with progressed CSF AD pathology and in MCI patients compared to cognitively normal (CN) subjects (Rusinek et al., 2003, Ridha et al., 2006, Henneman et al., 2009a). Pettigrew et al. (2017) specifically investigated the progression of MTL atrophy in preclinical AD, defined by abnormality in amyloid- and tau. They also found an increased atrophy rate in individuals with A+T+ biomarker profile. They did not find any differences between A−T− and A+T−. We observed differences in our automatic measures, although these differences where smaller when studying SCD subjects only. Pettigrew and colleagues investigated only CN subjects that were 10–15 years younger (on average) than in our study. Further, we defined CSF abnormality based on established cut-offs, and not by percentiles of the sample distribution. We expect our study sample to be in a more advanced pathological stage, which may explain why our data showed a difference between A−T− and A+T−. Henneman et al. (2009a) reported differences in both HC volume at baseline and HC atrophy rate between healthy controls and MCI patients, which is consistent with our observed differences between SCD and MCI subjects within each CSF group. However, SCD individuals in the A+T+ group displayed greater atrophy rates than A−T− and A+T− MCI patients. This is in line with another study from Henneman et al. (2009b), which suggested that greater CSF p-tau levels were associated with greater HC atrophy rate. The same trends as for HC were captured by AVRA’s MTA ratings, but not as clearly in the radiologist ratings. Most subjects, when using discrete ratings, had the same or +1 MTA score at six-year follow-up compared to baseline. This led to that the computed MTA/year values for Rad. 1 and Rad. 2 merely reflected the ratio of subjects given a higher MTA score within six years. That is, the MTA/year for a subject can “only” assume three values {0, 0.15, 0.2} depending on if, or at what timepoint, a higher MTA score is assigned. Thus, we argue that it is not possible to obtain reliable measure of atrophy rates from the integer radiologist ratings in our small study samples. It further suggests that the resolution of the MTA scale is too coarse to track individual MTL atrophy progression in short time spans, although the clinical usefulness of higher resolution MTL measures may be small. Focusing on the ratings from AVRA only, we found that the average changes in MTA scores were small: between 0.04 and 0.15 per year. This corresponds to roughly 25 years for A−T− subjects to progress a “full” MTA score (e.g. “1.0 2.0”). For the A+T− group the time is 13.3 years, and 8.3 years for A+T+. By combining the HC/year entries from Table 3 with the HC value at which it becomes more likely for the radiologists to give a higher MTA score (Fig. 4), we can estimate how many years it takes for individuals in each CSF group to be more likely to get a higher MTA score at follow-up. Subjects with A−T− at baseline are more likely to get a higher score at roughly 6.2 years, A+T− at 4.3 years, and A+T+ at 2.5 years. The difference in the two methods is that in the latter measure we are estimating the time to reach the next discrete MTA step. That is, borderline cases (e.g. MTA = 2.9) are more likely to get a higher score at the next follow-up than individuals with MTA = 2.0 at baseline. Assuming that patients being rated MTA = 2 by a radiologist have an underlying continuous MTA score, and that these are uniformly distributed on the interval [2,3), a patient in this group would on average have MTA = 2.5. The first method (based on AVRA ratings) should thus give roughly twice the conversion time to the second (based on radiologist ratings), which is too short but fairly close. The remaining differences can have multiple explanations. 1) The estimates are crude and based on relatively few subjects with large within-group variability in MTA rates. 2) The calculations are based on atrophy rates being constant over 20 years. This seems unlikely, given that individuals’ CSF status and cognition may worsen, which should yield increased atrophy rates according to Table 3. 3) The MTA scale assesses three structures and not just HC atrophy. Further, it has been suggested that atrophy mainly occurs in posterior HC in preclinical AD (Lindberg et al., 2017), causing the HC volume change to occur mainly “outside” the MTA rating slice. Longitudinal MTA scores have, to our knowledge, only previously been reported by Ferreira et al. (2017) in AD patients and CN subjects over a two-year follow-up. This study reported an MTA change of 0.25/year in CN participants, and 0.4/year in AD patients (estimated from figure). The annual change in MTA scores in CN individuals was higher than those observed in SCD subjects in the current study. However, we believe that our data, comprising four scans per participant and continuous ratings, allows for an accurate estimation of the MTA rate. A limitation of the current study is that many of the analyses assume a linear relationship between variables. From Fig. 3 the individual slopes for all MTL measures look linear with respect to age, or at least like a reasonable approximation for six years. However, if one was to model ILV as a function of MTA (see Fig. 2), the relationship is clearly not linear. This means that the change in ILV volume between MTA 0–1 is smaller than between MTA 3–4. This may confound the interpretations of Fig. 4, but our study sample was not large enough to consider non-linear relationships. Further, we emphasize that the study sample is not fully representative of A−T−, A+T− and A+T+ groups given that the inclusion criteria excluded (subjective) cognitively normal and dementia patient. The former would likely affect mainly the A− group results, and the latter the CSF pathological groups.

Conclusion

In this study we investigated the sensitivity and reliability of visual assessment of MTL atrophy according to Scheltens’ MTA scale in a longitudinal cohort of subjects with subjective cognitive decline and mild cognitive impairment. Our data showed that MTA ratings display the same cross-sectional and longitudinal trends as the volumes of hippocampus and the inferior lateral ventricle. This suggests that the MTA scale is a reliable alternative to automatic image segmentations, but where the discrete scale is too coarse to track individual atrophy progression in short time spans. The MTA ratings from two experienced radiologists, and an automated software, were strongly associated to the subcortical volumes as well as cognitive tests, showing that all raters were reliable. However, the inter-rater agreement was low due to systematic rating differences, which highlights the issue of using subjective assessments.

CRediT authorship contribution statement

Gustav Mårtensson: Conceptualization, Methodology, Formal analysis, Investigation, Writing - original draft, Visualization, Data curation. Claes Håkansson: Conceptualization, Methodology, Investigation, Writing - review & editing. Joana B. Pereira: Methodology, Writing - review & editing. Sebastian Palmqvist: Resources, Data curation, Writing - review & editing. Oskar Hansson: Resources, Data curation, Writing - review & editing. Danielle van Westen: Conceptualization, Methodology, Investigation, Supervision, Writing - review & editing. Eric Westman: Conceptualization, Methodology, Supervision, Writing - review & editing.

35 in total

1. A global optimisation method for robust affine registration of brain images.

Authors: M Jenkinson; S Smith
Journal: Med Image Anal Date: 2001-06 Impact factor: 8.545

2. Visual assessment of medial temporal lobe atrophy on magnetic resonance imaging: interobserver reliability.

Authors: P Scheltens; L J Launer; F Barkhof; H C Weinstein; W A van Gool
Journal: J Neurol Date: 1995-09 Impact factor: 4.849

3. Atrophy of medial temporal lobes on MRI in "probable" Alzheimer's disease and normal ageing: diagnostic value and neuropsychological correlates.

Authors: P Scheltens; D Leys; F Barkhof; D Huglo; H C Weinstein; P Vermersch; M Kuiper; M Steinling; E C Wolters; J Valk
Journal: J Neurol Neurosurg Psychiatry Date: 1992-10 Impact factor: 10.154

4. Medial temporal lobe atrophy is underreported and may have important clinical correlates in medical inpatients.

Authors: Gustav Torisson; Danielle van Westen; Lars Stavenow; Lennart Minthon; Elisabet Londos
Journal: BMC Geriatr Date: 2015-06-16 Impact factor: 3.921

5. Progressive medial temporal lobe atrophy during preclinical Alzheimer's disease.

Authors: Corinne Pettigrew; Anja Soldan; Kelly Sloane; Qing Cai; Jiangxia Wang; Mei-Cheng Wang; Abhay Moghekar; Michael I Miller; Marilyn Albert
Journal: Neuroimage Clin Date: 2017-08-25 Impact factor: 4.881

6. Comparing ¹⁸F-AV-1451 with CSF t-tau and p-tau for diagnosis of Alzheimer disease.

Authors: Niklas Mattsson; Ruben Smith; Olof Strandberg; Sebastian Palmqvist; Michael Schöll; Philip S Insel; Douglas Hägerström; Tomas Ohlsson; Henrik Zetterberg; Kaj Blennow; Jonas Jögi; Oskar Hansson
Journal: Neurology Date: 2018-01-10 Impact factor: 9.910

7. Dementia imaging in clinical practice: a European-wide survey of 193 centres and conclusions by the ESNR working group.

Authors: M W Vernooij; F B Pizzini; R Schmidt; M Smits; T A Yousry; N Bargallo; G B Frisoni; S Haller; F Barkhof
Journal: Neuroradiology Date: 2019-03-09 Impact factor: 2.804

8. Structural imaging findings on non-enhanced computed tomography are severely underreported in the primary care diagnostic work-up of subjective cognitive decline.

Authors: Claes Håkansson; Gustav Torisson; Elisabet Londos; Oskar Hansson; Danielle van Westen
Journal: Neuroradiology Date: 2019-01-17 Impact factor: 2.804

9. Atrophy of the Posterior Subiculum Is Associated with Memory Impairment, Tau- and Aβ Pathology in Non-demented Individuals.

Authors: Olof Lindberg; Gustav Mårtensson; Erik Stomrud; Sebastian Palmqvist; Lars-Olof Wahlund; Eric Westman; Oskar Hansson
Journal: Front Aging Neurosci Date: 2017-09-20 Impact factor: 5.750

10. Repeatability and reproducibility of FreeSurfer, FSL-SIENAX and SPM brain volumetric measurements and the effect of lesion filling in multiple sclerosis.

Authors: Chunjie Guo; Daniel Ferreira; Katarina Fink; Eric Westman; Tobias Granberg
Journal: Eur Radiol Date: 2018-09-21 Impact factor: 5.315

5 in total

1. Validity and reliability of the medial temporal lobe atrophy scale in a memory clinic population.

Authors: Anna Molinder; Doerthe Ziegelitz; Stephan E Maier; Carl Eckerström
Journal: BMC Neurol Date: 2021-07-24 Impact factor: 2.474

2. Associations of multiple visual rating scales based on structural magnetic resonance imaging with disease severity and cerebrospinal fluid biomarkers in patients with Alzheimer's disease.

Authors: Mei-Dan Wan; Hui Liu; Xi-Xi Liu; Wei-Wei Zhang; Xue-Wen Xiao; Si-Zhe Zhang; Ya-Ling Jiang; Hui Zhou; Xin-Xin Liao; Ya-Fang Zhou; Bei-Sha Tang; Jun-Ling Wang; Ji-Feng Guo; Bin Jiao; Lu Shen
Journal: Front Aging Neurosci Date: 2022-07-29 Impact factor: 5.702

3. Pre-stroke cognitive impairment is associated with vascular imaging pathology: a prospective observational study.

Authors: Till Schellhorn; Manuela Zucknick; Torunn Askim; Ragnhild Munthe-Kaas; Hege Ihle-Hansen; Yngve M Seljeseth; Anne-Brita Knapskog; Halvor Næss; Hanne Ellekjær; Pernille Thingstad; Torgeir Bruun Wyller; Ingvild Saltvedt; Mona K Beyer
Journal: BMC Geriatr Date: 2021-06-14 Impact factor: 3.921

4. Inter-modality assessment of medial temporal lobe atrophy in a non-demented population: application of a visual rating scale template across radiologists with varying clinical experience.

Authors: Claes Håkansson; Ashkan Tamaddon; Henrik Andersson; Gustav Torisson; Gustav Mårtensson; My Truong; Mårten Annertz; Elisabet Londos; Isabella M Björkman-Burtscher; Oskar Hansson; Danielle van Westen
Journal: Eur Radiol Date: 2021-07-30 Impact factor: 5.315

5. Diagnostic Performance of Automated MRI Volumetry by icobrain dm for Alzheimer's Disease in a Clinical Setting: A REMEMBER Study.

Authors: Mandy Melissa Jane Wittens; Diana Maria Sima; Ruben Houbrechts; Annemie Ribbens; Ellis Niemantsverdriet; Erik Fransen; Christine Bastin; Florence Benoit; Bruno Bergmans; Jean-Christophe Bier; Peter Paul De Deyn; Olivier Deryck; Bernard Hanseeuw; Adrian Ivanoiu; Jean-Claude Lemper; Eric Mormont; Gaëtane Picard; Ezequiel de la Rosa; Eric Salmon; Kurt Segers; Anne Sieben; Dirk Smeets; Hanne Struyfs; Evert Thiery; Jos Tournoy; Eric Triau; Anne-Marie Vanbinst; Jan Versijpt; Maria Bjerke; Sebastiaan Engelborghs
Journal: J Alzheimers Dis Date: 2021 Impact factor: 4.472

5 in total