We aimed to develop a new method to convert T1-weighted brain MRIs to feature vectors, which could be used for content-based image retrieval (CBIR). To overcome the wide range of anatomical variability in clinical cases and the inconsistency of imaging protocols, we introduced the Gross feature recognition of Anatomical Images based on Atlas grid (GAIA), in which the local intensity alteration, caused by pathological (e.g., ischemia) or physiological (development and aging) intensity changes, as well as by atlas-image misregistration, is used to capture the anatomical features of target images. As a proof-of-concept, the GAIA was applied for pattern recognition of the neuroanatomical features of multiple stages of Alzheimer's disease, Huntington's disease, spinocerebellar ataxia type 6, and four subtypes of primary progressive aphasia. For each of these diseases, feature vectors based on a training dataset were applied to a test dataset to evaluate the accuracy of pattern recognition. The feature vectors extracted from the training dataset agreed well with the known pathological hallmarks of the selected neurodegenerative diseases. Overall, discriminant scores of the test images accurately categorized these test images to the correct disease categories. Images without typical disease-related anatomical features were misclassified. The proposed method is a promising method for image feature extraction based on disease-related anatomical features, which should enable users to submit a patient image and search past clinical cases with similar anatomical phenotypes.
We aimed to develop a new method to convert T1-weighted brain MRIs to feature vectors, which could be used for content-based image retrieval (CBIR). To overcome the wide range of anatomical variability in clinical cases and the inconsistency of imaging protocols, we introduced the Gross feature recognition of Anatomical Images based on Atlas grid (GAIA), in which the local intensity alteration, caused by pathological (e.g., ischemia) or physiological (development and aging) intensity changes, as well as by atlas-image misregistration, is used to capture the anatomical features of target images. As a proof-of-concept, the GAIA was applied for pattern recognition of the neuroanatomical features of multiple stages of Alzheimer's disease, Huntington's disease, spinocerebellar ataxia type 6, and four subtypes of primary progressive aphasia. For each of these diseases, feature vectors based on a training dataset were applied to a test dataset to evaluate the accuracy of pattern recognition. The feature vectors extracted from the training dataset agreed well with the known pathological hallmarks of the selected neurodegenerative diseases. Overall, discriminant scores of the test images accurately categorized these test images to the correct disease categories. Images without typical disease-related anatomical features were misclassified. The proposed method is a promising method for image feature extraction based on disease-related anatomical features, which should enable users to submit a patient image and search past clinical cases with similar anatomical phenotypes.
Conventional structural MRI still plays a leading part in clinical diagnostic radiology, providing vast amounts of anatomical information. There are numerous clinical hallmarks and signs that can be depicted by structural MRI, which are well established after more than 30 years of clinical application. Currently, clinical MR images are interpreted by radiologists and stored electronically in the picture archiving and communication system (PACS) with the radiology reports. A text-based image searching of PACS enables the retrieval of stored images with the clinical information and radiology report. This searching capability dramatically improved daily clinical practice by saving time and effort to collect images from a patient to evaluate disease progression and the efficacy of treatments, and to collect images from a specific clinical condition to investigate the common anatomical phenotype depicted by MRI. However, to further aid in clinical use, an image-based search, in which the patient's image is submitted to PACS as a “keyword,” and past images with similar anatomical phenotypes are identified, and a statistical report about the diagnosis and prognosis is provided, would be far more informative. This type of image searching is called content-based image retrieval (CBIR), which is an anticipated technology in medical imaging (El-Kwae et al., 2000; Greenspan and Pinhas, 2007; Muller et al., 2005; Orphanoudakis et al., 1996; Rahman et al., 2007; Robinson et al., 1996; Sinha et al., 2001; Unay et al., 2010). Although the CBIR is a promising technology, to date, the application to PACS is limited (Muller et al., 2004; Sinha and Kangarloo, 2002), because of the difficulty of extracting features from the stored images, especially for brain MRI, which consists of numerous anatomical structures with highly varying intensity, volume, and shape among diseases and even among normal individuals. One of the solutions is to apply image quantification technologies, which has been the subject of extensive research in the last two decades (Ashburner and Friston, 1999; Chiang et al., 2008; Good et al., 2001; Mazziotta et al., 2001; Smith et al., 2006; Verma et al., 2005; Wright et al., 1995; Yushkevich et al., 2008; Zhang et al., 2006). These analyses have been mostly designed for traditional group-based studies, in which strict inclusion criteria and age-matched controls were essential, but often incompatible, with clinical practice where an individual image, not a group of diseases, is the target of the analysis. The concept of group analysis assumes consistent locations of abnormalities, which does not hold for clinical situations with heterogeneous patient populations and image quality. There are diseases with lesions that are not seen in the normal brain, such as stroke and brain tumors, and diseases with atrophy in a specific set of anatomical structures, such as Alzheimer's disease. To localize the disease-related pathological changes seen in brain MRI, transformation-based image analysis methods are often employed. However, the lesions with abnormal intensity or the space-occupying lesions often cause significant misregistration of brain structures after image transformation. The brain with severe atrophy, such as that seen in Alzheimer's disease, is also problematic in terms of the transformation accuracy. There are methods to overcome such inaccuracy by using specific approaches, such as lesion-masking (Andersen et al., 2010; Ripolles et al., 2012) or a disease-specific template (Liao et al., 2012; Mandal et al., 2012; Wang et al., 2012) (e.g., http://www.loni.ucla.edu/Atlases/), but it is still difficult to quantify various types of diseases in the same methodological framework. In addition, most of these methods use image contrast to guide the transformation, and therefore, are sensitive to the variation in contrast not only due to the anatomical abnormalities, but also to the differences in scanner and image parameters.In this study, we attempt to solve this widely known problem in transformation-based analysis by introducing an approach named the “Gross feature recognition of Anatomical Images based on Atlas grid (GAIA),” for image feature extraction (Fig. 1). In GAIA, images are co-registered to the atlas using linear transformation, followed by intensity measurement for the multiple areas in the atlas space. The overall shape and size are only roughly adjusted to that of the atlas, leaving residual misregistrations in most of the anatomical areas. The measured intensity of each area represents a combination of the local intensity alteration, caused by pathological (e.g., ischemia) or physiological (development and aging) intensity changes, as well as by atlas–image misregistration, which are recorded as unique anatomical features in a quantitative standardized matrix. Since the goal of CBIR is to retrieve images based on anatomical similarity, our ultimate interest is not how accurately images can be warped, but how to extract imaging features that can separate a specific diagnostic group from other conditions. This motivated us to use the GAIA as a method for the image recognition applied to a pool of clinical MRIs with a mixture of various diseases.
Fig. 1
GAIA procedure. All images are co-registered to the atlas space using affine transformation. The atlas segmentation map (colored contour) is overlaid on the co-registered image. The mean intensity of each of 177 parcels is calculated and ranked by the order of mean intensity. Namely, the area with highest intensity is ranked #1 and the area with lowest intensity is ranked #177. Note that the intensity includes information about both misregistration and intensity mismatch between the atlas and the target image. For example, parcels with cerebrospinal fluid contamination (e.g., parcel 4) and with low intensity change, such as the periventricular cap (yellow arrows, parcel 3), were ranked lower than those of the atlas. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
As a proof of concept, the GAIA was applied to multiple stages of neurodegenerative diseases with known macroscopic anatomical alterations, such as Alzheimer's disease (AD) (Dickerson et al., 2009; Du et al., 2007; Lerch et al., 2005), Huntington's disease (HD) (van den Bogaard et al., 2012), primary progressive aphasia (PPA) (Mesulam et al., 2012), and spinocerebellar ataxia type 6 (SCA6) (Eichler et al., 2011). We focused on 3D T1-weighted images scanned by magnetization-prepared rapid gradient recalled echo (MPRAGE), since this sequence is already widely used in clinical practice, especially when neurodegenerative diseases are suspected. To extract features specific to each of the diseases, we first applied a principal component analysis (PCA) followed by linear discriminant analysis (LDA) to a training dataset. The resultant feature vectors were subsequently applied to the test dataset collected from multiple scanners to test the accuracy of image categorization based on the GAIA.
Methods
Participants and image acquisitions
A de-identified database consisting of T1-weighted images scanned with a magnetization-prepared rapid gradient recalled echo (MPRAGE) sequence, collected through four independent clinical research studies (Faria et al., 2013; Jung et al., 2012; Oishi et al., 2011; Unschuld et al., 2012), was analyzed retrospectively. The Institutional Review Board approved each study, and written, informed consent was obtained. The demographic features, scan parameters, and abbreviations are summarized in Table 1.
Table 1
Demographic features and scan parameters.
AD group
Training dataset
Test dataset
AD (n = 12)
NC (n = 10)
NC_c(n = 3)
AD(n = 9)
MCI(n = 18)
MCI_c(n = 6)
NC(n = 10)
I.
Age (years)
76.4 ± 5.1
72.8 ± 8.5
77.7 ± 8.7
74.6 ± 8.1
74.4 ± 5.5
78.7 ± 2.7
76.8 ± 4.1
Sex (M:F)
10:2
4:6
2:1
7:2
13:5
4:2
4:6
Education (years)
15.8 ± 4.2
15.1 ± 2.2
14.7 ± 2.3
15.3 ± 3.0
15.4 ± 3.1
16.3 ± 3.4
17.4 ± 2.7
II.
MMSE
22.2 ± 2.8
29.2 ± 1.5
27.7 ± 1.5
21.2 ± 3.3
27.1 ± 2.1
24.3 ± 1.0
29.3 ± 0.9
ADAS-cog
18.4 ± 4.3
11.0 ± 2.1
8.7 ± 2.1
20.7 ± 8.1
12.2 ± 4.9
14.8 ± 4.9
10.8 ± 1.8
CDR-rating
1.2 ± 0.5
0
0.2 ± 0.3
1.0 ± 0.4
0.5 ± 0.1
0.5
0
CDR-sum
6.9 ± 2.9
0
0.2 ± 0.3
5.4 ± 2.1
1.3 ± 0.8
1.6 ± 0.6
0
GDS
1.5 ± 1.9
1.0 ± 1.2
0.3 ± 0.6
2.8 ± 2.1
1.4 ± 1.1
0.5 ± 0.8
1.4 ± 2.5
III.
Scan parameters
Protocol-1: 3.0 T Philips Intera MR scanner-MPRAGE-1.2; TR/TE (ms): 6.9/3.2; matrix: 256 × 256 × 170; FOV: 256 mm × 256 mm × 204 mm, zero-filled to 256 mm × 256 mm × 204 mm; voxel size (mm3): 1 × 1 × 1.2
Notes: AD = Alzheimer's disease, NC = normal control, MCI = mild cognitive impairment, MCI_c = MCI_converters, NC_c = NC_converters, HD = Huntington's disease, HD_es = HD early symptomatic, HD_cto = HD close to onset, HD_ffo = HD far from onset, SCA6 = spinocerebellar ataxia type 6, PPA = primary progressive aphasia, PPA_NFv = non-fluent variant of PPA, PPA_Sv = semantic variant of PPA, PPA_Lv = logopenic variant of PPA, PPA_U = unclassified PPA.
AD, mild cognitive impairment (MCI), and the cognitively normal (NC) control group
Twenty five probable-ADpatients who met the NINCDS/ADRDA criteria (McKhann et al., 1984), with a clinical dementia rating (CDR) of 1; 25 aMCI patients who met the criteria for amnestic MCI (Petersen, 2004) with a CDR = 0.5; and 25 NC participants with a CDR = 0, were included. There were no differences among these groups with regard to age, race, education, and the occurrence of vascular conditions (Mielke et al., 2009). After three years of follow-up, six MCI patients had converted to AD and were defined as MCI converters (MCI_c); three NC participants had converted to AD and were defined as NC converters (NC_c). The diagnosis and neuropsychiatric evaluations [CDR, the Alzheimer's Disease Assessment Scale — cognitive portion (ADAS-cog), the mini mental state examination (MMSE), and the geriatric depression scale (GDS)] were performed at the time of the MRI scan.MPRAGE sequences were acquired using a 3 T scanner (Gyroscan NT, Philips Medical Systems) located in the Kennedy Krieger Institute. The scan parameters were: repetition time (TR) 6.9 ms; echo time (TE) 3.2 ms; inversion time (TI) 846.3 ms; matrix 256 × 256 × 170; and field of view (FOV) 240 mm × 240 mm × 204 mm, zero-filled to 256 mm × 256 mm × 204 mm (protocol-1).
HD and the control group
Sixty-four participants positive for CAG expansion in Huntingtin and twenty-seven control subjects negative for CAG expansion were included. Among those positive for CAG expansion, thirteen participants were early symptomatic (HD_es) and 51 participants were asymptomatic, including 22 who were close to onset (HD_cto; less than 10 years to the estimated onset of the motor symptom) and 19 who were far from onset (HD_ffo; more than or equal to 10 years to the estimated onset of the motor symptom), based on the CAG-repeat length of the mutated Huntingtin allele and age (Langbehn et al., 2004). Disease burden score (DBS) was calculated as ([CAG-repeat length − 35.5] × age) (Penney et al., 1997). The Montreal Cognitive Assessment (MoCA) was performed to screen for mild cognitive dysfunction on the day of scanning. None of the participants had a history of diagnosed mood, obsessive compulsive, or psychotic disorder or substance abuse.MPRAGE sequences were acquired using a 3 T scanner (Intera, Philips Medical Systems) located in the Kennedy Krieger Institute. Two different protocols were used, including protocol-2: TR 8.4 ms; TE 3.8 ms; TI 826 ms; matrix 256 × 256 × 150; FOV230 mm × 230 mm × 135 mm, zero-filled to 256 mm × 256 mm × 135 mm; and Flip angle = 8°; and protocol-3: TR 8.0 ms; TE 3.7 ms; TI 811 ms; matrix 256 × 256 × 150; and FOV 256 mm × 256 mm × 150 mm.
SCA6 group and the control group
Twenty-four patients with genetically diagnosed SCA6 and eight controls were enrolled. The duration of disease was defined from the first self-reported symptom of ataxia. The Scale for the Assessment and Rating of Ataxia (SARA) was performed for the evaluation of ataxic symptoms.MPRAGE sequences were acquired using a 3 T scanner (Intera, Philips Medical Systems) located in the Kennedy Krieger Institute. The scan parameters were: TR 10.33 ms; TE 6.0 ms; TI 964.8 ms; matrix 256 × 256 × 140; and FOV 212 mm × 212 mm × 151 mm, zero-filled to 256 mm × 256 mm × 151 mm (protocol-4).
PPA group
Fifty seven participants with PPA, diagnosed on the basis of having a predominant and progressive deterioration in language in the absence of major change in personality, behavior, or cognition other than praxis for at least two years (Mesulam, 1982), and a control group without neurological symptoms, were included. PPA patients were classified as one of the variants of PPA according to recent guidelines (Gorno-Tempini et al., 2011), including non-fluent variant (PPA_NFv), semantic variant (PPA_Sv), and logopenic variant (PPA_Lv). Participants with only anomia and dysgraphia, and who did not meet the criteria for any of these variants, were categorized as unclassified PPA (PPA_U). All participants completed the Western Aphasia Battery (WAB) (Shewan and Kertesz, 1980) within one month before the MRI scans.MPRAGE sequences were acquired using two 3 T scanners. The one located in the Kennedy Krieger Institute (Achieva, Philips Medical Systems) was used for protocol-5: TR 8.4 ms; TE 3.9 ms; TI 849.4 ms; matrix 256 × 256 × 140; and FOV 212 mm × 212 mm × 140 mm, zero-filled to 256 mm × 256 mm × 154 mm. The other located in the Johns Hopkins Hospital (Achieva, Philips Medical Systems) was used for protocol-6: TR 6.6 ms; TE 3.1 ms; TI 821 ms; matrix 256 × 256 × 120; and FOV 230 mm × 230 mm × 120 mm, zero-filled to 256 mm × 256 mm × 120 mm.The MRIs from AD, HD_es, SCA6, PPA_Sv, PPA_NFv, PPA_Lv, PPA_U, and the control groups of each study were pseudo-randomly assigned to either training or test datasets. MRIs from NC_c, MCI, MCI_c, HD_cto, and HD_ffo were assigned as a test dataset.
Image processing
All images were re-sliced to 1 mm isotropic resolution (181 × 217 × 181 matrix), bias corrected, and skull-stripped to generate the “prepared” images by using SPM8 (http://www.fil.ion.ucl.ac.uk/spm/). The intensity histogram peaks of the cerebrospinal fluid (CSF), the gray matter (GM), and the white matter (WM) of the “prepared” images were adjusted to match those of the JHU-MNI atlas (http://cmrm.med.jhmi.edu/) using a nonlinear histogram matching routine implemented in DiffeoMap (www.mristudio.org). After intensity correction, 12-parameter affine transformation of AIR (Woods et al., 1998), implemented in DiffeoMap, was applied to the prepared images to co-register each participant's image to the atlas. The parcellation map of the JHU-MNI atlas was overlaid on the co-registered images to measure the mean intensity of the 177 areas. The measured intensity was converted to the rank order using the standard competition ranking. Namely, a structure with the highest intensity was ranked #1 and the lowest intensity was ranked #177. This conversion was performed to ameliorate the differences in intensity profile among different scan protocols, which might remain even after the bias and intensity corrections.The novelty of the GAIA is the use of parcellation maps to measure the degree of misregistration and structural intensity mismatch, which have been regarded as errors to be excluded in traditional transformation-based image analysis. Although the overall shape and size are roughly adjusted to that of the atlas after affine transformation, there are residual misregistrations in most anatomical areas (Fig. 1). For example, if a given image has an enlargement in the lateral ventricle, the area defined as the caudate in the atlas is occupied by the enlarged ventricle, which results in lower intensity in this area because of the contamination of the cerebrospinal fluid (parcel 4 of Fig. 1), and hence, this results in a relative lowering of the rank order in this parcel (rightmost column of Fig. 1). If the image contains lesions with altered intensity, such as the periventricular cap, this also lowers the intensity of the corresponding area (parcel 3 of Fig. 1), which also results in a relative lowering of the rank order in this parcel. Our hypothesis is that the rank order, which represents a combination of the atlas-image segmentation and intensity disagreements, could be used to capture the anatomical features specific to the target image.
Normalization of the ranking
Training dataset: The rank (Rtrain) of image i, area j was further converted to a z-score: Ztrain = (Rtrain - ) / σNC, where represents the mean rank and σNC represents the standard deviation of the area j of normal control images assigned to the training dataset. This resulted in a 102 (number of training data) × 177 (number of areas) matrix with Ztrain in each element. A portion of this matrix including only normal control images (40 × 177 matrix) was also created to investigate the effects of age and gender.Test dataset: The rank (Rtest) of image k, area j was further converted to a z-score: Ztest = (Rtest - ) / σNC. This resulted in a 170 (number of test data) × 177 matrix with Ztest in each element.
Extraction of age- and gender-related features using a control subset of the training dataset
PCA was applied to the 40 × 177 matrix of Ztrain to investigate correlations between extracted principal components (PCs) and age or gender. If significant correlations were identified, the PCA-derived eigenvectors were applied to the 102 × 177 matrix of Ztrain and the 170 × 177 matrix of Ztest, from which the PCs with significant correlations were removed. This resulted in Ztrain and Ztest, in which l ranges from 1 to m, which is the number of PCs without significant effects of age and gender. Spearman's rank correlation coefficient was applied for the evaluation, in which a corrected p < 0.05 (false discovery rate) was considered a significant correlation.
Extraction of disease-specific features using a training dataset
PCA was applied to the 102 × 177 matrix of Ztrain to extract PCs that could explain > 95% of the total variance. Subsequently, LDA was applied to the PCs to extract typical appearances for specific disease categories. The eight statuses (NC, AD, HD_es, SCA6, PPA_Sv, PPA_NFv, PPA_Lv, and PPA_U) were used to label the training dataset. If significant effects of age or gender existed, LDA was also applied to the 102 × m matrix of Ztrain. These procedures resulted in eight feature vectors that represented disease-specific anatomical features extracted from the training dataset.
Evaluation of GAIA using the test dataset
The eight feature vectors derived from the training dataset were applied to the test dataset (the 170 × 177 matrix of Ztest and the 170 × m matrix of Ztest) to calculate the discriminant scores of 13 statuses (NC, NC_c, AD, MCI, MCI_c, HD_es, HD_cto, HD_ffo, SCA6, PPA_Sv, PPA_NFv, PPA_Lv, and PPA_U) for each participant. A one-way analysis of variance was used to test the differences in the 13 statuses, and to test the differences in NC scores from five different scan protocols (protocols 1–5 in Table 1). The group differences in the discriminant scores were assessed using independent-sample t tests, in which p < 0.05 was considered significant. Receiver operating characteristic (ROC) curve analysis was performed to assess the classification of each disease group using discriminant scores. The correlations of discriminant scores with clinical scores were analyzed by using the Spearman's rank correlation tests, in which p < 0.05 was considered significant. Statistical analyses were performed on SPSS 18/20 (IBM Corp., NY, USA).
Results
Effects of age, gender, and scan parameters
Thirty-nine PCs were extracted from the 40 × 177 matrix of Ztrain. Significant correlations were identified between the first PC and age (Spearman's rho = 0.73, p = 8.9 × 10− 8), and the 16th PC and gender (Spearman's rho = 0.39, p = 1.2 × 10− 2) (Fig. 2). Therefore, we created Ztrain and Ztest (l: 1, 2, …, 37) in which the first and 16th PC were removed. With the effect of age and gender, the NC scores significantly differed among protocols 1–5, with the F (4, 35) = 3.648 and p = 1.4 × 10− 2. After removing the effects of age and gender, there was no significant difference in the NC scores among protocols 1–5 (F (4, 35) = 1.217 and p = 3.2 × 10− 1).
Fig. 2
Effects of age and gender on the T1-weighted image. The effects of age and gender are color-coded on the 177 areas of the atlas space. The red represents positively weighted areas and the blue represents negatively weighted areas. Weights are relative, and have no applicable units. The images are in radiological convention (R represents the right side). The effect of age was mostly identified around the ventricles. The effect of female gender was found in the left superior temporal, bilateral middle occipital, bilateral subgenual anterior cingulate, and the right prefrontal areas, which were positively weighted, and the left inferior temporal, left precentral, and bilateral superior parietal areas, which were negatively weighted. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Extraction of disease-specific features
From the Ztrain derived from the training dataset, PCA extracted 54 PCs that could explain > 95% of the total variance. LDA was applied to the 54 PCs to extract eight feature vectors that could calculate discriminant scores for seven disease statuses and for normal status (Fig. 3A). PCA and subsequent LDA were also applied to the Ztrain to extract feature vectors without the effects of age and gender (Fig. 3B).
Fig. 3
Color-coded feature vectors of eight clinical statuses.The feature vectors are color-coded on the 177 areas of the atlas space. The red represents positively weighted areas and the blue represents negatively weighted areas. Weights are relative, and have no applicable units. The images are in radiological convention (R represents the right side). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Discriminant scores of eight clinical statuses were calculated based on the trained feature vectors. Note that a higher discriminant score represents a closer match to the typical disease-related feature.The NC group had a significantly higher NC score than the patient groups (p = 1.7 × 10− 4) (Fig. 4A). The difference still remained after the effects of age and gender were removed (p = 1.9 × 10− 2) (Fig. 4A). The area under the ROC curve (AUC) indicated that the ability of the NC score to correctly discriminate between the NC group and the non-NC group was significant for both with and without effects of age and gender (Table 2, I). Although NC individuals were cognitively and neurologically normal, those with low NC scores had atrophy in the brain (Fig. 5A).
Fig. 4
Bar charts of eight discriminant scores (A: NC score, B: AD score, C: HD score, D: SCA6 score, E: PPA_Sv score, F: PPA_NFv score, G: PPA_Lv score, and H: PPA_U score) from thirteen statuses (from left to right a: NC, b: NC_c, c: AD, d: MCI, e: MCI_c, f: HD_es, g: HD_cto, h: HD_ffo, i: SCA6, j: PPA_Sv, k: PPA_NFv, l: PPA_Lv, and m: PPA_U), with effects of age and gender (upper chart) and without effects of age and gender (lower chart). Asterisks (*) represent a status that should be discriminated by the discriminant score.
Table 2
Results of ROC analyses.
With effect of age and gender
Without effect of age and gender
Cut-off(≥)
Sensitivity(%)
Specificity(%)
AUC(%)
95%CI(%)
p
Cut-off(≥)
Sensitivity(%)
Specificity(%)
AUC(%)
95%CI(%)
p
I. NC score
NC
− 2.5
76.9
58
71.4 ± 4.9
61.8–80.9
.000b
− 2.2
74.4
48.9
61.2 ± 5.0
51.5–71.0
.033a
II. AD score
NC_c
− 33.3
100
45.5
64.7 ± 8.7
47.6–81.7
0.384
− 9.5
100
48.5
59.5 ± 6.5
46.7–72.3
0.574
AD
− 11.2
88.9
90.1
92.3 ± 3.6
85.1–99.4
.000b
− 5.3
100
85.7
93.1 ± 2.2
88.9–97.3
.000b
MCI
− 34.3
83.3
44.7
65.4 ± 6.0
53.6–77.2
.033a
− 10.7
94.4
32.2
60.5 ± 6.4
48.0–73.1
0.145
MCI_c
− 39.9
100
29.3
62.4 ± 11.7
39.4–85.4
0.303
− 0.5
50
96.3
69.1 ± 13.4
42.9–95.3
0.112
III. HD score
HD_es
− 12.6
100
77.1
88.3 ± 4.1
80.2–96.3
.009b
− 3.9
100
54.8
77.0 ± 7.9
61.6–92.4
0.065
HD_cto
− 15.7
63.6
74.3
72.6 ± 6.0
60.8–84.4
.001b
− 1.3
50
86.5
69.3 ± 6.4
56.7–81.9
.004b
HD-ffo
− 23.2
86.2
55.3
72.0 ± 4.5
63.3–80.8
.000b
− 4.2
75.9
56
67.7 ± 5.2
57.6–77.8
.003b
IV. SCA6 score
SCA6
− 8.5
100
91.1
97.4 ± 1.2
94.9–99.8
.000b
− 1.8
91.7
88.6
94.1 ± 2.1
90.0–98.2
.000b
V. PPA_Sv score
PPA_Sv
− 13.7
88.9
97.5
94.6 ± 3.9
0.0–100.0
.000b
− 7
88.9
91.9
94.7 ± 2.7
89.4–100.0
.000b
VI. PPA_NFv score
PPA_NFv
− 10.3
100
89.8
97.3 ± 2.3
0.0–100.0
.001b
− 3.8
100
86.1
95.3 ± 3.0
0.0–100.0
.002b
VII. PPA_Lv score
PPA_Lv
− 13.3
72.7
74.8
78.7 ± 6.1
66.7–90.6
.001b
− 6.5
81.8
71.1
78.6 ± 5.6
67.7–89.5
.002b
VIII. PPA_U score
PPA_U
− 10.6
50
92.2
66.0 ± 16.6
33.3–98.6
0.276
− 0.6
50
98.2
71.1 ± 14.3
43.0–99.1
0.15
The asymptotic significance is less than 0.05.
The asymptotic significance is less than 0.01.
Fig. 5
Test images with the highest discriminant score (upper two rows) and the lowest discriminant score (lower two rows). A: Ventricular enlargement was prominent in the NC participant with the lowest NC score. B: The AD participant with the highest AD score showed prominent atrophy in the medial temporal area (yellow arrows), which was not seen in the AD participant with the lowest AD score. C: The HD participant with the highest HD score (HD_es) showed prominent atrophy in the basal ganglia (yellow arrows), which was not seen in the HD_ffo participant with the lowest HD score. D: The SCA6 participant with the highest SCA6 score showed prominent atrophy in the cerebellum (yellow arrows). Cerebellar atrophy was found only in the upper half of the cerebellum in the SCA6 participant with the lowest SCA6 score. E: The PPA_Sv participant with the highest PPA_Sv score showed prominent atrophy in the anterior part of the left temporal lobe (yellow arrows), which was only mildly seen in the PPA_Sv participant with the lowest PPA_Sv score. F: PPA_NFv participant with the highest PPA_NFv score showed prominent atrophy in the left perisylvian areas (yellow arrows), which was only mildly seen in the PPA_NFv participant with the lowest PPA_NFv score. G: The PPA_Lv participant with the highest PPA_Lv score showed prominent atrophy in the left parieto-temporal area (yellow arrows), which was only mildly seen in the PPA_Lv participant with the lowest PPA_Lv score. H: The PPA_U participant with the highest PPA_U score showed only mild ventricular enlargement. However, prominent atrophy in the anterior part of the temporal area (yellow arrows), similar to that in the PPA_Sv, was seen in the PPA_U participant with the lowest PPA_U score. Images are in radiological convention. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
The AD scores of the AD and MCI groups were significantly higher than those of the non-AD non-MCI group (p = 1.6 × 10− 9 and 4.0 × 10− 2). The AD scores of the MCI_c group tended to be higher than those of the other groups, but did not reach statistical significance (p = 1.4 × 10− 1). After removing the effects of age and gender, the AD scores were still significantly higher in the AD group (p = 1.1 × 10− 7), but not in the MCI and MCI_c groups (p = 2.0 × 10− 1 and 1.8 × 10− 1). The AUC indicated that the ability of the AD score to correctly discriminate between the AD or MCI group and the non-AD non-MCI group was significant. In the AD group, the significance still remained after removing the effects of age and gender, but not in the MCI group (Table 2-II). Medial temporal atrophy, which is typically seen in ADpatients, was not apparent on AD images with a low AD score (Fig. 5B). There were significant correlations between the AD score and MMSE, the ADAS, the CDR-rating, and the CDR-sum of box scores, but not between the AD score and GDS. After removing the effects of age and gender, the AD score still correlated with the MMSE, the ADAS, the CDR-rating, and the CDR-sum of box scores (Table 3, I).
Table 3
Correlations between discriminant scores and clinical scales.
With effect of age and gender
Without effect of age and gender
n
r
p (2-tailed)
n
r
p (2-tailed)
I. AD score
MMSE
36
− .363a
.030a
36
− .478b
.003b
ADAS
36
0.228
0.013a
36
0.322
0.009b
CDR-rating
36
.447b
.006b
36
.388a
.020a
CDR-sum
36
.483b
.003b
36
.458b
.005b
GDS
36
0.158
0.358
36
0.103
0.548
II. HD score
MoCA
20
− 0.102
0.668
20
0.066
0.781
III. SCA6 score
SARA
12
.745b
.005b
12
0.519
0.083
IV. PPA_Sv score
WAB AQ
15
− 0.2
0.475
15
− 0.411
0.128
WAB fluency
15
− 0.261
0.347
15
− 0.504
0.055
WAB sequential command
15
− 0.041
0.884
15
− 0.202
0.471
WAB repetition
15
0.247
0.374
15
0.036
0.898
V. PPA_NFv score
WAB AQ
15
0.161
0.576
15
0.375
0.168
WAB fluency
15
0.05
0.859
15
0.061
0.829
WAB sequential command
15
0.174
0.535
15
0.225
0.421
WAB repetition
15
0.502
0.057
15
0.564
0.029a
VI. PPA_Lv score
WAB AQ
15
− 0.286
0.302
15
− 0.475
0.074
WAB fluency
15
− 0.006
0.984
15
− 0.426
0.113
WAB sequential command
15
− 0.128
0.649
15
− 0.358
0.191
WAB repetition
15
− 0.338
0.218
15
− 0.382
0.16
VII. PPA_U score
WAB AQ
15
0.186
0.508
15
0.268
0.334
WAB fluency
15
0.233
0.402
15
0.46
0.085
WAB sequential command
15
0.119
0.672
15
0.252
0.365
WAB repetition
15
− 0.255
0.36
15
− 0.073
0.797
Correlation is significant at the 0.05 level (two-tailed).
Correlation is significant at the 0.01 level (two-tailed).
The HD scores of the HD groups (HD_es, HD_cto, and HD_ffo) were significantly higher than those of the non-HD group (p = 0.9 × 10− 2, 2.3 × 10− 4, and 2.2 × 10− 4). After removing the effects of age and gender, HD scores were still higher in the HD_cto and HD_ffo groups (p = .003 and .002), but the tendency toward higher HD scores for the HD_es group did not reach statistical significance (p = .096). The AUC indicated that the ability of the HD score to correctly discriminate between the HD group (HD_es, HD_cto, and HD_ffo) and the non-HD group was significant. This significance remained after removing the effects of age and gender, except in HD_es, which was slightly below statistical significance (Table 2, III). Atrophy in the striatum, which is typically seen in HDpatients, was not apparent in HD images with a low HD score (Fig. 5C). HD score did not correlate with MoCA score.The SCA6 scores of the SCA6 group were significantly higher than those of the non-SCA6 group (p = 3.1 × 10− 15). After removing the effects of age and gender, the SCA6 score was still significantly higher in the SCA6 group (p = 1.2 × 10− 8). The AUC indicated that the ability of the SCA6 score to correctly discriminate between the SCA6 group and the non-SCA6 group was significant. The significance remained after removing the effects of age and gender (Table 2, IV). Atrophy in the cerebellum, which is typically seen in SCA6patients, was only seen in the upper half of the cerebellum in SCA6 images with a low SCA6 score (Fig. 5D). The SCA6 score, with the effects of age and gender, was correlated with the SARA score, but the significance was less after removing the effects of age and gender (Table 3, III).The PPA_Sv score of the PPA_Sv group was significantly higher than that of the non-PPA_Sv group (p = .001). The PPA_NFv score of the PPA_NFv group was significantly higher than that of the non-PPA_NFv group (p = 2.0 × 10− 7). The PPA_Lv score of the PPA_Lv group was significantly higher than that of the non-PPA_Lv group (p = .001). The PPA_U group had a tendency toward higher PPA_U scores than those of the non-PPA_U group, but this did not reach statistical significance (p = 0.162). After removing the effects of age and gender, these four PPA scores were all significantly higher in PPA groups (p = .001, 4.1 × 10− 5, .006, and .019). The AUC indicated that the ability of the PPA score to correctly discriminate between the three PPA groups (PPA_Sv, PPA_NFv, and PPA_Lv) and the non-PPA group was significant. The significance remained after removing the effects of age and gender. However, the discrimination of the PPA_U group from the non-PPA_U group was not significant, either with or without the effects of age and gender (Table 2, V–VIII). Typical anatomical features, such as atrophy in the left fronto-temporal area (PPA_Sv), atrophy in the left frontal operculum (PPA_NFv), and atrophy in the left temporo-parietal area (PPA_Lv), were not apparent in PPA_Sv, PPA_NFv, and PPA_Lv images with low PPA scores (Fig. 5E–G). The WAB repetition scores correlated with the PPA_NFv scores only after removing the effects of age and gender (Table 3, IV–VII), but a significant correlation was not identified between the WAB AQ score and any of the PPA scores.
Discussion
GAIA employs mismatches between a target image and the reference atlas to extract anatomical features. The most striking aging effect was found in the periventricular area, probably due to ventricular enlargement, as previously reported (Juva et al., 1993; Wang et al., 2013). The effect of gender is also in agreement with the results of past studies (Chen et al., 2007; Coffey et al., 1998; Thambisetty et al., 2010).Rank order was used to quantify the intensity profile. For T1-weighted images, the intensity of the cerebrospinal fluid is always lower than that of gray and white matter, and the white matter intensity is always higher than that of gray matter. The comparison of NC scores among five different protocols indicated the robustness of the GAIA-based approach against protocol variability.The feature vectors extracted from the training dataset agreed with known pathological hallmarks. The medial temporal lobe and the parietal lobe were negatively weighted in AD, the basal ganglia were positively weighted in HD, the cerebellum was negatively weighted in SCA6, the left temporal area was negatively weighted in PPA_Sv, the left frontal operculum and the insular were negatively weighted in PPA_NFv, and the left parieto-temporal area was negatively weighted in PPA_Lv, regardless of the effects of age and gender. Note that with GAIA, the rank of the areas with cortical atrophy decreases because of the inclusion of the dark cerebrospinal fluid signal, and the lenticular nuclei with atrophy were ranked higher because of the inclusion of the surrounding bright white matter signal.Several features were observed in GAIA-based image scoring. First, the discriminant scores indicated “How close the target image was to the typical anatomical feature of the disease.” As indicated in Fig. 5, the discriminant scores were not suitable to detect diseases in their early stage with only subtle anatomical alterations, or with atypical anatomical features. Second, AD, SCA6, PPA_Sv, and PPA_NFv were well discriminated from each other, which was expected from previous publications (Dolek et al., 2012; Laakso et al., 1998; Marigliano et al., in press). Congruent with the past studies that used morphometry (Xu et al., 2000), the AD score had limited power to separate MCI and MCI_c groups from non-AD, non-MCI groups. The AD, SCA6, and PPA_NFv scores correlated with functional scales, similar to the correlations between hippocampal volume and cognitive scales (Arlt et al., 2013; Troyer et al., 2012), between cerebellar volume and ataxia scales (Eichler et al., 2011; Jacobi et al., 2012; Jung et al., 2012), and between regional volumes and WAB subsets (Amici et al., 2007). This indicated that GAIA-based feature recognition is comparable to that based on morphometry. Third, the disease separation was generally better when the effects of age and gender were accounted for, probably because the age of the AD, MCI, MCI_c, and PPA groups was higher than that of the SCA6 and HD groups. Last, the performance of the discriminant scores was not satisfactory for the disease categories that included various histopathological diagnoses, or those with an atypical phenotype. MCI includes early AD and MCI without AD pathology (Albert, 2011). The histopathological diagnosis of PPA_Lv is usually AD (Kirshner, 2012; Rabinovici et al., 2008), which might partially explain the relatively high PPA_Lv score in AD and MCI_c, but the clinical phenotype is different from that of common AD. PPA_U is, by definition, a mixture of unclassified cases of PPA, which lacks common anatomical features.While the GAIA was intended to be used as a tool for anatomical feature recognition, the natural extension is an automated image-based diagnosis. For such a diagnostic application, the GAIA needs to give discriminant scores with sufficiently high sensitivity and specificity for the diagnosis of individual patients. The ROC analysis demonstrated substantially high sensitivity and specificity for AD, HD_es, SCA6, PPA_Sv and PPA_NFv, suggesting the potential for a diagnostic application. However, given the fact that there are patients with less typical or atypical anatomical features (Fig. 5), GAIA alone might be insufficient for the clinical evaluation. One possibility for a future clinical application is a probabilistic evaluation of a single patient based on anatomical feature similarity. Namely, GAIA could be used to sort stored clinical cases with anatomical features similar to a target image, to calculate the probability of a given clinical condition, such as diagnosis, prognosis, or responsiveness to treatment. Anatomical features extracted by GAIA could also be combined with other clinical information, such as age, gender, symptoms, medical history, risk factors, results of physical examinations, and other neurological evaluations, to simulate physicians' decision-making. Since the effectiveness of combining image and non-image information to form a classification of AD and MCI has been demonstrated (Zhang et al., 2011), the GAIA might be a promising tool to extend the application of multimodal classification to a cohort that consists of multiple diseases and conditions. The exploration of the applicability of GAIA to clinical diagnosis support will be an important future direction.In this study, GAIA was based on linear transformation, which does not require computationally extensive non-linear transformation. It is possible to combine GAIA with non-linear transformation. As the nonlinearity of the transformation increases, the accuracy of atlas-based structural definition also increases. However, the transformation results become highly sensitive to intensity abnormalities, potentially leading to unpredictable outcomes. The combination of GAIA and nonlinear transformation and the effect of the degree of nonlinearity are, thus, important directions for future research. The GAIA found characteristic anatomical features for each disease category, which has been previously reported by morphometric studies. Please note that conventional morphometry studies are based on manual delineation of pre-selected structures, or voxel-based analyses, which lead to voxel-based patterns specific to each disease on a study-specific (customized) template, while GAIA applies a single generic atlas and simple linear transformation for all disease models, making it an ideal tool for CBIR of a large clinical database.This study has limitations. In this proof-of-concept study, only neurodegenerative diseases with well-known neuroanatomical features were included. To test the applicability of GAIA as a tool for CBIR, rigorous evaluation must be performed on much larger datasets, as well as on diseases with no or subtle neuroanatomical features (e.g., psychiatric diseases), diseases with substantial alterations in image intensity (e.g., stroke), diseases with space-occupying lesions (e.g., tumor), and patients with multiple diseases. Care should be taken to interpret the discriminant scores, since the scores are purely based on imaging features and do not necessarily reflect the histopathological or etiological background. Further investigations about the applicability of this method to other image modalities or to multimodal image recognition will be essential.In summary, a method to convert T1-weighted brain MRIs to feature vectors, based on local atlas–image segmentation disagreement, can accurately categorize test images with typical disease-related anatomical features.
Conflict of interest
Dr. van Zijl is a paid lecturer for Philips Medical Systems and is the inventor of technology that is licensed to Philips. This arrangement has been approved by the Johns Hopkins University in accordance with its conflict of interest policies.
Authors: An-Tao Du; Norbert Schuff; Joel H Kramer; Howard J Rosen; Maria Luisa Gorno-Tempini; Katherine Rankin; Bruce L Miller; Michael W Weiner Journal: Brain Date: 2007-03-12 Impact factor: 13.501