Literature DB >> 33236539

Clinically Available Software for Automatic Brain Volumetry: Comparisons of Volume Measurements and Validation of Intermethod Reliability.

Ji Young Lee¹, Se Won Oh², Mi Sun Chung³, Ji Eun Park⁴, Yeonsil Moon⁵, Hong Jun Jeon⁶, Won Jin Moon⁷.

Abstract

OBJECTIVE: To compare two clinically available MR volumetry software, NeuroQuant® (NQ) and Inbrain® (IB), and examine the inter-method reliabilities and differences between them.
MATERIALS AND METHODS: This study included 172 subjects (age range, 55-88 years; mean age, 71.2 years), comprising 45 normal healthy subjects, 85 patients with mild cognitive impairment, and 42 patients with Alzheimer's disease. Magnetic resonance imaging scans were analyzed with IB and NQ. Mean differences were compared with the paired t test. Inter-method reliability was evaluated with Pearson's correlation coefficients and intraclass correlation coefficients (ICCs). Effect sizes were also obtained to document the standardized mean differences.
RESULTS: The paired t test showed significant volume differences in most regions except for the amygdala between the two methods. Nevertheless, inter-method measurements between IB and NQ showed good to excellent reliability (0.72 < r < 0.96, 0.83 < ICC < 0.98) except for the pallidum, which showed poor reliability (left: r = 0.03, ICC = 0.06; right: r = -0.05, ICC = -0.09). For the measurements of effect size, volume differences were large in most regions (0.05 < r < 6.15). The effect size was the largest in the pallidum and smallest in the cerebellum.
CONCLUSION: Comparisons between IB and NQ showed significantly different volume measurements with large effect sizes. However, they showed good to excellent inter-method reliability in volumetric measurements for all brain regions, with the exception of the pallidum. Clinicians using these commercial software should take into consideration that different volume measurements could be obtained depending on the software used.

Entities: Chemical Disease Gene Species

Keywords: Alzheimer's disease; Brain volumetry; MRI; NeuroQuant®; Reliability; Softwares

Mesh：

Year: 2020 PMID： 33236539 PMCID： PMC7909859 DOI： 10.3348/kjr.2020.0518

Source DB: PubMed Journal: Korean J Radiol ISSN： 1229-6929 Impact factor: 3.500

INTRODUCTION

Volumetric measurements of brain atrophy have demonstrated close correlations with actual atrophy, neuropathological changes, and cognitive impairment in various neurodegenerative diseases (1234). Hippocampal and/or medial temporal lobar atrophy has been already integrated into the diagnostic framework of Alzheimer's disease (AD) (5). Although visual assessment of brain atrophy has been commonly performed in clinical practice, it suffers from high inter-observer variability and low sensitivity (67). In contrast, the quantitative volumetric measurement method is an objective method with good repeatability and reliability (89). Thus, volumetric measurement of brain atrophy could be used as an imaging marker for clinical differential diagnosis and prediction of disease progression. There are several freely available software packages for brain volume measurements: FSL (10), Voxel-Based Morphometry (11), FreeSurfer (12), and Statistical Parametric Mapping (13). However, the labor-intensive nature of these research software has limited generalization to routine clinical practice until the introduction of clinically available software (3). Currently, the FDA has approved several commercially available software for volume measurements: NeuroQuant (141516), Neuroreader (7), and MSmetrix (1718). Among these, NeuroQuant® (NQ, CorTechs Labs) is the most widely used software because of its fast processing time and the provision of information regarding the cortices of both hemispheres and white matter volume. Moreover, it provides normalized information of patients' data considering the intracranial volume (ICV) and relative atrophy report compared with age-matched normal data (8915). The most recently introduced clinically available software is Inbrain® (IB, MIDAS Information Technology Co., Ltd.) which is a Korean FDA-cleared software based on the FreeSurfer platform enhanced with its own deep learning algorithm (1920). While NQ provides only volume measurements of brain structures, IB provides not only volume measurements but also cortical thicknesses. In a previous study using IB, IB was able to classify the disease status and predict the progression into AD using cortical thickness in patients with mild cognitive impairment (MCI) (20). However, it has yet to be validated in terms of reliability with established software such as NQ or FreeSurfer. Therefore, in this study, we aimed to evaluate the inter-method reliability of the two commercially available software, IB and NQ, for brain volumetry in normal healthy subjects as well as in subjects with MCI and AD.

MATERIALS AND METHODS

This retrospective study received Institutional Review Board approval, and the requirement for written informed consent was waived in accordance with the requirements of a retrospective study.

Subjects

Table 1 shows the demographic data of the study population. A flowchart detailing the recruitment of subjects is shown in Figure 1. As part of a clinical practice guideline development research initiative by the Korean Society of Neuroradiology, this study used the imaging database of 102 and 51 patients with MCI and AD, respectively, who underwent brain magnetic resonance imaging (MRI) and visited the memory clinics between September 2016 and December 2017. The diagnosis of MCI and AD was confirmed by two dementia specialists (one neurologist and one psychiatrist), based on the criteria of the Diagnostic and Statistical Manual of Mental Disorders (4th edition), the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association, McKhann et al. (21), and Petersen et al. (22). After excluding patients who had other forms of dementia, those younger than 55 years of age, patients with poor image quality, 85 patients with MCI (33 male and 52 female; age range, 57–85 years; mean age, 71.76 years), and 42 patients with AD (14 male and 28 female; age range, 67–88 years; mean age, 79.00 years) were finally included. For comparison, we searched the imaging database of 119 normal healthy subjects who underwent brain MRI in the health screening center during the same time period. The inclusion criteria for healthy controls were as follows: over 55 years of age, no clinical evidence of neurological or psychiatric symptoms as evaluated by a physician. Finally, 45 normal healthy subjects (23 male, 22 female; age range, 55–74 years; mean age, 62.76 years) were included. Patients with MCI and AD were diagnosed using neuropsychiatric evaluations such as the Mini-Mental State Examination, Clinical Dementia Rating, Seoul Neuropsychological Screening Battery, or Consortium to Establish a Registry for Alzheimer's Disease.

Table 1

Demographic Data of Study Population

	Characteristics	NL	MCI	AD	P
No.		45	85	42
Age (yr)		62.8 ± 5.3	71.8 ± 6.9	79 ± 4.7	< 0.001
Sex					0.22
	Female	23	52	14
	Male	22	33	28
Mini-Mental State Examination score		NA	23.99 ± 4.26	17.98 ± 4.53	< 0.001
CDR		NA	0.58 ± 0.46	0.93 ± 0.43	< 0.001

Datas are mean ± SD. AD = Alzheimer's disease, CDR = Clinical Dementia Rating, MCI = mild cognitive impairment, NA = not applicable, NL = normal, SD = standard deviation

Fig. 1

Flow chart of the study population.

AD = Alzheimer's disease, MCI = mild cognitive impairment

Image Acquisition

All patients underwent MRI in a 3T unit (Discovery MR750; GE Healthcare). Routine brain MRI with additional T1-volume images was obtained in all subjects. During the time period, all subjects with MCI and AD were scanned for T1 volume images with a slice thickness of either 1 or 1.2 mm according to the preference of the referring physician. However, all normal healthy subjects were scanned for T1 volume images with a slice thickness of 1 mm. The preferred use of 1.2 mm was based on a recommendation from the NQ developers, while 1 mm was preferred based on the assumption that it provides a higher spatial resolution. The parameters of sagittal T1-weighted volumetric fast spoiled gradient-recalled echo were as follows: repetition time/echo time (TR/TE), 8.224/3.192; section thickness, 1 mm; matrix, 256 × 256; flip angle, 12°; field of view (FOV), 250 × 250 mm or TR/TE, 5.692/2.36; section thickness, 1.2 mm; matrix, 192 × 192; flip angle, 8°; FOV, 240 × 240 mm. Overall, three-dimensional (3D) T1 images with a slice thicknesses of 1 and 1.2 mm were obtained in 96 and 76 subjects, respectively. All of the normal healthy subjects were scanned with a slice thickness of 1 mm, and the 85 patients with MCI were scanned with slice thicknesses of 1 and 1.2 mm in 42 and 43 patients, respectively. Patients with AD were scanned with slice thicknesses of 1 and 1.2 mm in 9 and 33 patients, respectively.

Magnetic Resonance Volumetry

Sagittal T1-weighted volumetric images were used for analysis with the automated segmentation methods. The brain MRI data for each subject were uploaded on the tool's server. The processing in NQ was as follows: removal of the scalp, skull, and meninges; inflation of the brain to a spherical shape; mapping of the spherical brain to a common spherical space shared with the Talairach atlas coordinates; identification of segmented brain regions; and deflation of the brain to its original shape. Each brain region volume was corrected for head size differences by normalizing to the ICV, and the resulting output was expressed as a percentage. The result was compared with the data from the healthy controls, which were saved in the NQ database. The subject's brain region was classified as abnormally small if it fell below the fifth normative percentile. In addition, the automated tool provided an age-related atrophy report, which contained absolute volume and relative volume as a percentage of the ICV of the hippocampi, lateral ventricles, and inferior lateral ventricles. The processing time was 10–15 minutes. IB (https://www.inbrain.co.kr/index.html) is similar to the segmentation method of FreeSurfer, which is based on the volumetric- and surface-based segmentation and uses a template-driven approach (1223). The processing in IB was as follows: analysis failure prediction, intensity normalization, brain extraction, registration into the volume and surface atlas, white matter segmentation, white matter surface smoothing, topology correction, pial and white matter surface optimization, comparisons between output results and database, and analysis quality management. Finally, the volume of the regional brain structures and of cortical thickness were obtained. A deep learning algorithm was applied to the multiple steps, including analysis failure prediction, brain extraction, white matter segmentation, and analysis quality management to enhance the quality of the segmentation results. The processing took about 4 hour.

Statistical Analysis

The paired t test was used to compare the mean of volume measures between IB and NQ and Pearson's correlation was used to assess the relationships between the two methods. The inter-method agreement between the two software was assessed using the intraclass correlation coefficient (ICC). Effect sizes were obtained for the evaluation of the standardized mean difference between the two software via the following equation: effect size = mean difference/pooled standard deviation (1524). Effect sizes were defined as follows: small, 0.2; medium, 0.5; and large, 0.8 (1524). Further, comparisons were performed separately for the normal healthy subjects, and the MCI and AD subgroups. We performed the subgroup analysis based on the slice thickness. Statistical analyses were performed using commercially available software (SPSS, version 24 for Windows; IBM Corp.).

RESULTS

The statistical results are shown in Table 2. Between IB and NQ, there were significant mean differences for most regions. The mean volume in cortical gray matter, cerebral gray matter, cerebral white matter, putamen, and thalamus in NQ were larger than those in IB. The mean volume of the caudate, pallidum, hippocampus, and cerebellum in IB were larger than those in NQ. Especially, there were significant mean differences in the volume of the putamen and pallidum (p < 0.001). The volume of the putamen in NQ was larger than that in IB (5.50 ± 0.79 mL vs. 3.90 ± 0.62 mL in the left hemisphere, 5.26 ± 0.72 mL vs. 3.99 ± 0.62 mL in the right hemisphere). The pallidum volume in NQ was smaller than that in IB (0.50 ± 0.17 mL vs. 1.81 ± 0.25 mL in the left hemisphere, 0.47 ± 0.17 mL vs. 1.82 ± 0.26 mL in the right hemisphere). Figure 2 shows the color-coded images of NQ and IB. In these representative images, the pallidum in NQ appears smaller than that in IB, while the putamen in NQ appears larger than that in IB.

Table 2

Comparisons of Volume Obtained from NeuroQuant® and Inbrain® in All Subjects and Each Subgroup

		Left Hemisphere			Right Hemisphere
		NeuroQuant^®	Inbrain^®	P	NeuroQuant^®	Inbrain^®	P
		Mean ± SD	Mean ± SD	P	Mean ± SD	Mean ± SD	P
Cortical gray matter		215.69 ± 27.21	200.09 ± 25.04	< 0.001	217.99 ± 27.25	200.11 ± 24.79	< 0.001
	NL	238.116 ± 26.13	218.64 ± 21.59	< 0.001	241.73 ± 25.82	218.05 ± 21.38	< 0.001
	MCI	213.79 ± 22.21	201.12 ± 20.27	< 0.001	215.78 ± 22.18	201.50 ± 20.15	< 0.001
	AD	195.51 ± 19.18	177.80 ± 19.58	< 0.001	197.04 ± 17.17	178.06 ± 19.60	< 0.001
Caudate		2.80 ± 0.70	3.28 ± 0.54	< 0.001	2.83 ± 0.70	3.29 ± 0.53	< 0.001
	NL	2.62 ± 0.49	3.31 ± 0.38	< 0.001	2.82 ± 0.55	3.38 ± 0.42	< 0.001
	MCI	2.84 ± 0.70	3.34 ± 0.60	< 0.001	2.87 ± 0.77	3.33 ± 0.56	< 0.001
	AD	2.90 ± 0.85	3.13 ± 0.57	< 0.001	2.73 ± 0.72	3.09 ± 0.52	< 0.001
Putamen		5.50 ± 0.79	3.90 ± 0.62	< 0.001	5.26 ± 0.72	3.99 ± 0.62	< 0.001
	NL	5.88 ± 0.56	4.31 ± 0.50	< 0.001	5.68 ± 0.62	4.46 ± 0.43	< 0.001
	MCI	5.55 ± 0.76	3.93 ± 0.50	< 0.001	5.25 ± 0.66	3.98 ± 0.52	< 0.001
	AD	5.01 ± 0.82	3.39 ± 0.60	< 0.001	4.83 ± 0.69	3.48 ± 0.60	< 0.001
Pallidum		0.50 ± 0.17	1.81 ± 0.25	< 0.001	0.47 ± 0.17	1.82 ± 0.26	< 0.001
	NL	0.66 ± 0.14	1.89 ± 0.25	< 0.001	0.62 ± 0.12	1.90 ± 0.23	< 0.001
	MCI	0.48 ± 0.14	1.80 ± 0.24	< 0.001	0.44 ± 0.14	1.79 ± 0.25	< 0.001
	AD	0.35 ± 0.11	1.74 ± 0.24	< 0.001	0.35 ± 0.12	1.79 ± 0.29	< 0.001
Thalamus		7.05 ± 0.79	6.33 ± 0.75	< 0.001	7.05 ± 0.84	6.10 ± 0.76	< 0.001
	NL	7.31 ± 0.92	6.78 ± 0.75	< 0.001	7.27 ± 0.96	6.44 ± 0.74	< 0.001
	MCI	7.04 ± 0.69	6.26 ± 0.61	< 0.001	7.05 ± 0.78	6.05 ± 0.61	< 0.001
	AD	6.80 ± 0.75	5.98 ± 0.79	< 0.001	6.80 ± 0.79	5.81 ± 0.91	< 0.001
Amygdala		1.44 ± 0.28	1.20 ± 0.25	< 0.001	1.37 ± 0.26	1.39 ± 0.29	0.06^†
	NL	1.66 ± 0.25	1.40 ± 0.20	< 0.001	1.55 ± 0.23	1.62 ± 0.24	< 0.001
	MCI	1.43 ± 0.25	1.20 ± 0.22	< 0.001	1.38 ± 0.22	1.39 ± 0.25	0.33^†
	AD	1.22 ± 0.19	0.98 ± 0.19	< 0.001	1.17 ± 0.22	1.14 ± 0.22	0.14^†
Hippocampus		3.31 ± 0.75	3.55 ± 0.58	< 0.001	3.35 ± 0.74	3.60 ± 0.62	< 0.001
	NL	4.09 ± 0.56	4.07 ± 0.47	0.65^†	4.10 ± 0.52	4.17 ± 0.50	0.02
	MCI	3.28 ± 0.54	3.53 ± 0.44	< 0.001	3.31 ± 0.57	3.57 ± 0.49	< 0.001
	AD	2.55 ± 0.34	3.02 ± 0.40	< 0.001	2.61 ± 0.37	3.02 ± 0.37	< 0.001
Cerebellum		61.41 ± 6.16	61.70 ± 6.23	0.04	59.70 ± 5.70	60.43 ± 6.14	< 0.001
	NL	64.67 ± 6.96	64.46 ± 6.91	0.46^†	62.18 ± 6.31	62.88 ± 6.85	0.02
	MCI	60.82 ± 5.70	61.13 ± 6.04	0.14^†	59.22 ± 5.25	60.00 ± 5.73	< 0.001
	AD	59.09 ± 4.66	59.90 ± 4.85	0.003	58.00 ± 5.09	58.65 ± 5.40	0.03
Cerebral gray matter^*		431.48 ± 62.85	400.11 ± 49.62	< 0.001
	NL	480.04 ± 51.63	436.69 ± 42.85	< 0.001
	MCI	429.72 ± 44.02	402.61 ± 40.14	< 0.001
	AD	383.03 ± 68.14	355.86 ± 38.83	0.003
Cerebral white matter^*		447.96 ± 54.46	438.53 ± 67.79	0.008
	NL	466.59 ± 53.23	447.59 ± 48.86	< 0.001
	MCI	450.32 ± 53.97	432.90 ± 59.37	< 0.001
	AD	423.21 ± 48.38	440.23 ± 96.04	0.12^†

The units are mL. *Cerebral gray matter and white matter mean volume measured in both hemisphere, †Statistically not significant.

Fig. 2

Representative images of the color-coded images of NQ and IB.

An axial T1-weighted image (A) is shown at basal ganglia level with color-coded images of NQ (B) and IB (C). In these representative images, the pallidum in NQ appears smaller (B) compared to that in IB (C), while the putamen in NQ appears larger (B) than that in IB (C). The pallidum is indicated with asterisks. IB = Inbrain®, NQ = NeuroQuant®

Pearson's correlation analysis between IB and NQ showed a significantly strong linear correlation (0.72 < r < 0.96), except for the pallidum (Table 3). ICC also showed significantly good to excellent correlations between IB and NQ (0.83 < ICC < 0.98) (Table 3), except for the pallidum. There was no significant correlation between the two software in the pallidum (r = 0.03, p = 0.67 in the left and r = −0.05, p = 0.52 in the right, ICC = 0.06, p = 0.34 in the left and ICC = −0.09, p = 0.72 in the right).

Table 3

Results of Pearson's Correlation, ICC and Effect Size in All Subjects and Each Subgroup

		Left Hemisphere					Right Hemisphere
		r	P	ICC	P	Effect Size	r	P	ICC	P	Effect Size
Cortical gray matter		0.89	< 0.001	0.94	< 0.001	0.6	0.9	< 0.001	0.95	< 0.001	0.69
	NL	0.92	< 0.001	0.95	< 0.001	0.81	0.91	< 0.001	0.94	< 0.001	0.99
	MCI	0.83	< 0.001	0.91	< 0.001	0.6	0.87	< 0.001	0.93	< 0.001	0.67
	AD	0.8	< 0.001	0.89	< 0.001	0.91	0.78	< 0.001	0.87	< 0.001	1.03
Caudate		0.77	< 0.001	0.85	< 0.001	0.77	0.81	< 0.001	0.87	< 0.001	0.74
	NL	0.78	< 0.001	0.86	< 0.001	1.57	0.75	< 0.001	0.84	< 0.001	1.14
	MCI	0.83	< 0.001	0.9	< 0.001	77	0.82	< 0.001	0.88	< 0.001	0.68
	AD	0.76	< 0.001	0.82	< 0.001	0.32	0.83	< 0.001	0.88	< 0.001	0.57
Putamen		0.72	< 0.001	0.83	< 0.001	2.25	0.77	< 0.001	0.87	< 0.001	1.89
	NL	0.64	< 0.001	0.78	< 0.001	2.96	0.71	< 0.001	0.8	< 0.001	2.29
	MCI	0.65	< 0.001	0.75	< 0.001	2.52	0.72	< 0.001	0.82	< 0.001	2.14
	AD	0.7	< 0.001	0.8	< 0.001	2.25	0.7	< 0.001	0.82	< 0.001	2.09
Pallidum		0.03	0.67^*	0.06	0.34^*	6.13	-0.05	0.52^*	-0.09	0.72^*	6.15
	NL	0.03	0.85^*	0.05	0.43^*	6.07	0.03	0.83^*	0.06	0.43^*	6.98
	MCI	-0.28	0.01	-0.66	0.99^*	6.72	-0.25	0.02	-0.56	0.98^*	6.66
	AD	-0.07	0.67^*	-0.11	0.63^*	7.45	-0.27	0.09^*	-0.47	0.89^*	6.49
Thalamus		0.77	< 0.001	0.87	< 0.001	0.93	0.82	< 0.001	0.9	< 0.001	1.19
	NL	0.8	< 0.001	0.88	< 0.001	0.63	0.89	< 0.001	0.92	< 0.001	0.97
	MCI	0.73	< 0.001	0.84	< 0.001	1.2	0.82	< 0.001	0.89	< 0.001	1.43
	AD	0.75	< 0.001	0.86	< 0.001	1.06	0.77	< 0.001	0.86	< 0.001	1.16
Amygdala		0.87	< 0.001	0.93	< 0.001	0.9	0.91	< 0.001	0.95	< 0.001	0.07
	NL	0.78	< 0.001	0.86	< 0.001	1.15	0.9	< 0.001	0.95	< 0.001	0.3
	MCI	0.85	< 0.001	0.92	< 0.001	0.98	0.87	< 0.001	0.93	< 0.001	0.04
	AD	0.76	< 0.001	0.86	< 0.001	1.26	0.89	< 0.001	0.94	< 0.001	0.14
Hippocampus		0.89	< 0.001	0.93	< 0.001	0.36	0.94	< 0.001	0.96	< 0.001	0.37
	NL	0.85	< 0.001	0.91	< 0.001	0.04	0.92	< 0.001	0.96	< 0.001	0.14
	MCI	0.84	< 0.001	0.9	< 0.001	0.51	0.91	< 0.001	0.95	< 0.001	0.49
	AD	0.62	< 0.001	0.76	< 0.001	1.27	0.82	< 0.001	0.9	< 0.001	1.11
Cerebellum		0.96	< 0.001	0.98	< 0.001	0.05	0.95	< 0.001	0.97	< 0.001	0.12
	NL	0.96	< 0.001	0.98	< 0.001	0.03	0.96	< 0.001	0.98	< 0.001	0.11
	MCI	0.95	< 0.001	0.97	< 0.001	0.05	0.94	< 0.001	0.97	< 0.001	0.14
	AD	0.94	< 0.001	0.97	< 0.001	0.17	0.94	< 0.001	0.97	< 0.001	0.12
Cerebral gray matter		0.84	< 0.001	0.9	< 0.001	0.55
	NL	0.92	< 0.001	0.95	< 0.001	0.91
	MCI	0.86	< 0.001	0.92	< 0.001	0.64
	AD	0.59	< 0.001	0.67	< 0.001	0.49
Cerebral white matter		0.74	< 0.001	0.84	< 0.001	0.15
	NL	0.93	< 0.001	0.96	< 0.001	0.37
	MCI	0.79	< 0.001	0.88	< 0.001	0.31
	AD	0.73	< 0.001	0.74	< 0.001	0.22

*Statistically not significant. ICC = intraclass correlation coefficient

With regard to the effect size, the putamen and pallidum showed the largest effect sizes among the brain regions: the effect sizes of the putamen were 2.25 and 1.89 in the left and right hemispheres, respectively, and that of the pallidum were 6.13 and 6.15 in the left and right hemispheres, respectively. When subjects were divided into the normal control, MCI, and AD groups, similar results were demonstrated in each group (Tables 2, 3). For the normal control group, most regions showed significant mean differences between the two methods (p < 0.001). There were no significant differences in the hippocampus and cerebellum (p = 0.65 and 0.46). For the MCI group, the cerebellum (p = 0.14) and amygdala (p = 0.33) did not show significant differences. For the AD group, the amygdala (p = 0.14) and cerebral white matter (p = 0.12) did not show significant mean differences. The Pearson's correlation analysis showed significantly moderate to strong linear correlations in each subgroup, except for the pallidum: 0.64 < r < 0.96 in the normal control group, 0.65 < r < 0.95 in the MCI group, and 0.59 < r < 0.94 in the AD group. The ICC was also good to excellent: 0.78 < ICC < 0.98 in the normal control group, 0.75 < ICC < 0.97 in the MCI group, and 0.67 < ICC < 0.97 in the AD group. Effect sizes were within a wide range, from 0.03 in the cerebellum to 6.98 in the pallidum in the normal control group, and from 0.05 in the cerebellum to 6.72 in the pallidum in the MCI group. Likewise, the effect size was the smallest with 0.12 in the cerebellum and the largest with 7.45 in the pallidum in the patients with AD. The subgroup analysis based on the slice thickness of 3D T1 images was performed in patients with MCI and AD, because 3D T1 images with a slice thickness of 1 mm were obtained from all healthy subjects. The results based on the slice thickness of 3D T1 images were similar (Supplementary Tables 1, 2). There were significant mean differences between the two software in most regions (p < 0.001), except in the right amygdala, cerebellum, and cerebral white matter. The volumes obtained from images with a slice thickness of 1 mm were different from those obtained with a slice thickness of 1.2 mm in each software. For the cortical gray matter, caudate, hippocampus, and cerebral gray matter, images with a slice thickness of 1 mm resulted in a larger volume than those with a slice thickness of 1.2 mm in both NQ and IB. For the thalamus and cerebral white matter, images with a slice thickness of 1.2 mm resulted in a larger volume than those with a slice thickness of 1 mm in both NQ and IB. Inter-method reliability for images with a slice thickness of 1 mm showed better correlations than images with a slice thickness of 1.2 mm in most regions.

DISCUSSION

In this validation study of inter-method reliability, we found good to excellent correlations and reliability between IB and NQ for most brain regions. However, we found that there were significant differences in volume between IB and NQ. The measurements of cortical gray matter volume resulted in a significant mean difference between the two methods with medium effect sizes. Furthermore, the differences observed for some deep gray matter structures, especially the pallidum, were not negligible, which can be a potential obstacle in the clinical application of volumetry. Since the introduction of NQ in 2009 (8), many studies have investigated the clinical use of NQ (67815172526). Several studies have compared NQ with FreeSurfer, MSmetrix, or Neuroreader (715162526). The volumetric results of NQ were comparable to those of FreeSurfer, a reference standard of volumetry (1526). Although the segmentation method of NQ is reportedly similar to FreeSurfer, NQ uses a different atlas, independent code base, and separate methods for intensity normalization and gradient distortion correction to accommodate scanner-specific acquisition-level differences (15). Instead of providing each gyral thickness as in FreeSurfer, NQ gives only the volume of the cortex, white matter, and deep gray matter, thereby achieving a faster processing time. Since the introduction of IB in 2017 (19), there have only been a few clinical studies on IB (2027). In addition, a validation study in terms of reliability has not been conducted yet. In contrast to NQ, IB uses the same registration atlas as that of FreeSurfer, and the segmentation method of IB is almost identical to that of FreeSurfer (20). IB has added several steps into the process in FreeSurfer, such as analysis failure prediction, brain extraction, white matter segmentation, and analysis quality management by applying the deep learning technique to reduce the error rates. In this study, we found that the volume measurements could be different depending on the software used. There were significant mean differences between the two methods in most regions, except the amygdala. Moreover, subcortical gray matter regions showed large effect sizes. The pallidum showed the largest effect size. In this study, the volume of the pallidum in NQ was smaller than that in IB, and the volume of the putamen in NQ was larger than that in IB. Given that the IB uses the FreeSurfer platform, our finding is broadly in line with the previous observation on the difference in volume measurements of the pallidum between NQ and FreeSurfer (15). It has been reported that the difference in the volumes of the pallidum appears to arise from the fundamental problem of similar intensities of the pallidum and white matter in T1-weighted images, which makes it difficult to segment the pallidum from white matter accurately (1528). Besides that, we speculated that the different results between the two software are mainly attributable to the different pipeline, including the registration atlas. The atlas is the basis for segmentation: NQ uses a different probabilistic atlas from that of FreeSurfer (14), and IB uses the same atlas as that of FreeSurfer. The potential effect of the type of atlas on volumetric results has been demonstrated in a study by using different atlases for hippocampus segmentation, which resulted in differences in accuracy depending on the atlas used (29). Our findings suggest that at least some deep gray matter structures such as the putamen and pallidum are still susceptible to the use of different atlases despite overall good reliability. Previous studies demonstrated that the patients with AD showed cortical atrophy of the medial temporal, temporoparietal, posterior cingulate, and precuneus regions (3031); however, no study has focused on volume measurements of the basal ganglia. However, decreases in the volume of subcortical gray matter including the putamen and pallidum have been reported in patients with AD in previous studies (3233). This decrease in the volume of the basal ganglia could be explained by the neuronal loss caused by amyloid deposition and neurofibrillary tangles (33). In addition, because iron deposition (34) and tau pathology (35) might influence the basal ganglia in patients with AD, changes in the volume of the basal ganglia could not be neglected. Accordingly, the software users should be aware of the fact that the volume results of the basal ganglia could be markedly different depending on the software used. Atrophy of the hippocampus has been regarded as an imaging marker of AD (36). The volume of the hippocampus was significantly larger with IB than with NQ in patients with MCI and AD. The correlation between IB and NQ tended to be lower in patients with AD compared to normal controls and patients with MCI. For the hippocampal volume, there was a larger difference between patients with MCI and AD in NQ than in IB. In this study, we used the effect sizes in statistical analysis. Effect sizes are defined as standardized measurements of the size of the mean difference among the study groups (24). Effect size could be obtained with the mean difference between two groups divided by the standard deviation. Therefore, when a result shows the same mean difference, the standard deviation determines the effect size. The paired t test showed significant differences in the amygdala, cerebellum, and cerebral white matter; however, they showed small effect sizes. This meant that the standardized mean difference between the two methods was small, even though they showed statistically significant differences. Furthermore, other deep gray matter structures such as the pallidum, putamen, and thalamus showed large effect sizes. This effect size result implies that the results between the two software were not identical. Thus, we believe that the results for these smaller structures should be carefully interpreted because the interpretation could differ depending on the software used for volume measurements. The main limitation of this study was that we used two different magnetic resonance (MR) sequences for volumetric measurements. We did not consider the repeatability in the same scanner in terms of the different MR sequences that were applied. Actually, the volume of the cerebral white matter in patients with AD was greater than those of patients with MCI in IB. This is difficult to explain; however, it might be related to the scan protocol, where a slice thickness of 1.2 mm was more frequently used in the AD group than in the other 2 groups. Because the cerebral white matter tended to show greater volumes in scans with a slice thickness of 1.2 mm compared to those with a slice thickness of 1 mm in both NQ and IB, different MR scanning parameters might affect the volume measurements in a different way. Second, we did not investigate the reproducibility in a different MR scanner. The results of the volume measurements could be different in a different MR scanner because brain volumetry is usually influenced by several technical factors including MRI field strength and scanner model, as well as post-processing-related issues (23). Further studies are warranted in the future for complete methodological validation. Finally, we did not compare the result from NQ and IB with that of FreeSurfer or manual segmentation, which is the reference standard. Therefore, we could not determine which software could produce results that are similar to those of FreeSurfer or manual segmentation. In conclusion, we compared two commercial software for automated volume measurements of brain regions. Overall, they showed good to excellent correlation. However, they showed significant mean differences and large effect sizes. Therefore, clinicians and researchers should take the type of software used into consideration when interpreting the results of volume measurements obtained using commercial software.

36 in total

1. Mild cognitive impairment: clinical characterization and outcome.

Authors: R C Petersen; G E Smith; S C Waring; R J Ivnik; E G Tangalos; E Kokmen
Journal: Arch Neurol Date: 1999-03

2. Individual subject classification for Alzheimer's disease based on incremental learning using a spatial frequency representation of cortical thickness data.

Authors: Youngsang Cho; Joon-Kyung Seong; Yong Jeong; Sung Yong Shin
Journal: Neuroimage Date: 2011-10-08 Impact factor: 6.556

3. Comparing Two Processing Pipelines to Measure Subcortical and Cortical Volumes in Patients with and without Mild Traumatic Brain Injury.

Authors: Matthew W Reid; Nathan P Hannemann; Gerald E York; John L Ritter; Jonathan A Kini; Jeffrey D Lewis; Paul M Sherman; Carmen S Velez; Ann Marie Drennon; Jacob D Bolzenius; David F Tate
Journal: J Neuroimaging Date: 2017-02-14 Impact factor: 2.486

4. NeuroQuant® and NeuroGage® reveal effects of traumatic brain injury on brain volume.

Authors: David E Ross; John Seabaugh; Leah Cooper; Jan Seabaugh
Journal: Brain Inj Date: 2018-06-28 Impact factor: 2.311

5. Man Versus Machine Part 2: Comparison of Radiologists' Interpretations and NeuroQuant Measures of Brain Asymmetry and Progressive Atrophy in Patients With Traumatic Brain Injury.

Authors: David E Ross; Alfred L Ochs; Megan E DeSmit; Jan M Seabaugh; Michael D Havranek
Journal: J Neuropsychiatry Clin Neurosci Date: 2015 Impact factor: 2.198

6. Regional atrophy of the insular cortex is associated with neuropsychiatric symptoms in Alzheimer's disease patients.

Authors: Yeonsil Moon; Won-Jin Moon; Heejin Kim; Seol-Heui Han
Journal: Eur Neurol Date: 2014-01-25 Impact factor: 1.710

7. Predictive Utility of Marketed Volumetric Software Tools in Subjects at Risk for Alzheimer Disease: Do Regions Outside the Hippocampus Matter?

Authors: T P Tanpitukpongse; M A Mazurowski; J Ikhena; J R Petrella
Journal: AJNR Am J Neuroradiol Date: 2017-01-05 Impact factor: 3.825

8. MRI correlates of neurofibrillary tangle pathology at autopsy: a voxel-based morphometry study.

Authors: J L Whitwell; K A Josephs; M E Murray; K Kantarci; S A Przybelski; S D Weigand; P Vemuri; M L Senjem; J E Parisi; D S Knopman; B F Boeve; R C Petersen; D W Dickson; C R Jack
Journal: Neurology Date: 2008-09-02 Impact factor: 9.910

9. Fully-automated quantification of regional brain volumes for improved detection of focal atrophy in Alzheimer disease.

Authors: J B Brewer; S Magda; C Airriess; M E Smith
Journal: AJNR Am J Neuroradiol Date: 2008-12-26 Impact factor: 3.825

Review 10. FSL.

Authors: Mark Jenkinson; Christian F Beckmann; Timothy E J Behrens; Mark W Woolrich; Stephen M Smith
Journal: Neuroimage Date: 2011-09-16 Impact factor: 6.556

5 in total

Review 1. Updated Review of the Evidence Supporting the Medical and Legal Use of NeuroQuant^® and NeuroGage^® in Patients With Traumatic Brain Injury.

Authors: David E Ross; John Seabaugh; Jan M Seabaugh; Justis Barcelona; Daniel Seabaugh; Katherine Wright; Lee Norwind; Zachary King; Travis J Graham; Joseph Baker; Tanner Lewis
Journal: Front Hum Neurosci Date: 2022-04-08 Impact factor: 3.473

Review 2. [Expert Opinions and Recommendations for the Clinical Use of Quantitative Analysis Software for MRI-Based Brain Volumetry].

Authors: Ji Young Lee; Ji Eun Park; Mi Sun Chung; Se Won Oh; Won-Jin Moon
Journal: Taehan Yongsang Uihakhoe Chi Date: 2021-07-14

Review 3. [Brain MRI-Based Artificial Intelligence Software in Patients with Neurodegenerative Diseases: Current Status].

Authors: So Yeong Jeong; Chong Hyun Suh; Ho Young Park; Hwon Heo; Woo Hyun Shim; Sang Joon Kim
Journal: Taehan Yongsang Uihakhoe Chi Date: 2022-05-25

4. Agreement and Reliability between Clinically Available Software Programs in Measuring Volumes and Normative Percentiles of Segmented Brain Regions.

Authors: Huijin Song; Seun Ah Lee; Sang Won Jo; Suk-Ki Chang; Yunji Lim; Yeong Seo Yoo; Jae Ho Kim; Seung Hong Choi; Chul-Ho Sohn
Journal: Korean J Radiol Date: 2022-10 Impact factor: 7.109

Review 5. Technical and clinical validation of commercial automated volumetric MRI tools for dementia diagnosis-a systematic review.

Authors: Hugh G Pemberton; Lara A M Zaki; Olivia Goodkin; Ravi K Das; Rebecca M E Steketee; Frederik Barkhof; Meike W Vernooij
Journal: Neuroradiology Date: 2021-09-03 Impact factor: 2.804