Literature DB >> 32976486

Sociodemographic data and APOE-ε4 augmentation for MRI-based detection of amnestic mild cognitive impairment using deep learning systems.

Obioma Pelka^1,2, Christoph M Friedrich^1,3, Felix Nensa², Christoph Mönninghoff⁴, Louise Bloch^1,3, Karl-Heinz Jöckel³, Sara Schramm³, Sarah Sanchez Hoffmann⁵, Angela Winkler⁵, Christian Weimar⁵, Martha Jokisch⁵.

Abstract

Detection and diagnosis of early and subclinical stages of Alzheimer's Disease (AD) play an essential role in the implementation of intervention and prevention strategies. Neuroimaging techniques predominantly provide insight into anatomic structure changes associated with AD. Deep learning methods have been extensively applied towards creating and evaluating models capable of differentiating between cognitively unimpaired, patients with Mild Cognitive Impairment (MCI) and AD dementia. Several published approaches apply information fusion techniques, providing ways of combining several input sources in the medical domain, which contributes to knowledge of broader and enriched quality. The aim of this paper is to fuse sociodemographic data such as age, marital status, education and gender, and genetic data (presence of an apolipoprotein E (APOE)-ε4 allele) with Magnetic Resonance Imaging (MRI) scans. This enables enriched multi-modal features, that adequately represent the MRI scan visually and is adopted for creating and modeling classification systems capable of detecting amnestic MCI (aMCI). To fully utilize the potential of deep convolutional neural networks, two extra color layers denoting contrast intensified and blurred image adaptations are virtually augmented to each MRI scan, completing the Red-Green-Blue (RGB) color channels. Deep convolutional activation features (DeCAF) are extracted from the average pooling layer of the deep learning system Inception_v3. These features from the fused MRI scans are used as visual representation for the Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) classification model. The proposed approach is evaluated on a sub-study containing 120 participants (aMCI = 61 and cognitively unimpaired = 59) of the Heinz Nixdorf Recall (HNR) Study with a baseline model accuracy of 76%. Further evaluation was conducted on the ADNI Phase 1 dataset with 624 participants (aMCI = 397 and cognitively unimpaired = 227) with a baseline model accuracy of 66.27%. Experimental results show that the proposed approach achieves 90% accuracy and 0.90 F1-Score at classification of aMCI vs. cognitively unimpaired participants on the HNR Study dataset, and 77% accuracy and 0.83 F1-Score on the ADNI dataset.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2020 PMID： 32976486 PMCID： PMC7518632 DOI： 10.1371/journal.pone.0236868

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Alzheimer’s disease (AD) is a progressive neurodegenerative disease that causes behavioral changes and deterioration of memory and other cognitive domains [1]. Because there is no causal treatment for AD dementia, identifying early stages of the disease and preclinical markers will help to implement intervention and prevention strategies [2]. Mild cognitive impairment (MCI) is a clinical entity that describes the stage between cognitive changes of normal aging and dementia [3, 4]. The amnestic MCI (aMCI) subtype has a high probability of progressing to AD dementia [2]. Robust and reliable systems for early aMCI classification that aid doctors to identify high risk individuals are needed. Individuals with aMCI have a higher risk to develop AD dementia, but some individuals also revert to normal or stay stable without reaching the AD dementia stage [5]. Thus, it would be beneficial to implement an additional classification system for progression that does not need any invasive biomarker assessments (like beta-amyloid or tau in the cerebrospinal fluid). Magnetic resonance imaging (MRI) techniques offer a broad visual representation, that can be adopted for this purpose. For an effective classification of images, the selection and combination of adequate features, and labeled training data is crucial. The more knowledge present, the more enriched image representations are available. The selection and combination of features for an adequate representation of the images is essential for creating effective classification systems. Several research explorations using multi-modal representations and aiming to sufficiently represent biomedical and medical images achieve higher prediction accuracies. In Codella et al. [6], automated medical image modality recognition was achieved by fusing visual and text information. Valavanis et al. [7] and Pelka et al. [8] adopted a combination of visual representation with text information extracted from captions to classify and predict the image modality at the ImageCLEF2016 Medical Task [9]. Deep learning techniques [10] have improved prediction accuracies in object detection [11], speech recognition [12] and in medical imaging [13, 14]. These positive results are attributable to large amounts of natural scene data sets available, as they provide adequate feature representation for transfer learning [15]. However, a major concern in the medical domain is the insufficient number of large datasets such as ChestX-Ray8 database [16] and the OpenfMRI project [17]. This is due to the fact that detailed annotation of medical images is time-consuming, prone to errors and restricted by data protection rules. Therefore, image classification tasks in the medical domain are challenging, regarding sufficient and efficient feature selection. On the other hand, there are several input sources in the medical domain. These can be fused together, such as combining MRI with patient clinical information or several imaging modalities, as well as radiology reports with images, to obtain better medical image understanding. There is no restriction to the usage of the fused data, as it can be applied to several challenging medical tasks.

Related work

Successful research work regarding the prediction of the conversion from mild cognitive impairment to Alzheimer’s disease have been reported using multimodal features from several input sources. Spasov et al. [18] proposed a MCI to AD conversion and AD vs. healthy controls detection using deep learning techniques to combine structural MRIs with demographic, neuropsychological and APOE-ε4. The proposed model is based on dual learning and an ad hoc layer for 3D separable convolutions. Generative methods that detect occuring patterns were applied by Yang et al. [19] to characterize Alzheimer’s Disease using image and categorical genetic features, based on supervised topic modeling. In Lee et al. [20], a multimodal recurrent neural network using demographic information, longitudinal cognitive performance and cross-sectional neuroimaging biomarker was adopted for MCI to AD conversion prediction. The experimented objective was a sequential data classification and several Gated Recurrent Unit (GRU) for each data modality were trained and adopted for MCI prediction. Several prior works Zhang et al. [21], Liu et al. [22], Samper-González et al. [23] and Huang et al. [24] apply machine learning and neuroimaging to distinguish between cognitively unimpaired controls and patients with MCI and AD. A traditional way is to first extract features like volume, cortical thickness or gray matter volume from neuroimaging and then perform feature selection, as well as dimension and noise reduction. Finally, a feature-based classification is then conducted. This approach has been presented in multiple research work including Bloch et al. [25] and Sørensen et al. [26]. Choosing the best feature combination for several medical tasks can be tedious, time-consuming and challenging. As automatic feature-extraction from 3D-images is often combined with high computational effort, Liu et al. [22] and Huang et al. [24] use deep learning methods to extract information directly from the MRI scans, which improves the overall classification results. Multimodal approaches have shown to obtain encouraging results in other domains as well, such as biomedical image analysis. These attempts combine image and text representation into one vector, with which the image classifiers are trained. Adopting this method, the connections in low-level features can be exploited. For the ImageCLEF 2015 Medical Tasks [27], late fusion methods were applied in Pelka et al. [28] to fuse decision values from a multiclass linear kernel Support Vector Machine (SVM) [29] and Random Forest [30] classifiers to predict the modality of subfigures extracted from the PubMed Central (PMC) Open Access Subset [31]. In [32], automatic generated semantic information from Unified Modeling Language System (UMLS) [33] concepts were combined with Bag-of-Keypoints representations [34] computed with Dense (dSIFT) [35] features and applyed for predicting image modality, body region examined, orientation of the image and biological system investigated. This approach was further explored in Pelka et al. [36] by using Deep convolutional activation features (DeCAF) [37] to obtain an optimized medical image body region classification. Inspired by this and earlier work on body region detection in Pelka et al. [38], we propose an approach that brands encoded sociodemographic and genetic data onto MRI 2D slices to obtain an enhanced image representation, to reduce computational load. Due to the limited number of annotated medical images available, we propose to learn augmented deep convolutional activation features in a recurrent neural network framework for an optimized aMCI classification. These features are extracted with the Inception_v3 [39] deep learning model, thereby exploring the potential of Transfer Learning [15] from pre-trained ImageNet models. Promising results using deep convolutional activation features (DeCAF) have been presented by various work, including Gong et al. [40], Yosinski et al. [41], Sinha et al. [42] and Razavian et al. [43]. Our contributions in this paper are: A novel fusion method for branding MRI scans with patient sociodemographic and genetic data. Enhancement of MRI scans by augmenting two extra color layers. Transfer Learning is utilized for creating deep convolutional activation features. Long short-term memory (LSTM) based Recurrent Neural Networks (RNN) are utilized for modeling approaches Evaluation on sub sample of the Heinz Nixdorf Recall (HNR, Risk Factors, Evaluation of Coronary Calcium and Lifestyle) Study with 1.5T-weighted MRI scans. Further evaluation was conducted on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Phase 1 dataset with 1.5T-weighted MRI scans.

Materials and methods

Study population

The proposed data fusion techniques were evaluated using a sub sample of 61 participants with aMCI and 59 cognitively unimpaired controls derived from the Heinz Nixdorf Recall (HNR) Study [44] and further evaluated on the ADNI Phase 1 dataset, an open-accessible state-of-the-art ADNI Phase 1 dataset distributed by the Alzheimer’s Disease Neuroimaging Initiative (https://adni.loni.usc.edu) [45]. The HNR Study is a population-based prospective cohort study with subjects randomly selected from mandatory lists of residence. Its major aim is to evaluate the predictive value of coronary artery calcification using electron-beam computed tomography for myocardial infarction and cardiac death in comparison to other cardiovascular risk factors. Details of the study methods have been previously described in detail [44]. Ethics Statement for the use of the HNR study population from IRB of University Hospital Essen, Essen, Germany dated 2009-10-23 and 2012-06-06 to Prof. Dr. C. Weimar, registration number: 06-3116 is available and was approved by the university review board. All participants provided written informed consent. Briefly, 4814 participants 45 to 75 years of age were enrolled between 2000 and 2003 in the Ruhr area in Germany. Five years after baseline (2005-2008, n = 4,145), the first follow-up of the HNR Study was conducted and included a short cognitive assessment (for details see [46]). This cognitive assessment was evaluated and validated in a sub-study [46]. The longitudinal sub-study comprises a more comprehensive neuropsychological assessment (see below), a neurological exam assessed by a certified neurologist and MRI volumetric data [1, 46]. Participants with dementia (n = 7), severe depression (ADAS depression subscale score >4, n = 13), Parkinson disease (n = 5), mental retardation (n = 2), severe alcohol consumption (for women: >20 g/day; for men: >40 g/day, n = 2), known brain cancer (n = 1), severe problems with the German language (foreign persons, n = 9) and severe sensory impairment (n = 2) leading to invalid cognitive testing were excluded from the sub-study. ADNI is a consortium of several medical centers and universities in the United States and Canada, and was established to create biomarker procedures and standardized imaging techniques in subjects with MCI, subjects with AD, and normal subjects [45]. Led by Principal Investigator Michael W. Weiner, MD., ADNI was launched in 2003 as a public-private partnership. One of the major aims of this initiative was to develop an accessible data repository that contains serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment. Using this repository, modeling approaches capable off measuring the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD) can implemented and evaluated. For up-to-date information, see http://www.adni-info.org and details about the ethics statement of the ADNI study population can be found at https://adni.loni.usc.edu. All enrolled subjects in the ADNI Phase 1 dataset were between 55 and 90 years of age, could either speak Spanish or English and were classified as normal controls, subjects with MCI or subjects with mild AD [45]. Participants with no memory complaints were classified as normal subjects. The Clinical Dementia Rating (CDR) for normal, MCI, and AD subjects were 0, 0.5 and > 0.5, respectively [45]. All classified subjects had a study partner with over 10 hours contact per week, adequate visual and auditory perceptiveness, at least 6 years of education or similar work biography, and general good health, a geriatric depression score of >= 4 [45]. Female participants had to be either 2 years past child bearing potential or sterile. Further information on subject selection is detailed in Petersen et al. [45].

Evaluation of cognitive status and aMCI diagnosis

Detailed information on the neuropsychological assessment to identify participants with MCI of the HNR Study has been described in Dlugal et al. [47]. Briefly, the standardized neuropsychological examination was conducted by a neuropsychologist using the following tests: The Alzheimer’s Disease Assessment Scale (ADAS) [48] Number Connection Test from the NAI [49] Verbal Fluency Test [50] (two subtests with a formal lexical category and two subtests with a semantic category) Instrumental Activities of Daily Living scale to assess disability [49] Using these tests, the following areas of neuropsychological functioning were covered: verbal memory, orientation/praxis, information processing speed, executive functions and verbal abilities. A cognitive domain was rated as impaired if the performance was more than 1 standard deviation (SD) below the age adjusted mean. Because the MCI due to AD criteria by Albert et al. [51] were not yet published when the sub-study started, the Winblad et al. [52] MCI criteria were used to diagnose aMCI. The 61 aMCI participants had to meet all of the following aMCI criteria: cognitive impairment in the verbal memory domain subjective cognitive decline normal functional abilities and daily activities no dementia diagnosis The final decision about aMCI diagnosis was ultimately made by consensus agreement between the examining neurologist and neuropsychologist taking into account the medical history related to cognitive functioning, duration of such symptoms, the history of other medical illnesses and current treatment for each participant. The diagnosis aMCI is equivalent to the diagnosis of MCI due to AD without biomarker information representing the core clinical criteria as proposed by Albert et al. [51]. Participants who did not show cognitive impairment in any domain were considered as cognitively unimpaired and categorized as “Controls”. For the ADNI Phase 1 Population, the Assessment is detailed in Petersen et al. [45].

Covariates

To fuse several input sources in the medical domain, MRI combined with the following sociodemographic characteristics were used: age, gender, education and marital status. Education was classified by the International Standard Classification of education (ISCED) as total years of formal education, combing school and vocational training [53]. For the HNR Study, the continuous education variable was grouped into three categories, with the highest category of 14 and more years of education and the lowest category with 10 and fewer years. Participants were asked about their marital status using the following categories (married, widowed, divorced and single). For the ADNI Phase 1 dataset, education was based on the Logical Memory II subscale of the Wechsler Memory Scale-Revised [54] and on subject classification. For normal subjects, the cutoff scores were >= 9 for 16 years of education, >= 5 for 8 to 15 years of education and >= 3 for 0 to 7 years. For subjects with MCI and subjects with AD, the cutoff scores were <= 8 for 16 years of education, <= 4 for 8 to 15 years of education and <= 2 for 0 to 7 years. Furthermore, genetic information was adopted for the proposed fusion approach prior to training the classification model. The apolipoprotein E (APOE)-ε4 allele is the main genetic risk factor for sporadic AD [55]. For the HNR Study, Cardio-MetaboChip BeadArrays were used for genotyping of two single-nucleotide polymorphisms (rs7412 and rs429358) to discriminate between the APOE alleles ε2, ε3, and ε4. Participants defined as APOE-ε4 positive had at least one allele 4 (2/4, 3/4, 4/4). All other participants were defined as APOE-ε4 negative. Information regarding APOE-ε on the ADNI Phase 1 dataset is detailed in Petersen et al. [45].

Dataset

Table 1 shows the distribution of the sociodemographic data variables age, gender, education, marital status and genetic data variable APOE-ε4 genotype (defined as “Participant Data”) for aMCI and cognitively unimpaired controls on the applied sub sample. All participants were scanned with a single 1.5T MR scanner (Magneton Avanto, Siemens Healthcare, Erlangen) with 60cm bore diameter, 200T/m/s slew rate, 160cm length and 40/40/45 mT/m gradient strength [1].

Table 1

HNR Study explorative analysis.

Participant Data		aMCI	Controls	Sum
age_yr	46–55	1 (50.00%)	1 (50.00%)	2 (1.67%)
	56–65	15 (60.00%)	10 (40.00%)	25 (20.83%)
	66–75	31 (48.44%)	33 (51.56%)	64 (53.33%)
	76–85	14 (48.28%)	15 (51.72%)	29 (21.17%)
gender	Female	24 (50.00%)	24 (50.00%)	48 (40.00%)
gender	Male	37 (51.39%)	35 (48.61%)	72 (60.00%)
education_yr	<= 10	15 (55.56%)	12 (44.44%)	27 (22.50%)
	11–13	37 (53.62%)	32 (46.38%)	69 (57.50%)
	>= 14	9 (37.50%)	15 (62.50%)	24 (20.00%)
marital status	Married	49 (51.04%)	47 (48.96%)	96 (80.00%)
	Widowed	8 (57.14%)	7 (42.86%)	14 (11.67%)
	Divorced	4 (57.14%)	3 (42.86%)	7 (5.83%)
	Single	0 (0%)	2 (100%)	2 (1.67%)
APOE-ε4	Positive	21 (63.64%)	12 (36.36%)	33 (27.50%)
APOE-ε4	Negative	40 (45.98%)	47 (54.02%)	87 (72.50%)
Sum		61 (50.83%)	59 (49.17%)	120 (100.00%)

aMCI = Amnestic Mild cognitive impairment

Controls = Cognitively unimpaired

HNR Study explorative analysis.

Summary statistics computed on the sub-study of the HNR Study adopted for the proposed fusion approach. Participant Data denotes the sociodemographic data (age, marital status, education, gender) and genetic data (APOE-ε4). The total number of particpants is n = 120. aMCI = Amnestic Mild cognitive impairment Controls = Cognitively unimpaired To additionally examine the proposed approach, the open-accessible state-of-the-art ADNI Phase 1 dataset was used, which is distributed by the Alzheimer’s Disease Neuroimaging Initiative (https://adni.loni.usc.edu) [45]. This initiative is a consortium of several medical centers and universities in the United States and Canada, and was established to create biomarker procedures and standardized imaging techniques in subjects with MCI, subjects with AD, and normal subjects [45]. Led by Principal Investigator Michael W. Weiner, MD., ADNI was launched in 2003 as a public-private partnership. One of the major aims of this initiative was to develop an accessible data repository that contains serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment. Using this repository, modeling approaches capable off measuring the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD) can be implemented and evaluated. For up-to-date information, see http://www.adni-info.org. Table 2 shows the distribution of the participant data variables used for this evaluation from the ADNI Phase 1 dataset.

Table 2

ADNI Phase 1 dataset explorative analysis.

Summary statistics computed on ADNI Phase 1 dataset adopted for the proposed fusion approach. Participant Data denotes the sociodemographic data (age, marital status, education, gender) and genetic data (APOE-ε4). The total number of particpants is n = 624.

Participant Data		aMCI	Controls	Sum
age_yr	46–55	3 (100.00%)	0 (00.00%)	2 (00.48%)
	56–65	52 (89.66%)	6 (10.34%)	58 (09.29%)
	66–75	158 (58.52%)	112 (41.48%)	270 (43.27%)
	76–90	184 (62.80%)	109 (37.20%)	293 (46.96%)
gender	Female	141 (56.40%)	109 (43.60%)	250 (40.01%)
gender	Male	256 (68.45%)	118 (31.55%)	374 (59.94%)
education_yr	<= 10	20 (66.67%)	10 (33.33%)	30 (04.80%)
	11–13	79 (72.48%)	30 (27.52%)	109 (17.46%)
	>= 14	298 (61.44%)	187 (38.56%)	485 (77.72%)
marital status	Married	318 (67.23%)	155 (32.77%)	473 (75.80%)
	Widowed	48 (55.17%)	39 (44.83%)	87 (13.94%)
	Divorced	25 (59.52%)	17 (40.48%)	42 (06.73%)
	Single	6 (27.27%)	16 (72.72%)	22 (03.53%)
APOE-ε4	Positive	185 (52.56%)	167 (47.44%)	352 (56.41%)
APOE-ε4	Negative	212 (77.94%)	60 (22.06%)	272 (43.59%)
Sum		397 (63.62%)	227 (36.38%)	624 (100.00%)

aMCI = Amnestic Mild cognitive impairment

Controls = Cognitively unimpaired

ADNI Phase 1 dataset explorative analysis.

Data fusion

The presented work proposes an approach to fuse sociodemographic data and APOE-ε4 with MRI scans, enabling enriched multi-modal image representation. This is fundamental for image classification and retrieval purposes, and is not limited to computer-aided decision systems for clinical diagnoses. Positive results have been presented for 2D Images in Pelka et al. [38], where automatic generated keywords were incorporated onto x-ray and biomedical images. Several branding options such numerical, grayscale and ordinal values were experimented in Pelka et al., and the binary branding option proved to obtain best results [38]. This approach is further investigated in this work by encoding sociodemographic data and APOE-ε4 onto 2D slices of an MRI scan for a specific clinical question, and is denoted as “Branded“. The limitation of the usage of 2D slices instead of the 3D MRI scans will be experimented in further work, as positive results have been reported in the overview survey of deep learning techniques for MRI [56]. Fusing information from multiple input domain, aims at increasing consolidated representations of the participants. For each of the variables that are listed in sociodemographic data and APOE-ε4, possible values are grouped. Hence, 2 to 4 groups were obtained per variable. To incorporate these groups onto the MRI scans, generated markers displayed in Fig 1 are created as a variable group.

Fig 1

Marker for branding.

Marker for branding.

Generated markers applied for fusing sociodemographic data and APOE-ε4 data with 2D slices of MRI scans. Each marker denotes the different values for clinical data variables. Participant Data denote the sociodemographic data variables (age, marital status, education, gender) and genetic data variable (APOE-ε4). The markers were randomly distributed amongst values per variable. Finally, each 2D slice (image size [224x224]) is branded by markers denoting the participant data values, which are listed in Fig 1. Each participant’s information is fused as a [10x20] pixel marker at the pixel position (0, 10) to (10, 150). A space [10x5] is kept between each marker position, as shown in Fig 2 and the complete implementation was done in python and will be available after acceptance. For the HNR Study dataset, all DICOM scans were converted to png-files and resized to [224x224], prior to branding and image enhancement. Similarly, for the ADNI Phase 1 dataset, the nifti scans were converted to png-files. The png-files from both datasets are 8-bit.

Fig 2

Branding approach.

Proposed branding approach of fusing sociodemographic data (age, education, marital status and gender) and genetic data (APOE-ε4) with 2D slices of an MRI scan. The marker positions and sizes of each clinical data variable branded are displayed. The 2D slice was randomly selected from an MRI scan of the sub-study from the HNR Study.

Branding approach.

Image enhancement

For image recognition tasks, convolutional neural networks trained on large datasets produce favorable results. Considering the number of images in the applied data set, the adaptation of Transfer Learning with pre-trained neural network Inception-v3 [39] was chosen. This pre-trained deep convolutional neural network models were designed to extract among other features, color information in three separate channels (RGB) from the images [57, 58]. However, the MRI scan are gray-scale and have a single color channel with values 0,…,255. To fully utilize the capabilities of deep convolutional neural networks, two extra color layers are augmented to each MRI, completing the Red-Green-Blue (RGB) channels. Color input enhancement have aided to substantially improve prediction accuracy from 86% to 92% for the detection of malignancy in digital mammography images [59] and approximately 3% for structuring 2D x-rays according to imaging technique modality, anatomical region and biological systems examined, which is applied for medical image retrieval [60]. The first extra layer was obtained by using the image processing technique: Contrast Limited Adaptive Historization Equation (CLAHE) [61]. CLAHE is a contrast enhancement method, modified from the Adaptive Histogram Equation (AHE). It is designed to be broadly applicable and has demonstrated effectiveness, especially for medical images [62]. Fig 3 displays the original 2D slice of a MRI scan with contrast enhanced image adaption after CLAHE was performed. The CLAHE output image was obtained using the following parameters:

Fig 3

CLAHE image preprocessing.

Desired histogram shape: Uniform Distribution parameter: 0.4 Number of histogram bins: 256 Contrast enhancement limit: 0.01 Range of output data: Full Number of tiles: [8, 8]

CLAHE image preprocessing.

2D slice from a MRI scan before and after applying the Contrast Limited Adaptive Histogram Equation (CLAHE) preprocessing method. The 2D slice was randomly selected from an MRI scan of the sub-study from the HNR Study. The second layer was generated by applying the Non-Local Means (NL-MEANS) preprocessing method. This is a digital image denoising method, based on a non-local averaging of all present pixels in an image [63]. The effect of applying NL-MEANS to a randomly chosen 2D slice is shown in Fig 4.

Fig 4

NL-MEANS image preprocessing.

2D slice from a MRI scan before and after applying the Non-Local Means (NL-MEANS) preprocessing method. The 2D slice was randomly selected from an MRI scan of the sub-study from the HNR Study.

NL-MEANS image preprocessing.

2D slice from a MRI scan before and after applying the Non-Local Means (NL-MEANS) preprocessing method. The 2D slice was randomly selected from an MRI scan of the sub-study from the HNR Study. The NL-MEANS output images were obtained using the following parameters: Filter strength: 0.05 Kernel ratio: 4 Window ratio: 4

Visual representation

For visual representation, deep convolutional activation features (DeCAF) [37] were chosen. DeCAF features are extracted from the average pooling layer of the deep learning system Inception-v3 [39], which is pre-trained on the ImageNet [64]. For comparison purposes, additional DeCAF features were extracted using a medical context pre-trained DenseNet-121 model [65] on the ChestX-Ray8 database [16]. The activation features were extracted using the neural network API Keras 2.2.0 [66]. The default values for the Inception-v3 base model was used. For the 3D MRI scans, the DeCAF visual representations were extracted 2D slice-wise with a vector size of 2048. Every second 2D slice between [8 − 165] was considered. Hence, each 3D MRI scan was represented with 80 2D slices and has a vector size of 163, 840 deep convolutional activation features.

Classification

As aMCI vs control classification model, LSTM based RNNs was adopted. RNNs are mostly used for modeling long-range dependencies, where future events are predicted with past events [67] and has proven to be successful for several research topics such as medical question and answering [68]. The effective characteristic of LSTM is the ability to accumulate state information, as information of every new input is accumulated onto previous input [69, 70]. As each 2D slice of a MRI scan contains dependencies between predecessor and successor slices, we choose the LSTM architecture for modeling the classifier. The applied LSTM network contains the following keras layers: LSTM Output shape: (None, 2048) Input shape: (80, 2048) Dropout = 0.5 Dense Output shape: (None, 512) Activation: Sigmoid [1/(1 + exp(−x))] Dropout Output shape: (None, 512) Rate: 0.5 Dense Output shape: (None, 2) Activation: Softmax For the approach evaluation, three (3) different inputs were fed into the LSTM network: Original: DeCAF representations extracted from the original MRI scans. Branded: DeCAF representations extracted with the branded and enhanced MRI scans. Wide and Deep [: Dot product of features extracted using the original MRI and clinical data. The HNR Study dataset consisting of 120 participants was split into a training and test set, containing 99 participants (aMCI = 51 and controls = 48) and 21 participants (aMCI = 10 and controls = 11), respectively. Similarly, the ADNI Phase 1 dataset with 624 participants was split into a training and test with 561 (aMCI = 357 and controls = 204) and 63 (aMCI = 40 and controls = 23), respectively. The test set was independent and not used for training or parameter optimization. The complete workflow describing the proposed method is displayed in Fig 5.

Fig 5

Complete proposed approach.

Complete workflow of the proposed approach. Sociodemographic data and APOE-ε4 are fused with MRI scans 2D slice-wise and further enhanced by augmenting contrast intensified and blurred image adaptions as two extra layer completing the RGB channels. DeCAF representations are extracted and used as visual representations for training the aMCI vs control classification model.

Complete proposed approach.

Results

For the HNR Study datatset, a k = 5-fold cross validation [72] was achieved by splitting the training set with 99 participants into 5 different partitions. From this, one partition is used as the validation set (20%) and the remaining 4 partitions (80%) are used for training. This has been done for each of the five partitions. For comparison purpose, evaluation metrics using both DeCAF visual representation are listed. Tables 3 and 4 show the classification rates obtained on the k-fold cross validation sets. The evaluation rates achieved on the independent test set with n = 21 participants are listed in Tables 5 and 6. The random split was done class-wise, to confirm the occurrence of both classes in the k-fold cross validation sets.

Table 3

Cross-validation prediction on HNR Study.

Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the k = 5-fold cross validation sets from the training set with n = 99 participants of the sub-study from the HNR Study. The values are the average and standard deviation rates across all k = 5 cross validation sets. Visual representation were extracted using the ImageNet database [64].

	Original	Branded	Wide and Deep
Specificity	0.64 (± 0.26)	0.80 (± 0.18)	0.64 (± 0.21)
Sensitivity	0.70 (± 0.12)	0.74 (± 0.11)	0.76 (± 0.18))
F₁-Score	0.69 (± 0.09)	0.80 (± 0.14)	0.71 (± 0.08))
Accuracy	0.70 (± 0.16)	0.77 (± 0.07)	0.70 (± 0.07)

Table 4

Cross-validation prediction on HNR Study.

Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the k = 5-fold cross validation sets from the training set with n = 99 participants of the sub-study from the HNR Study. The values are the average and standard deviation rates across all k = 5-fold cross validation sets. Visual representation were extracted using the ChestX-Ray8 database [16].

	Original	Branded	Wide and Deep
Specificity	0.68 (± 0.13)	0.76 (± 0.11)	0.74 (± 0.27)
Sensitivity	0.70 (± 0.07)	0.72 (± 0.08)	0.68 (± 0.23)
F₁-Score	0.70 (± 0.03)	0.74 (± 0.06)	0.70 (± 0.16)
Accuracy	0.69 (± 0.04)	0.74 (± 0.07)	0.71 (± 0.13)

Table 5

Prediction accuracy on HNR Study test set.

	Original	Branded	Wide and Deep
Specificity	0.82	0.91	0.64
Sensitivity	0.70	0.90	0.90
F₁-Score	0.74	0.90	0.78
Accuracy	0.76	0.90	0.76

Table 6

Prediction accuracy on HNR Study test set.

	Original	Branded	Wide and Deep
Specificity	0.73	0.91	0.64
Sensitivity	0.90	0.80	1
F₁-Score	0.82	0.84	0.83
Accuracy	0.81	0.86	0.81

Cross-validation prediction on HNR Study.

Prediction accuracy on HNR Study test set.

Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the independent test set with n = 21 participants of the sub-study from the HNR Study. Visual representation were extracted using the ImageNet database [64]. Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the independent test set with n = 21 participants of the sub-study from the HNR Study. Visual representation were extracted using the ChestX-Ray8 database [16]. For the ADNI datatset, a k = 5-fold cross validation [72] was achieved by splitting the training set with 561 participants into 5 different partitions. From this, one partition is used as the validation set (10%) and the remaining 4 partitions (90%) are used for training. This has been done for each of the five partitions. Tables 7 and 8 show the classification rates obtained on the k-fold cross validation sets. The evaluation rates achieved on the independent test set with n = 63 participants are listed in Tables 9 and 10. The random split was done class-wise, to confirm the occurrence of both classes in the k-fold cross validation sets.

Table 7

Cross-validation prediction on ADNI Phase 1 dataset.

Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the k = 5-fold cross validation sets from the training set with n = 561 participants of the ADNI Phase 1 dataset. The values are the average and standard deviation rates across all k = 5-fold cross validation sets. Visual representation were extracted using the ImageNet database [64].

	Original	Branded	Wide and Deep
Specificity	0.44 (± 0.08)	0.54 (± 0.11)	0.47 (± 0.09)
Sensitivity	0.82 (± 0.06)	0.83 (± 0.12)	0.81 (± 0.03)
F₁-Score	0.79 (± 0.04)	0.81 (± 0.07)	0.80 (± 0.03))
Accuracy	0.69 (± 0.06)	0.74 (± 0.09)	0.71 (± 0.04)

Table 8

Cross-validation prediction on ADNI Phase 1 dataset.

Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the k = 5-fold cross validation sets from the training set with n = 561 participants of the ADNI Phase 1 dataset. The values are the average and standard deviation rates across all k = 5-fold cross validation sets. Visual representation were extracted using the ChestX-Ray8 database [16].

	Original	Branded	Wide and Deep
Specificity	0.41 (± 0.08)	0.57 (± 0.09)	0.39 (± 0.05)
Sensitivity	0.67 (± 0.04)	0.71 (± 0.09)	0.70 (± 0.03)
F₁-Score	0.67 (± 0.02)	0.72 (± 0.06)	0.68 (± 0.00))
Accuracy	0.58 (± 0.03)	0.64 (± 0.09)	0.59 (± 0.01)

Table 9

Prediction accuracy on ADNI Phase 1 test set.

	Original	Branded	Wide and Deep
Specificity	0.48	0.65	0.52
Sensitivity	0.85	0.85	0.77
F₁-Score	0.78	0.83	0.79
Accuracy	0.66	0.77	0.71

Table 10

Prediction accuracy on ADNI Phase 1 test set.

	Original	Branded	Wide and Deep
Specificity	0.43	0.57	0.48
Sensitivity	0.59	0.77	0.70
F₁-Score	0.61	0.72	0.69
Accuracy	0.53	0.76	0.61

Cross-validation prediction on ADNI Phase 1 dataset.

Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the k = 5-fold cross validation sets from the training set with n = 561 participants of the ADNI Phase 1 dataset. The values are the average and standard deviation rates across all k = 5-fold cross validation sets. Visual representation were extracted using the ImageNet database [64]. Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the k = 5-fold cross validation sets from the training set with n = 561 participants of the ADNI Phase 1 dataset. The values are the average and standard deviation rates across all k = 5-fold cross validation sets. Visual representation were extracted using the ChestX-Ray8 database [16].

Prediction accuracy on ADNI Phase 1 test set.

Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the independent test set with n = 63 participants of the ADNI Phase 1 dataset. Visual representation were extracted using the ImageNet database [64]. Prediction performance of the LSTM classification model using various image input types. The highlighted values are the best per evaluation metric. Evaluation was calculated on the independent test set with n = 63 participants of the ADNI Phase 1 dataset. Visual representation were extracted using the ChestX-Ray8 database [16]. Fig 6 displays the Gradient-weighted Class Activation Mapping (Grad-CAM) [73] of the adopted LSTM model. The Grad-CAM shows visual explanations of the decisions made by the LSTM models, highlighting the important regions of the MRI scans used to distinguish between aMCI and controls. An ablation study was conducted prior to branding by omitting each clinical data variable. This ablation study gives insight regarding the information gain by applying sociodemographic data and APOE-ε4, which is listed in Tables 11 and 12 for the HNR Study dataset and in Tables 13 and 14 for the ADNI Phase 1 dataset.

Fig 6

Classification activation mapping.

Gradient-weighted Class Activation Mapping (Grad-CAM) image, highlighting important image regions used for distinguishing between aMCI and controls by the classification models. The 2D slice was randomly chosen from the sub-study of the HNR Study.

Table 11

Ablation study on HNR Study test set.

	Specificity	Sensitivity	F₁-Score	Accuracy
All data variables	0.91	0.90	0.90	0.90
Without age	0.82	0.90	0.84	0.86
Without APOE-ε4	0.91	0.70	0.78	0.81
Without gender	0.91	0.80	0.86	0.86
Without education	0.82	0.80	0.80	0.81
Without marital status	0.91	0.80	0.84	0.86

Table 12

Ablation study on HNR Study test set.

	Specificity	Sensitivity	F₁-Score	Accuracy
All data variables	0.91	0.80	0.84	0.86
Without age	0.74	0.80	0.78	0.76
Without APOE-ε4	0.82	0.80	0.86	0.86
Without gender	0.73	0.70	0.82	0.81
Without education	0.74	0.70	0.78	0.76
Without marital status	0.77	0.80	0.78	0.76

Table 13

Ablation study on ADNI Phase 1 test set.

	Specificity	Sensitivity	F₁-Score	Accuracy
All data variables	0.65	0.85	0.83	0.77
Without age	0.57	0.72	0.73	0.63
Without APOE-ε4	0.39	0.80	0.74	0.65
Without gender	0.61	0.72	0.74	0.68
Without education	0.52	0.72	0.73	0.65
Without marital status	0.57	0.75	0.75	0.68

Table 14

Ablation study on ADNI Phase 1 test set.

	Specificity	Sensitivity	F₁-Score	Accuracy
All data variables	0.57	0.77	0.72	0.76
Without age	0.50	0.66	0.71	0.60
Without APOE-ε4	0.58	0.69	0.68	0.62
Without gender	0.52	0.70	0.66	0.61
Without education	0.49	0.71	0.70	0.64
Without marital status	0.54	0.72	0.72	0.65

Classification activation mapping.

Ablation study on HNR Study test set.

Prediction performance of the LSTM classification model on the ablation study. Each sociodemographic data variable, as well as the genetic data APOE-ε4 was subsequently omitted, prior to the MRI branding. Evaluation was calculated on the independent test set with n = 21 participants of the sub-study from the HNR Study. Visual representation were extracted using the ImageNet database [64]. Prediction performance of the LSTM classification model on the ablation study. Each sociodemographic data variable, as well as the genetic data APOE-ε4 was subsequently omitted, prior to the MRI branding. Evaluation was calculated on the independent test set with n = 21 participants of the sub-study from the HNR Study. Visual representation were extracted using the ChestX-Ray8 database [16].

Ablation study on ADNI Phase 1 test set.

Prediction performance of the LSTM classification model on the ablation study. Each sociodemographic data variable, as well as the genetic data APOE-ε4 was subsequently omitted, prior to the MRI branding. Evaluation was calculated on the independent test set with n = 63 participants of the ADNI Phase 1 dataset. Visual representation were extracted using the ImageNet database [64]. Prediction performance of the LSTM classification model on the ablation study. Each sociodemographic data variable, as well as the genetic data APOE-ε4 was subsequently omitted, prior to the MRI branding. Evaluation was calculated on the independent test set with n = 63 participants of the ADNI Phase 1 dataset. Visual representation were extracted using the ChestX-Ray8 database [16].

Discussion

The proposed branding technique to obtain fused image representations of MRI scans with the sociodemographic data age, gender, education and marital status and APOE-ε4 genotype outperformed other inputs in all evaluation metrics in the independent test set. This could be shown for both the HNR Study and ADNI Phase 1 datasets. For the k-fold cross validation samples with n = 99 participants on the HNR Study dataset, the Wide and Deep input method achieved a higher sensitivity rate. However, for the specificity, precision and overall accuracy rate, the proposed method obtains better scores. The original image as input obtains better specificity, precision and accuracy rates on the test set than the Wide and Deep input method. Analyzing the ablation study with DeCAF representation extracted from Inception-v3 [39], the following findings can be taken: gender does not affect the overall specificity education has the greatest impact on all four evaluation values APOE-ε4 does not affect specificity marital status does not affect specificity age does not affect sensitivity Removing the genetic variable APOE-ε4 leeds to the highest decrease in the F1-Score All applied sociodemographic data and APOE-ε4 have an impact on the overall F1-Score Analyzing the ablation study with DeCAF representation extracted with ChestX-Ray8 [16], the following findings can be taken: Age, education and marital status do not affect the overall sensitivity Removing age and education leads to the highest decrease in specificity Removing age and education leads to the highest decrease in the overall accuracy Removing age leeds to the highest decrease in the F1-Score All applied sociodemographic data and APOE-ε4 have an impact on the overall F1-Score For the k-fold cross validation samples with n = 561 participants on the ADNI Phase 1 dataset, the original input method achieved the same sensitivity rate. However, for the specificity, precision and overall accuracy rate, the proposed method obtains better scores. The original image as input obtains better specificity, precision and accuracy rates on the test set than the Wide and Deep input method. Analyzing the ablation study with DeCAF representation extracted with Inception-v3 [39], the following findings can be taken: gender does not affect the overall specificity education has a great impact on all four evaluation values Removing APOE-ε4 has the highest decrease on specificity Removing age has the highest decrease on the accuracy rate All applied sociodemographic data and APOE-ε4 have an impact on all four evaluation metrics Analyzing the ablation study with DeCAF representation extracted with ChestX-Ray8 [16], the following findings can be taken: Removing education led to the highest decrease in specificity Age and education have a great impact on all four evaluation values Removing age has the highest decrease on sensitivity and accuracy rate Removing gender has the highest decrease on the F1-Score All applied sociodemographic data and APOE-ε4 have an overall impact on all four evaluation metrics As mentioned earlier, adequate fusion of selected features leads to enriched and consolidated visual representations. We show that combining several data input sources from the medical domain, proves to be a possible way for tackling challenging medical tasks. Deep convolutional neural networks incorporate the ability to extract color information from RGB-images. MRI scans offer important insight into visual representation which can be applied for automatic structuring, such as classification, semantic tagging, and disease detection. However, they are only gray-scaled and thereby use the same color information redundantly for all 3 color channels. In the notion of fusing information to achieve medical image understanding, the MRI scans are enhanced after branding and prior to training the classification models. The evaluation results show that augmenting contrast intensified and blurred image adaptions as two extra layers increases the model performance regarding classification and annotation between aMCI and controls. It has to be kept in mind that we did not have any biomarker information that is specific for hallmark AD proteinopathies like amyloid beta deposition or phosphorylated tau. Thus, we cannot identify the underlying pathology in our aMCI cases. In contrast to the ADNI data set the data in the HNR stem from a local cohort of German nationality in three neighborhood cities in the Ruhr Area. As a consequence this study population is rather homogenous both in cultural as in ethnic aspects. The HNR research group has consistently experienced similar obervations also in other fields, like CVD prediction. Due to the limited number of participants in the applied datasets, there are limitations to the usage of standard end-to-end deep learning classification architecture. However to utilize the benefits of deep learning systems and examine its capabilities, DeCAF are adopted for visual representation. The evaluation metric rates on the independent test set show that taking advantage of large trained deep learning models such as ImageNet as feature representation, the aMCI vs control classification models are fed sufficient information and are capable of predicting clinical outcome. Each 2D slice of an MRI contains information and dependencies about predecessor and successor slice. LSTM models have the ability to accumulate information, thus feeding every 2nd slice of the MRI scans was not only time efficient but led to positive results. By adopting an LSTM model over a 3D convolutional neural network, the computational time is reduced, as convolutional operations for the 2D convolutional layers are done across the x and y dimension only. We could show that LSTM models are capable of classifying between aMCI and controls using sociodemographic data and APOE-ε4, and deep convolutional activation features. The Grad-CAM results showing the visual explanations of the applied LSTM model are on first sight reasonable. The presented approach can be applied to create computer-aided diagnosis systems for aMCI vs cognitively unimpaired, as well as semantic structuring and tagging systems in practical clinical situations. Radiologists and neurologists can use the classifier output as ‘second opinion’ in addition to peer discussions. Another application is to integrate the classifier output for a built-in preselection filter after MRI scans are taken. Suspected aMCI cases can be highlighted with this filter, hence reducing the number of images radiologists have to examine and indicating when to comprehensively screen. As structured and annotated data is fundamental for effective Information Retrieval (IR) systems, the proposed method can be integrated for the modeling and creation of IR systems. The classifier outputs are then adopted for prior content tagging. Such IR systems can be used by early medical practioners to filter aMCI vs cognitively impaired for learning purposes. The findings of this proposed work require further evaluation on different functional neuroimaging techniques. For tackling the challenging medical task of early and preclinical detection of AD dementia, the fusion of various clinical data can be intensively experimented, as there are numerous input sources in the medical domain.

Conclusion

This work presents an approach to combine sociodemographic data and APOE-ε4 with 1.5T MRI scans to create optimized classification models to distinguish between aMCI and controls. The fusion method enables an enriched image representation, as classification systems with multi-modal image features have proven to obtain higher prediction accuracies. Information fusion is obtained by encoding the values of the APOE-ε4 and sociodemographic data variables: gender, marital status, age and education as markers, and branding these on the MRI scans, prior to training and prediction. Two extra color layers denoting the contrast intensified and blurred image adaptions are augmented to simulate RGB-channeled images, which aims to use the characteristic of deep convolutional neural networks for color extraction as features for training. LSTM based RNNs are modeled as aMCI vs control classification models, as each 2D slice of a MRI scan contains dependencies between predecessor and successor slices. The output of the classification models are justified with visual explanations, denoting the important image regions used for decision making. This works shows that fusing sociodemographic and genetic data from participants in a sub-study from the HNR Study and the ADNI Phase 1 datasets with MRI scans obtains enriched visual information that provides adequate representations, which is essential for creating effective automatic structuring systems, such as classification models, disease detection and semantic tagging. This is observed for both visual feature input techniques: DeCAF representations from ‘Branded’ images and ‘Wide and Deep’ image representation method. Prospective modeling and evaluation of mild cognitive impairment classification systems can be based on different multi-modal image representation, as positive results have been presented in recent approaches and in this work. The proposed work pursues the way of several fusion techniques of features from different heterogeneous modalities in the medical domain for computer-aided diagnosis applications and can be adapted o 3D deep learning approaches by branding volume markers. 6 Jan 2020 PONE-D-19-27720 Sociodemographic and Genetic Data Augmentation for MRI-based Detection of Amnestic Mild Cognitive Impairment Using Deep Learning Methods PLOS ONE Dear Prof. Dr. Friedrich, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== ACADEMIC EDITOR: Dear authors The majority of commented raised from the reviewers can be answered properly. The second reviewer raised an issue regarding the reproducibility of the findings by using a closed database. I can understand that it is always possible to open a database to the public due to many issues from the funding schemes. My opinion is to reproduce your findings by using ADNI database. This will further strengthen your study. ============================== We would appreciate receiving your revised manuscript by Feb 20 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Stavros I. Dimitriadis Academic Editor PLOS ONE Additional Editor Comments: Dear authors I have received the comments from two reviewers. My recommendation is major revision with the suggestion of replicating your findings in an open database like ADNI using the same design of cohort (age,gender,group distribution). If you decide to revise your manuscript, I will be glad to send it back to reviewers for further evaluation. Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please provide additional details regarding participant consent. In the Methods section, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information. 3. Thank you for including your ethics statement: "Ethics Statement from IRB of University Hospital Essen, Essen, Germany available dated 2009-10-20 to Prof. Dr. C. Weimar, registration number: 06-3116" a. Please amend your current ethics statement to confirm that your named institutional review board or ethics committee specifically approved this study. b. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”). For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research 4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. 5. Thank you for stating the following in the Financial Disclosure section: "The work of Obioma Pelka was partially funded by a PhD grant from University of Applied Sciences and Arts Dortmund, Germany. The authors thank the Heinz Nixdorf Foundation [Chairman: Martin Nixdorf; Past Chairman: Dr jur. Gerhard Schmidt (†)], for their generous support of this study. Parts of the study were also supported by the German Research Council (DFG) [DFG project: EI 969/2-3, ER 155/6-1;6-2, HO 3314/2-1;2-2;2-3;4-3, INST 58219/32-1, JO 170/8-1, KN 885/3-1, PE 2309/2-1, SI 236/8-1;9-1;10-1,], the German Ministry of Education and Science [BMBF project: 01EG0401, 01GI0856, 01GI0860, 01GS0820_WB2-C, 01ER1001D, 01GI0205], the Ministry of Innovation, Science, Research and Technology, North Rhine-Westphalia (MIWFT-NRW), the Else Kröner-Fresenius-Stiftung [project: 2015_A119] and the German Social Accident Insurance [DGUV project: FF-FP295]. Furthermore the study was supported by the Competence Network for HIV/AIDS, the deanship of the university hospital and IFORES of the university Duisburg-Essen, the European Union, the German Competence Network Heart Failure, Kulturstiftung Essen, the Protein Research Unit within Europe (PURE) and the following companies: Celgene GmbH München, Imatron/GE-Imatron, Janssen, Merck KG, Philips, ResMed Foundation, Roche Diagnostics, Sarstedt AG&Co, Siemens HealthCare Diagnostics, Volkswagen Foundation. This substudy conducted in the Department of Neurology of the university hospital Essen was additionally supported by the Dr. Werner-Jackstädt Stiftung and Janssen." We note that you received funding from a commercial source: Celgene GmbH München, Imatron/GE-Imatron, Janssen, Merck KG, Philips, ResMed Foundation, Roche Diagnostics, Sarstedt AG&Co, Siemens HealthCare Diagnostics and the Volkswagen Foundation. Please provide an amended Competing Interests Statement that explicitly states this commercial funder, along with any other relevant declarations relating to employment, consultancy, patents, products in development, marketed products, etc. Within this Competing Interests Statement, please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include your amended Competing Interests Statement within your cover letter. We will change the online submission form on your behalf. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: No ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors conducted an observational case-control study including 120 participants (61 amnestic Mild Cognitive Impairment (aMCI) and 59 cognitively unimpaired) of the Heinz Nixdorf Recall (HNR) Study. The aim of the study was to fuse sociodemographic data (age, marital status, education and gender) and genetic data (presence of an APOE4 allele) with Magnetic Resonance Imaging (MRI) scans, and, by this approach enable enriched multi-modal features (which represented the MRI scan visually) creating and modeling classification systems capable of detecting aMCI. To fully utilize the potential of Deep Convolutional Neural Networks- two extra color layers denoting contrast intensified and blurred image adaptations- were virtually augmented to each MRI scan, completing the Red-Green-Blue color channels. The Deep convolutional activation features were extracted from the average pooling layer of the deep learning system Inception v3. These features from the fused MRI scans were used as visual representation for the Long Short-Term Memory based Recurrent Neural Network classification model. The dataset consisting of 120 participants was split into a training and test set, containing 99 participants (aMCI=51 and controls=48) and 21 participants (aMCI=10 and controls=11), respectively. The test set was independent and not used for training or parameter optimization. Experimental results showed the proposed approach achieved 90% accuracy and 0.90 F1-Score at classification of aMCI vs. cognitively unimpaired participants. The theme is interesting.The subject is relevant. The design of the study is clear. There are some difficulties in Introduction section and in describing methods of the study. I would encourage the authors to review (major revision) the manuscript since the topic is relevant. Following, I made some recommendations to the authors, by section. The title: is good. Abstract section: is good. Introduction section: 1 - Page 2, line 11: “The amnestic MCI (aMCI) subtype has a high probability of progressing to AD”. Indeed, AD neuropathology could be present and related to aMCI by the time aMCI is diagnosed. In this case, aMCI should correspond to first symptomatic presentation of AD. Therefore, I suggest including the term "dementia" to AD instead of only AD. 2- The authors should explain previous works and knowledge development as a history. I suggest they avoid citation like this one in page 2, line 20: “As shown in [6-9], multi-modal representation achieves higher prediction rates in medical and biomedical annotation tasks...”. Also, in page 2, line 38: “In [17, 18], automatic generated keywords are combined with x-rays to obtain optimized body region classification”. And, in page 2, line 44: “In [22, 23], late fusion methods were applied, where decision values from several classifiers are fused to make the final classification prediction”. The authors frequently cite previous works by the references. I suggest they cite the authors first, as a history of development of knowledge, and, then, they should include the citation number. Methods section: 3 - Page 3, line 88: the authors explain that the current study is a sub-study derived from the Heinz Nixdorf Recall (HNR) Study. They relate that the verbal memory, orientation/praxis, information processing speed, executive functions and verbal abilities were covered by the study protocol. Was the aMCI diagnosis based on the results of the short or on more comprehensive neuropsychological assessment? 4- What was considered the gold standard method for diagnosing aMCI? 5-When was the current study conducted? 6- Considering that current study is a sub-study derived from the Heinz Nixdorf Recall (HNR) Study, what were the inclusion and exclusion criteria? 7- Page 3, line 107: the authors explained “Because the MCI due to AD criteria by Albert et al. [42] were not yet published when the sub-study started, the Winblad et al. [43] MCI criteria were used to diagnose aMCI. Therefore, the diagnosis of aMCI was equivalent to the diagnosis of MCI due to AD without biomarker information representing the core clinical criteria as proposed by Albert et al". Considering the study propose an approach to diagnose, absence of the AD biomarkers must be considered a limitation of the study. 8- The authors included sociodemographic characteristics and APOE-ε4 genotyping as study covariates. Why didn´t they include clinical characteristics, especially cardiovascular risk factors, as covariates? I think clinical data should be disponible, since the current study derived from The HNR Study, a population-based prospective cohort study which evaluate cardiovascular risk factors. They are considered risk factors to AD. 9-Page5, line 135: “Table 1 shows the distribution of the sociodemographic data variables age, gender, education, marital status and genetic data variable APOE-ε4 genotype (defined as “clinical data”) for aMCI and cognitively unimpaired controls on the applied sub sample”. Why did the authors consider genetic data as clinical data? Please, explain it. 10- Page 5, line 142: “The presented work proposes an approach to fuse clinical data with MRI scans, enabling enriched multi-modal image representation”. Is it possible to consider sociodemographic characteristics and APOE-ε4 genotype as clinical data? 11- Page 5, line 142: “This is fundamental for image classification and retrieval purposes, and is not limited to computer-aided decision systems for clinical diagnoses, as positive results have been presented for 2D Images in Pelka et al. [21]. The method of encoding clinical data onto 2D slices of an MRI scan is denoted in this work as “Branded“. Since current studies have used 3D image-processing, using MRI 2D slices does not constitute a study limitation? Please, explain it better. 12-Page 6: Table 2 needs a legend with explanations. 13- I think statistical analysis needs to be more detailed. Result section: It is good. Tables and figures are good. Discussion section: 14- Discussion is good, based on the results. However, it is important to emphasize the limitations, especially lack of clinical data and of biomarkers of AD. Conclusion: Conclusions are based on the results. However, in page 11, line 325, the authors conclude “Information fusion is obtained by encoding the values of the clinical data variables: gender, marital status, APOE-ε4, age and education as markers and branding these on the MRI scans, prior to training and prediction. One more time, I advise that it is important to pay attention to “clinical data variables”. Sincerely. Reviewer #2: Major comments This paper presents an interesting work on ‘branding’ sociodemographic data and ApoE information to MRI data in a deep learning framework (CNN + LSTM) for amnestic MCI prediction. The four different inputs tested are pretty thorough design to evaluate the proposed method. However, this study, in its current form, still needs much more work both in the methodology and the study design. Overall, methodologically, I think using the transfer learning on 2D MRI slices, with some kind of augmentation to fit in the RGB channels, and using LSTM to model information across slices are much like very early deep learning works in this medical imaging, when people tried to fit the medical image into successful deep learning models demonstrated elsewhere. I think the field has moved beyond it. This also relates to my point below, where large dataset can make a difference in methodological choice. And for the specific study design, the main concern is the low number of subjects, as the authors also realized. We can see the large variance present in Table 3. There are many external large public datasets (ADNI, OASIS, AIBL, …) to test the algorithm. And due to the restriction on the data of this study, future researchers are also not able to compare their results with the result present in this study. Minor comments: dCNN and DeCafs are not good abbreviations for those well-known concepts. I have heard little usage of the term ‘prediction rate’, maybe change to more common terms such as prediction accuracy. The related work section is a bit confusing and unrelated, such as multiple image and text fusion references, the hearing loss paper. Some of the more related references might include: Spasov, Simeon, et al. "A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer's disease." Neuroimage 189 (2019): 276-287. Yang, Jie, et al. "Characterizing Alzheimer's Disease with Image and Genetic Biomarkers using Supervised Topic Models." IEEE journal of biomedical and health informatics (2019). Lee, Garam, et al. "Predicting Alzheimer’s disease progression using multi-modal deep learning approach." Scientific reports 9.1 (2019): 1952. It’s true ApoE information represents the most well-known genetic factor of AD, but having genetic data in the title while only using ApoE information is still misleading, I was expecting more SNPs included in the model. The left image in the Grad-CAM seems to pinpoint frontal lobe and makes sense, perhaps it’s better to visualize in 3D, see e.g. Feng, Xinyang, et al. "Deep Learning on MRI Affirms the Prominence of the Hippocampal Formation in Alzheimer's Disease Classification." bioRxiv (2018): 456277. Using only clinical data (input #4) is not a good comparison, as the input is too different from the distribution represented in ImageNet (arguably MRI dataset is also pretty different), maybe just use some simple classifiers, for example, we can see from Table 1, ApoE itself should be able to achieve sensitivity better than 0.5. As the dataset is from a longitudinal study, it would be interesting to including some longitudinal element either in the feature side or diagnosis side. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 16 Mar 2020 The responses are better readable in the corresponding document. Editor 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. Thank you for the comment. The authors have followed the styling guidelines and made the appropriate changes. 2. Please provide additional details regarding participant consent. In the Methods section, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information. All participants provided written informed consent. 3. "Ethics Statement from IRB of University Hospital Essen, Essen, Germany available dated 2009-10-20 to Prof. Dr. C. Weimar, registration number: 06-3116" a. Please amend your current ethics statement to confirm that your named institutional review board or ethics committee specifically approved this study. Thank you. The ethics statement has been correctly amended. 4. We note that you have indicated that data from this study are available upon request. a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. De-identified data from this study are available upon request to Department of Neurology, University Hospital Essen, Germany pending approval by the study steering committee. Due to data security reasons (i.e., data contain potentially participant identifying information), the HNR Study does not allow sharing data as public use file. Data requests can be addressed to: recall@uk-essen.de 5. We note that you received funding from a commercial source: Celgene GmbH München, Imatron/GE-Imatron, Janssen, Merck KG, Philips, ResMed Foundation, Roche Diagnostics, Sarstedt AG&Co, Siemens HealthCare Diagnostics and the Volkswagen Foundation. Please provide an amended Competing Interests Statement that explicitly states this commercial funder, along with any other relevant declarations relating to employment, consultancy, patents, products in development, marketed products, etc. Competing intereset were amended. 6. Please ensure that you refer to Table 9 in your text as, if accepted, production will need this reference to link the reader to the Table. Table 9 has been correctly referred to. 7. Thank you for including your ethics statement on the submission form: "Ethics Statement for the use of the HNR study population from IRB of University Hospital Essen, Essen, Germany available dated 2009-10-23 and 2012-06-06 to Prof. Dr. C. Weimar, registration number: 06-3116. All participants provided informed consent. Details about the Ethics statement of the ADNI study population can be found at: https://adni.loni.usc.edu" The ethics statement has been amended and added to the Methods section of the manuscript. 8. Thank you for clarifying in your Response to Reviewers that: "All participants provided written informed consent." a) To help ensure that the wording of your manuscript is suitable for publication, would you please also add this statement at the beginning of the Methods section of your manuscript file. This statement has been added to “Study Population” of the Methods section of the manuscript. 10. Thank you for providing the following data availability statement: "De-identified data from this study are available upon request to Dr. Martha Jokisch pending approval by the study steering committee. Due to data security reasons (i.e., data contain potentially participant identifying information), the HNR Study does not allow sharing data as public use file. Data requests can be addressed to: recall@uk-essen.de Additional Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (https://adni.loni.usc.edu). Details about data access is detailed there." Before we proceed, please address the following points: a) We note de-identified data are available upon request to Dr. Jokisch. PLOS policy does not allow authors to be the sole point of contact for data access queries. Please provide a non-author point of contact for fielding de-identified data access queries from future researchers. We have provided a non-author point for data access queries and added this information to journal editor comments point 4. b) Please provide any relevant accession codes, data set names, etc a future researcher may need when requesting access to the additional data obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. There are no relevant accession codes or data set names. c) Please also confirm the authors had no special access privileges others would not have to the data obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The authors had no special access privileges others would not have to the data obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Reviewer 1 Introduction section: 1. Page 2, line 11: “The amnestic MCI (aMCI) subtype has a high probability of progressing to AD”. Indeed, AD neuropathology could be present and related to aMCI by the time aMCI is diagnosed. In this case1, aMCI should correspond to first symptomatic presentation of AD. Therefore, I suggest including the term "dementia" to AD instead of only AD. We have included the term “dementia” accordingly. 2. The authors should explain previous works and knowledge development as a history. I suggest they avoid citation like this one in page 2, line 20: “As shown in [6-9], multi-modal representation achieves higher prediction rates in medical and biomedical annotation tasks...”. Also, in page 2, line 38: “In [17, 18], automatic generated keywords are combined with x-rays to obtain optimized body region classification”. And, in page 2, line 44: “In [22, 23], late fusion methods were applied, where decision values from several classifiers are fused to make the final classification prediction”. The authors frequently cite previous works by the references. I suggest they cite the authors first, as a history of development of knowledge, and, then, they should include the citation number. The reference style was changed. The authors were first cited by their name and then the citation number. Methods section: 3. Page 3, line 88: the authors explain that the current study is a sub-study derived from the Heinz Nixdorf Recall (HNR) Study. They relate that the verbal memory, orientation/praxis, information processing speed, executive functions and verbal abilities were covered by the study protocol. Was the aMCI diagnosis based on the results of the short or on more comprehensive neuropsychological assessment? The aMCI diagnosis was based on a neurological and neuropsychological evaluation. The neuropsychological evaluation was performed by an experienced neuropsychologist using the more comprehensive assessment (see point below). 4. What was considered the gold standard method for diagnosing aMCI? The gold standard method consisted of a neurological and neuropsychological evaluation. The neurological evaluation was performed by a senior neurologist. The neuropsychological examination was performed by an experienced neuropsychologist. The final decision about aMCI diagnosis was ultimately made by consensus agreement between the examining neurologist and neuropsychologist taking into account the medical history related to cognitive functioning, duration of such symptoms, the history of other medical illnesses and current treatment for each participant. Participants with aMCI had to fulfill the following criteria: Cognitive impairment in the verbal memory domain; presence of subjective cognitive decline; normal functional abilities of daily living and no dementia diagnosis. We have now added this information to the “Evaluation of cognitive status and aMCI diagnosis” section as follows: “The final decision about aMCI diagnosis was ultimately made by consensus agreement between the examining neurologist and neuropsychologist taking into account the medical history related to cognitive functioning, duration of such symptoms, the history of other medical illnesses and current treatment for each participant.” 5. When was the current study conducted? The data used for these analyses were assessed from 2006 to 2009.This information has been added to the “Study population” section. 6. Considering that current study is a sub-study derived from the Heinz Nixdorf Recall (HNR) Study, what were the inclusion and exclusion criteria? Inclusion criteria: A short cognitive assessment was performed in the HNR study (see Wege et al. 2011, Neuroepidemiology). A random sample of participants (aged 50–80 years) with impaired screening results (two subtests below the age- and education adjusted mean) and age-appropriate screening results were invited to the sub-study. Exclusion criteria: Participants with dementia, severe depression (ADAS depression subscale score >4), Parkinson disease, mental retardation, severe alcohol consumption (for women: >20 g/day; for men: >40 g/day), known brain cancer, severe problems with the German language (foreign persons) and severe sensory impairment leading to invalid cognitive testing were excluded. We have added this information to the “Study population” section. 7. Page 3, line 107: the authors explained “Because the MCI due to AD criteria by Albert et al. [42] were not yet published when the sub-study started, the Winblad et al. [43] MCI criteria were used to diagnose aMCI. Therefore, the diagnosis of aMCI was equivalent to the diagnosis of MCI due to AD without biomarker information representing the core clinical criteria as proposed by Albert et al". We totally agree with the reviewer and have now added this limitation to our “discussion” section as follows: “It has to be kept in mind that we did not have any biomarker information that is specific for hallmark AD proteinopathies like amyloid beta deposition or phosphorylated tau. Thus, we cannot identify the underlying pathology in our aMCI cases.” 8. The authors included sociodemographic characteristics and APOE-ε4 genotyping as study covariates. Why didn´t they include clinical characteristics, especially cardiovascular risk factors, as covariates? We totally agree with the reviewer, however other clinical characteristics were not added as they were not part of the proof of principle. 9. Page5, line 135: “Table 1 shows the distribution of the sociodemographic data variables age, gender, education, marital status and genetic data variable APOE-ε4 genotype (defined as “clinical data”) for aMCI and cognitively unimpaired controls on the applied sub sample”. Why did the authors consider genetic data as clinical data? Please, explain it. Thank you for the comment. To avoid confusion, the authors have changed “clinical data” to “participant data”. Participant data is further split into sociodemographic (age, gender, education and marital status) and genetic data (APOE) 10. Page 5, line 142: “The presented work proposes an approach to fuse clinical data with MRI scans, enabling enriched multi-modal image representation”. Is it possible to consider sociodemographic characteristics and APOE-ε4 genotype as clinical data? Thank you for the comment. The authors have removed the misleading sentences. 11. Page 5, line 142: “This is fundamental for image classification and retrieval purposes, and is not limited to computer-aided decision systems for clinical diagnoses, as positive results have been presented for 2D Images in Pelka et al. [21]. The method of encoding clinical data onto 2D slices of an MRI scan is denoted in this work as “Branded“. Since current studies have used 3D image-processing, using MRI 2D slices does not constitute a study limitation? Please, explain it better. The limitations regarding MRI 2D slices vs 3D have been added. 12. Page 6: Table 2 needs a legend with explanations. Legends have been added to Table 2. 13. I think statistical analysis needs to be more detailed. Additional analysis has been written. This is also the case for the second dataset (ADNI Phase 1) used for further evaluation. Discussion section: 14. Discussion is good, based on the results. However, it is important to emphasize the limitations, especially lack of clinical data and of biomarkers of AD. Thank you for the comment. The authors have removed the misleading sentences. Conclusion: 15. Page 11, line 325, the authors conclude “Information fusion is obtained by encoding the values of the clinical data variables: gender, marital status, APOE-ε4, age and education as markers and branding these on the MRI scans, prior to training and prediction. I advise that it is important to pay attention to “clinical data variables”. Thank you for the comment. The authors have removed the misleading sentences. Reviewer 2 Major comments: Overall, methodologically, I think using the transfer learning on 2D MRI slices, with some kind of augmentation to fit in the RGB channels, and using LSTM to model information across slices are much like very early deep learning works in this medical imaging, when people tried to fit the medical image into successful deep learning models demonstrated elsewhere. I think the field has moved beyond it. This also relates to my point below, where large dataset can make a difference in methodological choice. For the specific study design, the main concern is the low number of subjects, as the authors also realized. We can see the large variance present in Table 3. There are many external large public datasets (ADNI, OASIS, AIBL, …) to test the algorithm. And due to the restriction on the data of this study, future researchers are also not able to compare their results with the result present in this study. Thank you for the comment and we totally agree that an additional and open-accessible state-of-the-art dataset should be included. Limitations regarding 2D vs 3D models are now mentioned in the manuscript. The authors have further evaluated the proposed approach on the ADNI Phase 1 dataset. Using the same participants data structure and branding process, we reproduce similar predication scores for the classification of controls vs aMCI. This manuscript was submitted prior to the ADNI Data and Publications Committee and received approval regarding Data User Agreement. In this manuscript, ADNI is acknowledged, the data gathering is described in the Methods section, and included to the named authors. Minor comments: 1. dCNN and DeCafs are not good abbreviations for those well-known concepts. The abbreviation dCNN and DeCafs have been removed. DeCafs was changed throughout the manuscript to DeCAF. 2. I have heard little usage of the term ‘prediction rate’, maybe change to more common terms such as prediction accuracy. Prediction rate was changed to prediction accuracy. 3. The related work section is a bit confusing and unrelated, such as multiple image and text fusion references, the hearing loss paper. Some of the more related references might include: Spasov, Simeon, et al. "A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer's disease." Neuroimage 189 (2019): 276-287. Yang, Jie, et al. "Characterizing Alzheimer's Disease with Image and Genetic Biomarkers using Supervised Topic Models." IEEE journal of biomedical and health informatics (2019). Lee, Garam, et al. "Predicting Alzheimer’s disease progression using multi-modal deep learning approach." Scientific reports 9.1 (2019): 1952. Thank you for the comment and we totally agree. Misleading and unrelated references were removed. The proposed research (Spasov et al., Yang et al. Anf Garam et al.) have been added to the Related Work section. 4. It’s true ApoE information represents the most well-known genetic factor of AD, but having genetic data in the title while only using ApoE information is still misleading The misleading title has been changed. We have replaced genetic with merely ApoE. 5. The left image in the Grad-CAM seems to pinpoint frontal lobe and makes sense, perhaps it’s better to visualize in 3D, see e.g. Feng, Xinyang, et al. "Deep Learning on MRI Affirms the Prominence of the Hippocampal Formation in Alzheimer's Disease Classification." bioRxiv (2018): 456277. Thank you for the comment. The adopted approach is a 2D classification model and can unfortunately not be used to create similar Grad-CAM visualization as seen in Feng et al. 6. Using only clinical data (input #4) is not a good comparison Thank you for the comment. The usage of merely clinical data has been completely removed from the manuscript. As Reviewer 1 mentioned, the term “clinical data” is misleading and was changed to “participants data” Submitted filename: Response_to_Reviewers.pdf Click here for additional data file. 1 Jun 2020 PONE-D-19-27720R1 Sociodemographic Data and APOE-ε4 Augmentation for MRI-based Detection of Amnestic Mild Cognitive Impairment Using Deep Learning Systems PLOS ONE Dear Dr. Friedrich, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. During the final round of review, one of the reviewers suggested a few more corrections regarding the references, the explanation of a specific deep learning model with transfer learning and a new experiment using only one portion of the dataset. I suggest you to respond to these comments and revised properly the draft. Please submit your revised manuscript by Jul 16 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Stavros I. Dimitriadis Academic Editor PLOS ONE Additional Editor Comments (if provided): Reviewers suggested a few more comments during the final round. I suggest to read carefully the comments and answer them one by one in your revised letter. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors conducted an observational case-control study including 120 participants (amnestic Mild Cognitive Impairment (aMCI) and cognitively unimpaired) of the HNR Study. The aim of the study was to fuse sociodemographic data and APOE4 allele with Magnetic Resonance Imaging scans, and, by this approach, enable enriched multi-modal features, creating and modeling classification systems capable of detecting aMCI. To fully utilize the potential of Deep Convolutional Neural Networks- two extra color layers denoting contrast intensified and blurred image adaptations- were virtually augmented to each MRI scan, completing the Red-Green-Blue color channels. The proposed approach was evaluated on a sub-study of 120 participants (aMCI and cognitively unimpaired) of the HNR Study with a baseline model accuracy of 76%. Further evaluation was conducted on the ADNI Phase 1 dataset with 624 participants (aMCI and cognitively unimpaired) with a baseline model accuracy of 66.27%. Experimental results showned the proposed approach achieves 90% accuracy and 0.90 F1-Score at classification of aMCI vs. cognitively unimpaired participants on the HNR Study dataset, and 77% accuracy and 0.83 F1-Score on the ADNI dataset. The authors answered all the reviewers questions and made the suggested amendments which contributed to enhance the quality of the manuscript. Sincerily, Reviewer #3: 1. Can you expand on the number of patients that were removed from each of the exclusion criteria’s? 2. Can you expand on why the ADNI dataset had lower performance despite having more data to train on. 3. On line 31-33 you state “However, a major 31 concern in the medical domain is the lack of publicly available large image data sets 32 like ImageNet [16]. This is due to the fact that detailed annotation of medical images is 33 time-consuming, prone to errors and restricted by data protection rules” Minor point but can you just change it from lack of publicly available image data sets to insufficient number of datasets. They exist they’re just not enough for all contexts see. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR 2017 Poldrack, R.A.; Barch, D.M.; Mitchell, J.; Wager, T.; Wagner, A.D.; Devlin, J.T.; Cumba, C.; Koyejo, O.; Milham, M. Toward open sharing of task-based fMRI data: the OpenfMRI project. Front. Neuroinf. 2013, 7, 12. 4. 89-90 “deep learning model, thereby exploring the potential of Transfer 89 Learning [15] from pre-trained ImageNet models” I would imagine using transfer learning from something from a medical domain like chestx-ray8 database would be better? 5. Can you expand on the performance impact of the augmented 2 color layers? 6. You use inception v3 as the basis of the transfer learning. Can you expand on why you chose that instead of a model pretrained for something in a medical context. As I mention in point 3 they exist. 7. How does the model perform if you train it only on the HNR study dataset and then test it on the ADNI phase 1 dataset ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: Yes: Ross O'Hagan [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 15 Jul 2020 Reviewer 3 1. Can you expand on the number of patients that were removed from each of the exclusion criteria’s? Participants with dementia (n=7), severe depression (ADAS depression subscale score >4, n=13), Parkinson disease (n=5), mental retardation (n=2), severe alcohol consumption (for women: >20 g/day; for men: >40 g/day, n=2), known brain cancer (n=1), severe problems with the German language (foreign persons, n=9) and severe sensory impairment (n=2) leading to invalid cognitive testing were excluded from the sub-study. 2. Can you expand on why the ADNI dataset had lower performance despite having more data to train on. In contrast to the ADNI data set the data in the HNR stem from a local cohort of German nationality in three neighborhood cities in the Ruhr Area. As a consequence this study population is rather homogenous both in cultural as in ethnic aspects. The HNR research group has consistently experienced similar obervations also in other fields, like CVD prediction. 3. On line 31-33 you state “However, a major concern in the medical domain is the lack of publicly available large image data sets like ImageNet [16]. This is due to the fact that detailed annotation of medical images is time-consuming, prone to errors and restricted by data protection rules” Minor point but can you just change it from lack of publicly available image data sets to insufficient number of datasets. They exist they’re just not enough for all contexts see. Thank you for the comment. We have edited the statement and the required references (Wang et. al and Poldrack et. al) were added. 4. 89-90 “deep learning model, thereby exploring the potential of Transfer 89 Learning [15] from pre-trained ImageNet models” I would imagine using transfer learning from something from a medical domain like chestx-ray8 database would be better? We adopted the chestx-ray8 database weights instead of imagenet to extract visual representations prior to training. This step was implemented for both the HNR study dataset, as well as the ADNI phase I dataset. The obtained evaluation metric scores are listed alongside, for comparison purposes. 5. Can you expand on the performance impact of the augmented 2 color layers? Thank you for the comment. We have added information regarding the prediction rate improvements. 6. You use inception v3 as the basis of the transfer learning. Can you expand on why you chose that instead of a model pretrained for something in a medical context. As I mention in point 3 they exist. We have included the prediction performance using the classification model with transfer learning using the chestx-ray8 database. 7. How does the model perform if you train it only on the HNR study dataset and then test it on the ADNI phase 1 dataset Thank you for the comment. This experiment was performed and obtained less favourable prediction rates. Submitted filename: Response_to_Reviewers.docx Click here for additional data file. 16 Jul 2020 Sociodemographic Data and APOE-ε4 Augmentation for MRI-based Detection of Amnestic Mild Cognitive Impairment Using Deep Learning Systems PONE-D-19-27720R2 Dear Dr. Friedrich, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Stavros I. Dimitriadis Academic Editor PLOS ONE Additional Editor Comments (optional): After carefully reading of your answers to reviewers' comments, I recommend the acceptance of your manuscript in its current form. Reviewers' comments: 16 Sep 2020 PONE-D-19-27720R2 Sociodemographic Data and APOE-ε4 Augmentation for MRI-based Detection of Amnestic Mild Cognitive Impairment Using Deep Learning Systems Dear Dr. Friedrich: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Stavros I. Dimitriadis Academic Editor PLOS ONE

31 in total

1. PubMed Central: The GenBank of the published literature.

Authors: R J Roberts
Journal: Proc Natl Acad Sci U S A Date: 2001-01-16 Impact factor: 11.205

2. The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors: Olivier Bodenreider
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

3. Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: results from ADNI.

Authors: Chandan Misra; Yong Fan; Christos Davatzikos
Journal: Neuroimage Date: 2008-11-05 Impact factor: 6.556

4. Reproducible evaluation of classification methods in Alzheimer's disease: Framework and application to MRI and PET data.

Authors: Jorge Samper-González; Ninon Burgos; Simona Bottani; Sabrina Fontanella; Pascal Lu; Arnaud Marcoux; Alexandre Routier; Jérémy Guillon; Michael Bacci; Junhao Wen; Anne Bertrand; Hugo Bertin; Marie-Odile Habert; Stanley Durrleman; Theodoros Evgeniou; Olivier Colliot
Journal: Neuroimage Date: 2018-08-18 Impact factor: 6.556

5. Multimodal classification of Alzheimer's disease and mild cognitive impairment.

Authors: Daoqiang Zhang; Yaping Wang; Luping Zhou; Hong Yuan; Dinggang Shen
Journal: Neuroimage Date: 2011-01-12 Impact factor: 6.556

6. A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer's disease.

Authors: Simeon Spasov; Luca Passamonti; Andrea Duggento; Pietro Liò; Nicola Toschi
Journal: Neuroimage Date: 2019-01-14 Impact factor: 6.556

Review 7. Mild cognitive impairment as a diagnostic entity.

Authors: R C Petersen
Journal: J Intern Med Date: 2004-09 Impact factor: 8.989

Review 8. Evaluating performance of biomedical image retrieval systems--an overview of the medical image retrieval task at ImageCLEF 2004-2013.

Authors: Jayashree Kalpathy-Cramer; Alba García Seco de Herrera; Dina Demner-Fushman; Sameer Antani; Steven Bedrick; Henning Müller
Journal: Comput Med Imaging Graph Date: 2014-03-27 Impact factor: 4.790

9. Classification of Alzheimer's Disease using volumetric features of multiple MRI scans.

Authors: Louise Bloch; Christoph M Friedrich
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2019-07

10. Annotation of enhanced radiographs for medical image retrieval with deep convolutional neural networks.

Authors: Obioma Pelka; Felix Nensa; Christoph M Friedrich
Journal: PLoS One Date: 2018-11-12 Impact factor: 3.240