Literature DB >> 35004227

Protocol for a machine learning algorithm predicting depressive disorders using the T1w/T2w ratio.

David A A Baranger¹, Yaroslav O Halchenko², Skye Satz¹, Rachel Ragozzino¹, Satish Iyengar³, Holly A Swartz¹, Anna Manelis¹.

Abstract

The T1w/T2w ratio is a novel magnetic resonance imaging (MRI) measure that is thought to be sensitive to cortical myelin. Using this novel measure requires developing novel pipelines for the data quality assurance, data analysis, and validation of the findings in order to apply the T1w/T2w ratio for classification of disorders associated with the changes in the myelin levels. In this article, we provide a detailed description of such a pipeline as well as the reference to the scripts used in our recent report that applied the T1w/T2w ratio and machine learning to classify individuals with depressive disorders from healthy controls.

Entities: Chemical

Keywords: Cortical myelin; Depression; Elastic net; LDA; MRI; Machine learning; T1w/T2w ratio

Year: 2021 PMID： 35004227 PMCID： PMC8720909 DOI： 10.1016/j.mex.2021.101595

Source DB: PubMed Journal: MethodsX ISSN： 2215-0161

Specifications table

Methods

Participants

Participant recruitment and demographics has been previously described [1]. In brief, we recruited healthy controls (HC; N=55) and individuals with unipolar depression (UD; N=50) diagnosed with major depressive or persistent depressive disorder using DSM-5 criteria. Participants with a history of head injury, metal in the body, pregnancy, claustrophobia, neurodevelopmental disorders, systemic medical illness, premorbid IQ<85 per the National Adult Reading Test [2], current alcohol/drug abuse, Young Mania Rating Scale scores>10 (YMRS [3]) at scan, or meeting criteria for any psychotic-spectrum disorder were excluded during recruitment. In addition, we excluded from the present analyses participants with brain abnormalities of potential clinical relevance (2 UD); diagnosis change during the course of the study (1 HC was diagnosed with major depressive disorder, and 1 UD was diagnosed with bipolar disorder); scanner and movement related artifacts in T1w or T2w images (4 HC, 7 UD), and myelin maps of insufficient quality (3 HC, 1 UD). A more detailed description of the last two points will be provided below. The final sample included 47 HC and 39 UD.

Neuroimaging data acquisition

The neuroimaging data were collected at the University of Pittsburgh Magnetic Resonance Research Center using a 3T Siemens Prisma scanner with a 64-channel receiver head coil. Neuroimaging data files were named according to the ReproIn convention [4]. DICOM images were converted to BIDS format using heudiconv [5] and dcm2niix [6]. High-resolution T1w images were collected using the MPRAGE sequence (TR=2400ms, resolution=0.8 × 0.8 × 0.8mm, 208 slices, FOV=256, TE=2.22ms, flip angle=8°). High-resolution T2w images were collected using TR=3200ms, resolution=0.8 × 0.8 × 0.8mm, 208 slices, FOV=256, TE=563ms. Field maps were collected in the AP and PA directions using the spin echo sequence (TR=8000, resolution=2 × 2 × 2mm, FOV=210, TE=66 ms, flip angle=90°, 72 slices). The raw data used [1] are publicly available (https://openneuro.org/datasets/ds003653/versions/1.0.0) [7].

Subject-level preprocessing

Cortical myelin was characterized with the T1w/T2w ratio [8], [9], [10] using the PreFreeSurfer, FreeSurfer, and PostFreeSurfer minimal preprocessing pipelines for the human connectome project (HCP) [8]. Workbench v1.4.2 and HCPpipelines-4.1.3 were installed system-wide on a workstation with GNU/Linux Debian 10 operating system. Bias field correction in PreFreeSurfer used spin echo field maps collected in AP and PA phase encoding directions. MSMSulc [11] in PostFreeSurfer was used for registration to standard space. Parcellation of the T1w/T2w ratio maps [10,12] was done using the Glasser Atlas [9] (n=360 regions). Parcellation is a procedure by which points in the brain are mapped to brain regions, or parcels. The mean T1w/T2w ratio of each parcel was computed for each participant. The Glasser Atlas was used as it is the only brain atlas which incorporates the T1w/T2w ratio. Subject-level myelin maps, as well as all code and data used in this protocol, are publicly available (https://github.com/manelis-lab/myelin-paper-NICL2021).

Neuroimaging quality assurance

Quality assurance (QA) was first performed following standard procedures. We additionally provide detail on our customizations to our QA pipelines for T1w, T2w, and T1w/T2w ratio images. T1w and T2w images were visually inspected for noise or movement-induced artifacts [13], including ringing, aliasing, ghosting, blurring, banding, and hyperintensities, as well as for structural abnormalities that may be clinically relevant. Data quality was then examined using mriqc version 0.15.1 [14]. Background noise enhanced images generated by mriqc were visually inspected for the same aforementioned artifacts. As there are no clear guidelines regarding how to use single or combined image quality metrics (IQMs) produced by mriqc to identify potential low-quality scans or unacceptable level of noise, we adopted the procedure proposed by the mriqception project [15] to identify potential T1w and T2w outlier images. Specifically, the images that passed this initial visual inspection were compared against a large sample of independent deidentified T1w and T2w images whose mriqc IQMs were downloaded from the mriqc API (https://mriqc.nimh.nih.gov/). The mriqc API collection contains information on over 1.4 million T1 and T2 images. For the purpose of this study, we selected the IQMs from images that were matched to the present study, based on TR, TE, spacing, scanner manufacturer, head coil, and the version of mriqc used. This resulted in 1046 T1w comparison images and 619 T2w comparison images. The distribution of IQMs from the present study was compared to the distribution of IQMs from the API. Scans with IQMs beyond the interquartile range (IQR) of the mriqc API data (median +/- 1.5 × 75% quartile – 25% quartile) were flagged as potential outliers and were re-inspected. Scans were excluded solely based on IQMs when values fell far outside the expected range (>5 IQRs). One subject (UD) was excluded based on this procedure. A total of 4 HC subjects and 7 UD subjects were excluded for noise or movement-related artifacts. Participants were excluded from analyses even if they failed QC only for T1w or only for T2w images. For the images that passed the mriqc API quality assurance, the FreeSurfer generated images were visually inspected for registration errors and artifacts that were not detected previously.

T1w/T2w ratio quality assurance

For the participants whose T1w and T2w images passed the neuroimaging QA steps described above, we evaluated the quality of the T1w/T2w ratio maps. As the T1w/T2w ratio image is a relatively recent addition to the MRI toolkit [10], there is no standard QA pipeline for T1w/T2w ratio maps. Here we detail our procedure for performing QA on T1w/T2w ratio images. First, T1w/T2w ratio images were visually inspected, by comparing each image to the average T1w/T2w ratio map from the HCP [9]. This step allowed us to identify regions with excessively high or low values. Some examples include large regions of apparent low myelin in the occipital cortex due to the transverse sinus interfering with accurate identification of the pial surface, or regions of excessive myelin in the medial frontal cortex that likely appeared due to motion. Second, beyond gross imaging or anatomical artifacts, some areas in the brain are affected by susceptibility artifacts more than others [16]. These areas include the orbitofrontal cortex and the regions in the medial temporal cortex such as hippocampal and entorhinal cortices [13,17]. The presence of susceptibility artifacts varies among subjects due to the differences in the scalp, face, and brain anatomy as well as the placement inside the scanner. Characterizing between-subject variability in the T1w/T2w ratio measures could help identify brain regions affected by susceptibility artifacts the most. We developed an empirical procedure to identify the regions with unusually high variability and remove them from the subsequent analyses. The procedure included calculating the coefficient of variation (sd/mean) to summarize the variability within each parcel [18] (https://github.com/manelis-lab/myelin-paper-NICL2021/blob/master/scripts/preprocessing/outlier_regions.R). As susceptibility artifacts result in both increased variability in the T1w/T2w ratio and excessively low estimates of the mean T1w/T2w ratio, regions contaminated with susceptibility artifacts will have a very high coefficient of variation, relative to other regions. Rosner's test for outliers [18,19] was used to identify outlier parcels with excessively high variation. Rosner's test assumes that the data, without outliers, are normally distributed. The most extreme values are iteratively tested, identifying up to 10 possible outliers preventing the data from being normal are identified. It may be appropriate to log-transform the coefficient of variation prior to Rosner's test, if the distribution is heavily skewed. This was not done in the present analysis as we were only interested in outliers in the right-hand side of the distribution. If more than 10 possible outliers are suspected, the test needs to be run multiple times. This procedure identified 11 outlier parcels (Fig. 1), including the bilateral hippocampus, entorhinal cortex, presubiculum, piriform cortex, and posterior orbitofrontal cortex complex, and the right subgenual cingulate (bilateral H, EC, PreS, Pir, pOFC, and right 25). The locations for these outlier parcels were, in general, consistent with areas of high susceptibility artifacts [13,17]. These outlier parcels were removed from the subsequent data analyses, thus, leaving 349 parcels per participant in the data set.

Fig. 1

Results of parcel outlier detection

The coefficient of variation of each parcel (sd/|mean|), colored by whether or not the parcel was determined to be an outlier. sd and |mean| were calculated across all subjects per each parcel.

Results of parcel outlier detection The coefficient of variation of each parcel (sd/|mean|), colored by whether or not the parcel was determined to be an outlier. sd and |mean| were calculated across all subjects per each parcel.

Machine learning pipeline

Elastic-net with nested cross-validation

One limitation of using machine learning in neuroimaging data analysis is that there is a larger number of features (e.g., voxels or parcels) relative to the number of participants in the data sets. Elastic net is a regularized regression which combines LASSO and ridge regression (i.e., L1- and L2- norm regularization) [20], which is widely used to select variables in large multivariate analyses [21], [22], [23]. Ridge regression reduces the size of overly-large coefficients, while LASSO regression removes variables with small coefficients. In this study, elastic net was used to select parcels whose T1w/T2w ratio was sensitive to distinguishing HC from individuals with depressive disorders. All variables used in training the elastic net model are publicly available (https://github.com/manelis-lab/myelin-paper-NICL2021). Previous studies that used elastic net for clinical and neuroimaging data analyses often used a whole sample to identify a sparse model. Such approach can result in model bias and overfitting. To mitigate these effects, we used nested cross-validation - a procedure in which a subset of participants is held-out and the entirety of the analytic pipeline is performed in the remaining participants [24]. The resulting model is tested in the held-out participants. This process is repeated multiple times and results in multiple sparse models. Here we provide details on our customizations to the machine learning pipeline. In the present nested cross-validation analysis, in each iteration of the ‘outer loop’, two participants (one UD and one HC) were held-out for testing, while the remaining participants were used for model training. During model training, the elastic net alpha (α) parameter was set to 0.5. The optimal lambda (λ) parameter was determined using leave-one-out cross-validation of the training data (i.e., the ‘inner loop’). Thus, there were 1833 nested cross-validation models (all possible pairings of UD and HC participants: 39 UD * 47 HC = 1833) and, consequently, 1833 sets of parcels deemed important for UD vs. HC classification (Fig. 2). This analysis allowed us to identify how consistently each parcel was selected across 1833 models. For each parcel, we calculated the proportion of times a given region was selected across all 1833 nested cross-validation models. The less frequently a region is selected, the more strongly its selection depends on the choice of the training and testing samples, indicating that it is a less robust predictor of participant class (e.g., less likely to be generalizable).

Fig. 2

Diagram of analysis steps.

Figure used with permission from Baranger et al., 20211. Conceptual depiction of analysis steps including: (1) a unique pair of one UD and one HC participant is held-out; (2) an elastic net regression is used to select variables; (3) the retained variables are used an LDA model predicting case/control status; (4) the LDA model is tested on the held-out sample; (5) this process is repeated for each of the n=1833 pairs of subjects; (6) for each held-out pair, the training procedure is repeated with 100 unique permutations.

Diagram of analysis steps. Figure used with permission from Baranger et al., 20211. Conceptual depiction of analysis steps including: (1) a unique pair of one UD and one HC participant is held-out; (2) an elastic net regression is used to select variables; (3) the retained variables are used an LDA model predicting case/control status; (4) the LDA model is tested on the held-out sample; (5) this process is repeated for each of the n=1833 pairs of subjects; (6) for each held-out pair, the training procedure is repeated with 100 unique permutations. We used a linear discriminant analysis (LDA) in the ‘outer loop’ to verify how well the set of variables selected by elastic net (in the ‘inner loop’) classified UD vs. HC (Fig. 2). A total of 1833 LDA models were tested as this analysis was run for each repetition of nested cross validation. Specifically, each LDA model was trained using the set of participants and the set of parcels selected by elastic net in that particular ‘inner loop’ cross-validation model . This LDA model was then used to predicted the class (UD or HC) of the two participants who were held-out for each round of nested cross-validation. This procedure was repeated for all 1833 participant pairs (https://github.com/manelis-lab/myelin-paper-NICL2021/blob/master/scripts/analyses/glmnet_with_LDA_myelin_paper.R). Participant-wise accuracy was computed as the percent of nested models in which the held-out participant was correctly classified. Total model accuracy was computed as the average of the participant-wise accuracies. Model sensitivity and specificity were computed as the average of UD-only and HC-only participant accuracies, respectively (https://github.com/manelis-lab/myelin-paper-NICL2021/blob/master/scripts/analyses/process_glmnet_output.R). The strength of the nested cross validation approach is that it produces a relatively unbiased estimate of model parameter (λ), which would not be the case if the entire training data were used for determining the parameter [25,26]. We note than many other classifiers could be used instead of LDA. The complete output of this pipeline is available (https://github.com/manelis-lab/myelin-paper-NICL2021).

Testing the model performance with permutation analyses

While we believed that the parcels selected by all 1833 elastic net models would likely represent signal, the parcels that were selected by 1 model could reflect noise. Permutation analyses were employed to assess the presence and extent of model bias and overfitting. In each permutation, participant class (UD or HC) in the training data set was permuted (randomized), while the class of the hold-out participants remained unchanged (i.e., true) (Fig. 2). The expected accuracy for permuted data is at chance (in the present case, 50%). Deviations from this expectation are indicative of bias or overfitting. The permutation analyses included the same steps for nested cross-validation procedure described above, except that for each cross-validation fold we conducted 100 repetitions of a permutation procedure, which resulted in a total of 183,300 permutation models (1833*100=183,300). A new randomization seed was set prior to each of these permutations to ensure that permutations would be unique, and the results would be reproducible. Following the processing pipeline for the primary analyses, we calculated the proportion of times each variable was selected in the permutation analyses, reflecting how frequently we can expect each variable to be selected by chance. Variables from the primary analyses were considered to be important for distinguishing control from UD if they were selected more than the median value + 3.5 IQR of the selection frequency in permutation analyses. That is, if they were selected more frequently than expected by chance. We applied this threshold, as opposed to computing the significance of each feature via individually permuting them, due to the high computational burden of the latter in the context of nested cross-validation (1833*100*360 = 65,988,000 models), In addition, we computed the LDA accuracies for each permutation round as a validation of the variable selection. Accuracy is expected to be higher in the primary analyses than the permutations (i.e., selected variables are truly predictive of participant class). The deviation of permutation accuracy from chance is indicative of the extent to which results are contaminated by bias or overfitting. Deviation from chance would thus indicate that selected variables may not be valid indicators of the outcome, and would additionally indicate that model results may not be generalizable to new samples (https://github.com/manelis-lab/myelin-paper-NICL2021/blob/master/scripts/analyses/permuted_glmnet_with_LDA_myelin_paper.R).

Outcomes

There are three primary outcomes of this protocol. First, we developed a quality assurance pipeline for T1w/T2w ratio images that includes identifying the outlier regions heavily affected by susceptibility artifacts. Second, we developed a nested cross-validation pipeline that runs the feature selection and linear discrimination to distinguish participant classes (UD vs. HC). Third, we verified the selected regions and classification accuracy, using a permutation approach. The raw data are publicly available (https://openneuro.org/datasets/ds003653/versions/1.0.0) 7. The derivative data as well as the scripts used in this protocol are available (https://github.com/manelis-lab/myelin-paper-NICL2021) [27].

Subject Area:	Neuroscience
More specific subject area:	Neuroimaging and Psychiatry
Protocol name:	Cortical myelin mapping and machine learning-based prediction
Reagents/tools:	Scanning was performed using a 3T Siemens Prisma scanner with a 64-channel receiver head coil.
Experimental design:	Magnetic resonance imaging (MRI) data were collected following the Human Connectome Project (HCP) 3T imaging protocol. High-resolution T1w images were collected using the MPRAGE sequence with TR=2400ms, resolution=0.8 × 0.8 × 0.8mm, 208 slices, FOV=256, TE=2.22ms, flip angle=8°. High-resolution T2w images were collected using TR=3200ms, resolution=0.8 × 0.8 × 0.8mm, 208 slices, FOV=256, TE=563ms. Field maps were collected in the AP and PA directions using the spin echo sequence (TR=8000, resolution=2 × 2 × 2mm, FOV=210, TE=66ms, flip angle=90°, 72 slices).
Trial registration:
Ethics:	The study was approved by the University of Pittsburgh Institutional Review Board and participants gave written informed consent (protocol number STUDY20060265)
Value of the Protocol:	Acquisition, preprocessing, and quality assurance of T1w/T2w cortical myelin maps.Development of a machine learning analysis pipeline to predict depressive disorders with performance evaluation via permutation.

18 in total

1. Optimized EPI for fMRI studies of the orbitofrontal cortex.

Authors: R Deichmann; J A Gottfried; C Hutton; R Turner
Journal: Neuroimage Date: 2003-06 Impact factor: 6.556

2. Mapping human cortical areas in vivo based on myelin content as revealed by T1- and T2-weighted MRI.

Authors: Matthew F Glasser; David C Van Essen
Journal: J Neurosci Date: 2011-08-10 Impact factor: 6.167

3. The first step for neuroimaging data analysis: DICOM to NIfTI conversion.

Authors: Xiangrui Li; Paul S Morgan; John Ashburner; Jolinda Smith; Christopher Rorden
Journal: J Neurosci Methods Date: 2016-03-02 Impact factor: 2.390

4. A rating scale for mania: reliability, validity and sensitivity.

Authors: R C Young; J T Biggs; V E Ziegler; D A Meyer
Journal: Br J Psychiatry Date: 1978-11 Impact factor: 9.319

5. Aberrant levels of cortical myelin distinguish individuals with depressive disorders from healthy controls.

Authors: David A A Baranger; Yaroslav O Halchenko; Skye Satz; Rachel Ragozzino; Satish Iyengar; Holly A Swartz; Anna Manelis
Journal: Neuroimage Clin Date: 2021-08-23 Impact factor: 4.881

6. Machine learning in neuroimaging: Progress and challenges.

Authors: Christos Davatzikos
Journal: Neuroimage Date: 2018-10-06 Impact factor: 6.556

7. Bias in error estimation when using cross-validation for model selection.

Authors: Sudhir Varma; Richard Simon
Journal: BMC Bioinformatics Date: 2006-02-23 Impact factor: 3.169

Review 8. Artifacts in magnetic resonance imaging.

Authors: Katarzyna Krupa; Monika Bekiesińska-Figatowska
Journal: Pol J Radiol Date: 2015-02-23

9. MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites.

Authors: Oscar Esteban; Daniel Birman; Marie Schaer; Oluwasanmi O Koyejo; Russell A Poldrack; Krzysztof J Gorgolewski
Journal: PLoS One Date: 2017-09-25 Impact factor: 3.240

10. The minimal preprocessing pipelines for the Human Connectome Project.

Authors: Matthew F Glasser; Stamatios N Sotiropoulos; J Anthony Wilson; Timothy S Coalson; Bruce Fischl; Jesper L Andersson; Junqian Xu; Saad Jbabdi; Matthew Webster; Jonathan R Polimeni; David C Van Essen; Mark Jenkinson
Journal: Neuroimage Date: 2013-05-11 Impact factor: 6.556

1 in total

1. Behavioral and neuroimaging evidence prodromal to major depressive disorder onset in a young adult without personal or family history of psychiatric disorder: Case report.

Authors: Rachel Miceli; Skye Satz; Holly A Swartz; Anna Manelis
Journal: Psychiatry Res Case Rep Date: 2022-05-18

1 in total