| Literature DB >> 35126080 |
Mariana Bento1,2,3, Irene Fantini4, Justin Park2,3,5, Leticia Rittner4, Richard Frayne2,3,5,6.
Abstract
Large, multi-site, heterogeneous brain imaging datasets are increasingly required for the training, validation, and testing of advanced deep learning (DL)-based automated tools, including structural magnetic resonance (MR) image-based diagnostic and treatment monitoring approaches. When assembling a number of smaller datasets to form a larger dataset, understanding the underlying variability between different acquisition and processing protocols across the aggregated dataset (termed "batch effects") is critical. The presence of variation in the training dataset is important as it more closely reflects the true underlying data distribution and, thus, may enhance the overall generalizability of the tool. However, the impact of batch effects must be carefully evaluated in order to avoid undesirable effects that, for example, may reduce performance measures. Batch effects can result from many sources, including differences in acquisition equipment, imaging technique and parameters, as well as applied processing methodologies. Their impact, both beneficial and adversarial, must be considered when developing tools to ensure that their outputs are related to the proposed clinical or research question (i.e., actual disease-related or pathological changes) and are not simply due to the peculiarities of underlying batch effects in the aggregated dataset. We reviewed applications of DL in structural brain MR imaging that aggregated images from neuroimaging datasets, typically acquired at multiple sites. We examined datasets containing both healthy control participants and patients that were acquired using varying acquisition protocols. First, we discussed issues around Data Access and enumerated the key characteristics of some commonly used publicly available brain datasets. Then we reviewed methods for correcting batch effects by exploring the two main classes of approaches: Data Harmonization that uses data standardization, quality control protocols or other similar algorithms and procedures to explicitly understand and minimize unwanted batch effects; and Domain Adaptation that develops DL tools that implicitly handle the batch effects by using approaches to achieve reliable and robust results. In this narrative review, we highlighted the advantages and disadvantages of both classes of DL approaches, and described key challenges to be addressed in future studies.Entities:
Keywords: MR brain imaging; batch effects; data aggregation; deep learning; domain adaptation; machine learning; multi-site datasets
Year: 2022 PMID: 35126080 PMCID: PMC8811356 DOI: 10.3389/fninf.2021.805669
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Figure 1Graphical paper outline. Data Access (left): including description of heterogeneous datasets due to variations in acquisition, curation, and annotation; challenges with access to publicly available data, and issues relating to ethics and privacy concerns. Data Harmonization (middle): including model development from standardized image sets generated by performing pre-processing and outlier detection algorithms. Domain Adaptation (right): including use of advanced techniques and models that improve generalization and reliability by using techniques, such as domain transfer and multi-task learning, as well as adversarial network approaches. Data Harmonization and Domain Adaptation are further organized by proposed task: segmentation and classification models.
Figure 2Example of fluid-attenuated inversion recovery (FLAIR) images in a multi-site dataset: (A) healthy control participant imaged on a specific scanner vendor; (B) a patient with a pathology (white matter hyperintensities, WMH, Wardlaw et al., 2013) imaged on the same scanner vendor; (C) a second patient with WMH pathology imaged on a different scanner vendor. Images acquired using different scanner vendors may present different image contrasts, shape, and other characteristics. These varying characteristics may impact the DL models development and performance due to unwanted correlations in the dataset (that are not correlated with the pathology occurrence).
Summary of some publicly available MR brain imaging datasets.
|
|
|
|
|
| |
|---|---|---|---|---|---|
|
|
|
|
|
|
|
| ABIDE (Di Martino et al., | 1112 | T1-w | NO | HC, Autism | YES |
| ADNI (Wyman et al., | 2542 | T1-w, T2-w, FLAIR, DTI | YES | HC, MCI, AD | NO |
| Calgary-Campinas-359 (CC-359) (Souza et al., | 359 | T1-w | NO | HC | NO |
| Cambridge Center for Aging Neuroscience (Taylor et al., | 653 | T1-w, T2-w | NO | HC | YES |
| Connectome Coordination Facility (CCF) - HCP Young Adult | |||||
|
| 1113 | T1-w | NO | HC | YES |
| Dallas Lifespan Brain Study (Bischof and Park, | 315 | T1-w | NO | HC | YES |
| Information eXtraction from Images (IXI) | |||||
| ( | 600 | T1-w, T2-w, PD, DWI | NO | HC | NO |
| MICCAI 2015 BRATS (Menze et al., | 262 | T1-w, T2-w, FLAIR | NO | GBM | YES |
| MICCAI 2019 WMH Segmentation (Kuijf et al., | 60 | T1-w, FLAIR | NO | SVD | YES |
| MIRIAD (Malone et al., | 69 | T1-w | YES | HC, AD | YES |
| MR-MS (Lesjak et al., | 36 | T1-w, T2-w, FLAIR | YES | MS | YES |
| Neuroimaging Tools and Resources Collaboratory (NITRC) | |||||
| ( | 65 | T1-w, T2-w, PD | NO | HC | YES |
| OASIS (Fillmore et al., | 1664 | T1-w | NO | AD | NO |
| PPMI (Simuni et al., | 1460 | T1-w | YES | HC, Prodomal Parkinson's | YES |
| Southwest University Adult Lifespan Dataset (Wei et al., | 494 | T1-w | NO | HC | YES |
| TCIA ( | 955 | T1-w | NO | GBM | YES |
The list includes relevant information, such as number of participants, acquisition sequences, if longitudinal data are available, the sampled population and if manual labels are available. AD, Alzheimer's disease; DWI, diffusion-weighted imaging; DTI , diffusion tensor imaging; FLAIR, fluid-attenuated inversion recovery; GBM, glioblastoma; HC, healthy control; MCI, mild cognitive impairment; MS, multiple sclerosis; PD, proton density; SVD, small vessel disease; T1-w, T1-weighted; T2-w, T2-weighted.
Figure 3Data Harmonization (section 3) and Domain Adaptation (section 4) are related concepts that address issues related to batch effect correction. Data Harmonization attempts to standardize images, explicitly minimizing the batch effects. Conversely, Domain Adaptation implicitly corrects batch effects while achieving the modeling objective. A hybrid approach that first includes elements of data harmonization and is followed by a domain adaptation approach (solid horizontal arrow) is possible and may lead to improved results.
Summary of reviewed papers organized by batch effect correction approach (denoted by shading): explicitly using data harmonization vs. implicitly using domain adaptation; and principal task (segmentation vs. classification).
|
|
|
|
| |
|---|---|---|---|---|
|
|
|
|
|
|
| Smith et al. ( | Explicit | N/A | SVD | Propose a harmonizing framework for neuroimaging studies |
| Jovicich et al. ( | Explicit | N/A | Phantom | Improve reproducibility in morphometric studies |
| Fortin et al. ( | Explicit | N/A | HC | Investigate scanner effects on cortical thickness measurements |
| Li et al. ( | Explicit | N/A | HC | Propose a denoising method to reduce variance related to scanner |
| Zeng et al. ( | Explicit | N/A | Synthetic data and HC | Perform super-resolution reconstructions |
| Gauriau et al. ( | Explicit | Segmentation | Various pathologies | Study abnormalities in MR images related to the presence of brain pathology |
| Despotovic et al. ( | Explicit | Segmentation | HC | Compare different brain segmentation strategies with focus on the pre-processing |
| Liao et al. ( | Explicit | Segmentation | Phantoms and HC | Study the impact of intensity inhomogeneities or bias fields |
| Ahmed et al. ( | Explicit | Segmentation | HC | Estimate and compensate for intensity inhomogeneities |
| Shinohara et al. ( | Explicit | Segmentation | MS | Propose several automated lesion-detection and whole-brain analysis using protocol harmonization |
| Sajja et al. ( | Explicit | Segmentation | MS | Minimize false-positive lesion classification on brain segmentation |
| Dewey et al. ( | Explicit | Segmentation | HC | Propose a data harmonization-segmentation method based on image contrast (DeepHarmony) |
| Swati et al. ( | Explicit | Classification | GBM and HC | Perform brain tumor classification using transfer learning considering unbalance training data |
| Van Leemput et al. ( | Explicit | Classification | MS | Propose a fully automated bias field correction prior to tissue classification |
| Tohka et al. ( | Explicit | Classification | AD, MCI and HC | Compare different feature selection approaches in dementia classification |
| Sundaresan et al. ( | Implicit | Segmentation | AD, MS, Stroke and SVD | Develop domain adaptation strategies to segment brain lesions |
| Zeng et al. ( | Implicit | Segmentation | Compare lesion segmentation methods according to improve translation to clinical environment | |
| Orbes-Arteaga et al. ( | Implicit | Segmentation | SVD | Propose a domain adaptation strategy for using data augmentation and adversarial networks |
| Ghafoorian et al. ( | Implicit | Segmentation | SVD | Study how to properly apply domain adaptation: required amount of data from new domain and portion of model to be retrained |
| Akkus et al. ( | Implicit | Segmentation | Survey brain MR imaging segmentation methods according to the usage of data augmentation and transfer learning | |
| Karani et al. ( | Implicit | Segmentation | HC | Propose a brain structure segmentation method for single CNN with shared convolutional filters and domain-specific batch normalization layers |
| Ackaouy et al. ( | Implicit | Segmentation | GBM and HC | Describe an unsupervised domain-shift approach for brain abnormality segmentation |
| Kondrateva et al. ( | Implicit | Classification | Analyze and compare different approaches for domain-shift problem with focus on advanced data processing, auto-encoding neural networks and their domain-invariant variations, model architecture enhancing, and feature training | |
| Hofer et al. ( | Implicit | Classification | AD and HC | Propose an approach to perform adaptation in feature space directly to reduce domain shift impact |
| Islam and Zhang ( | Implicit | Classification | AD and HC | Identify different stages of AD and obtained superior performance for diagnosing early-stage disease using ensemble Deep CNNs |
| Jain et al. ( | Implicit | Classification | AD and HC | Propose AD classification approach based on transfer learning minimizing the pre-processing steps |
| Zhang et al. ( | Implicit | Classification | AD and HC | Perform brain disease identification using unsupervised conditional consensus adversarial network |
The study participant population and main aim are provided. AD, Alzheimer's disease; GBM, glioblastoma; HC, healthy control; MCI, mild cognitive impairment; MS, multiple sclerosis; N/A, Not-applicable; SVD, small vessel disease.