| Literature DB >> 31316553 |
Garam Lee1,2, Byungkon Kang1, Kwangsik Nho3,4, Kyung-Ah Sohn1, Dokyoon Kim2,5,6.
Abstract
As large amounts of heterogeneous biomedical data become available, numerous methods for integrating such datasets have been developed to extract complementary knowledge from multiple domains of sources. Recently, a deep learning approach has shown promising results in a variety of research areas. However, applying the deep learning approach requires expertise for constructing a deep architecture that can take multimodal longitudinal data. Thus, in this paper, a deep learning-based python package for data integration is developed. The python package deep learning-based multimodal longitudinal data integration framework (MildInt) provides the preconstructed deep learning architecture for a classification task. MildInt contains two learning phases: learning feature representation from each modality of data and training a classifier for the final decision. Adopting deep architecture in the first phase leads to learning more task-relevant feature representation than a linear model. In the second phase, linear regression classifier is used for detecting and investigating biomarkers from multimodal data. Thus, by combining the linear model and the deep learning model, higher accuracy and better interpretability can be achieved. We validated the performance of our package using simulation data and real data. For the real data, as a pilot study, we used clinical and multimodal neuroimaging datasets in Alzheimer's disease to predict the disease progression. MildInt is capable of integrating multiple forms of numerical data including time series and non-time series data for extracting complementary features from the multimodal dataset.Entities:
Keywords: Alzheimer’s disease; data integration; gated recurrent unit; multimodal deep learning; python package
Year: 2019 PMID: 31316553 PMCID: PMC6611503 DOI: 10.3389/fgene.2019.00617
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Longitudinal total intracranial volume, hippocampal volume, and entorhinal cortex thickness from brain imaging data, genomic data, cognitive assessment, and any forms of numerical data that can be taken using our framework. In phase 1 (blue-dashed rectangle), each modality of data is separately processed for learning feature representation. Both time series and non-time series data can be accepted to produce fixed-size feature vectors using a gated recurrent unit (GRU) component (green-dashed rectangle). Then, the learned representations (rectangles colored by red, green, and yellow) are simply concatenated to form an input for logistic regression (LR) classifier in phase 2 (red-dashed rectangle).
Figure 2Classification performances of test set with MildInt, SVM, random forest, and logistic regression using single modality of data (A) and multi-modality of data (B).
Figure 3Classification performances using time series data with single modality (A) and multimodality (B).
Summary statistics for data and hyperparameters in the experiment with real data.
| #Features | Hidden dimension | Time length (avg) | Time length (sd) | |
|---|---|---|---|---|
| Cognitive performance | 2 | 3 | 4.05 | 1.71 |
| Demographic information | 4 | 5 | 1 | 0 |
| CSF | 5 | 6 | 1.69 | 0.95 |
| MRI | 3 | 4 | 1 | 0 |
CSF, cerebrospinal fluid; MRI, magnetic resonance imaging.
Figure 4Predictive performances using multi-modality and single modality of data.
A list of previous models that train classifiers mainly with mild cognitive impairment (MCI) samples.
| Method | Subjects (MCI-C/MCI-NC) | Data source | ACC | SEN | SPE |
|---|---|---|---|---|---|
| SVM ( | 43/48 | MRI, PET, CSF | 0.73 | 0.68 | 0.73 |
| SVM (Cheng et al., 2012) | 43/56 | MRI, FDG-PET, CSF | 0.79 | 0.84 | 0.72 |
| SVM ( | 35/50 | MRI, PET, cognitive score | 0.78 | 0.79 | 0.78 |
| Gaussian process ( | 47/96 | MRI, PET, CSF, APOE genotype | 0.68 |
| 0.52 |
| Hierarchical ensemble ( | 70/61 | MRI | 0.79 | 0.86 | 0.78 |
| Deep neural network ( | 235/409 | MRI, PET |
| 0.79 |
|
| MildInt | 163/376 | Cognitive score, neuroimaging data, CSF biomarker, demographic data | 0.79 | 0.83 | 0.77 |
MCI-C, MCI-Converter; MCI-NC, MCI-NonConverter; ACC, Accuracy; SEN, Sensitivity; SPE, Specificity; APOE, Apolipoprotein E; FDG; Fluorodeoxyglucose.