| Literature DB >> 28056889 |
Satish E Viswanath1, Pallavi Tiwari2, George Lee2, Anant Madabhushi2.
Abstract
BACKGROUND: With a wide array of multi-modal, multi-protocol, and multi-scale biomedical data being routinely acquired for disease characterization, there is a pressing need for quantitative tools to combine these varied channels of information. The goal of these integrated predictors is to combine these varied sources of information, while improving on the predictive ability of any individual modality. A number of application-specific data fusion methods have been previously proposed in the literature which have attempted to reconcile the differences in dimensionalities and length scales across different modalities. Our objective in this paper was to help identify metholodological choices that need to be made in order to build a data fusion technique, as it is not always clear which strategy is optimal for a particular problem. As a comprehensive review of all possible data fusion methods was outside the scope of this paper, we have focused on fusion approaches that employ dimensionality reduction (DR).Entities:
Keywords: Data fusion; Dimensionality reduction; Imaging; Kernels; Non-imaging
Mesh:
Year: 2017 PMID: 28056889 PMCID: PMC5217665 DOI: 10.1186/s12880-016-0172-6
Source DB: PubMed Journal: BMC Med Imaging ISSN: 1471-2342 Impact factor: 1.930
Fig. 1Illustration of data acquired at different length scales from imaging (radiology, pathology) and non-imaging (MR spectroscopy, protein expression) data, which could be combined to create fused predictors of disease aggressiveness and treatment outcome. In this illustration we use the example of prostate to illustrate the types of data that might be acquired before and after radical prostatectomy. In vivo information acquired prior to prostatectomy includes MR imaging and spectroscopy, while the surgical specimen yields digitized histological sections as well as undergoing genomic profiling via mass spectrometry. The middle column of the illustration depicts different knowledge representation methods (e.g. dimensionality reduction, co-association matrices) for uniformly representing multi-modal data. Once represented in a common space, these features can be combined to create a predictive model. An application of this predictive model could include survival curve analysis (far right column, obtained by combining histologic and proteomic features) for identification of prostate cancer patients who will later suffer from biochemical recurrence within 5 years (red) from those who will not (blue)
Brief review of multi-modal data fusion methods from the literature and methodologies that have been used
| Reference | Data | Method |
|---|---|---|
| Moutselos et al. [ | Skin images | Combining features into a confusion matrix |
| Gene expression | ||
| Golugula et al. [ | Histopathology | Correlating features via CCA, combining CCA-based confusion matrices |
| Proteomics | ||
| Dai et al. [ | sMRI | Construct classifiers from features, weighted combination of classifier decisions |
| fMRI | ||
| Gode et al. [ | mRNA | Compute LDR/classifier decisions, unweighted combination of LDR- or classifier-based confusion matrices |
| miRNA | ||
| Raza et al. [ | Gene-expression | Compute classifier decisions, unweighted combination of classifier decisions |
| FNAC | ||
| Sui et al. [ | DTI | Correlate features via CCA, unweighted combination of CCA-based confusion matrices |
| fMRI | ||
| Wolz et al. [ | T1-w MRI | Compute LDR, weighted combination of LDR-based confusion matrices |
| ApoE genotype, A | ||
| Wang et al. [ | T1-w MRI, FDG-PET | Feature selection, weighted concatenation of selected features |
| Gene-expression | ||
| Lanckriet et al. [ | Protein expression | Compute kernel representations, weighted combination of kernels |
| Gene-expression | ||
| Yu et al. [ | Text ontologies | Compute kernel representations, fuse kernel-based confusion matrices |
| Gene-expression | ||
| Higgs et al. [ | CT | Compute LDR, fuse LDR maintaining manifold structure |
| Gene-expression | ||
| Lee et al. [ | Gene-expression | Compute LDR, unweighted concatenation of LDR |
| Histopathology | ||
| Viswanath et al. [ | T2-w | Compute LDR, combine LDR-based confusion matrices using label information |
| ADC, DCE | ||
| Tiwari | T2-w MRI | Compute kernel representations, weighted LDR-based combination of kernels using label information |
| MRS |
CCA Canonical Correlation Analysis, LDR Low-Dimensional Representation. See Description of methods utilized for multi-modal data fusion section for more details
Fig. 2Generalized overview of steps followed for DR-based multimodal data fusion. Knowledge representation refers to transforming each modality individually into a space where modality-specific scale and dimensionality differences are removed. Resampling allows for generation of multiple representations from each data modality to try and maximize the information extracted from it. Knowledge fusion then combines different representations into a single integrated result to build a fused predictor. Weighting enables building of a fused result where the data modalities are differentially considered depending on how well they individually characterize the data. The final fused result is expected to leverage the complementary information from different modalities as best as possible
Summary of the three clinical problems and data cohorts utilized to evaluate the GFA
| Dataset | # Studies | Modalities | Clinical problem addressed |
|---|---|---|---|
|
| 77 | T1-w MRI, protein-expression | Differentiating Alzheimer’s patients from normal subjects |
|
| 40 | Histology, protein expression profiles | Predicting biochemical recurrence in prostate cancer |
|
| 36 (3000 voxels) | T2-w MRI, MR spectroscopy | Detecting prostate cancer on a per-voxel basis |
Summary of different DR-based multimodal data fusion methods considered in this work
| Strategy | Resampling | Representation | Weighting | Fusion |
|---|---|---|---|---|
| DFS-DD | - | Decision | Unweighted | Direct fusion (AND operation) |
| DFS-EC | Feature perturbation | PCA | Unweighted | Co-association matrix fusion |
| DFS-KC | - | Kernels | Weighted, semi-supervised | Co-association matrix fusion |
| DFS-ES | - | LLE | Unweighted | Structural fusion |
DFS Data Fusion Strategy, DD Decision representation, Direct fusion, EC Embedding representation, Co-Association fusion, KC Kernel representation, Co-Association fusion, ES Embedding representation, Structural fusion
Mean and standard deviation in AUC values (obtained via three-fold cross validation) for datasets S 1, S 2, and S 3, while utilizing different DR-based multimodal data fusion methods (see Table 3 for details)
| Strategy | Dataset | Dataset | Dataset |
|---|---|---|---|
| Non-imaging | 0.774 ± 0.043 | 0.511 ± 0.078 | 0.771 ± 0.009 |
| Imaging | 0.885 ± 0.034 | 0.503 ± 0.076 | 0.564 ± 0.036 |
| DFS-DD |
| 0.496 ± 0.079 | 0.752 ± 0.026 |
| DFS-EC | 0.675 ± 0.065a | 0.465 ± 0.111 | 0.720 ± 0.020 |
| DFS-KC | 0.888 ± 0.040 |
|
|
| DFS-ES | 0.789 ± 0.035 | 0.531 ± 0.086 | 0.748 ± 0.013 |
For baseline performance comparison, AUC values for the individual data modalities are also reported
aindicates that the result was statistically significantly worse than comparative strategies
bindicates that the result was statistically significantly better than comparative strategies
The best performing data fusion strategy for each classification task is highlighted in bold
Fig. 3Sample predictive heatmaps for detection of prostate cancer in vivo through combining MRI and MRS data. a shows a T2w MRI section with the MRS grid overlaid in white. The expert annotation of cancer presence is also shown with a red outline around those voxels that were assessed as cancerous. Corresponding automated classification results are shown for using: b T2w MRI texture features alone, c MRS peak area metabolite ratios, d DFS-ES, e DFS-EC, f DFS-KC. These are visualized in the form of heatmaps, where red corresponds to higher probability of CaP presence. The expert annotation of CaP presence is also superposed via a red outline in each image
Description of 327 imaging and 146 proteomic features in Dataset S 1 for classifying AD patients from normal controls
| T1w MRI | # | Description |
| FreeSurfer ROIs extracted | 327 | Subcortical, cortical volumes, surface area, thickness average and standard deviation for Pallidum, Paracentral, Parahippocampal, Opercularis, Pars Orbitalis, Triangularis, Pericalcarine, Cingulate, Frontal, Pareital, Temporal, Caudate, Insula, Occipital etc. |
| Proteomic data | Description | |
| Plasma proteomics | 146 | Microglobulin, Macroglobulin, Apolipoproteins, Epidermal growth factors, Immunoglobulins, Interleukins, Insulin, Monocyte Chemotactic Proteins, Macrophage Inflammatory Proteins, Matrix Metalloproteinases etc. |
Description of 189 histomorphometric and 650 proteomic features in Dataset S 2 to be used to identify patients who will and who will not suffer CaP recurrence within 5 years
| Morphological | # | Description |
| Gland Morphology | 100 | Area Ratio, distance Ratio, Standard Deviation of Distance, Variance of Distance, Distance Ratio,Perimeter Ratio, Smoothness, Invariant Moment 1–7, Fractal Dimension, Fourier Descriptor 1–10 (Mean, Std. Dev, Median, Min/ Max of each) |
| Architectural | Description | |
| Voronoi Diagram | 12 | Polygon area, perimeter, chord length: mean, std. dev., min/max ratio, disorder |
| Delaunay Triangulation | 8 | Triangle side length, area: mean, std. dev., min/max ratio, disorder |
| Minimum Spanning Tree | 4 | Edge length: mean, std. dev., min/max ratio, disorder |
| Co-occurring Gland Tensors | 39 | Entropy, energy: mean, std. dev., range |
| Gland Subgraphs | 26 | Eccentricity, Clustering coefficient C, D, and E, largest connected component: mean, std. dev. |
| Proteomic | Description | |
| Proteins Identified | 650 | Protein-disulfide isomerase A6, T-complex protein subunit delta, ADP-ribosylation factor 1/3, Protein di-sulfide-isomerase, Ras GTPase-activating-like protein IQGAP2, T-complex protein subunit beta, Ras-related protein Rab-5C, ATP-dependent RNA helicase DX3X/DDX3Y, 40S ribosomal protein S17, Serine/arginine-rich splicing factor 7, Tubulin alpha-1A chain/alpha-3C/D chain/ alpha-3E chain, Laminin subunit alpha-4, Collagen alpha-1 (VIII) chain, Tubulin-tyrosine ligase-like protein 12 |
Description of 58 texture and 6 metabolic features in Dataset S 3, extracted from 1.5 Tesla T2w MRI and MRS for identifying prostate cancer (CaP) on a per-voxel basis
| Texture features | # | Description |
| Kirsh Filters | 4 | X-direction, Y-direction, XY-diagonal, YX-diagonal |
| Sobel Filters | 4 | X-direction, Y-direction, XY-diagonal, YX-diagonal |
| Directional Filters | 5 | x-Gradient, y-Gradient, Magnitude of Gradient, 2 Diagonal Gradients |
| First order Gray Level | 8 | Mean, Median, Standard deviation, Range for window size = 3×3,5×5 |
| Haralick features | 13 | Contrast Energy, Contrast Inverse Moment, Contrast Average, Contrast Variance, Contrast Entropy, Intensity Average for window size = 3×3, Intensity Variance, Intensity Entropy, Entropy, Energy, Correlation, info. Measure of Correlation 1, Info. Measure of Correlation 2 |
| Gabor filters | 24 | Filterbank constructed for different combinations of scale and orientation |
| Metabolic features | Description | |
| Metabolites Identified | 6 | Area under peaks for choline ( |