| Literature DB >> 32581662 |
Abstract
MVPA-Light is a MATLAB toolbox for multivariate pattern analysis (MVPA). It provides native implementations of a range of classifiers and regression models, using modern optimization algorithms. High-level functions allow for the multivariate analysis of multi-dimensional data, including generalization (e.g., time x time) and searchlight analysis. The toolbox performs cross-validation, hyperparameter tuning, and nested preprocessing. It computes various classification and regression metrics and establishes their statistical significance, is modular and easily extendable. Furthermore, it offers interfaces for LIBSVM and LIBLINEAR as well as an integration into the FieldTrip neuroimaging toolbox. After introducing MVPA-Light, example analyses of MEG and fMRI datasets, and benchmarking results on the classifiers and regression models are presented.Entities:
Keywords: MVPA; classification; cross-validation; decoding; machine learning; regression; regularization; toolbox
Year: 2020 PMID: 32581662 PMCID: PMC7287158 DOI: 10.3389/fnins.2020.00289
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1Structure of MVPA-Light.
Figure 2Overview of the available classifiers. Dots represent samples, color indicates the class. LDA: different classes are assumed to have the same covariance matrix, indicated by the ellipsoids. Gaussian Naive Bayes: features are conditionally independent, yielding diagonal covariance matrices. Logistic regression: a sigmoid function (curved plane) is fit to directly model class probabilities. SVM: a hyperplane (solid line) is fit such that the margin (distance from hyperplane to closest sample; indicated by dotted lines) is maximized. Ensemble: multiple classifiers are trained on subsets of the data. In this example, their hyperplanes partition the data into spaces belonging to classes 1 and 2. After applying all classifiers to a new data point and collecting their “votes,” the class receiving most votes is selected. Kernel methods: in this example the optimal decision boundary is circular (circle), hence the data is not linearly separable. After projection into a high-dimensional feature space using a map ϕ, the data becomes linearly separable (solid line) and a linear classifier such as SVM or LDA can be successfully applied in this space.
Metrics in MVPA-Light.
| Classification | [0,1] | Fraction correctly predicted class labels. | |
| [0,1] | For two classes only. An alternative to classification accuracy that is more robust to imbalanced classes. Requires continuous classifier output (decision values or probabilities). 0.5 means chance-level performance and 1 means perfect separation of the classes. | ||
| [0,1] | Confusion matrix. Rows corresponds to true class, columns to predicted class. The ( | ||
| (−∞, +∞) | For two classes only. Average decision value, for each class separately. | ||
| [0,1] | Combines precision (PR) and recall (R) into a single score using the harmonic average 2*PR*R / (PR+R). | ||
| [-1, 1] | Cohen's kappa, a measure of inter-rater reliability. | ||
| [0,1] | TP / (TP + FP). Fraction of samples labeled as positive that actually belong to the positive class. For multi-class, it is calculated per class from the confusion matrix. | ||
| [0,1] | TP / (TP + FN). Fraction of positive samples that have been detected. For multi-class, it is calculated per class from the confusion matrix. | ||
| (−∞, +∞) | For two classes only. T-test statistic for the unequal sample size, equal variance case, based on decision values. | ||
| (−∞, +∞) | Returns a cell array with the raw classifier outputs for all test sets. | ||
| Regression | [0, ∞) | Mean absolute error: | |
| [0, ∞) | Mean squared error: | ||
| (−∞, 1] |
TP, true positives; FP, false positives; FN, false negatives. Regression: y = responses, ŷ = model predictions.
Figure 3Results for the classification analysis of the Wakeman and Henson (2015) MEEG data. (A) Multi-class classification (famous vs. unfamiliar vs. scrambled faces) of N170 and sustained ERP component. (B) AUC is plotted as a function of time for famous vs. scrambled images. The classification was performed using three different channel sets: EEG only, MEG only, and EEG+MEG combined. (C) Binary classification (famous vs. scrambled and famous vs unfamiliar) for time-frequency data. AUC is plotted as a function of both time and frequency. The AUC values are color-coded. (D) Time x time generalization and frequency x frequency generalization using a binary classifier (famous vs. scrambled). (E) Level 2 statistical analysis of the time-frequency classification. (F) Level 1 statistical analysis of the time x time generalization, shown exemplarily for subject 1.
Figure 4Results for the classification analysis of the Haxby et al. (2001) fMRI data. (A) Confusion matrix for multi-class (8 classes) classification based on voxels in the ventral temporal area, averaged across subjects. (B) Multi-class (8 classes) classification accuracy was calculated for each time point following stimulus onset. Lines depict means across subjects, shaded areas correspond to standard error. Masks were used to select voxels in the ventral temporal area (yellow line), voxels responsive to faces (blue), or voxels responsive to houses (red). (C) Cluster permutation test results based on a searchlight analysis using a binary classifier (faces vs houses). Red spots represent AUC values superimposed on axial slices of the averaged structural MRI. All depicted AUC values correspond to the significant cluster; other AUC values have been masked out.
Figure 5Mean ERP classification accuracy for the benchmarking analysis using the MEG single-subjects data (averaged across subjects). MVPA-Light is depicted as a solid black line.
Benchmarking results: mean training time and standard deviation in seconds for different classifiers.
| MEG single-subjects | MVPA-Light | 0.07 ± 0.002 | ||||
| LIBLINEAR | – | 0.014 ± 0.0009( | – | – | ||
| LIBSVM | – | – | – | 0.098 ± 0.01 | 0.125 ± 0.001 | |
| MATLAB | 0.026 ± 0.0008 | 0.03 ± 0.006 | 0.05 ± 0.0001 | 0.041 ± 0.004 | 0.023 ± 0.0004 | |
| Scikit Learn | 0.097 ± 0.0006 | 0.1 ± 0.005 | 0.007 ± 0.0001 | 0.37 ± 0.052 | 0.45 ± 0.032 | |
| R | 0.084 ± 0.0003 | 0.013 ± 0.002 | 0.04 ± 0.0001 | 0.71 ± 0.113 | 0.41 ± 0.026 | |
| MEG super-subject | MVPA-Light | 0.437 ± 0.0062 | 10.122 ± 1.05 | |||
| LIBLINEAR | – | 0.732 ± 0.068( | – | – | ||
| LIBSVM | – | – | – | 42.089 ± 4.188 | 37.941 ± 0.404 | |
| MATLAB | 0.149 ± 0.002 | 0.279 ± 0.137 | 0.231 ± 0.027 | 20.98 ± 1.78 | 11.65 ± 0.217 | |
| Scikit Learn | 0.596 ± 0.017 | 2.065 ± 0.109 | 0.09 ± 0.001 | 32.19 ± 2.07 | 34.56 ± 0.38 | |
| R | 0.84 ± .004 | 0.144 ± .0006 | 1123.16 ± 27.39 | 123.31 ± 9.38 | ||
| fMRI | MVPA-Light | OOM | ||||
| LIBLINEAR | – | - | 2.235 ± 0.218( | - | ||
| LIBSVM | – | – | – | 11.79 ± 0.787 | 11.88 ± 0.822 | |
| MATLAB | OOM | 23.79 ± 4.008 | 357.49 ± 2.205 | 5.053 ± 0.325 | 4.845 ± 0.308 | |
| Scikit Learn | 24.45 ± 1.1 | 20.68 ± 4.24 | 2.86 ± 0.06 | 10.46 ± 0.59 | 9.15 ± 0.59 | |
| R | OOM | 7.1 ± 1.13 | 18.48 ± 0.35 | 39.67 ± 1.98 | 43.3 ± 2.18 | |
For each combination of dataset and classifier, the fastest model is marked in bold. OOM, out of memory error; (p), primal form; (d), dual form.
Benchmarking results: mean training time and standard deviation in seconds for different regression models.
| MEG single-subjects | MVPA-Light | – | – | ||
| LIBSVM | – | – | 0.02 ± 0.001 | ||
| MATLAB | 0.0061 ± 0.0002 | – | 0.023 ± 0.0005 | ||
| Scikit Learn | 0.0069 ± 0.0003 | 0.023 ± 0.003 | 0.654 ± 0.0647 | 0.481 ± 0.02 | |
| R | 0.055 ± 0.0027 | – | 1.59 ± 0.094 | 0.43 ± 0.002 | |
| MEG super-subject | MVPA-Light | – | – | ||
| LIBSVM | – | – | |||
| MATLAB | 0.186 ± 0.007 | – | 6.931 ± 0.237 | 9.9798 ± 0.239 | |
| Scikit Learn | 0.062 ± 0.005 | 14.51 ± 0.21 | 3.213 ± 0.394 | 31.61 ± 1.51 | |
| R | 0.547 ± 0.0079 | – | 465.08 ± 49.83 | 151.66 ± 26.76 | |
| fMRI | MVPA-Light | 2.026 ± 0.256 | – | – | |
| LIBSVM | – | – | |||
| MATLAB | OOM | – | 4.545 ± 0.353 | 4.563 ± 0.284 | |
| Scikit Learn | 0.638 ± 0.022 | 16.138 ± 3.64 | 9.999 ± 0.59 | ||
| R | 7.503 ± 0.593 | – | 37.211 ± 2.056 | 41.037 ± 2.298 | |
For each combination of dataset and model, the fastest model is marked in bold. OOM, out of memory error; (p), primal form; (d), dual form.