| Literature DB >> 32440643 |
Marc J Lanovaz1, Antonia R Giannakakos2, Océane Destras3.
Abstract
Visual analysis is the most commonly used method for interpreting data from single-case designs, but levels of interrater agreement remain a concern. Although structured aids to visual analysis such as the dual-criteria (DC) method may increase interrater agreement, the accuracy of the analyses may still benefit from improvements. Thus, the purpose of our study was to (a) examine correspondence between visual analysis and models derived from different machine learning algorithms, and (b) compare the accuracy, Type I error rate and power of each of our models with those produced by the DC method. We trained our models on a previously published dataset and then conducted analyses on both nonsimulated and simulated graphs. All our models derived from machine learning algorithms matched the interpretation of the visual analysts more frequently than the DC method. Furthermore, the machine learning algorithms outperformed the DC method on accuracy, Type I error rate, and power. Our results support the somewhat unorthodox proposition that behavior analysts may use machine learning algorithms to supplement their visual analysis of single-case data, but more research is needed to examine the potential benefits and drawbacks of such an approach.Entities:
Keywords: AB design; Artificial intelligence; Error rate; Machine learning; Single-case design
Year: 2020 PMID: 32440643 PMCID: PMC7198678 DOI: 10.1007/s40614-020-00244-0
Source DB: PubMed Journal: Perspect Behav Sci ISSN: 2520-8969
Some Machine-Learning Terms
| Term | Description |
|---|---|
| Algorithm | At its broadest, an algorithm is a set of instructions that provides a solution to a problem. In the case of machine learning, these instructions are typically statistical computations that aim to predict the value of a label. |
| Hyperparameter | A hyperparameter is a value or a function of the algorithm that is set by the experimenter. That said, the experimenter may compare the results produced by different combinations of hyperparameters using a validation set in order to select the best model. |
| Features | The features represent the input data, which are transformed by the algorithm to provide a prediction. In the current study, the eight features were: (1) the mean of points in Phase A, (2) the mean of points in Phase B, (3) the standard deviation of points in Phase A, (4) the standard deviation of points in Phase B, (5) the intercept of LSRL for Phase A, (6) the slope of LSRL for Phase A, (7) the intercept of LSRL for Phase B, and (8) the slope of LSRL for Phase B. |
| Label | The label is what the algorithm is trying to predict from a set of features. The current study has a single binary label: clear change (1) or no clear change (0). |
| Model | A model refers to a specific algorithm with fixed hyperparameters and parameters. |
| Parameters | The parameters are the values that are fit to the training data (e.g., weights). |
| Test set | A set of features and labels that are never used in fitting the parameters or fixing the hyperparameters. This set is used to test for generalization. |
| Training set | A set of features and labels used during training to fit the best parameters to the model. |
| Validation set | A set of features and labels that are used to compare the accuracy of different combinations of hyperparameters. The validation set is not used during training to set the parameters. It simply allows the selection of the model that produces the best generalization. |
Note: LSRL = Least squares regression line
Constant and Variable Hyperparameters for Each Machine-Learning Algorithm
| Algorithm | Hyperparameters | |
|---|---|---|
| Constant Values | Values Tested | |
| SGD | Loss: Logistic regression Penalty: ElasticNet | Learning rate: 10-5–10-2 Epochs: 5–1,000 by 5 |
| SVC | Kernel: Radial basis function | Penalty C term: 1, 10, 100 Gamma: 10-5–10-1 |
| Random forest | Estimators: 10–190 by 10 | |
| DNN | Early stopping: No improvement in loss function for 30 epochs Learning rate optimizer: Adam Loss: Binary cross entropy Neuron activation function: ReLu Output activation function: Sigmoid | Neurons: 23–26 Hidden layers: 0, 1, 2, 4, 6 |
Note: SGD = stochastic gradient descent; SVC = support vector classifier; DNN = dense neural network
Fig. 1The upper panel shows a two-dimensional graph representing two features: x1 and x2. Closed points represent one category and opened points a different category. The lower panel depicts the addition of a higher dimension (z) and a linear plan that separates the two categories
Fig. 2A decision tree where the percentage of correct responding in the next-to-last (x1) and last (x2) sessions are used to decide whether a concept is mastered (y = 1) or not mastered (y = 0)
Fig. 3Dense neural network with four features, two hidden layers with four neurons each and a prediction
Fig. 4Graphs showing a clear change and no clear change (upper panels) and applications of the dual-criteria method (bottom panels)
Fig. 5Normalization of graph data and extraction of features prior to training, validation, and testing
Hyperparameters of Best Model for Each Dataset
| Algorithm | Hyperparameter | Complete nonsimulated dataset | Nonsimulated dataset with agreements only |
|---|---|---|---|
| SGD | Learning rate | .0001 | .00001 |
| Epochs | 60 | 215 | |
| SVC | Penalty Term | 10 | 100 |
| Gamma | .1 | .01 | |
| Random Forest | Estimators | 30 | 180 |
| DNN | Neurons | 16 | 16 |
| Hidden Layers | 6 | 2 |
Note: SGD = stochastic gradient descent; SVC = support vector classifier; DNN = dense neural network
Correspondence between Visual Analysis and Models Derived from Different Machine Learning for Test Sets Showing Clear Change and No Clear Change
| Algorithm | Models trained with complete nonsimulated dataset | Models trained with nonsimulated dataset with agreements only | ||||
|---|---|---|---|---|---|---|
| Overall | Clear change | No clear change | Overall | Clear change | No clear change | |
| DC method | .869 | .822 | .914 | .942 | .948 | .937 |
| SGD | .914 | .930 | .899 | .951 | .948 | .953 |
| SVC | .929 | .938 | .921 | .959 | .948 | .969 |
| Random forest | .902 | .868 | .935 | .947 | .948 | .945 |
| DNN | .925 | .899 | .950 | .963 | .957 | .969 |
Note: DC = dual-criteria; SGD = stochastic gradient descent; SVC = support vector classifier; DNN = dense neural network
Accuracy, Type I Error Rate and Power of the DC Method and Models Derived from Different Algorithms on Simulated Data with Three Points in Phase A and Five Points in Phase B
| Algorithm | Models trained with complete nonsimulated dataset | Models trained with nonsimulated dataset with agreements only | ||||
|---|---|---|---|---|---|---|
| Accuracy | Type I error | Power | Accuracy | Type I error | Power | |
| DC method | .909 | .059 | .878 | .909 | .059 | .878 |
| SGD | .948 | .064 | .959 | .949 | .050 | .972 |
| SVC | .943 | .063 | .949 | .944 | .054 | .942 |
| Random forest | .940 | .054 | .933 | .937 | .059 | .932 |
| DNN | .946 | .051 | .943 | .935 | .067 | .938 |
Note: DC = dual-criteria; SGD = stochastic gradient descent; SVC = support vector classifier; DNN = dense neural network
Accuracy, Type I Error Rate and Power of the DC Method and Models Derived from Different Algorithms on Simulated Data with Varying Number of Points Per Phase
| Algorithm | Models trained with complete nonsimulated dataset | Models trained with nonsimulated dataset with agreements only | ||||
|---|---|---|---|---|---|---|
| Accuracy | Type I error | Power | Accuracy | Type I error | Power | |
| DC method | .913 | .074 | .901 | .913 | .074 | .901 |
| SGD | .963 | .025 | .952 | .962 | .018 | .942 |
| SVC | .961 | .024 | .947 | .960 | .020 | .940 |
| Random forest | .953 | .022 | .928 | .947 | .028 | .923 |
| DNN | .960 | .017 | .937 | .954 | .027 | .935 |
Note; DC = dual-criteria; SGD = stochastic gradient descent; SVC = support vector classifier; DNN = dense neural network