| Literature DB >> 30459704 |
Natalia P Rocha1,2, Benson Mwangi1, Carlos A Gutierrez Candano1,3, Cristina Sampaio4, Erin Furr Stimming2,5, Antonio L Teixeira1.
Abstract
Background: Psychotic symptoms have been under-investigated in Huntington's disease (HD) and research is needed in order to elucidate the characteristics linked to the unique phenotype of HD patients presenting with psychosis. Objective: To evaluate the frequency and factors associated with psychosis in HD.Entities:
Keywords: Enroll-HD; Huntington's disease; behavior; machine learning; psychosis
Year: 2018 PMID: 30459704 PMCID: PMC6232301 DOI: 10.3389/fneur.2018.00930
Source DB: PubMed Journal: Front Neurol ISSN: 1664-2295 Impact factor: 4.003
Figure 1Flowchart showing participants' selection. (A) Conventional statistics and machine learning algorithms were applied to evaluate predictors of psychosis in the periodic dataset containing Enroll-HD participants which meet the criteria for inclusion into the dataset as of November 1, 2015 (Wave 1 sample). (B) Wave 2 dataset composed of new Enroll-HD participants whose information was released by the Enroll-HD as of October 31, 2016 (PDS3) used for validating the machine learning algorithms.
Demographics and clinical characteristics of manifest patients with Huntington's disease (HD) with and without history of psychosis.
| Age in years [mean ± SD (median)] | 52.4 ± 11.7 (53) | 53.6 ± 12.6 (53.5) | 0.18 |
| Sex (% female) | 50.2 | 52.0 | 0.32 |
| Age at motor symptoms onset [mean ± SD (median)] | 45.6 ± 11.6 (46) | 44.1 ± 12.0 (45) | 0.15 |
| Age of clinical HD diagnosis [mean ± SD (median)] | 48.3 ± 12.1 (48) | 47.01 ± 12.5 (47) | 0.21 |
| Mother affected (%) | 47.3 | 41.8 | 0.06 |
| Father affected (%) | 46.7 | 52.3 | 0.06 |
| CAG repeats [mean ± SD (median)] | 44.0 ± 3.7 (43) | 43.9 ± 3.9 (43) | 0.47 |
| Medical history of: | |||
| Alcohol use disorders | 9.3 | 17.3 | <0.0001 |
| Smoking | 49.0 | 51.8 | 0.22 |
| Drugs abuse | 9.7 | 13.7 | 0.04 |
| Marijuana | 86.0 | 88.2 | 0.49 |
| Heroin | 5.5 | 11.8 | 0.16 |
| Cocaine | 29.0 | 41.2 | 0.11 |
| Club drugs (ecstasy, GHB, roofies) | 19.0 | 29.4 | 0.13 |
| Amphetamines | 17.5 | 26.5 | 0.16 |
| Ritalin | 1.5 | 0 | 0.62 |
| Hallucinogens | 19.0 | 17.6 | 0.53 |
| Inhalants | 1.0 | 5.9 | 0.10 |
| Opium | 2.0 | 0 | 0.53 |
| Painkillers | 7.0 | 5.9 | 0.58 |
| Barbiturates/sedatives | 2.5 | 5.9 | 0.27 |
| Tranquilizers | 1.5 | 2.9 | 0.47 |
| Depression (%) | 63.8 | 87.0 | <0.0001 |
| Irritability (%) | 60.5 | 83.1 | <0.0001 |
| Violent/aggressive behavior (%) | 27.2 | 59.3 | <0.0001 |
| Perseverative/obsessive behavior (%) | 39.4 | 73.8 | <0.0001 |
| Apathy (%) | 52.4 | 76.2 | <0.0001 |
| Cognitive impairment (%) | 57.9 | 77.7 | <0.0001 |
| Previous suicidal ideation (%) | 23.2 | 42.5 | <0.0001 |
| Total motor score [mean ± SD (median)] | 38.1 ± 20.9 (35) | 50.1 ± 25.0 (47) | <0.0001 |
| Total functional capacity [mean ± SD (median)] | 8.2 ± 3.5 (9) | 5.3 ± 3.5 (5) | <0.0001 |
| SDMT (total correct) [mean ± SD (median)] | 23.5 ± 13.0 (22) | 16.6 ± 13.5 (15) | <0.0001 |
| Verbal fluency test (category) – number of correct responses in1 min [mean ± SD (median)] | 12.1 ± 5.7 (12) | 9.4 ± 6.1 (9) | <0.0001 |
| Stroop Interference Test – number of correct responses [mean ± SD (median)] | 24.2 ± 11.7 (24) | 18.6 ± 12.6 (17) | <0.0001 |
| TMT-A time to complete [mean ± SD (median)] | 71.8 ± 52.9 (55) | 104.1 ± 70.4 (83) | <0.0001 |
| TMT-A number of correct responses [mean ± SD (median)] | 24.1 ± 4.4 (25) | 22.4 ± 7.1 (25) | <0.0001 |
| TMT-B time to complete [mean ± SD (median)] | 151.2 ± 71.8 (141) | 187.4 ± 66.0 (239) | <0.0001 |
| TMT-B number of correct responses [mean ± SD (median)] | 21.2 ± 9.5 (25) | 17.1 ± 9.6 (24) | <0.0001 |
| MMSE [mean ± SD (median)] | 25.3 ± 4.1 (26) | 22.2 ± 6.6 (24) | <0.0001 |
Only CAG ≥ 36 and manifest HD subjects (N = 2,303) from Enroll-HD which meet the criteria for inclusion into the dataset as of November 1, 2015.
Fisher's exact test;
Mann-Whitney test.
SD, Standard deviation; SDMT, Symbol Digit Modalities Test; TMT, Trail Making Test; MMSE, Mini–Mental State Examination.
Figure 2Algorithm training and testing process. (A) A flow diagram showing algorithm training and testing process. This process which included a *majority class undersampling step to mitigate the class imbalance problem was used in all algorithms except weighted SVM. The majority class undersampling process was repeated 5,000 iterations and predicted probabilities averaged over all iterations. Notably, the weighted SVM did not require a resampling step as it's able to mitigate for class imbalance by weighting the algorithm penalty parameter by the ratio of observations in each class. A standard 10-fold cross validation was used to separate training and testing samples in wave 1. (B) A representation of the 10-fold cross-validation process used in this study. First, the wave 1 sample was randomly separated into ten folds with nearly equal number of subjects in each fold. At every iteration (i.e., 1–10), a machine learning algorithm was trained using the training set and tested using the testing set (in blue). This process was repeated until all folds were left out of the training stage at-least once. Lastly, results were aggregated and used to generate a confusion matrix and ROC curve. The machine learning algorithms' ability to predict history of psychosis was examined using standard statistical metrics such as accuracy, specificity, sensitivity and area under ROC curve. (C) A flow diagram representing the machine learning algorithm validation using Wave 2 data. The algorithm was trained to predict individual subjects' history of psychosis using wave 1 data only and evaluated using wave 2 data.
Final logistic regression model (step 19) to define factors associated with psychosis in Huntington's disease (HD).
| Age at clinical diagnosis | −0.51 | 0.016 | 9.749 | 1 | 0.002 | 0.951 | 0.921 | 0.981 |
| Number of CAG repeats | −0.149 | 0.056 | 7.051 | 1 | 0.008 | 0.862 | 0.772 | 0.962 |
| History of alcohol use disorders | 0.568 | 0.323 | 3.084 | 1 | 0.079 | 1.764 | 0.936 | 3.324 |
| History of depression | 1.235 | 0.372 | 11.053 | 1 | 0.001 | 3.440 | 1.660 | 7.126 |
| History of violent/aggressive behavior | 0.711 | 0.246 | 8.358 | 1 | 0.004 | 2.036 | 1.257 | 3.297 |
| History of perseverative/obsessive behavior | 1.374 | 0.276 | 24.866 | 1 | 0.000 | 3.952 | 2.303 | 6.674 |
| TFC score | −0.074 | 0.043 | 3.025 | 1 | 0.082 | 0.929 | 0.854 | 1.009 |
| TMT-B (time to complete) | 0.006 | 0.002 | 7.207 | 1 | 0.007 | 1.006 | 1.001 | 1.010 |
The analysis considered only CAG ≥ 36 and manifest HD subjects from Enroll-HD which meet the criteria for inclusion into the dataset as of November 1, 2015 (Wave 1 sample; N = 2,303).
CI, confidence interval; df, degrees of freedom; TFC, total functional capacity; TMT, trail making test; SE, standard error.
Algorithm performance in distinguishing between individuals who presented and who did not present history of psychosis (Wave 1 analysis, i.e., Enroll-HD participants which meet the criteria for inclusion into the dataset as of November 1, 2015).
| LASSO | 0.72 | 0.70 | 0.79 | 0.69 | 0.75 | 0.71 | 0.73 | |
| Elastic net | 0.70 | 0.68 | 0.78 | 0.67 | 0.73 | 0.69 | 0.71 | |
| SVM | 0.72 | 0.70 | 0.72 | 0.70 | 0.74 | 0.71 | 0.73 | |
| Random forest | 0.71 | 0.71 | 0.77 | 0.71 | 0.72 | 0.71 | 0.71 | |
| Weighted SVM | 0.71 | 0.72 | 0.78 | 0.72 | 0.71 | 0.71 | 0.71 |
AUC, Area under the curve; LASSO, Least Absolute Shrinkage and Selection Operator; NPV, negative predictive value; PPV, positive predictive value; SVM, Support Vector Machines. Sensitivity and specificity represented correctly predicted history of psychosis (true positive rate) and correctly predicted no history of psychosis (true negative rate), respectively. The “classical” prediction accuracy was provided by taking the sum of true positives and true negatives divided by the total sample. Due to the class imbalance problem, a “balanced” accuracy was also calculated as the average of predicted sensitivity and specificity. PPV was calculated as the proportion of individuals predicted as having history of psychosis and who actually presented history of psychosis. NPV was calculated as the proportion of individuals who were predicted as not having history of psychosis and who actually did not present history of psychosis. Permutation-based p-values were calculated for the weighted SVM using the permutations tests method presented by Ojala et al. (.
Figure 3Weighted SVM algorithm. (A) Confusion matrix and (B) Receiver Operating Characteristic (ROC) curve for the weighted SVM algorithm in wave 1 data. H-Psych, history of psychosis; NH-Psych, no history of psychosis; AUC, area under the curve. (C) Bar graph containing coefficients or weighting factors assigned to each variable by the weighted SVM algorithm. tfcscore, total functional capacity score; trlb1, Trail making test (TMT)-B, time to complete; ccpob, history of perseverative/obsessive behavior; ccdep, history of depression; trlb2, TMT-B, total correct; trla2, TMT-A, total correct; hxalcab, history of alcohol use disorders; ccvab, history of violent/aggressive behavior; hxsid, previous suicidal ideation; caghigh, number of CAG repeats; trla1, TMT-A, time to complete; momhd, mother affected; motscore, total motor score; ccirb, history of irritability; hxtobab, history of smoking; ccapt, history of apathy; cccog, history of cognitive impairment; mmsetotal, mini-mental state examination, total score; hxdrugab, history of drugs abuse (hxdrugab); verfct5, verbal fluency test (animals), total correct in 1 min; ccmtrage, age at motor symptoms onset; sit1, Stroop interference test, total correct; hddiagn, age of clinical HD diagnosis; sdmt1, symbol digit modalities test, total correct.
Algorithm performance in distinguishing between individuals who presented and who did not present history of psychosis in Wave 2 data (new data released at the Enroll-HD periodic dataset containing information as of October 31, 2016).
| LASSO | 0.71 | 0.68 | 0.79 | 0.67 | 0.75 | 0.69 | 0.73 | |
| Elastic net | 0.70 | 0.67 | 0.77 | 0.66 | 0.74 | 0.69 | 0.72 | |
| SVM | 0.72 | 0.69 | 0.79 | 0.68 | 0.75 | 0.70 | 0.73 | |
| Random forest | 0.71 | 0.68 | 0.78 | 0.69 | 0.73 | 0.7 | 0.72 | |
| Weighted SVM | 0.72 | 0.70 | 0.79 | 0.69 | 0.75 | 0.71 | 0.73 |
AUC, Area under the curve; LASSO, Least Absolute Shrinkage and Selection Operator; NPV, negative predictive value; PPV, positive predictive value; SVM, Support Vector Machines.