| Literature DB >> 28689495 |
Glenn N Saxe1, Sisi Ma2, Jiwen Ren3, Constantin Aliferis4,5.
Abstract
BACKGROUND: The care of traumatized children would benefit significantly from accurate predictive models for Posttraumatic Stress Disorder (PTSD), using information available around the time of trauma. Machine Learning (ML) computational methods have yielded strong results in recent applications across many diseases and data types, yet they have not been previously applied to childhood PTSD. Since these methods have not been applied to this complex and debilitating disorder, there is a great deal that remains to be learned about their application. The first step is to prove the concept: Can ML methods - as applied in other fields - produce predictive classification models for childhood PTSD? Additionally, we seek to determine if specific variables can be identified - from the aforementioned predictive classification models - with putative causal relations to PTSD.Entities:
Keywords: Child & Adolescent psychiatry; Informatics; Machine learning; PTSD; Traumatic stress
Mesh:
Year: 2017 PMID: 28689495 PMCID: PMC5502325 DOI: 10.1186/s12888-017-1384-1
Source DB: PubMed Journal: BMC Psychiatry ISSN: 1471-244X Impact factor: 3.630
Fig. 1Flowchart for cross validation. a The 5-fold cross validation process. The widths of the rectangular data boxes represent the dimension of features. The heights of the rectangular data boxes represent the dimension of subjects. An orange rectangular data box represents a testing data set. b A simple example of Causal Network. Node T represents the “target” (i.e. response variable); Nodes P represent parents of target; Nodes C represent children of the target; Node S represents spouse of target. MB is the Markov Blanket comprising parents, children and spouse; PC is the parents and children set. For details please see text (c) 5-fold cross validation process with Feature Selection
Fig. 2Performance of Predictive Classification Methods constructed with all variables. Predictive performance that are significant at p < 0.05 is labeled with *, predictive performance that are significant at p < 0.01 is labeled with **. The dotted line placed at AUC = 0.5 indicates the expected performance under the null hypothesis (no signal in the data)
Performance of classifiers and feature selection methods
| Classifier | All features | Feature selection with HITON-PC | |
|---|---|---|---|
| SVM linear | observed data | 0.79 (0.02)** | 0.68 (0.04)* |
| label shuffling | 0.50 [0.32 0.71] | 0.50 [0.36 0.67] | |
| SVM poly | observed data | 0.78 (0.02)* | 0.68 (0.04) |
| label shuffling | 0.50 [0.31 0.71] | 0.50 [0.34 0.71] | |
| SVM RB | observed data | 0.76 (0.02)* | 0.68 (0.04) |
| label shuffling | 0.50 [0.36 0.70] | 0.50 [0.35 0.69] | |
| Random forest | observed data | 0.78 (0.01)** | 0.74 (0.01)* |
| label shuffling | 0.50 [0.33 0.67] | 0.50 [0.33 0.73] | |
| Lasso | observed data | 0.67 (0.01)** | 0.74 (0.01)* |
| label shuffling | 0.50 [0.44 0.57] | 0.50 [0.35 0.68] | |
| Logistic Regression (LR) | observed data | 0.47 (0.01) | 0.72 (0.01) |
| label shuffling | 0.50 [0.35 0.64] | 0.51 [0.32 0.74] | |
| Stepwise LR | observed data | 0.57 (0.02) | 0.72 (0.02)* |
| label shuffling | 0.51 [0.39 0.64] | 0.49 [0.31 0.71] |
The performance (measured as Area Under the ROC Curve) of individual classifiers and feature selection methods in the observed data and under the null hypothesis of no signal in the data (estimated with label shuffling). For observed results the mean and (standard deviation) were presented. For the label shuffling, mean and [95% confidence interval] were presented. Predictive performance that are significant at p < 0.05 is labeled with *, predictive performance that are significant at p < 0.01 is labeled with **
Fig. 3Performance of Predictive Classification Methods with HITON-PC Causal Discovery Feature Selection. Predictive performances that are significant at p < 0.05 are labeled with * and predictive performances that are significant at p < 0.01 are labeled with **. The dotted line placed at AUC = 0.5 indicates the expected performance under the null hypothesis (no signal in the data)
Fig. 4Frequencies of causal variables selected out of 100 bootstrap samples. (Variables of frequency greater than 20% to the left of dotted line)