| Literature DB >> 27935995 |
Joanna F Dipnall1,2, Julie A Pasco1,3,4,5, Michael Berk1,5,6,7,8, Lana J Williams1,5, Seetal Dodd1,5,6,8, Felice N Jacka1,6,9,10, Denny Meyer2.
Abstract
BACKGROUND: Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study.Entities:
Mesh:
Year: 2016 PMID: 27935995 PMCID: PMC5147841 DOI: 10.1371/journal.pone.0167055
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flowchart of Methods, Testing and Results.
Fig 2Training progress and SOM plots.
Note: The “Training progress” graph indicates as the SOM training iterations distance from each node's weights to the samples represented by that node reduces and plateaus to indicate no more iterations were required. The “Counts plots” indicates reasonable samples were mapped to each node on the map. The “Neighbour distance plot” or U-Matrix indicates the distance between each node and its neighbours.
Fig 3Hierarchical Cluster Options for SOM.
Note: Clusters 3 to 12 solutions mapped onto the SOM grid. Colours indicate different clusters. The final 10 cluster solution selected for further analysis has been highlighted with a red border.
Frequency Distribution of Initial Depression Ordered SOM Cluster Solution.
| Cluster | Frequency | Percent |
|---|---|---|
| 2 | 34 | 0.87 |
| 3 | 57 | 1.45 |
| 5 | 50 | 1.27 |
| 6 | 83 | 2.12 |
| 7 | 55 | 1.4 |
| 8 | 52 | 1.33 |
| 9 | 29 | 0.74 |
| 10 (Dropped) | 8 | 0.2 |
Note: Dominant clusters in bold. Cluster shaded dropped due to very small base (n = 8).
Fig 4Mean depression scores and percent depression across final depression clusters.
Note: “Mean Depression Score” is the average total PHQ-9 score which ranged from 0 to 27. “Percent Depressed” based on a total PHQ-9 ≥ 10.
Demographic Profile Across SOM Clusters.
| CLUSTER | Total | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|---|
| 3,914 | 3,108 | 34 | 57 | 446 | 50 | 83 | 55 | 52 | 29 | |
| Male | 49.6% | 50.8% | 60.4% | 60.2% | 38.9% | 43.1% | 63.9% | 41.2% | 31.2% | 57.4% |
| Female | 50.4% | 49.2% | 39.6% | 39.8% | 61.1% | 56.9% | 36.1% | 58.8% | 68.8% | 42.6% |
| 42.44 | 42.08 | 48.17 | 53.05 | 46.09 | 51.60 | 56.24 | 44.30 | 55.93 | 38.67 | |
| Never | 19.3% | 20.4% | 15.0% | 11.6% | 15.8% | 14.6% | 7.8% | 14.7% | 7.6% | 19.6% |
| Married/Partner | 65.3% | 65.5% | 66.4% | 76.6% | 64.7% | 65.2% | 63.9% | 66.8% | 48.9% | 63.6% |
| Widowed/Divorced/Separated | 15.4% | 14.1% | 18.6% | 11.8% | 19.6% | 20.3% | 28.4% | 18.6% | 43.5% | 16.8% |
| 3.22 | 3.23 | 3.19 | 2.23 | 3.07 | 2.95 | 2.64 | 3.26 | 2.55 | 3.94 | |
| 3.02 | 3.03 | 3.07 | 2.94 | 2.91 | 2.75 | 2.49 | 2.94 | 2.39 | 3.88 | |
| Mexican/Hispanic | 14.3% | 14.4% | 6.0% | 17.3% | 15.3% | 13.3% | 5.6% | 27.0% | 2.6% | 45.5% |
| Non-Hispanic white | 67.5% | 67.6% | 78.4% | 66.3% | 67.3% | 54.0% | 74.2% | 55.4% | 81.3% | 26.0% |
| Non-Hispanic black | 11.4% | 10.9% | 15.6% | 5.9% | 12.8% | 27.0% | 17.4% | 14.6% | 10.1% | 16.5% |
| Other | 6.7% | 7.1% | 0.0% | 10.4% | 4.6% | 5.7% | 2.8% | 2.9% | 6.1% | 12.0% |
| Low | 31.0% | 29.2% | 29.9% | 39.8% | 35.7% | 42.6% | 32.7% | 64.9% | 58.9% | 68.8% |
| Middle | 24.0% | 24.5% | 10.5% | 19.0% | 22.9% | 23.4% | 31.3% | 15.9% | 21.5% | 11.2% |
| High | 45.0% | 46.4% | 59.6% | 41.2% | 41.5% | 33.9% | 36.0% | 19.2% | 19.6% | 20.0% |
| (Note: 1 = poverty line) | 3.03 | 3.16 | 3.39 | 2.80 | 2.89 | 2.68 | 2.91 | 1.82 | 2.09 | 1.68 |
Note: Figures quoted take account of the survey design of NHANES with 15 strata, 31Primary Sampling Units (PSU).
*Total sample size varies per demographic as base includes all those with a depression score and valid answer given for demographic.
**Family income poverty ratio represents the ratio of family or unrelated individual income to their appropriate poverty threshold where groupings are based on eligibility for Special Supplemental Nutrition Program for Women, Infants, and Children (WIC): Low = 0.00–1.85 family income poverty ratio, Middle = >1.85–3.50 family income poverty ratio, and High = >3.50 and above family income poverty ratio.
Binary Logistic Regression Model Odds Ratios with 95% Confidence Intervals.
| Depression | OR | p-value | 95% CI Low | 95% CI High |
|---|---|---|---|---|
| 1.00 | ||||
| Cluster 2 | 1.67 | 0.341 | 0.55 | 5.04 |
| Cluster 3 | 1.98 | 0.151 | 0.76 | 5.20 |
| 2.24 | 1.56 | 3.24 | ||
| Cluster 5 | 2.10 | 0.180 | 0.68 | 6.43 |
| 3.78 | 2.17 | 6.57 | ||
| 4.61 | 2.21 | 9.63 | ||
| 7.80 | 2.86 | 21.33 | ||
| 6.33 | 1.67 | 24.02 | ||
| 2.00 | 1.05 | 3.81 | ||
| 1.00 | ||||
| Female | 1.86 | 1.31 | 2.64 | |
| 1.00 | ||||
| 25–34 | 1.37 | 0.326 | 0.71 | 2.63 |
| 35–44 | 1.61 | 0.177 | 0.79 | 3.29 |
| 45–54 | 1.92 | 1.11 | 3.34 | |
| 55+ | 1.22 | 0.545 | 0.62 | 2.39 |
| 1.00 | ||||
| Married/living with partner | 0.54 | 0.35 | 0.82 | |
| Widowed/Divorced/Separated | 0.79 | 0.172 | 0.55 | 1.12 |
| Gender | ||||
| 1.00 | ||||
| Mexican American / Hispanic | 0.88 | 0.368 | 0.67 | 1.17 |
| Non-Hispanic Black | 1.17 | 0.436 | 0.77 | 1.76 |
| Other | 0.77 | 0.391 | 0.42 | 1.43 |
| 1.00 | ||||
| High School / GED Equivalent | 0.43 | 0.20 | 0.95 | |
| Some College / AA / College or Above | 0.59 | 0.40 | 0.85 | |
| 0.60 | 0.45 | 0.80 | ||
| 1.00 | ||||
| High School / GED Equivalent | 1.43 | 0.073 | 0.96 | 2.11 |
| Some College / AA / College or Above | 1.24 | 0.089 | 0.96 | 1.58 |
| 0.16 | <0.001 | 0.08 | 0.34 |
Note: OR = Odds Ratio, CI = Confidence Interval. Multivariate logistic model taking account of complex survey methodology (N = 3,584, 15 Strata, 32 PSUs). Bold p-values indicate significant p<0·05. Cluster 9 OR = 12.67 (95% CI: 1.75, 91.56) taking into account the interaction.
Fig 5Predicted probability of depression across age and family income poverty ratio for each cluster.
Fig 6Importance of medical categories that make up the key significant clusters.
Note: Based on total boosted relative importance percentage across all clusters. Summed percentage from boosted regression across all five key significant clusters, thus total >100%.
Fig 7Total percentage importance of medical conditions for each key significant cluster.
Note: Clusters presented in order of percent depressed. Note: Percentage sum does not take account of direction of relationship.