| Literature DB >> 27549765 |
Yashu Liu1, Lynn Yieh2, Tao Yang1, Wilhelmus Drinkenburg3, Pieter Peeters3, Thomas Steckler3, Vaibhav A Narayan4, Gayle Wittenberg4, Jieping Ye5.
Abstract
BACKGROUND: Major depressive disorder (MDD) is a heterogeneous disease at the level of clinical symptoms, and this heterogeneity is likely reflected at the level of biology. Two clinical subtypes within MDD that have garnered interest are "melancholic depression" and "anxious depression". Metabolomics enables us to characterize hundreds of small molecules that comprise the metabolome, and recent work suggests the blood metabolome may be able to inform treatment decisions for MDD, however work is at an early stage. Here we examine a metabolomics data set to (1) test whether clinically homogenous MDD subtypes are also more biologically homogeneous, and hence more predictiable, (2) devise a robust machine learning framework that preserves biological meaning, and (3) describe the metabolomic biosignature for melancholic depression.Entities:
Keywords: Biomarker; Classification; Major depressive disorder; Melancholic depression; Metabolomics
Mesh:
Substances:
Year: 2016 PMID: 27549765 PMCID: PMC4994306 DOI: 10.1186/s12864-016-2953-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sample statistics of metabolite data for the classification of melancholic depression. Mean statistics are reported with standard deviation, minimum value, and maximum value in the parenthesis
| HC | MDD | Melancholic depressed | Anxious depressed | |
|---|---|---|---|---|
| # of samples | 97 | 90 | 21 | 58 |
| Age | 38.49 (14.72, 16.25-74.38) | 39.70 (14.10,18.66-76.75) | 40.59 (12.69, 19.86-68.97) | 39.78 (13.95, 18.66-68.97) |
| Gender(%female) | 60.82 % | 63.33 % | 57.14 % | 60.34 % |
| Education | 14.85 (2.41, 7-18) | 13.87 (2.95, 3-18) | 13.81 (3.72, 3-18) | 14.36 (2.55, 9-18) |
| HAMD | 0.28 (0.72, 0-4) | 21.90 (3.49, 18-34) | 24.57 (4.46,19-34) | 22.14 (3.61,18-34) |
| CORE | - | 5.43 (4.21, 0-23) | 11.24 (3.96, 8-23) | 5.71 (4.50, 0-23) |
Fig. 2Analytic workflow aimed at maximizing both predictive power and biological interpretability. a Metabolite preprocessing includes (1) correction of each individual metabolite for storage time effects, (2) imputation of missing data, (3) feature clustering, (4) Classification using ensemble learning framework, (5) selection of top clusters/features, (6) pathway analysis and biological interpretation. b Example of cluster containing 5 metabolites. Highly correlated features are grouped into clusters using K-means or Heirarchical clustering. For each cluster, the cluster centroid is computed and used as a feature for ensebml learning. Subsequent pathway analysis includes all members of top clusters. c Illustration of the ensemble learning framework. Given the imbalanced training data, we randomly undersample the training data times, and then we perform feature selection and classification on each undersampled dataset. Finally, we combine all classifiers to make the final prediction, and report out top cluster-features
Fig. 1Metabolites classify Melancholic Depression from Healthy Controls with greater accuracy MDD as a whole or Anxious Depression. a Classifiers for 90 MDD, 58 Anxious Depressed, and 21 Melancholic Depressed subjects were trained against 97 HC subjects (96 for the Anxious Depression classification, as described in the Methods). b The table includes results using kNN imputation, Random Forest classification using individual metabolites as features, and the feature selection method which resulted in highest accuracy (Fisher, Gini, T-test or Stability) for each comparison
Comparison of the classification performance obtained by Random Forest. For three clustering strategies, we compare 4 different imputation methods: halfMin, kNN3, EM, and SVD. And four feature selection methods: Fisher, Gini, T-test and Stability. These are described in the Methods. The method used for subsequent pathway analysis is in bold
| Imputation | halfMin | kNN3 | EM | SVD | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FS method | Fisher | Gini |
| Stability | Fisher | Gini |
| Stability | Fisher | Gini |
| Stability | Fisher | Gini |
| Stability |
| Raw Features | ||||||||||||||||
| Accuracy | 80.42 % | 80.36 % | 80.42 % | 80.34 % | 78.68 % | 80.43 % | 77.84 % | 83.84 % | 77.84 % | 76.87 % | 77.84 % | 78.74 % | 77.84 % | 76.11 % | 77.84 % | 81.18 % |
| Sensitivity | 73.33 % | 76.67 % | 73.33 % | 76.67 % | 73.33 % | 76.67 % | 73.33 % | 76.67 % | 73.33 % | 71.67 % | 73.33 % | 68.33 % | 73.33 % | 66.67 % | 73.33 % | 76.67 % |
| Specificity | 82.22 % | 81.22 % | 82.22 % | 81.11 % | 80.22 % | 81.33 % | 79.11 % | 85.44 % | 79.11 % | 78.00 % | 79.11 % | 81.11 % | 79.11 % | 78.11 % | 79.11 % | 82.11 % |
| Cluster-Representatives (K-means) | ||||||||||||||||
| Accuracy | 77.92 % | 80.50 % | 77.92 % | 79.51 % |
|
|
|
| 77.07 % | 77.16 % | 75.34 % | 74.15 % | 78.74 % | 78.74 % | 80.48 % | 79.73 % |
| Sensitivity | 83.33 % | 88.33 % | 83.33 % | 86.67 % |
|
|
|
| 78.33 % | 78.33 % | 78.33 % | 75.00 % | 78.33 % | 78.33 % | 73.33 % | 78.33 % |
| Specificity | 77.22 % | 79.22 % | 77.22 % | 78.11 % |
|
|
|
| 77.11 % | 77.22 % | 75.11 % | 74.33 % | 79.11 % | 79.11 % | 82.33 % | 80.33 % |
| Cluster-Representatives (Hierarchical Clustering) | ||||||||||||||||
| Accuracy | 78.06 % | 75.48 % | 78.06 % | 77.01 % | 79.59 % | 79.66 % | 80.50 % | 77.24 % | 73.79 % | 73.79 % | 73.73 % | 69.62 % | 75.47 % | 74.64 % | 76.38 % | 70.38 % |
| Sensitivity | 73.33 % | 78.33 % | 73.33 % | 83.33 % | 78.33 % | 83.33 % | 78.33 % | 78.33 % | 73.33 % | 80.00 % | 73.33 % | 70.00 % | 70.00 % | 70.00 % | 70.00 % | 70.00 % |
| Specificity | 79.33 % | 75.22 % | 79.33 % | 76.00 % | 80.11 % | 79.22 % | 81.22 % | 77.33 % | 74.00 % | 72.89 % | 74.00 % | 69.89 % | 77.11 % | 76.11 % | 78.22 % | 70.89 % |
Comparison of the classification performance obtained by Support Vector Machines. For three clustering strategies, we compare 4 different imputation methods: halfMin, kNN3, EM, and SVD. And four feature selection methods: Fisher, Gini, T-test and Stability. These are described in the Methods
| Imputation | halfMin | kNN3 | EM | SVD | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FS method | Fisher | Gini |
| Stability | Fisher | Gini |
| Stability | Fisher | Gini |
| Stability | Fisher | Gini |
| Stability |
| Raw Features | ||||||||||||||||
| Accuracy | 72.13 % | 79.52 % | 72.13 % | 74.64 % | 71.08 % | 77.77 % | 72.74 % | 75.68 % | 71.22 % | 78.68 % | 74.56 % | 73.73 % | 71.99 % | 78.74 % | 71.85 % | 75.54 % |
| Sensitivity | 65.00 % | 71.67 % | 65.00 % | 70.00 % | 65.00 % | 71.67 % | 65.00 % | 70.00 % | 60.00 % | 73.33 % | 60.00 % | 65.00 % | 65.00 % | 73.33 % | 68.33 % | 75.00 % |
| Specificity | 74.11 % | 81.33 % | 74.11 % | 76.11 % | 72.78 % | 79.00 % | 74.78 % | 77.22 % | 74.00 % | 80.11 % | 78.11 % | 76.00 % | 73.89 % | 80.11 % | 72.78 % | 76.11 % |
| Cluster-Representatives (K-means) | ||||||||||||||||
| Accuracy | 77.83 % | 80.20 % | 78.68 % | 79.73 % | 78.06 % | 80.57 % | 79.66 % | 76.46 % | 72.96 % | 71.14 % | 73.79 % | 74.00 % | 79.65 % | 79.43 % | 79.65 % | 72.67 % |
| Sensitivity | 78.33 % | 86.67 % | 81.67 % | 78.33 % | 80.00 % | 83.33 % | 83.33 % | 75.00 % | 65.00 % | 70.00 % | 70.00 % | 70.00 % | 78.33 % | 76.67 % | 78.33 % | 71.67 % |
| Specificity | 78.00 % | 79.00 % | 78.11 % | 80.33 % | 78.22 % | 80.44 % | 79.22 % | 77.33 % | 75.00 % | 71.78 % | 75.00 % | 75.11 % | 80.11 % | 80.00 % | 80.11 % | 72.78 % |
| Cluster-Representatives (Hierarchical Clustering) | ||||||||||||||||
| Accuracy | 76.15 % | 79.28 % | 76.23 % | 77.01 % | 77.92 % | 80.64 % | 77.92 % | 74.70 % | 72.68 % | 74.55 % | 72.68 % | 71.23 % | 75.96 % | 78.60 % | 76.79 % | 70.10 % |
| Sensitivity | 85.00 % | 83.33 % | 85.00 % | 78.33 % | 78.33 % | 75.00 % | 78.33 % | 75.00 % | 80.00 % | 80.00 % | 80.00 % | 75.00 % | 63.33 % | 70.00 % | 73.33 % | 68.33 % |
| Specificity | 74.78 % | 78.78 % | 74.89 % | 77.11 % | 78.11 % | 82.33 % | 78.11 % | 75.11 % | 71.67 % | 73.78 % | 71.67 % | 71.00 % | 79.00 % | 81.00 % | 77.89 % | 70.78 % |
Fig. 3Classification accuracy obtained by Random Forest and SVM in permutation tests. The missing values in the data set are imputed via kNN3 method. The performance obtained in the original experiment is shown for reference
Unique metabolites included in pathway analysis. Metabolites from the kNN3 imputed, K-means clustering, were analyzed using IPA. Standardized KEGG nomenclature is included together with metabolite class and ranking of cluster representatives in the four feature selection methods. Metabolites were selected for pathway analysis if they were members of a cluster that was among the top 15 cluster-centroids selected by GINI, Stability, Fisher, or T-Test. Of 76 selected metabolites, 48 were selected by 3 of the 4 methods. 19 of the remainder were selected only by Stability
| Metabolite Name | KEGG | Metabolite class | Cluster rank | |||
|---|---|---|---|---|---|---|
| Gini | Stability | Fisher |
| |||
| Triacylgyceride hydroperoxide (C18:1,18:2,C18:2-OOH) | ||||||
| (additional: Triacylgyceride hydroperoxide C16:0,C18:1,C20:4-OOH, | ||||||
| Triacylgyceride hydroperoxide (C18:1,C18:1,C18:3-OOH) | -- | Lipid Hydroperoxides | 1 | 1 | 6 | 2 |
| Triacylgyceride hydroperoxide (C16:0,C18:1,C18:2-OOH) | -- | Lipid Hydroperoxides | 1 | 1 | 6 | 2 |
| Cysteine | C00097 | Amino acids | 2 | 2 | 1 | 1 |
| Cystine | C00491 | Amino acids | 2 | 2 | 1 | 1 |
| Pseudouridine | C02067 | Nucleobases (and related) | 2 | 2 | 1 | 1 |
| Unknown(28100470) | -- | Unknown lipid | 3 | 22 | 8 | 12 |
| Conjugated linoleic acid (C18:trans[10]cis[12]2: | -- | Fatty acids | 3 | 22 | 8 | 12 |
| Heptadecanoic acid (C17:0) | -- | Fatty acids | 3 | 22 | 8 | 12 |
| Phenylalanine | C00079 | Amino acids | 4 | 19 | 2 | 3 |
| Lysine | C00047 | Amino acids | 4 | 19 | 2 | 3 |
| Methionine | C00073 | Amino acids | 4 | 19 | 2 | 3 |
| Tyrosine | C00082 | Amino acids | 4 | 19 | 2 | 3 |
| Alanine | C00041 | Amino acids | 4 | 19 | 2 | 3 |
| Histamine | C00388 | Catecholamines and other monoamines | 5 | 12 | 12 | 9 |
| Serotonine | C00780 | Catecholamines and other monoamines | 5 | 12 | 12 | 9 |
| Fumarate | C00122 | Energy metabolism and related | 6 | 4 | 3 | 4 |
| Normetanephrine | C05589 | Catecholamines and other monoamines | 6 | 4 | 3 | 4 |
| Sphingomyelin (dl8:l/C16:0) | C00550 | Sphingolipids | 7 | 17 | 16 | 22 |
| Unknown(58100162) | Unknown polar | 7 | 17 | 16 | 22 | |
| DAG C18:1, C18:2) [seel] | Glycerides (Mono-, Di-, Triglycerides) | 8 | 63 | 5 | 6 | |
| TAG (containing C16:l/C18:l or C16:0/C18:2; | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 8 | 63 | 5 | 6 |
| TAG (C55H10006) (e.g. C16:0,C18:1,C18:2) | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 8 | 63 | 5 | 6 |
| TAG (C55H9606) or (C50H10006) (e.g. C16:0,C18:2,C18:3 or C16:0,C16:0,C18:2 | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 8 | 63 | 5 | 6 |
| TAG (containing C18:1/C18:2) | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 8 | 63 | 5 | 6 |
| TAG (C55H9806) (e.g. C16:0,C18:2,C18:2) | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 8 | 63 | 5 | 6 |
| Unknown(28100099) | -- | Unknown lipid | 9 | 11 | 4 | 14 |
| Urea | C00086 | Amino acids derivates | 9 | 11 | 4 | 14 |
| Unknown(68100044) | -- | Unknown lipid | 10 | 61 | 10 | 10 |
| TAG (containing C18:2/C18:3) | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 10 | 61 | 10 | 10 |
| Unknown(68100059) | -- | Unknown lipid | 10 | 61 | 10 | 10 |
| TAG (C55H10006) (e.g. C16:0,C18:1,C18:2) 1 | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 10 | 61 | 10 | 10 |
| Linoleic acid (C18:cis[9,12]2 | C01595 | Fatty acids | 10 | 61 | 10 | 10 |
| TAG (containing C18:2,C18:2] | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 10 | 61 | 10 | 10 |
| Ceramide (dl8:l/C24:0) | C00195 | Sphingolipids | 11 | 83 | 9 | 7 |
| Palmitic acid (C16:0) | C00249 | Fatty acids | 11 | 83 | 9 | 7 |
| Oleic add (Cl8:cis[9] 1) | C00712 | Fatty acids | 11 | 83 | 9 | 7 |
| Stearic acid (C18:0) | C01530 | Fatty acids | 11 | 83 | 9 | 7 |
| Glycerol, lipid fraction | C00116 | Cholesterol and fatty alcohols | 11 | 83 | 9 | 7 |
| Ceramide (dl8:l/C24:0) | C00195 | Sphingolipids | 11 | 83 | 9 | 7 |
| Eicosenoic acid (C20:cis[ll]l) | C16526 | Fatty acids | 11 | 83 | 9 | 7 |
| Dodecanol | C02277 | Cholesterol and fatty alcohols | 11 | 83 | 9 | 7 |
| Metanephrfne | C05588 | Catecholamines and other monoamines | 12 | 7 | 11 | 8 |
| Ribonic acid | C01685 | Carbohydrates and related | 13 | 45 | 17 | 18 |
| myo-lnositol | C00137 | Carbohydrates and related | 13 | 45 | 17 | 18 |
| Unknown(68100045) | Unknown lipid | 14 | 57 | 13 | 13 | |
| Unknown(58100165) | Unknown polar | 14 | 57 | 13 | 13 | |
| Phosphatidylcholine (C16:0/C18:2) | C00157 | Phospholipids | 14 | 57 | 13 | 13 |
| Leucine | C00123 | Amino acids | 15 | 50 | 7 | 5 |
| Valine | C00183 | Amino acids | 15 | 50 | 7 | 5 |
| Isoleucine | C00407 | Amino acids | 15 | 50 | 7 | 5 |
| TAG#1 | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 16 | 89 | 15 | 15 |
| TAG (containing C16:0/C16:1 or C14:0/C18:l | C00042 | Glycerides (Mono-, Di-, Triglycerides) | 16 | 89 | 15 | 15 |
| Palmitoleic acid | C08362 | Fatty acids | 16 | 89 | 15 | 15 |
| Myristic acid (C14:0) | C06424 | Fatty acids | 16 | 89 | 15 | 15 |
| Pentadecanol | Cholesterol and fatty alcohols | 16 | 89 | 15 | 15 | |
| lndole-3-propionic acic | Amino acids derivates | 17 | 13 | 22 | 16 | |
| Arginine | C00062 | Amino acids | 17 | 13 | 22 | 16 |
| Elaidic acid | C01712 | Fatty acids | 18 | 10 | 14 | 11 |
| Ratio Glu_versus_Gln | -- | -- | 18 | 10 | 14 | 11 |
| O-Phospho-L-tyrosine | C06501 | Amino acids derivates | 19 | 9 | 34 | 26 |
| Unknown(58100024) | -- | Unknown polar | 19 | 9 | 34 | 26 |
| Unknown(38100389) | -- | Unknown polar | 20 | 6 | 19 | 19 |
| Unknown(38100468) | -- | Unknown polar | 20 | 6 | 19 | 19 |
| Dopamine | C03758 | Catecholamines and other monoamines | 20 | 6 | 19 | 19 |
| Unknown(68100052) | -- | Unknown lipid | 32 | 8 | 33 | 30 |
| Arachidonic acid (C20:cis-[5,8,ll,14]4) | C00219 | Fatty acids | 32 | 8 | 33 | 30 |
| Phosphatidylcholine #8 | C00157 | Phospholipids | 32 | 8 | 33 | 30 |
| Cholic acid | C00695 | Miscellaneous | 44 | 5 | 25 | 17 |
| lndole-3-acetic acid | C00954 | Amino acids derivates | 44 | 5 | 25 | 17 |
| Cortisol | COO735 | Hormones and related | 65 | 3 | 40 | 28 |
| Corticosterone | C02140 | Hormones and related | 65 | 3 | 40 | 28 |
| Androstendion | C00280 | Hormones and related | 65 | 3 | 40 | 28 |
| Threonic acid | C01620 | Vitamins, cofactors and related | 74 | 15 | 57 | 58 |
| Glyceric acid | C00258 | Miscellaneous | 74 | 15 | 57 | 58 |
| Unknown(68100002) | -- | Unknown lipid | 76 | 14 | 42 | 35 |
| Lysophosphatidylcholine (16:0) | C04230 | Phospholipids | 76 | 14 | 42 | 35 |
Fig. 4Network analysis of metabolites. IPA network analysis was used to view connections between selected metabolites. Several classes and biological functions were observed and are highlighted in the diagram. Red-filled metabolites are increased comparing melancholic depressed subjects to healthy controls, while green-filled metabolites are decreased. Higher intensity color indicated larger changes. Increases range from 1.94 to 1.07 fold change, while decreases range from -1.71 to -1.02