| Literature DB >> 35448464 |
Nathan Hwangbo1, Xinyu Zhang2, Daniel Raftery2, Haiwei Gu2, Shu-Ching Hu3,4, Thomas J Montine5, Joseph F Quinn6,7, Kathryn A Chung6,7, Amie L Hiller6,7, Dongfang Wang2, Qiang Fei2, Lisa Bettcher2, Cyrus P Zabetian3,4, Elaine R Peskind3,8, Ge Li3,8, Daniel E L Promislow9,10, Marie Y Davis3,4, Alexander Franks1.
Abstract
In recent years, metabolomics has been used as a powerful tool to better understand the physiology of neurodegenerative diseases and identify potential biomarkers for progression. We used targeted and untargeted aqueous, and lipidomic profiles of the metabolome from human cerebrospinal fluid to build multivariate predictive models distinguishing patients with Alzheimer's disease (AD), Parkinson's disease (PD), and healthy age-matched controls. We emphasize several statistical challenges associated with metabolomic studies where the number of measured metabolites far exceeds sample size. We found strong separation in the metabolome between PD and controls, as well as between PD and AD, with weaker separation between AD and controls. Consistent with existing literature, we found alanine, kynurenine, tryptophan, and serine to be associated with PD classification against controls, while alanine, creatine, and long chain ceramides were associated with AD classification against controls. We conducted a univariate pathway analysis of untargeted and targeted metabolite profiles and find that vitamin E and urea cycle metabolism pathways are associated with PD, while the aspartate/asparagine and c21-steroid hormone biosynthesis pathways are associated with AD. We also found that the amount of metabolite missingness varied by phenotype, highlighting the importance of examining missing data in future metabolomic studies.Entities:
Keywords: biomarker; cerebrospinal fluid; cross-sectional study; neurodegenerative disease; predictive modeling
Year: 2022 PMID: 35448464 PMCID: PMC9029812 DOI: 10.3390/metabo12040277
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Summary of subject data split by phenotype (top), with cognitive test results (middle), and with additional PD subject information (bottom). The cognitive status for PD was classified at three levels: no cognitive impairment, mild cognitive impairment (MCI), and dementia. Age of onset refers to the age of onset of motor symptoms. GBA refers to the carrier frequency for pathogenic GBA mutations and the E326K polymorphism. A chi-squared test of independence for sex and phenotype reports a p-value of 0.051. A one-way ANOVA of age at time of LP and phenotype reports a p-value of [18].
| Control | AD | PD | ||
|---|---|---|---|---|
| n | 85 | 57 | 56 | |
| Age at time of LP |
|
|
| |
| Duration of disease | N/A |
|
| |
| ApoE genotype | 2.3 (9%) | 2.3 (3.5%) | 2.2 (1.8%) | |
| Race (% white) | 91.7% | 94.7% | 94.6% | |
| Sex | 53% M (41 F) | 49% M (29 F) | 70% M (17 F) | |
|
|
|
| ||
| MMSE total score (0–30) |
|
| N/A | |
| Logical memory immediate recall (0–25) |
|
|
| |
| Category fluency (animals) (0–999) |
|
|
| |
| Trail Making Test Part A (s) * |
|
|
| |
| Trail Making Test Part B (s) * |
|
|
| |
| Logical memory delayed recall (0–25) |
|
|
| |
|
|
|
|
| |
| n | 56 | 16 | 36 | 4 |
| Sex | 70% M (17 F) | 56% M (7 F) | 70% M (11 F) | 100% M |
| Race (% white) | 94.6% | 100% | 91.7% | 100% |
| Age of onset of motor symptoms |
|
|
|
|
| Age at time of LP |
|
|
|
|
| Duration of disease |
|
|
|
|
| Levodopa equivalent dose |
|
|
|
|
| MDS-UPDRS III |
|
|
|
|
| Hoehn & Yahr stage |
|
|
|
|
| MoCA |
|
|
|
|
* For PD subjects, Trail Making Test Part A was truncated at 150 s, and Part B was truncated at 300 s.
Figure A1(a) Distribution of subject age, split by phenotype; (b) a comparison of missingness between profiles, split by phenotype.
Figure A2Flowchart outlining the analysis performed on each of the three profiles.
Figure 1Untargeted data projected onto the first two Principal Components (PC). Each point represents a subject, colored by their phenotype. Percentages in the axis titles refer to the percentage of variation of the data explained by the respective PC. In addition, 95% confidence ellipses assuming the t-distribution are also plotted. The first two principal components do not clearly separate the disease phenotypes.
Figure 2Receiver Operating Characteristic (ROC) Curves for binomial elastic net regressions classifying (a) controls against subjects with AD, (b) controls against subjects with PD, and (c) subjects with AD against subjects with PD. Solid lines represent models formed using each of the five missing data imputed datasets. The dotted line represents the ROC curve under a model which makes predictions at random. The average Area Under the Curve (AUC) across the five ROC curves is displayed in the bottom right.
Names and Odds Ratios (OR) for targeted metabolites and lipids retained in all five imputations of elastic net models fit on the full data to classify AD patients against controls. The tables are sorted by magnitude and split into positive and negative coefficient tables. Because the features were standardized prior to fitting the models, these ORs represent the expected odds ratio resulting from a standard deviation increase in concentration. Only ORs with magnitude greater than 1.1 or less than 0.9 are shown here. A two standard deviation interval is shown for the OR to quantify variability across the five missing data imputations.
| Targeted Metabolites–Positive Coefficients | |
|---|---|
|
|
|
| 1-Methyladenosine | 1.52 (1.43, 1.62) |
| Glycine | 1.38 (1.3, 1.46) |
| Alanine | 1.38 (1.32, 1.43) |
| Sarcosine | 1.21 (1.16, 1.26) |
| Acetylcarnitine | 1.19 (1.16, 1.21) |
| 4-Methoxyphenylacetic acid | 1.17 (1.11, 1.25) |
| Sorbitol | 1.15 (1.13, 1.17) |
| Lactate | 1.14 (1.12, 1.16) |
| Hydrocortisone | 1.14 (1.09, 1.19) |
| Homoserine | 1.12 (1.07, 1.16) |
| Caffeine | 1.11 (1.04, 1.17) |
|
|
|
| 0.76 (0.71, 0.80) | |
| Glycocyamine | 0.80 (0.79, 0.82) |
| 4-Aminobutyric acid | 0.84 (0.81, 0.88) |
| Creatine | 0.85 (0.80, 0.90) |
| Urocanic acid | 0.86 (0.73, 1.01) |
| Homocysteine | 0.88 (0.84, 0.91) |
| Uridine | 0.89 (0.85, 0.92) |
|
| |
|
|
|
| SM(18:1) | 1.51 (1.42, 1.60) |
| CE(16:1) | 1.22 (1.20, 1.25) |
| CE(20:1) | 1.19 (0.91, 1.54) |
| PC(18:0/20:3) | 1.12 (0.99, 1.26) |
|
| |
|
|
|
| PE(P-18:0/22:6) | 0.77 (0.76, 0.79) |
| PE(18:0/20:4) | 0.84 (0.79, 0.89) |
| PE(18:0/22:6) | 0.90 (0.84, 0.98) |
Names and OR for targeted metabolites and lipids retained in all five elastic net models fit on the full data to classify PD patients against controls. Only ORs with magnitude greater than 1.1 or less than 0.9 are shown here. A two standard deviation interval is shown for the OR to quantify variability across the five missing data imputations.
| Targeted Metabolites—Positive Coefficients | |
|---|---|
|
|
|
| Ornithine | 2.10 (1.82, 2.41) |
| Glycylproline | 1.75 (1.52, 2.01) |
| Levulinic acid | 1.62 (1.43, 1.82) |
| Acetylglycine | 1.57 (1.42, 1.73) |
| Glycine | 1.57 (1.45, 1.70) |
| Creatinine | 1.52 (1.46, 1.58) |
| Cytosine | 1.48 (1.28, 1.70) |
| Adenosine | 1.45 (1.26, 1.67) |
| Pentadecanoic acid | 1.40 (1.32, 1.49) |
| Sorbitol | 1.40 (1.30, 1.52) |
| 1.39 (1.31, 1.48) | |
| alpha-Hydroxyisovaleric acid | 1.39 (1.23, 1.57) |
| 2-aminoadipic acid | 1.36 (1.16, 1.60) |
| Methylguanidine | 1.32 (1.27, 1.38) |
| Xanthosine | 1.25 (1.20, 1.30) |
| Dimethylarginine | 1.22 (1.15, 1.30) |
| Homoserine | 1.21 (1.14, 1.28) |
| Threonine | 1.20 (1.15, 1.25) |
| Cystine | 1.16 (1.09, 1.23) |
| 3 | 1.16 (1.09, 1.23) |
| Adenosyl- | 1.15 (1.08, 1.22) |
| 6-Methyl- | 1.13 (1.06, 1.20) |
| Anthranilic acid | 1.12 (1.03, 1.21) |
| Fructose | 1.11 (1.02, 1.20) |
|
| |
|
|
|
| Indole-3-acetic acid | 0.57 (0.54, 0.61) |
| Serine | 0.58 (0.52, 0.64) |
| 0.61 (0.55, 0.68) | |
| Urocanic acid | 0.64 (0.53, 0.76) |
| Agmatine | 0.65 (0.63, 0.68) |
| HIAA | 0.66 (0.60, 0.73) |
| Glycocyamine | 0.71 (0.58, 0.87) |
| Aspartic acid | 0.76 (0.66, 0.88) |
| 4-Methylvaleric acid | 0.79 (0.73, 0.85) |
| Serotonin | 0.82 (0.77, 0.87) |
| Mannose | 0.82 (0.74, 0.90) |
| Creatine | 0.83 (0.78, 0.88) |
| Xanthine | 0.83 (0.76, 0.90) |
| 4-Aminobutyric acid | 0.86 (0.81, 0.91) |
| 4-Methoxyphenylacetic acid | 0.86 (0.81, 0.91) |
| Citraconic acid | 0.87 (0.74, 1.02) |
| Decanoylcarnitine | 0.89 (0.84, 0.94) |
|
|
|
| PE(P-16:0/18:1) | 1.54 (1.45, 1.63) |
| HCER(18:0) | 1.49 (1.43, 1.55) |
| FFA(16:1) | 1.46 (1.30, 1.65) |
| SM(18:1) | 1.42 (1.26, 1.60) |
| FFA(24:0) | 1.22 (1.11, 1.35) |
| PC(16:0/20:2) | 1.21 (0.93, 1.57) |
| FFA(20:2) | 1.20 (0.94, 1.52) |
| CE(20:1) | 1.20 (0.98, 1.46) |
| DAG(20:0/20:0) | 1.17 (0.96, 1.43) |
| PE(16:0/22:6) | 1.16 (0.97, 1.39) |
| LPC(18:1) | 1.11 (1.06, 1.15) |
|
| |
|
|
|
| PC(18:1/18:2) | 0.49 (0.45, 0.53) |
| FFA(18:0) | 0.64 (0.53, 0.76) |
| PE(18:1/18:1) | 0.65 (0.48, 0.88) |
| FFA(24:1) | 0.68 (0.61, 0.75) |
| PC(18:1/20:4) | 0.72 (0.64, 0.81) |
| PC(18:0/22:6) | 0.76 (0.70, 0.82) |
| PC(18:1/16:1) | 0.88 (0.79, 0.97) |
Names and OR (associated with a standard deviation increase in concentration) for targeted metabolites retained in all five elastic net models fit on the full data to classify PD patients against AD patients. A metabolite with OR indicates that higher concentration is associated with AD in our models. Only ORs with magnitude greater than 1.1 or less than 0.9 are shown here. A two standard deviation interval is shown for the OR to quantify variability across the five missing data imputations.
| Targeted Metabolites—Positive Coefficients | |
|---|---|
|
|
|
| Serine | 1.63 (1.36, 1.95) |
| Alanine | 1.62 (1.46, 1.79) |
| Indole-3-acetic acid | 1.52 (1.46, 1.58) |
| Xanthine | 1.42 (1.26, 1.60) |
| Aspartic acid | 1.40 (1.32, 1.49) |
| Caffeine | 1.40 (1.30, 1.52) |
| Sarcosine | 1.22 (1.15, 1.30) |
| HIAA | 1.20 (1.11, 1.30) |
| 1.16 (1.01, 1.34) | |
| Glycodeoxycholic acid | 1.15 (1.06, 1.25) |
| 4-Methoxyphenylacetic acid | 1.14 (1.12, 1.16) |
| Serotonin | 1.13 (1.08, 1.17) |
|
| |
|
|
|
| Ornithine | 0.52 (0.51, 0.53) |
| alpha-Hydroxyisovaleric acid | 0.63 (0.58, 0.68) |
| Homocysteine | 0.64 (0.59, 0.70) |
| Histidine | 0.70 (0.62, 0.79) |
| Creatinine | 0.72 (0.66, 0.78) |
| Glycylproline | 0.73 (0.64, 0.82) |
| Levulinic acid | 0.76 (0.70, 0.83) |
| Adenosine | 0.77 (0.67, 0.89) |
| 0.81 (0.75, 0.88) | |
| Acetyl- | 0.86 (0.79, 0.93) |
Names and OR (associated with a standard deviation increase in concentration) for lipids retained in all five elastic net models fit on the full data to classify PD patients against AD patients. A metabolite with OR indicates that a higher concentration is associated with AD in our models. Only ORs with magnitude greater than 1.1 or less than 0.9 are shown here.
| Positive Coefficients | |
|---|---|
|
|
|
| PC(18:1/18:2) | 1.79 (1.32, 2.41) |
| PC(18:1/20:4) | 1.79 (1.55, 2.05) |
| FFA(20:3) | 1.75 (1.62, 1.90) |
| DAG(16:0/18:1) | 1.55 (1.25, 1.93) |
| FFA(18:3) | 1.43 (1.13, 1.82) |
| CE(16:1) | 1.34 (1.21, 1.48) |
| PC(18:0/22:6) | 1.27 (1.06, 1.52) |
| TAG53:2-FA18:1 | 1.23 (0.99, 1.54) |
| PC(16:0/22:5) | 1.19 (0.95, 1.48) |
| CE(20:3) | 1.16 (0.93, 1.45) |
| PC(18:1/16:1) | 1.14 (1.01, 1.28) |
|
| |
|
|
|
| PC(18:0/18:2) | 0.62 (0.38, 1) |
| PE(P-16:0/18:1) | 0.64 (0.44, 0.91) |
| PE(P-18:0/20:4) | 0.7 (0.63, 0.79) |
| TAG48:1-FA16:0 | 0.72 (0.55, 0.93) |
| FFA(20:4) | 0.72 (0.63, 0.83) |
| TAG46:0-FA16:0 | 0.75 (0.57, 0.99) |
| CER(14:0) | 0.78 (0.61, 0.99) |
| PE(P-18:0/22:6) | 0.78 (0.73, 0.83) |
| HCER(18:0) | 0.79 (0.72, 0.88) |
| PE(18:0/20:4) | 0.80 (0.63, 1.02) |
| TAG48:0-FA14:0 | 0.82 (0.68, 0.98) |
| TAG55:4-FA18:1 | 0.84 (0.72, 0.99) |
| DAG(20:0/20:0) | 0.86 (0.65, 1.14) |
| PC(18:0/18:0) | 0.87 (0.76, 1.00) |
| CE(15:0) | 0.88 (0.75, 1.03) |
Names and OR for targeted metabolite abundance missingness indicators in AD classification against controls, lipid abundance missingness indicators in PD classification against controls, and lipid abundance missingness indicators in AD classification against PD, retained in elastic net models fit on the full data. The OR corresponds to increased odds of classifying a patient as having AD/PD/AD associated with a standard deviation increase in metabolite concentration (for left, middle, and right, respectively). Age and sex are not penalized, and are therefore guaranteed to be included in these models. Other combinations of phenotype and profile (i.e., targeted-PD or lipids-AD tables) are not shown because the only retained covariates were age and sex.
| Targeted Metabolites—AD v C | |
|---|---|
|
|
|
| Citraconic acid | 0.53 |
| Phenylalanine | 1.85 |
| Creatinine | 1.59 |
| Glucosamine | 1.46 |
| Amiloride | 1.42 |
| 0.71 | |
| Mannose | 0.73 |
| Male | 0.85 |
| Age | 1.07 |
| Creatine | 1.02 |
|
| |
|
|
|
| Male | 1.85 |
| TAG46:0-FA16:0 | 1.77 |
| DAG(18:1/22:6) | 1.08 |
| Age | 1.04 |
|
| |
|
|
|
| Male | 0.25 |
| PE(18:1/18:1) | 0.68 |
| PC(16:0/14:0) | 0.74 |
| TAG52:4-FA16:1 | 1.17 |
| CE(18:4) | 1.10 |
| Age | 1.09 |
Figure A4For metabolites with widely varying amounts of abundance missingness by phenotype (AD, PD, age-matched controls), PD tend to have the least missing data. Plotted is the percent missingness of untargeted metabolites for which univariate logistic regressions classifying its missingness across subjects using phenotype contained a FDR < 0.05. The controls used for this analysis were found by matching each member of the AD/PD cohort to the control of closest age, removing duplicates. Metabolites are listed by their retention time, neutral mass (DA), and mode, with each value separated by an underscore.
Figure 3Pathway Analysis of Mummichog on the positive mode untargeted metabolites from univariate logistic models classifying PD and AD against Controls, sorted by Benjamini–Hochberg corrected p-values, with the vertical dashed line marking .
Figure A3Distribution of sample collection date, split by phenotype.
Figure A8Comparison of univariate p-values before and after orthogonalization, using univariate logistic regressions fit on the targeted profile to classify PD against controls. p-values are Benjamini–Hochberg corrected within each dataset, with a transformation applied. The line is also drawn, with values above the line representing metabolites which are more differentially abundant after orthogonalization. Points are colored according to false discovery rate cutoffs of 0.05, indicating whether a metabolite is considered differentially abundant in both analyses, only one analysis, or in neither analysis.
A display of various cutoff-dependent metrics, using the same leave one out predictions displayed in Figure 2 and Figure A5. In each case, the cutoff is chosen to maximize F1 score.
| Metric | AD v C | PD v C | AD v PD |
|---|---|---|---|
| F1 score | 0.63 | 0.97 | 0.93 |
| Sensitivity | 0.79 | 0.96 | 0.95 |
| Specificity | 0.52 | 0.99 | 0.91 |
| Positive predictive value | 0.52 | 0.98 | 0.92 |
| Negative predictive value | 0.79 | 0.98 | 0.94 |
Metabolite Set Enrichment Analysis: Output of MSEA on the targeted profile based on metabolites significant for PD classification in univariate logistic regression. Set are the pre-defined metabolite sets in the SMPDB (left) and CSF disease library (right). Total is the number of metabolites in these sets, hits is the number of metabolites shared between the pre-defined sets and our input list, and expected is the expected number of metabolites shared by random chance, using the cumulative hypergeometric distribution.
| SMPDB | |||
|---|---|---|---|
|
|
|
|
|
| Carnitine Synthesis | 3 | 0.67 | 2 |
| Betaine Metabolism | 4 | 0.90 | 2 |
| Methionine Metabolism | 8 | 1.79 | 3 |
| Tyrosine Metabolism | 3 | 0.67 | 1 |
| Glycine and Serine Metabolism | 13 | 2.91 | 4 |
| Arginine and Proline Metabolism | 8 | 1.79 | 2 |
| Tryptophan Metabolism | 8 | 1.79 | 2 |
| Histidine Metabolism | 4 | 0.90 | 1 |
| Urea Cycle | 5 | 1.12 | 1 |
|
| |||
|
|
|
|
|
| Aging-Related Metabolites | 3 | 1.1 | 2 |
| Leukemia | 13 | 4.8 | 3 |
| Alzheimer’s Disease | 14 | 5.2 | 3 |
| Different Seizure Disorders | 12 | 4.5 | 1 |
| Schizophrenia | 13 | 4.8 | 1 |
Targeted metabolites and lipids appearing in all leave one out models, for each modeling task.
| PD v C | ||
|---|---|---|
|
|
|
|
| Lipids | HCER(24:0) | 1.22 (1.15, 1.30) |
| Targeted | Threonine | 1.19 (1.08, 1.32) |
| Targeted | Methylguanidine | 1.19 (1.11, 1.26) |
| Lipids | PC(16:0/22:6) | 1.16 (1.03, 1.30) |
| Targeted | Dimethylarginine | 1.15 (1.03, 1.29) |
|
| ||
|
|
|
|
| Targeted | 1-Methyladenosine | 1.27 (1.02, 1.58) |
| Targeted | Sarcosine | 1.12 (1.05, 1.19) |
| Targeted | Alanine | 1.08 (1.03, 1.14) |
| Lipids | CE(20:3) | 1.07 (0.99, 1.15) |
|
| ||
|
|
|
|
| Lipids | FFA(20:3) | 4.03 (2.66, 6.11) |
| Targeted | Glycylproline | 0.53 (0.33, 0.85) |
| Targeted | Lactate | 1.57 (1.10, 2.23) |
| Targeted | Creatinine | 0.75 (0.62, 0.90) |