| Literature DB >> 32692761 |
Milica Milivojevic1, Xiaoyu Che1,2, Lucinda Bateman3, Aaron Cheng1, Benjamin A Garcia4, Mady Hornig5, Manuel Huber6, Nancy G Klimas7,8, Bohyun Lee1, Hyoungjoo Lee4, Susan Levine9, Jose G Montoya10, Daniel L Peterson11, Anthony L Komaroff12, W Ian Lipkin1.
Abstract
Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is an unexplained chronic, debilitating illness characterized by fatigue, sleep disturbances, cognitive dysfunction, orthostatic intolerance and gastrointestinal problems. Using ultra performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS), we analyzed the plasma proteomes of 39 ME/CFS patients and 41 healthy controls. Logistic regression models, with both linear and quadratic terms of the protein levels as independent variables, revealed a significant association between ME/CFS and the immunoglobulin heavy variable (IGHV) region 3-23/30. Stratifying the ME/CFS group based on self-reported irritable bowel syndrome (sr-IBS) status revealed a significant quadratic effect of immunoglobulin lambda constant region 7 on its association with ME/CFS with sr-IBS whilst IGHV3-23/30 and immunoglobulin kappa variable region 3-11 were significantly associated with ME/CFS without sr-IBS. In addition, we were able to predict ME/CFS status with a high degree of accuracy (AUC = 0.774-0.838) using a panel of proteins selected by 3 different machine learning algorithms: Lasso, Random Forests, and XGBoost. These algorithms also identified proteomic profiles that predicted the status of ME/CFS patients with sr-IBS (AUC = 0.806-0.846) and ME/CFS without sr-IBS (AUC = 0.754-0.780). Our findings are consistent with a significant association of ME/CFS with immune dysregulation and highlight the potential use of the plasma proteome as a source of biomarkers for disease.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32692761 PMCID: PMC7373296 DOI: 10.1371/journal.pone.0236148
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Characteristics of the study cohort.
| Demographics | ME/CFS (n = 50) | Control (n = 50) | ME/CFS (n = 39) | Control (n = 41) | |
|---|---|---|---|---|---|
| Female | 41 | 41 | 30 | 32 | |
| Male | 9 | 9 | 9 | 9 | |
| Mean (SEM) | 51.08 (11.19) | 51.32 (11.46) | 52.06 (10.87) | 51.43 (11.89) | |
| White | 49 | 48 | 39 | 39 | |
| Asian | 1 | 1 | 1 | 1 | |
| Other | 0 | 1 | 0 | 1 | |
| Not Hispanic or Latino | 46 | 45 | 37 | 37 | |
| Hispanic or Latino | 4 | 5 | 2 | 4 | |
| Miami, FL | 10 | 9 | 6 | 7 | |
| New York, NY | 14 | 14 | 12 | 12 | |
| Salt Lake City, UT | 14 | 15 | 11 | 12 | |
| Sierra, NV | 12 | 12 | 10 | 10 | |
| Summer | 27 | 26 | 22 | 22 | |
| Fall | 23 | 24 | 17 | 19 | |
| Yes | 24 | 1 | 18 | 1 | |
| No | 26 | 49 | 21 | 40 | |
| Overweight (>25) | 28 | 22 | 24 | 18 | |
| Normal (< 25) | 22 | 28 | 15 | 23 | |
| < 3 years | 4 | N/A | 2 | N/A | |
| >3 years | 46 | N/A | 37 | N/A | |
| Physical Functioning | 40.5 (26.71) | 96.1 (6.95) | 40.13 (27.08) | 95.85 (7.41) | |
| Physical Limitations | 8 (22.27) | 97 (14.85) | 9.62 (24.75) | 98.78 (5.45) | |
| Emotional Limitations | 53.33 (47.62) | 96 (15.99) | 54.7 (48.66) | 98.37 (7.27) | |
| Energy/Fatigue | 15.6 (17.46) | 74.77 (15.58) | 16.03 (18.25) | 73.25 (14.43) | |
| Emotional Well-being | 63.44 (19.94) | 81.6 (15.25) | 65.44 (20.05) | 81.27 (11.95) | |
| Social Functioning | 32.75 (25.24) | 93.25 (13.88) | 32.37 (26.39) | 93.6 (13.15) | |
| Pain | 46 (27.17) | 91.8 (10.06) | 44.04 (28.75) | 90.79 (10.3) | |
| General Health | 26.38 (15.33) | 83.35 (13.88) | 26.63 (16.96) | 82.26 (14.07) | |
The demographics and characteristics of the whole study cohort (n = 100) as well as the subset used for the analysis (n = 80) are shown. ME/CFS: myalgic encephalomyelitis/chronic fatigue syndrome, sr-IBS: self- reported irritable bowel syndrome, BMI: body mass index, SEM: standard error of mean, SF-36: Short form 36 health survey; score on 0–100 scale with 0 = poor and 100 = excellent.
Fig 1Quadratic effect of immunoglobulin proteins with ME/CFS and ME/CFS subgroups.
Two separate models were fitted: one with only the linear term of the protein levels, and one with both linear and quadratic terms of the protein levels as independent variables. In both models we adjusted for BMI, sr-IBS, antidepressant medication use, age, sex, race/ethnicity, geographic/clinical site and season of sampling. Likelihood-ratio tests were used to compare the goodness-of-fit between the two nested models. The Hochberg step-up procedure was applied to correct for the multiple tests over the annotated proteins, controlling the family-wise error rate (FWER) at the level of 0.05. For the protein analytes associated with ME/CFS with significant quadratic effect, adjusted odds ratios (aORs), together with their 95% confidence intervals (95% CI), were calculated comparing ME/CFS risk of various protein levels to that of the reference level at which the ME/CFS risk was at the lowest. (A) All ME/CFS cases versus controls, (B) ME/CFS cases with sr-IBS versus controls, (C & D) ME/CFS cases without sr-IBS versus controls. ME/CFS: myalgic encephalomyelitis/chronic fatigue syndrome, a.u.: arbitrary units, sr-IBS: self-reported irritable bowel syndrome, IGHV: immunoglobulin heavy variable, IGLC: immunoglobulin lambda constant, IGKV: immunoglobulin kappa variable.
Quadratic relationship of immunoglobulin proteins with ME/CFS and ME/CFS subgroups.
| Group | Protein | Reference Level | Comparison | aOR | 95% CI | Protein | p-value |
|---|---|---|---|---|---|---|---|
| IGHV3-23/30 | 51,000 | decreased to 25,000 | 5.646 | 1.179 | 27.035 | 0.0303 | |
| increased to 100,000 | 4.439 | 1.289 | 15.286 | 0.0182 | |||
| IGLC7 | 3,326 | decreased to 1,500 | 3.851 | 1.115 | 13.303 | 0.033 | |
| increased to 7,000 | 3.257 | 1.216 | 8.722 | 0.019 | |||
| IGKV3(D)-11 | 544,370 | decreased to 171,000 | 59.492 | 1.062 | 3332.100 | 0.047 | |
| increased to 1,100,000 | 4.527 | 1.138 | 18.001 | 0.032 | |||
| IGHV3-23/30 | 51,000 | decreased to 25,000 | 6.582 | 1.244 | 34.816 | 0.027 | |
| increased to 100,000 | 4.545 | 1.284 | 16.086 | 0.019 |
Reference levels based on relative intensity are shown for each protein as well as the aOR, 95% CI and p-value when increasing and decreasing from this point for all ME/CFS patients, ME/CFS patients with sr-IBS and ME/CFS patients without sr-IBS, when compared to the control group. ME/CFS: myalgic encephalomyelitis/chronic fatigue syndrome, sr-IBS: self-reported irritable bowel syndrome, BMI: body mass index, IGHV: immunoglobulin heavy variable; IGLC: immunoglobulin lambda constant; IGKV: immunoglobulin kappa variable, aOR: adjusted odds ratio, CI: confidence interval.
Potential plasma protein biomarkers for ME/CFS.
| Gene Name | Uniprot ID | Direction | Lasso | Random Forest | XGBoost | |||
|---|---|---|---|---|---|---|---|---|
| Percentage | Rank | Mean Decrease in accuracy | Rank | Gain | Rank | |||
| P49913 | Increased | 22.80% | 1 | 0.1284 | 4 | 0.0652 | 2 | |
| P02750 | Decreased | 9.90% | 9 | 0.1302 | 3 | 0.0327 | 4 | |
| P05019 | Decreased | 3.90% | 19 | 0.1320 | 2 | 0.0318 | 6 | |
| P06396 | Decreased | 3.70% | 20 | 0.0743 | 9 | 0.0281 | 8 | |
| P35858 | Decreased | 11.60% | 7 | 0.0988 | 7 | 0.0292 | 7 | |
| P01700 | Decreased | 14.10% | 2 | 0.0639 | 14 | 0.0319 | 5 | |
| Q96P31 | Decreased | 4.70% | 17 | 0.0545 | 20 | 0.0127 | 17 | |
| Q9NQ79 | Decreased | 13.30% | 3 | 0.2653 | 1 | 0.1225 | 1 | |
| P49913 | Increased | 30.10% | 1 | 0.1772 | 2 | 0.0852 | 2 | |
| P01011 | Decreased | 4.20% | 16 | 0.0731 | 7 | 0.0249 | 6 | |
| P05019 | Decreased | 11.00% | 6 | 0.1768 | 3 | 0.1132 | 1 | |
| P19823 | Decreased | 13.60% | 4 | 0.1870 | 1 | 0.0529 | 4 | |
| A0A0C4DH31 | Decreased | 19.30% | 3 | 0.0535 | 17 | 0.0157 | 14 | |
| Q9NQ79 | Decreased | 4.80% | 13 | 0.0922 | 4 | 0.0577 | 3 | |
| Q15166 | Increased | 7.20% | 3 | 0.0601 | 19 | 0.0571 | 2 | |
| P01042 | Increased | 3.70% | 13 | 0.0674 | 17 | 0.0122 | 20 | |
| P02750 | Decreased | 5.50% | 6 | 0.0960 | 8 | 0.0400 | 4 | |
| A0M8Q6 | Decreased | 8.40% | 2 | 0.0664 | 18 | 0.0196 | 14 | |
| Q9NQ79 | Decreased | 3.90% | 12 | 0.1031 | 6 | 0.0740 | 1 | |
Proteins with more than 50% undetectable/filtere values were excluded. All 250 protein analytes were fitted as predictors in 3 different classifiers: Lasso, Random Forests, and XGBoost. Table shows the proteins that were ranked in the top 20 of importance measurements for all ME/CFS patients, ME/CFS patients with sr-IBS and ME/CFS patients without sr-IBS. Direction is measured relative to controls. ME/CFS: myalgic encephalomyelitis/chronic fatigue syndrome, sr-IBS: self-reported irritable bowel syndrome, CAMP: cathelicidin antimicrobial protein, LRG1: Leucin-rich glycoprotein 1, IGF1: insulin-like growth factor 1, IGFALS: Insulin-like growth factor-binding protein complex acid labile subunit, IGLV1-47: immunoglobulin lambda variable region 1–47, FCRL3: Fc receptor-like protein 3, SERPINA3: Alpha-1-antichymotrypsin, ITIH2: Inter-alpha-trypsin inhibitor heavy chain H2, IGHV1-18: immunoglobulin heavy variable region 1–18, PON3: Serum paraoxonase/lactonase 3, KNG1: Kininogen 1, IGLC7: immunoglobulin lambda constant region 7.
1Percentage: Lasso regularizes the least squares by adding a penalty term in which the L1 norm of the parameter vector is no greater than a given value, and increasing the penalty drives more coefficients of unimportant predictors to absolute zero. Therefore, measure of importance can be represented as the percentage of iterations (out of 1,000 random resampling cross-validation iterations) in which the predictor’s parameter estimate in the best fitting model is nonzero.
2Mean Decrease in Accuracy: Random Forests measures the mean decrease in accuracy when values of the predictor are randomly permuted. For unimportant predictors, the permutation should have little to no effect on model accuracy, while permuting values of important predictors should significantly decrease it.
3Gain: XGBoost measures the importance of predictors in ‘Gain’ to indicate the relative contribution of the corresponding predictor to the model calculated by taking each predictor’s contribution for each tree in the model.
4Rank: We selected the protein analytes that were ranked in the top 20 in all three importance measurements.
Fig 2Diagnostic performance (AUROC) of ME/CFS and ME/CFS subgroup plasma proteomes.
Three machine learning algorithms were used to examine the utility of the proteomics assay as a biomarker tool for ME/CFS: Lasso (least absolute shrinkage and selection operator), Random Forests, and XGboost. We fitted all protein analytes, excluding the ones with more than 50% undetectable/filtered values, as predictors in the three classifiers and measured the importance for each predictor in the classifiers. The protein analytes that were ranked in the top 20 in all three importance measurements were fitted in the classifiers again (Trimmed set), except that here we used the logistic regression model instead of Lasso. The predictive performance was evaluated in random resampling cross-validation (CV) with 1,000 iterations from which we calculated the Area under the Receiver Operating Characteristic curve (AUROC) values and generated Receiver Operating Characteristic (ROC) curves for (A) all ME/CFS cases, (B) ME/CFS cases with sr-IBS and (C) ME/CFS cases without sr-IBS. ME/CFS: myalgic encephalomyelitis/chronic fatigue syndrome, sr-IBS: self-reported irritable bowel syndrome.