| Literature DB >> 28266614 |
G Craig Wood1, Xin Chu1, George Argyropoulos2, Peter Benotti1, David Rolston1, Tooraj Mirshahi1, Anthony Petrick1, John Gabrielson1, David J Carey1, Johanna K DiStefano3, Christopher D Still1, Glenn S Gerhard1,2.
Abstract
Non-alcoholic fatty liver disease (NAFLD) represents a spectrum of conditions that include steatohepatitis and fibrosis that are thought to emanate from hepatic steatosis. Few robust biomarkers or diagnostic tests have been developed for hepatic steatosis in the setting of obesity. We have developed a multi-component classifier for hepatic steatosis comprised of phenotypic, genomic, and proteomic variables using data from 576 adults with extreme obesity who underwent bariatric surgery and intra-operative liver biopsy. Using a 443 patient training set, protein biomarker discovery was performed using the highly multiplexed SOMAscan® proteomic assay, a set of 19 clinical variables, and the steatosis predisposing PNPLA3 rs738409 single nucleotide polymorphism genotype status. The most stable markers were selected using a stability selection algorithm with a L1-regularized logistic regression kernel and were then fitted with logistic regression models to classify steatosis, that were then tested against a 133 sample blinded verification set. The highest area under the ROC curve (AUC) for steatosis of PNPLA3 rs738409 genotype, 8 proteins, or 19 phenotypic variables was 0.913, whereas the final classifier that included variables from all three domains had an AUC of 0.935. These data indicate that multi-domain modeling has better predictive power than comprehensive analysis of variables from a single domain.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28266614 PMCID: PMC5339694 DOI: 10.1038/srep43238
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Characteristics of discovery and validation sets.
| Variable | Measure | Discovery group N = 443 | Validation group N = 134 | p-value |
|---|---|---|---|---|
| Age, years | Mean (SD) | 46.2 (10.7) | 46.3 (11.1) | 0.9431 |
| Sex | Female, % (n) | 82% (n = 363) | 84% (n = 112) | 0.6632 |
| Male, % (n) | 18% (n = 80) | 16% (n = 22) | ||
| Race | White, % (n) | 99% (n = 439) | 99% (n = 133) | 0.9993 |
| Black, % (n) | < 1% (n = 2) | 1% (n = 1) | ||
| Other, % (n) | <1% (n = 2) | 0% (n = 0) | ||
| BMI, kg/m2 | Mean (SD) | 49.2 (9.0) | 49.2 (8.4) | 0.9791 |
| Diabetes | Yes, % (n) | 41% (n = 180) | 41% (n = 55) | 0.9322 |
| Hypertension | Yes, % (n) | 47% (n = 209) | 44% (n = 59) | 0.5222 |
| Dyslipidemia | Yes, % (n) | 37% (n = 163) | 43% (n = 57) | 0.2302 |
| ALT, U/L | Median [IQR] | 27 [20, 39] | 26 [19, 38] | 0.4984 |
| AST, U/L | Median [IQR] | 24 [19, 33] | 24 [20, 30] | 0.6474 |
| Cholesterol, md/dL | Mean (SD) | 187.7 (40.3) | 188.2 (39.5) | 0.8971 |
| HDL, md/dL | Mean (SD) | 47.2 (11.5) | 46.2 (10.8) | 0.3511 |
| LDL, md/dL | Mean (SD) | 105.8 (33.6) | 107.4 (36.1) | 0.6301 |
| Triglycerides, md/dL | Median [IQR] | 152 [104, 208] | 180.6 (118.9) | 0.9114 |
| Platelet count, K/uL | Mean (SD) | 285.2 (72.3) | 294.7 (64.2) | 0.1751 |
| Steatosis | <5%, % (n) | 30% (n = 131) | 32% (n = 43) | 0.6122 |
| 5–33%, % (n) | 20% (n = 89) | 24% (n = 32) | ||
| 33–66%, % (n) | 26% (n = 117) | 24% (n = 32) | ||
| >66%, % (n) | 24% (n = 106) | 20% (n = 27) | ||
| Lobular inflammation | No foci, % (n) | 55% (n = 242) | 62% (n = 83) | 0.1572 |
| <2 foci*, % (n) | 37% (n = 162) | 34% (n = 45) | ||
| 2–4 foci*, % (n) | 9% (n = 39) | 4% (n = 6) | ||
| >4 foci*, % (n) | 0% (n = 0) | 0% (n = 0) | ||
| Fibrosis stage | None, % (n) | 59% (n = 262) | 64% (n = 86) | 0.0693 |
| 1, % (n) | 25% (n = 111) | 28% (n = 37) | ||
| 2, % (n) | 9% (n = 39) | 7% (n = 10) | ||
| 3, % (n) | 5% (n = 20) | 1% (n = 1) | ||
| 4, % (n) | 2% (n = 11) | 0% (n = 0) | ||
| PNPLA3** | CC, % (n) | 54% (n = 219) | 57% (n = 71) | 0.4072 |
| CG, % (n) | 40% (n = 163) | 35% (n = 43) | ||
| GG, % (n) | 6% (n = 23) | 8% (n = 10) |
Reference ranges: ALT (Male 5–52 U/L, Female 10–60 U/L), AST (Male 13–39 U/L, Female 10–42 U/L), Cholesterol (<200 mg/dL), HDL (> = 40 mg/dL), LDL (<130 mg/dL), Triglycerides (<150 mg/dL), Platelet Count (140–400 K/uL).
1Two-sample t-test; 2Chi-square test; 3Fisher’s Exact Test; 4Wilcoxon Rank-Sum test.
SD = standard deviation, IQR = Interquartile Range.
*per 200X field.
**PNPLA3 unknown for 48 patients (38 in discovery group and 10 in the validation group). Hardy-Weinberg test for equilibrium: p = 0.304 in discovery group and p = 0.344 in validation group.
Figure 1Plot of unadjusted p-values for association between each protein expression and presence of any steatosis.
The labeled proteins were those selected for inclusion in final model.
Logistic regression model for NAFLD using selected protein biomarkers.
| Gene | Protein | Odds Ratio | [95% CI] | p-value |
|---|---|---|---|---|
| ACY1 | Aminoacylase-1 | 57.89 | [13.69, 244.90] | <0.0001 |
| SHBG | Sex hormone-binding globulin | 0.56 | [0.42, 0.75] | <0.0001 |
| CTSZ | Cathepsin Z | 0.69 | [0.48, 0.98] | 0.0400 |
| MET | Hepatocyte growth factor receptor | 0.60 | [0.43, 0.83] | 0.0020 |
| GSN | Gelsolin/GSN | 2.69 | [1.74, 4.16] | <0.0001 |
| LGALS3BP | Galectin-3 binding protein | 0.59 | [0.43, 0.79] | 0.0005 |
| CHL1 | Neural cell adhesion molecule L1-like protein | 2.20 | [1.42, 3.42] | 0.0004 |
| SERPINC1 | Antithrombin III | 0.68 | [0.49, 0.94] | 0.0185 |
The biomarkers were rescaled to the standard normal (mean = 0, SD = 1) before inclusion in the logistic regression model. Odds ratios can be interpreted as the odds of steatosis for each 1 standard deviation increase in the protein expression level.
Area under (AUC) the receiver operating characteristic (ROC) curve based on a c-statistic.
| Model | Discovery | Validation | ||
|---|---|---|---|---|
| AUC | 95%CI | AUC | 95%CI | |
| 1. GENOMIC only | 0.596 | [0.547, 0.645] | 0.610 | [0.519, 0.713] |
| 2. PHENOMIC only | 0.886 | [0.851, 0.918] | 0.778 | [0.693, 0.851] |
| 3. PHENO + GENO | 0.892 | [0.862, 0.924] | 0.782 | [0.710, 0.865] |
| 4. PROTEOMIC ONLY | 0.913 | [0.882, 0.937] | 0.864 | [0.793, 0.927] |
| 5. PROTEO + GENO | 0.920 | [0.892, 0.946] | 0.889 | [0.832, 0.945] |
| 6. PROTEO + PHENO | 0.932 | [0.904, 0.955] | 0.892 | [0.840, 0.943] |
| 7. 3 DOMAIN MODEL | 0.935 | [0.913, 0.959] | 0.914 | [0.871, 0.957] |
Figure 2ROC curves for Discovery Model*.
*Note that PNPLA3 was included as the number of alleles (0, 1, 2) and was treated as an ordinal variable. Those with unknown PNPLA3 status were included in the model by using a common missing data strategy (i.e. treating them as a separate subgroup).
Figure 3ROC curves for Validation model.