| Literature DB >> 29872119 |
Margaret Doherty1,2, Evropi Theodoratou3,4, Ian Walsh5, Barbara Adamczyk6,7, Henning Stöckmann6, Felix Agakov8, Maria Timofeeva4, Irena Trbojević-Akmačić9, Frano Vučković9, Fergal Duffy6, Ciara A McManus6, Susan M Farrington4, Malcolm G Dunlop4, Markus Perola10, Gordan Lauc9,11, Harry Campbell3,4, Pauline M Rudd6.
Abstract
Aberrant glycosylation has been associated with a number of diseases including cancer. Our aim was to elucidate changes in whole plasma N-glycosylation between colorectal cancer (CRC) cases and controls in one of the largest cohorts of its kind. A set of 633 CRC patients and 478 age and gender matched controls was analysed. Additionally, patients were stratified into four CRC stages. Moreover, N-glycan analysis was carried out in plasma of 40 patients collected prior to the initial diagnosis of CRC. Statistically significant differences were observed in the plasma N-glycome at all stages of CRC, this included a highly significant decrease in relation to the core fucosylated bi-antennary glycans F(6)A2G2 and F(6)A2G2S(6)1 (P < 0.0009). Stage 1 showed a unique biomarker signature compared to stages 2, 3 and 4. There were indications that at risk groups could be identified from the glycome (retrospective AUC = 0.77 and prospective AUC = 0.65). N-glycome biomarkers related to the pathogenic progress of the disease would be a considerable asset in a clinical setting and it could enable novel therapeutics to be developed to target the disease in patients at risk of progression.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29872119 PMCID: PMC5988698 DOI: 10.1038/s41598-018-26805-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The Oxford notation by example. Inset: the most common residues in N-glycans and their linkages. Note that linkage information may be left ambiguous e.g. F(6)A1[3] may be written FA1.
Figure 2(a) A representative chromatogram from human plasma N-glycome and peak assignments from the CRC cohort. Significant peaks (found on training set of 625 patients vs. 468 control) are coloured in red (increased in CRC) and blue (decreased in CRC). ‘*’ indicates one of the top five peak abundance changes (i.e. lowest p-value). (b) Significant peaks are marked decreased (blue) or increased (red) in all CRC and four stages of CRC.
Descriptive information for 625 CRC patients vs. 468 healthy controls.
| Feature | CRC (n = 625) | Control (n = 468) | P-value |
|---|---|---|---|
| Age | 52 (47–55) | 53 (48–56) | 2.32E-02 |
| BMI | 26 (23–29) | 28 (26–31) |
|
| CRP | 0.36 (0.12–1.635) | 0.18 (0.09–0.60) |
|
| KCalories | 2527 (2006–3366) | 2572 (2055–3238) | 4.63E-01 |
| Gender | 54.24/45.76 | 55.34/44.66 | 7.59E-01 |
| Family history | 67.84%/27.04%/5.12% | 96.03%/1.05%/2.93% |
|
| Physical activity | 58.08%/14.08%/27.84% | 62.18%/14.96%/22.86% | 3.49E-01 |
| Smoking status | 36.96%/37.44%/25.60% | 38.25%/41.03%/20.73% | 2.62E-01 |
| NSAIDs | 60.00%/15.04%/24.96% | 59.62%/19.66%/20.73% | 6.62E-02 |
Underlined p-values show significant differences between CRC and control. To highlight biological variation for continuous variables the median and interquartile ranges (IQR) are shown. IQR shows the spread of the continuous variables. For categorical features the basic counts are shown.
Plasma glycome composition in CRC patients and controls. Only the main derived traits describing glycome composition are shown.
| Glycan trait | CRC (n = 625) (median[IQR]) | Control (n = 468) (median[IQR]) | Δ peak area* | p-values |
|---|---|---|---|---|
| G0 | 2.23 (1.59–3.02) | 2.01 (1.44–2.78) | 0.29 | 1.41E-04 |
| G1 | 8.49 (7.14–9.63) | 8.695 (7.5–9.8175) | −0.22 | 1.99E-02 |
| G2 | 65.34 (63.04–66.95) | 66.6 (64.82–68.05) | −1.79 | 1.27E-11 |
| G3 | 13.97 (12.14–15.99) | 13.295 (11.7225–14.98) | 0.63 | 6.25E-10 |
| G4 | 6.5 (5.26–8.05) | 6.02 (5.03–7.0375) | 0.81 | 2.87E-08 |
| S0neutral | 11.91 (9.79–13.73) | 12.215 (10.2075–14) | −0.18 | 1.13E-01 |
| S1 | 23.72 (21.95–25.56) | 24.29 (22.9125–25.825) | −0.72 | 4.94E-06 |
| S2 | 45.15 (43.41–46.83) | 45.465 (43.785–47.14) | −0.49 | 7.89E-01 |
| S3 | 13.99 (12.36–15.92) | 13.24 (12.005–14.5475) | 0.93 | 2.37E-11 |
| S4 | 1.96 (1.6–2.44) | 1.72 (1.45–2.02) | 0.34 | 2.74E-06 |
| CoreFall | 28.43 (25.16–31.28) | 29.135 (26.535–32.105) | −1.06 | 4.30E-03 |
| CoreFneutral | 18.07 (15.23–20.52) | 18.78 (16.4025–21.095) | −0.89 | 1.51E-03 |
| CoreFneutralG1G2 | 6.05 (4.93–7.28) | 6.685 (5.515–7.8675) | −0.65 | 3.57E-09 |
| OuterF | 14.98 (13.27–16.58) | 14.885 (13.47–16.46) | 0.07 | 6.35E-01 |
| Ball | 5.06 (4.08–5.85) | 5.115 (4.29–5.9575) | −0.16 | 2.58E-01 |
| Bneutral | 4.03 (3.61–4.52) | 3.95 (3.53–4.43) | 0.15 | 3.46E-02 |
| Tail | 2.88 (2.43–3.51) | 2.63 (2.26–2.98) | 0.46 | 9.94E-09 |
Directly measured glycan structures are available in Supplementary Table 1. Description of each derived trait is given in Supplementary Table 4. Bonferroni correction for multiple testing (P-values significance threshold <0.05/17 (0.003). *The difference between the mean peak areas (CRC – control). #Results reported on this column. G – galactose; F – fucose; B – bisecting GlcNAc, S – sialic acid. To highlight biological variation the median and interquartile ranges (IQR) are shown. IQR shows the spread of the relative abundances (i.e. summed peak areas).
Figure 3Boxplots showing increased and decreased glycan abundance for the derived glycome traits. The dots represent an individual’s relative abundance for the trait. Statistically significant traits can be found in Table 2.
Comparison between the IgG glycan markers[19] to the plasma glycan markers found in this work.
| Predominant Glycan | Plasma CRC change | This work’s Peak | IgG change CRC[ | IgG Peak from[ | Likely plasma protein(s)* | Amino acid | Protein plasma concentration (% of total)^ |
|---|---|---|---|---|---|---|---|
| FA2 | Increased | GP01 | Increased | GP4 | IgG | Asn297, Asn322 (IgG3) | 40.4% |
| FA2[6]G1 | Decreased | GP04 | Decreased | GP08 | IgG | Asn297, Asn322 (IgG3) | 40.4% |
| FA2[3]G1 | Decreased | GP05 | Decreased | GP09 | IgG | Asn297, Asn322 (IgG3) | 40.4% |
| A2[3]G1S[3]1 | Decreased | GP08 | NF | NF | ? | ? | ? |
| A2G2 | Decreased | GP08 | Decreased | GP12 | IgG, Apo | Asn297 (IgG), Asn322 (IgG3) | 40.4% (IgG), 0.5% (Apo) |
| FA2G2 | Decreased | GP11 | Decreased | GP14 | IgG | Asn297 (IgG), Asn322 (IgG3) | 40.4% |
| FA2BG2 | Decreased | GP12 | Decreased | GP15 | IgG | Asn297 (IgG), Asn322 (IgG3) | 40.4% |
| FA2G2S1 | Decreased | GP18 | Decreased | GP18 | IgG, IgA, IgE, IgD, IgM, A2M | Asn297 (IgG), Asn322 (IgG3), Asn340(IgA), Asn46(IgM), Asn209(IgM), Asn272(IgM) | 40.4% (IgG), 9.0% (IgA), 0.1% (IgD), 5.0% (IgM) |
| A2F1G2S1 | Decreased | GP20 | NF | NF | Apo D, Haptoglobin | Asn98 (Apo D), Asn184(Hapto), Asn207 (Hapto), Asn241 (Hapto) | 0.3% (ApoD), 4.5% (Hapto) |
| A2BG2S2 | Increased | GP24 | Decreased | GP22 | IgG | Asn297 (IgG), Asn322 (IgG3) | 40.4% |
| FA3G3S1 | Increased | GP24 | NF | NF | ? | ? | ? |
| FA3BG3S1 | Increased | GP24 | NF | NF | ? | ? | ? |
| A3G3S2 | Increased | GP27 | NF | NF | B2, Apo D, Hapto, Sero | Asn65 (Apo D), Asn162 (B2), Asn193 (B2), Asn184 (Hapto), Asn211 (Hapto), Asn241 (Hapto), Asn432 (Sero), Asn630 (Sero) | 0.7% (B2), |
| A3BG3S2 | Increased | GP27 | NF | NF | ? | ? | ? |
| FA3G3S3 | Increased | GP32 | NF | NF | ? | ? | ? |
| A4G4S3 | Increased | GP37 | NF | NF | AGP | Asn72, Asn93, Asn103 | 2.6% (AGP) |
| A4F1G3S3 | Increased | GP38 | NF | NF | AGP | Asn93 | 2.6% (AGP) |
| A4G4S4 | Increased | GP42 | NF | NF | AGP, Apo D, CP | Asn65 (Apo D), Asn762 (CP) | 2.6% (AGP), 1.2% (CP), |
| A4F1G4S4 | Increased | GP42 | NF | NF | AGP, CP | Asn762 (CP) | 2.6% (AGP), 1.2% (CP) |
*The possible plasma protein and amino acid site involved for each glycan derived from[21] and the previous IgG CRC study[19], in bold the highest concentration. NF: not found in the IgG profile. For simplicity we did not separate by linkage isomers since all followed the same trend (e.g. A4G4S[3,3,3,3]4 and A4G4S[3,3,3,6]4 both increased in plasma CRC). Abbreviated proteins (UniProt ID)- AGP: Alpha-1-acid glycoprotein (P02763; P19652), A2M: Alpha-2-macroglobulin (P01023), Apo: Apolipoprotein B-100 (P04114), Apo D: Apolipoprotein D (P05090), CP: Ceruloplasmin (P00450), Hapto: Haptoglobin (P00738), B2: Beta-2-glycoprotein I (P02749). Sero: Serotransferrin (P02787). IgA: Immunoglobulin A (P01876, P01877), IgD: Immunoglobulin D (P01880), IgE: Immunoglobulin E (P01854), IgG: Immunoglobulin G (P01857, P01859, P01860, P01861), IgM: Immunoglobulin M (P01876, P01877), ?: unknown. ^Approx. derived from[21].
10-fold cross validation model performance to identify CRC.
| Peaks + features | AUC | Sensitivity | Specificity | p-value |
|---|---|---|---|---|
| All peaks | 0.765 | 0.318 | 0.949 | 9.8E-02 |
| All peaks + clinical | 0.770 | 0.373 | 0.949 | * |
| All peaks + CRP | 0.763 | 0.322 | 0.949 | 8.93E-02 |
| Clinical only | 0.613 | 0.150 | 0.949 | <0.0001 |
| Clinical only - CRP | 0.529 | 0.075 | 0.949 | <0.0001 |
| CRP only | 0.569 | 0.153 | 0.949 | <0.0001 |
Discrimination models on the SOCCS dataset (625 CRC vs. 468 control). ‘All peaks’: 42 peak areas. ‘All peaks + clinical’: 42 peak areas, BMI, age, gender, smoking status, physical activity, NSAIDs intake, Kcalorie intake and CRP. ‘All peaks + CRP’: 42 peaks areas + CRP only. ‘Clinical only’: BMI, age, gender, smoking status, physical activity, NSAIDs intake, Kcalorie intake and CRP. ‘Clinical – CRP’ used all clinical features except family history and CRP. P-value column: ‘*’ best model, P-value AUC different from best model.
Figure 4Classification of CRC patients using plasma glycans. ROC curve illustrating the performance of perceptron model in discrimination between CRC patients and healthy controls from SOCCS retrospective study (625 CRC vs. 468 control). “All peaks’: 42 peak areas. ‘All peaks + clinical’: 42 peak areas, BMI, age, gender, smoking status, physical activity, NSAIDs intake, Kcalorie intake and CRP. All peaks + CRP: 42 peak areas and CRP only. ‘Clinical only’: BMI, age, gender, smoking status, physical activity, NSAIDs intake, Kcalorie intake and CRP. ‘CRP only: the only variable used is CRP’.
10-fold cross validation model performance to identify at risk groups.
| Peaks + features | AUC | Sensitivity | Specificity | p-value |
|---|---|---|---|---|
| All peaks + age | 0.651 | 0.125 | 0.950 | * |
| All peaks | 0.612 | 0.150 | 0.950 | <0.0001 |
Discrimination models tested on the FINRISK dataset. ‘All peaks’: 39 peak areas and ‘All peaks + age’: all 39 peak areas and age of each person in the FINRISK dataset. P-value column: ‘*’ best model, P-value AUC different from best model.