| Literature DB >> 35684402 |
Christine Bollwein1, Juliana Pereira Lopes Gonҫalves1, Kirsten Utpatel2, Wilko Weichert1, Kristina Schwamborn1.
Abstract
Pancreatic ductal adenocarcinoma and cholangiocarcinoma constitute two aggressive tumor types that originate from the epithelial lining of the excretory ducts of the pancreatobiliary tract. Given their close histomorphological resemblance, a correct diagnosis can be challenging and almost impossible without clinical information. In this study, we investigated whether mass spectrometric peptide features could be employed to distinguish pancreatic ductal adenocarcinoma from cholangiocarcinoma. Three tissue microarrays of formalin-fixed and paraffin-embedded material (FFPE) comprising 41 cases of pancreatic ductal adenocarcinoma and 41 cases of cholangiocarcinoma were analyzed by matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI). The derived peptide features and respective intensities were used to build different supervised classification algorithms: gradient boosting (GB), support vector machine (SVM), and k-nearest neighbors (KNN). On a pixel-by-pixel level, a classification accuracy of up to 95% could be achieved. The tentative identification of discriminative tryptic peptide signatures revealed proteins that are involved in the epigenetic regulation of the genome and tumor microenvironment. Despite their histomorphological similarities, mass spectrometry imaging represents an efficient and reliable approach for the distinction of PDAC from CC, offering a promising complementary or alternative approach to the existing tools used in diagnostics such as immunohistochemistry.Entities:
Keywords: MALDI-MSI; cholangiocarcinoma; machine learning; pancreatic ductal adenocarcinoma; peptides; proteomics; supervised classification; tandem mass spectrometry
Mesh:
Substances:
Year: 2022 PMID: 35684402 PMCID: PMC9182561 DOI: 10.3390/molecules27113464
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Figure 1One exemplary core of a cholangiocarcinoma (CC) (top) and a pancreatic ductal adenocarcinoma (PDAC) (bottom), without (left side) and with annotation (right side).
Characteristics of the study population.
| Pancreatic Carcinoma (PDAC) | Cholangiocarcinoma (CC) | ||
|---|---|---|---|
|
| 66 ± 11.4 | 64 ± 11.9 | 0.49 † |
|
| 73.2 | 51.2 | 0.04 *‡ |
|
| |||
|
| 26.8 | 70.7 | < 0.01 *‡ |
|
| 73.2 | 29.3 | |
|
| |||
|
| 95.1 | 63.4 | < 0.01 *‡ |
|
| 4.9 | 36.6 | |
|
| |||
|
| 51.2 | 51.2 | 1.00 ‡ |
|
| 48.8 | 48.8 | |
|
| |||
|
| 53.7 | 73.2 | 0.07 ‡ |
|
| 46.3 | 26.8 |
* Statistically significant; † student’s t-test; ‡ Chi-squared test.
Figure 2ROC curves for the binary classification task of differentiating between pancreatic ductal adenocarcinoma and cholangiocarcinoma using (a) support vector machine, (b) gradient boosting, and (c) k-nearest neighbors.
Performance metrics for the binary classification task of differentiating between pancreatic ductal adenocarcinoma and cholangiocarcinoma using gradient boosting, support vector machine, and k-nearest neighbors on the whole dataset.
| Classification Algorithm | Accuracy | FNR * | FPR ** | TNR *** | TPR **** | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Class | CC | PDAC | CC | PDAC | CC | PDAC | CC | PDAC | CC | PDAC |
| Support vector machine | 0.91 | 0.91 | 0.03 | 0.24 | 0.24 | 0.03 | 0.76 | 0.97 | 0.97 | 0.76 |
| Gradient boosting | 0.88 | 0.88 | 0.02 | 0.34 | 0.34 | 0.02 | 0.67 | 0.98 | 0.98 | 0.67 |
| K-nearest neighbors | 0.86 | 0.86 | 0.14 | 0.14 | 0.14 | 0.14 | 0.86 | 0.87 | 0.87 | 0.86 |
* False negative rate; ** False positive rate; *** True negative rate; **** True positive rate.
Performance metrics for the binary classification task of differentiating between pancreatic ductal adenocarcinoma and cholangiocarcinoma using gradient boosting, support vector machine, and k-nearest neighbors on the balanced dataset.
| Classification Algorithm | Accuracy | FNR * | FPR ** | TNR *** | TPR **** | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Class | CC | PDAC | CC | PDAC | CC | PDAC | CC | PDAC | CC | PDAC |
| Support vector machine | 0.89 | 0.89 | 0.16 | 0.07 | 0.07 | 0.16 | 0.93 | 0.84 | 0.84 | 0.93 |
| Gradient boosting | 0.88 | 0.88 | 0.19 | 0.07 | 0.07 | 0.19 | 0.93 | 0.82 | 0.82 | 0.93 |
| K-nearest neighbors | 0.85 | 0.85 | 0.26 | 0.07 | 0.07 | 0.26 | 0.93 | 0.74 | 0.74 | 0.93 |
* False negative rate; ** False positive rate; *** True negative rate; **** True positive rate.
Figure 3Feature importance using the mean decrease in the impurity of gradient-boosting classification on whole dataset (left side) and balanced dataset (right side) (top 10 features with red bars, others with blue bars).
Tentative MS/MS identification.
| Observed | Mr (Expect) | Mr (Calc) | Error (Da) | Peptide Sequence | Protein | Modifications |
|---|---|---|---|---|---|---|
| 850.4 | 850.5 | 849.5 | 0.03 | R.HLQLAIR.N | Histone H2A | |
| 944.5 | 944.6 | 943.5 | 0.10 | R.AGLQFPVGR.I | Histone H2A | |
| 1105.5 | 1105.5 | 1104.6 | 0.17 | R.GVQGPPGPAGPR.G | Collagen alpha-1(I) chain | Oxidation ( |
| 2056.0 | 2056.0 | 2056.0 | 0.43 | K.TGPPGPAGQDGRPGPPGPPGAR.G | Collagen alpha-1(I) chain | Oxidation ( |
| 2073.0 | 2073.0 | 2072.0 | 0.24 | K.GSPGADGPAGAPGTPGPQGIAGQR.G | Collagen alpha-1(I) chain(P02452) | |
| 1198.7 | 1198.7 | 1197.7 | 0.22 | R. AVFPSIVGRPR.H | Actin * |
* possible underlying isoforms: ACTA1 (P68133) or ACTA2 (P62736), ACTAB (P60709), ACTG1 (P63261), ACTG2 (P63267), POTEI (P0CG38), POTEKP (Q9BYX7), POTEF (A5A3E0) or POTEE (Q6S8J3).