| Literature DB >> 35008745 |
Nicolai Bjødstrup Palstrøm1,2, Aleksandra M Rojek1,3, Hanne E H Møller1,3, Charlotte Toftmann Hansen1,4, Rune Matthiesen5, Lars Melholt Rasmussen1,2,6, Niels Abildgaard1,4,6,7, Hans Christian Beck1,2,6.
Abstract
Amyloidosis is a rare disease caused by the misfolding and extracellular aggregation of proteins as insoluble fibrillary deposits localized either in specific organs or systemically throughout the body. The organ targeted and the disease progression and outcome is highly dependent on the specific fibril-forming protein, and its accurate identification is essential to the choice of treatment. Mass spectrometry-based proteomics has become the method of choice for the identification of the amyloidogenic protein. Regrettably, this identification relies on manual and subjective interpretation of mass spectrometry data by an expert, which is undesirable and may bias diagnosis. To circumvent this, we developed a statistical model-assisted method for the unbiased identification of amyloid-containing biopsies and amyloidosis subtyping. Based on data from mass spectrometric analysis of amyloid-containing biopsies and corresponding controls. A Boruta method applied on a random forest classifier was applied to proteomics data obtained from the mass spectrometric analysis of 75 laser dissected Congo Red positive amyloid-containing biopsies and 78 Congo Red negative biopsies to identify novel "amyloid signature" proteins that included clusterin, fibulin-1, vitronectin complement component C9 and also three collagen proteins, as well as the well-known amyloid signature proteins apolipoprotein E, apolipoprotein A4, and serum amyloid P. A SVM learning algorithm were trained on the mass spectrometry data from the analysis of the 75 amyloid-containing biopsies and 78 amyloid-negative control biopsies. The trained algorithm performed superior in the discrimination of amyloid-containing biopsies from controls, with an accuracy of 1.0 when applied to a blinded mass spectrometry validation data set of 103 prospectively collected amyloid-containing biopsies. Moreover, our method successfully classified amyloidosis patients according to the subtype in 102 out of 103 blinded cases. Collectively, our model-assisted approach identified novel amyloid-associated proteins and demonstrated the use of mass spectrometry-based data in clinical diagnostics of disease by the unbiased and reliable model-assisted classification of amyloid deposits and of the specific amyloid subtype.Entities:
Keywords: amyloidosis; laser microdissection; machine learning; mass spectrometry; proteomics
Mesh:
Substances:
Year: 2021 PMID: 35008745 PMCID: PMC8745254 DOI: 10.3390/ijms23010319
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Signature proteins for amyloid deposits. The Boruta feature selection method was applied to estimate the capacity of specific proteins to differentiate between Congo Red-positive and negative samples determined by the calculated “mean importance score “. The quantitative readout from the proteomic analysis (number of peptide spectrum matches for each protein) of the 153 biopsies divided into a training set (70% of the biopsies randomly chosen among CR+ and CR− samples) and a test set (30% of the biopsies randomly chosen among CR+ and CR− samples).
| Rank | #Accession | Protein Name | Mean Importance |
|---|---|---|---|
| 1 | P10909 | Clusterin | 10.94 |
| 2 | P02743 | Serum amyloid P-component | 10.47 |
| 3 | P06727 | Apolipoprotein A-IV | 9.99 |
| 4 | P04004 | Vitronectin | 9.20 |
| 5 | P02649 | Apolipoprotein E | 8.94 |
| 6 | P02748 | Complement component C9 | 7.25 |
| 7 | P12109 | Collagen alpha-1(VI) chain | 6.25 |
| 8 | P12110 | Collagen alpha-2(VI) chain | 5.19 |
| 9 | P12111 | Collagen alpha-3(VI) chain | 4.93 |
| 10 | P23142 | Fibulin-1 | 4.83 |
The capability of the identified amyloid signature proteins, and combinations of these, to discriminate amyloid-containing biopsies from CR-negative samples. Support Vector Machine algorithm was developed based on the quantitative readout (number of peptide spectrum matches) of each of the identified amyloid signature protein (Table 1) from the proteomics analysis of the biopsies from the training set. The test data set consisted of 22 amyloid-containing biopsies (“+”) and 23 corresponding controls without amyloid (“−”).
| Signature Protein (s) | Correct/Total | Sensitivity | Specificity | PPV | NPV | Accuracy |
|---|---|---|---|---|---|---|
| ApoA4 | +: 21/22 | 0.95 | 0.96 | 0.95 | 0.96 | 0.96 |
| ApoE | +: 22/22 | 1.00 | 0.96 | 0.96 | 1.00 | 0.98 |
| SAP | +: 21/22 | 0.95 | 1.00 | 1.00 | 0.96 | 0.98 |
| Clusterin | +: 20/22 | 0.90 | 1.00 | 1.00 | 0.92 | 0.96 |
| Vitronectin | +: 6/22 | 0.27 | 1.00 | 1.00 | 0.59 | 0.64 |
| Complement C9 | +: 2/22 | 0.09 | 1.00 | 1.00 | 0.53 | 0.56 |
| Collagen alpha-1(VI) chain | +: 14/22 | 0.64 | 0.91 | 0.88 | 0.72 | 0.78 |
| Collagen alpha-2(VI) chain | +: 13/22 | 0.59 | 0.87 | 0.81 | 0.69 | 0.73 |
| Collagen alpha-3(VI) chain | +: 13/22 | 0.59 | 0.87 | 0.81 | 0.69 | 0.73 |
| Fibulin-1 | +: 7/22 | 0.32 | 1.00 | 1.00 | 0.61 | 0.67 |
| ApoA4 and ApoE | +: 20/22 | 0.91 | 0.96 | 0.95 | 0.92 | 0.93 |
| ApoA4, ApoE, Clusterin | +: 22/22 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| ApoA4, ApoE, Vitronectin | +: 22/22 | 1.00 | 0.96 | 0.96 | 1.00 | 0.98 |
| ApoA4, ApoE, Complement C9 | +: 22/22 | 1.00 | 0.96 | 0.96 | 1.00 | 0.98 |
| ApoA4, ApoE, Collagen alpha−1(VI) chain | +: 20/22 | 0.91 | 0.96 | 0.95 | 0.92 | 0.93 |
| ApoA4, ApoE, Collagen alpha−2(VI) chain | +: 20/22 | 0.91 | 1.00 | 1.00 | 0.92 | 0.96 |
| ApoA4, ApoE, Collagen alpha−3(VI) chain | +: 21/22 | 0.95 | 0.96 | 0.95 | 0.96 | 0.96 |
| ApoA4, ApoE, Fibulin−1 | +: 20/22 | 0.91 | 1.00 | 1.00 | 0.92 | 0.96 |
| ApoA4, ApoE and SAP | +: 22/22 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Validation of the SVM-based models for recognizing tissue samples with amyloid deposits by testing on blinded validation data set consisting of 103 amyloid-containing tissue samples of different organ origin collected from amyloidosis patients.
| Amyloid Signature Proteins | Correct/Total | Accuracy |
|---|---|---|
| ApoA4 | 99/103 | 0.96 |
| ApoE | 99/103 | 0.96 |
| SAP | 103/103 | 1.00 |
| Clusterin | 97/103 | 0.94 |
| Vitronectin | 64/103 | 0.62 |
| Complement C9 | 26/103 | 0.25 |
| Collagen alpha-1(VI) chain | 83/103 | 0.81 |
| Collagen alpha-2(VI) chain | 82/103 | 0.80 |
| Collagen alpha-3(VI) chain | 84/103 | 0.82 |
| Fibulin-1 | 72/103 | 0.70 |
| ApoA4 and ApoE | 100/103 | 0.97 |
| ApoA4, ApoE, Clusterin | 101/103 | 0.98 |
| ApoA4, ApoE and Vitronectin | 102/103 | 0.99 |
| ApoA4, ApoE and Complement C9 | 102/103 | 0.99 |
| ApoA4, ApoE and Collagen alpha-1(VI) chain | 102/103 | 0.99 |
| ApoA4, ApoE and Collagen alpha-2(VI) chain | 102/103 | 0.99 |
| ApoA4, ApoE and Collagen alpha-3(VI) chain | 102/103 | 0.99 |
| ApoA4, ApoE, and Fibulin-1 | 101/103 | 0.98 |
| ApoA4, ApoE and SAP | 103/103 | 1.00 |
Subtype-specific proteins included in subtype classification model. These proteins are commonly observed in high levels in relation to their respective subtype and were, therefore, selected for the classifier.
| #Accession | Protein Name | Subtype |
|---|---|---|
| P0DJI8 | Serum amyloid A-1 | AA |
| P0DJI9 | Serum amyloid A-2 | |
| P0DOX7 | Immunoglobulin kappa light chain | AL-kappa |
| P01834 | Immunoglobulin kappa constant | |
| P0DOX8 | Immunoglobulin lambda-1 light chain | AL-lambda |
| P0DOY2 | Immunoglobulin lambda constant 2 | |
| P02766 | Transthyretin | ATTR |
SVM-based classification of prospectively collected amyloid-containing samples. The validation set consisted of 103 Congo positive cases with confirmed diagnosis by IEM. In total, 69 ATTR cases, 21 AL-L cases and 4 AL-K was included in the validation set. For the 4-protein model, four ATTR-cases were misclassified as AL-L, whereas for the 7-protein model one ATTR sample was misclassified as AL-K.
| Subtype | Amyloidogenic Proteins | Correct/Total | Sensitivity | Specificity | PPV | NPV | Accuracy |
|---|---|---|---|---|---|---|---|
| AA | Serum amyloid A-1 protein (SA) | 6/6 | 1.00 | 1.00 | 1.00 | 1.00 | 0.96 */0.99 |
| SA and Serum amyloid A-2 protein | 6/6 | 1.00 | 1.00 | 1.00 | 1.00 | ||
| AL-K | Immunoglobulin kappa light chain (IgK) | 4/4 | 1.00 | 1.00 | 1.00 | 1.00 | |
| IgK and Ig kappa constant | 4/4 | 1.00 | 0.99 | 0.80 | 1.00 | ||
| AL-L | Ig lambda-1 light chain (IgL-1) | 25/25 | 1.00 | 0.95 | 0.86 | 1.00 | |
| IgL-1 and Ig lambda constant 2 | 25/25 | 1.00 | 1.00 | 1.00 | 1.00 | ||
| ATTR | Transthyretin (four-protein model) | 64/68 | 0.94 | 1.00 | 1.00 | 0.90 | |
| Transthyretin (seven-protein model) | 67/68 | 0.99 | 1.00 | 1.00 | 0.97 |
* Accuracy for the four protein model.