| Literature DB >> 27293431 |
Manuel Galli1, Italo Zoppis2, Gabriele De Sio1, Clizia Chinello1, Fabio Pagni3, Fulvio Magni1, Giancarlo Mauri2.
Abstract
Biomarkers able to characterise and predict multifactorial diseases are still one of the most important targets for all the "omics" investigations. In this context, Matrix-Assisted Laser Desorption/Ionisation-Mass Spectrometry Imaging (MALDI-MSI) has gained considerable attention in recent years, but it also led to a huge amount of complex data to be elaborated and interpreted. For this reason, computational and machine learning procedures for biomarker discovery are important tools to consider, both to reduce data dimension and to provide predictive markers for specific diseases. For instance, the availability of protein and genetic markers to support thyroid lesion diagnoses would impact deeply on society due to the high presence of undetermined reports (THY3) that are generally treated as malignant patients. In this paper we show how an accurate classification of thyroid bioptic specimens can be obtained through the application of a state-of-the-art machine learning approach (i.e., Support Vector Machines) on MALDI-MSI data, together with a particular wrapper feature selection algorithm (i.e., recursive feature elimination). The model is able to provide an accurate discriminatory capability using only 20 out of 144 features, resulting in an increase of the model performances, reliability, and computational efficiency. Finally, tissue areas rather than average proteomic profiles are classified, highlighting potential discriminating areas of clinical interest.Entities:
Year: 2016 PMID: 27293431 PMCID: PMC4886047 DOI: 10.1155/2016/3791214
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Table listing all the patients enrolled in the study, along with the cytological and histological diagnosis.
| Patient number | Cytological diagnosis | Histological diagnosis |
|---|---|---|
| Patient #1 | THY2 | Ben |
| Patient #2 | THY3 | Ben |
| Patient #3 | THY4 | PTC |
| Patient #4 | THY5 | PTC |
| Patient #5 | THY2 | Ben |
| Patient #6 | THY5 | PTC |
| Patient #7 | THY2 | Ben |
| Patient #8 | THY5 | PTC |
| Patient #9 | THY3 | PTC |
| Patient #10 | THY4 | PTC |
| Patient #11 | THY2 | Ben |
| Patient #12 | THY4 | PTC |
| Patient #13 | THY3 | Ben |
| Patient #14 | THY3 | PTC |
| Patient #15 | THY4 | PTC |
| Patient #16 | THY2 | Ben |
| Patient #17 | THY2 | Ben |
| Patient #18 | THY3 | PTC |
| Patient #19 | THY2 | Ben |
| Patient #20 | THY3 | Ben |
| Patient #21 | THY4 | PTC |
| Patient #22 | THY3 | Ben |
| Patient #23 | THY5 | PTC |
| Patient #24 | THY2 | Ben |
| Patient #25 | THY4 | PTC |
| Patient #26 | THY4 | PTC |
| Patient #27 | THY2 | Ben |
| Patient #28 | THY2 | Ben |
| Patient #29 | THY2 | Ben |
| Patient #30 | THY5 | PTC |
| Patient #31 | THY5 | PTC |
| Patient #32 | THY2 | Ben |
| Patient #33 | THY5 | PTC |
| Patient #34 | THY3 | Ben |
| Patient #35 | THY2 | Ben |
| Patient #36 | THY3 | Ben |
| Patient #37 | THY3 | Ben |
| Patient #38 | THY5 | PTC |
| Patient #39 | THY5 | PTC |
| Patient #40 | THY5 | PTC |
| Patient #41 | THY3 | Ben |
| Patient #42 | THY4 | PTC |
| Patient #43 | THY2 | Ben |
Ben: benign lesions; PTC: papillary thyroid carcinoma.
Figure 1MALDI-MSI data cube. The intensity value of a specific analyte compound is localised as follows: x- and y-axis represent the spatial coordinates of the 2D digitalised tissue image (a mouse brain is shown in this example); the z-axis represents the mass-to-charge (m/z) ratio in the acquired spectra. For each m/z value in the spectrum, a 2D molecular image is computed by colouring the pixels according to the relative abundance (intensity of that m/z value) of the selected compound across the tissue section.
Table displaying the difference in computational time taken by the classification process when employing the feature selection and when not. The tuning parameter grid is the same in both cases.
| Feature selection | No feature selection | |
|---|---|---|
| RFE | 75.656 | // |
| SVM tuning and test | 32.392 | 117.524 |
Times are displayed in seconds and calculated by the R function system.time().
Validation performances of the SVM classifier without performing feature selection.
| Accuracy | Sensitivity | Specificity | PPV | NPV | ROC | |
|---|---|---|---|---|---|---|
| EV | 0.273 | 0.000 | 1.000 | 0.000 | 0.273 | 0.500 |
| 2x 10-fold CV | 0.567 | 0.000 | 1.000 | 0.000 | 0.567 | 0.500 |
In our case, the performances indicate the ability of the algorithm to correctly detect the benignity when the case is filed as THY3.
EV: external validation; CV: cross-validation; PPV: positive predicted value; NPV: negative predictive value.
Validation performances of the SVM classifier after performing the RFE feature selection.
| Accuracy | Sensitivity | Specificity | PPV | NPV | ROC | |
|---|---|---|---|---|---|---|
| EV | 0.818 | 0.750 | 1.000 | 1.000 | 0.600 | 0.875 |
| 2x 10-fold CV | 0.713 | 0.625 | 0.775 | 0.740 | 0.767 | 0.778 |
In our case, the performances indicate the ability of the algorithm to correctly detect the benignity when the case is filed as THY3.
EV: external validation; CV: cross-validation; PPV: positive predicted value; NPV: negative predictive value.
Discrepancy between the predicted class and the actual diagnosis.
| Sample | Predicted class | True class |
|---|---|---|
| Patient #2 | Ben | Ben |
| Patient #9 | PTC | PTC |
| Patient #13 | Ben | Ben |
| Patient #14 | PTC | PTC |
| Patient #18 | PTC | PTC |
| Patient #20 | PTC | Ben |
| Patient #22 | Ben | Ben |
| Patient #34 | Ben | Ben |
| Patient #36 | Ben | Ben |
| Patient #37 | Ben | Ben |
| Patient #41 | PTC | Ben |
Figure 2Graphical evaluation of the patient classification operated by the model. The green area is proportional to the number of correctly classified patients, while the blue area corresponds to the number of misclassifications.
Figure 3Receiver Operating Characteristic (ROC) curve computed by determining the number of true positive (sensitivity) and true negative (specificity) observations when employing the selected features.
Tuning parameters of the Support Vector Machines, with and without performing the feature selection. The best parameters are chosen according to the classification performance of the model.
| Feature selection | Kernel | Cost | Epsilon | Gamma |
|---|---|---|---|---|
| RFE | Radial | 10 | 0.1 | 0.11 |
| No RFE | Radial | 10 | 0.1 | 1.11 |
Figure 4Pixel-by-pixel classification. An entire thyroid cytological smear is displayed. A mass spectrum was acquired for each pixel and the pixel-by-pixel classification has been applied. Green pixels correspond to spectra classified as benign (HP: hyperplastic) while red pixels correspond to malignant (PTC: papillary thyroid carcinoma) spectra.