| Literature DB >> 23717808 |
Hussain Montazery Kordy1, Mohammad Hossein Miran Baygi, Mohammad Hassan Moradi.
Abstract
Pathological changes within an organ can be reflected as proteomic patterns in biological fluids such as plasma, serum, and urine. The surface-enhanced laser desorption and ionization time-of-flight mass spectrometry (SELDI-TOF MS) has been used to generate proteomic profiles from biological fluids. Mass spectrometry yields redundant noisy data that the most data points are irrelevant features for differentiating between cancer and normal cases. In this paper, we have proposed a hybrid feature subset selection algorithm based on maximum-discrimination and minimum-correlation coupled with peak scoring criteria. Our algorithm has been applied to two independent SELDI-TOF MS datasets of ovarian cancer obtained from the NCI-FDA clinical proteomics databank. The proposed algorithm has used to extract a set of proteins as potential biomarkers in each dataset. We applied the linear discriminate analysis to identify the important biomarkers. The selected biomarkers have been able to successfully diagnose the ovarian cancer patients from the noncancer control group with an accuracy of 100%, a sensitivity of 100%, and a specificity of 100% in the two datasets. The hybrid algorithm has the advantage that increases reproducibility of selected biomarkers and able to find a small set of proteins with high discrimination power.Entities:
Keywords: Biomarker; classification; correlation-based weight function; feature subset selection; peak scoring; proteomics
Year: 2012 PMID: 23717808 PMCID: PMC3660712
Source DB: PubMed Journal: J Med Signals Sens ISSN: 2228-7477
Figure 1A typical mass spectrum from normal and cancer groups: (a and b) dataset I and (c and d) dataset II
Distribution of data
Figure 2A processed mass spectra signal: (a) original signal; (b) approximation coefficients; (c) detail coefficients; (d) estimated baseline; (e) estimated noise and (f) preprocessed signal
Figure 3The computed sum of distances function (SDF) for dataset II (top): certain regions of SDF (a-c) are enlarged to show distinguishable differences between intensities of normal cases (solid line) and ovarian cancer patients (dashed line) in the mean spectrum
Figure 4The percentage of recognition rates using 30 high ranked features by the LDA classifier: (a) accuracy in dataset I and (b) accuracy in dataset II
Figure 5A histogram view of selected masses using the MDMC method: (a) histogram of selected features in dataset I and (b) histogram of selected features in dataset II
Performance results
Comparison results
Figure 6A comparison of correlation between selected biomarkers by the MDMC algorithm and results of reported biomarkers by other workers: (a) dataset I and (b) dataset II