| Literature DB >> 21886899 |
Ankita Thakur1, Vijay Mishra, Sunil K Jain.
Abstract
Pathological changes in an organ or tissue may be reflected in proteomic patterns in serum. The early detection of cancer is crucial for successful treatment. Some cancers affect the concentration of certain molecules in the blood, which allows early diagnosis by analyzing the blood mass spectrum. It is possible that exclusive serum proteomic patterns could be used to differentiate cancer samples from non-cancer ones. Several techniques have been developed for the analysis of mass-spectrum curve, and use them for the detection of prostate, ovarian, breast, bladder, pancreatic, kidney, liver, and colon cancers. In present study, we applied data mining to the diagnosis of ovarian cancer and identified the most informative points of the mass-spectrum curve, then used student t-test and neural networks to determine the differences between the curves of cancer patients and healthy people. Two serum SELDI MS data sets were used in this research to identify serum proteomic patterns that distinguish the serum of ovarian cancer cases from non-cancer controls. Statistical testing and genetic algorithm-based methods are used for feature selection respectively. The results showed that (1) data mining techniques can be successfully applied to ovarian cancer detection with a reasonably high performance; (2) the discriminatory features (proteomic patterns) can be very different from one selection method to another.Entities:
Keywords: Neural networks; Ovarian cancer; SELDI; Serum proteomics
Year: 2011 PMID: 21886899 PMCID: PMC3163368 DOI: 10.3797/scipharm.1105-11
Source DB: PubMed Journal: Sci Pharm ISSN: 0036-8709
Sch. 1.Algorithmic approach used in present study
Fig. 1.a: This study uses the high-resolution Ovarian Cancer data set that was generated using the WCX2 protein array. The sample set includes 121 cancer, 95 normal state. Plot of some data sets into a Figure window to visually compare profiles from the groups; in these graph 5 spectrograms from Ovarian Cancer patients (blue) and 5 from control patients (green).
b: Zooming in on the region from 8450 to 8700 M/Z shows some peaks that might be useful for classifying the data.
Fig. 2.Plot of the group average and the envelopes of each group. Observe that apparently there is no single feature that can discriminate both groups perfectly.
Fig. 3.Plot of ranked features. Note that there are significant regions at high M/Z values but low intensity (∼8150 Da). The approaches to measure class separability are performed using in ranking features, such as student t-test.
The Overall performance of the proposed method
| Active Ovarian Cancer | 100 | 11 | 100 | 99.16 |
| Controlled Ovarian Cancer | 80 | 95 | 100 | 98.50 |
The comparison with the LDA classifier method
| LDA | 85% | 71% |
| Feed Forward ANN | 98% | 96% |