| Literature DB >> 22461913 |
Qiang Lin1, Qianqian Peng, Feng Yao, Xu-Feng Pan, Li-Wen Xiong, Yi Wang, Jun-Feng Geng, Jiu-Xian Feng, Bao-Hui Han, Guo-Liang Bao, Yu Yang, Xiaotian Wang, Li Jin, Wensheng Guo, Jiu-Cun Wang.
Abstract
PURPOSE: Lung cancer is the leading cause of cancer death worldwide, but techniques for effective early diagnosis are still lacking. Proteomics technology has been applied extensively to the study of the proteins involved in carcinogenesis. In this paper, a classification method was developed based on principal components of surface-enhanced laser desorption/ionization (SELDI) spectral data. This method was applied to SELDI spectral data from 71 lung adenocarcinoma patients and 24 healthy individuals. Unlike other peak-selection-based methods, this method takes each spectrum as a unity. The aim of this paper was to demonstrate that this unity-based classification method is more robust and powerful as a method of diagnosis than peak-selection-based methods.Entities:
Mesh:
Year: 2012 PMID: 22461913 PMCID: PMC3312904 DOI: 10.1371/journal.pone.0034457
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Features of lung adenocarcinoma patients.
| Pathological parameters | Tumors | Sex | Age(y) | |
| Male | Female | |||
|
| ||||
|
| 5 (7.04%) | 5 | 0 | 38–64 |
|
| 46 (64.79%) | 25 | 21 | 41–72 |
|
| 10 (14.08%) | 4 | 6 | 40–75 |
|
| 10 (14.08%) | 3 | 7 | 37–75 |
|
| ||||
|
| 25 (35.21%) | 12 | 13 | 38–75 |
|
| 25 (35.21%) | 14 | 11 | 51–72 |
|
| 21 (29.58%) | 11 | 10 | 37–69 |
|
| ||||
|
| 70 (98.59%) | 36 | 34 | 37–75 |
|
| 1 (1.41%) | 1 | 0 | 42 |
|
| ||||
| I | 17 (23.94%) | 11 | 6 | 38–68 |
| II | 26 (36.62%) | 13 | 13 | 49–75 |
| III | 18 (25.35%) | 10 | 8 | 40–72 |
| III | 9 (12.68%) | 2 | 7 | 37–75 |
| IV | 1 (1.41%) | 1 | 0 | 42 |
|
| ||||
|
| 28 (39.44%) | 20 | 8 | 37–75 |
|
| 32 (45.07%) | 13 | 19 | 38–75 |
|
| 11 (15.49%) | 4 | 7 | 4172 |
Number of cases. The numbers in the parenthesis stand for the percentage.
Candidate principal components.
| PC | PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 |
|
| 18.1 | 10.5 | 7.69 | 4.90 | 4.41 | 3.43 | 3.16 |
|
| <0.01* | 0.39 | 0.12 | 0.48 | 0.48 | 0.03* | 0.03* |
|
| 0.54 | 0.01 | 0.03 | 0.01 | 0.01 | 0.05 | 0.06 |
Contribution of each PC to the whole variation.
P value of the coefficient testing of logistic regression analysis on each PC.
Fitness index of each logistic regression model on single PC.
Summary of classification models based on principal components of SELDI spectral data.
| Model | R square | Hosmer-Lemeshow statistic | Cross validation accuracy |
| pc1 | 0.5338 | 3.01 (0.93) | 92.63% |
| pc1 pc6 | 0.5591 | 1.32 (1.00) | 92.63% |
|
|
|
|
|
| pc1 pc6 pc7 | 0.6330 | 0.33 (1.00) | 95.79% |
Optimal classification model based on principal components of SELDI spectral data.
| Parameter | Coef (STDErr) | OR (95%CL) |
|
|
| −5.49(1.92) | <0.01 | |
|
| 4.05(1.40) | 57.22(3.72, 881.07) | <0.01 |
|
| −5.30(2.27) | 0.005(<0.001, 0.43) | 0.02 |
Coef, coefficient.
STDErr, standard error.
OR, odds ratio.
CL, confidence level.
Figure 1M/Z means of cases and controls and the weights of PC1 and PC7 on the spectrum.
A) The M/Z means of cases (red) and normal controls (green) at each M/Z point. B) The weights of PC1 at each M/Z point. C) Weights of PC7 at each M/Z point. Horizontal lines in Figure 1B and 1C represent 3*SD of corresponding PC on the spectrum. The data used here are the normalized SELDI data obtained from 71 lung adenocarcinoma patients and 24 normal individuals.
Figure 2Classification method based on principal components of SELDI spectral data and experimental data.
Two cases and two normal individuals had been misclassified into opposite groups. The black squares indicate case individuals, and white squares with “V” shapes in the middle represent normal individuals. The data used here are the normalized SELDI data obtained from 71 lung adenocarcinoma patients and 24 normal individuals.
Figure 3Decision-tree-based classification model and experimental data.
Two peaks that identified using a decision-tree-based classification model are shown, with 2 cases misclassified into control groups. The data used here are the peaks selected through baseline subtraction, normalization, peak detection, and peak alignment of SELDI data obtained from 71 lung adenocarcinoma patients and 24 normal individuals.
Cross-validation results of DT, SVM, LDA, CART, and our method.
| Cross-validation | DT | SVM | LDA | CART | Our method |
|
| |||||
| 91.55% | 95.77% | 88.73% | 90.14% | 97.18% | |
| 87.50% | 83.33% | 91.67% | 70.83% | 91.67% | |
| 90.53% | 92.63% | 89.47% | 85.26% | 95.79% | |
|
| |||||
| 91.55% | 94.37% | 90.14% | 92.96% | 97.18% | |
| 87.50% | 83.33% | 87.50% | 79.17% | 91.67% | |
| 90.53% | 91.58% | 89.47% | 89.47% | 95.79% | |
|
| |||||
| 94.37% | 94.37% | 85.92% | 90.14% | 97.18% | |
| 79.17% | 79.17% | 87.50% | 58.33% | 91.67% | |
| 90.53% | 90.53% | 86.32% | 82.11% | 95.79% | |
|
| |||||
| 94.37% | 94.37% | 77.46% | 94.37% | 95.77% | |
| 62.50% | 58.33% | 75.00% | 54.17% | 91.67% | |
| 86.32% | 85.26% | 76.84% | 84.21% | 94.74% |
DT, decision-tree-based classification model; SVM, support vector machine; LDA, linear discriminant approach; CART, classification and regression tree.
The first line is the true positive rate (sensitivity); the second line is the true negative rate (specificity); and the third line is accuracy.