| Literature DB >> 21929822 |
Dong L Tong1, David J Boocock, Clare Coveney, Jaimy Saif, Susana G Gomez, Sergio Querol, Robert Rees, Graham R Ball.
Abstract
INTRODUCTION: Raw spectral data from matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) with MS profiling techniques usually contains complex information not readily providing biological insight into disease. The association of identified features within raw data to a known peptide is extremely difficult. Data preprocessing to remove uncertainty characteristics in the data is normally required before performing any further analysis. This study proposes an alternative yet simple solution to preprocess raw MALDI-TOF-MS data for identification of candidate marker ions. Two in-house MALDI-TOF-MS data sets from two different sample sources (melanoma serum and cord blood plasma) are used in our study.Entities:
Year: 2011 PMID: 21929822 PMCID: PMC3224566 DOI: 10.1186/1559-0275-8-14
Source DB: PubMed Journal: Clin Proteomics ISSN: 1542-6416 Impact factor: 3.988
Examples of the experiments conducted using control samples with different settings applied in the MS instrument
| Sample group | Total samples | Deflection mass | Delay time | Calibration standard | Total | Intra-sample variation (in-between |
|---|---|---|---|---|---|---|
| Control | 15 | 650 da | 9993 ns | Internal | 198592 | 95223 points ± 824 |
| Control | 21 | 650 da | 9993 ns | Internal | 198592 | 95213 points ± 3 |
| Control | 10 | 450 da | 9999 ns | Internal | 198584 | 95200 points ± 825 |
| Control | 16 | 450 da | 9999 ns | External | 198584 | 95199 points ± 3 |
| Control | 10 | 450 da | 10003 ns | External | 198602 | 95211 points ± 3 |
Figure 1Schematic illustration of data preprocessing step.
Figure 2Schematic illustration of ion identification analysis for MALDI-TOF MS protein profiling.
Summary of the GANN parameters
| Parameter | Setting |
|---|---|
| Population size | 300 |
| Chromosome size | 20 features |
| Chromosome Encoding | Real-number representation |
| Fitness Function | The total number of correctly labelled samples |
| Selection | Tournament, tournament size = 2 |
| ANN architecture | 20-2-2 |
| ANN size | 48 nodes including 4 bias nodes |
| ANN learning algorithm | Feedforward |
| ANN activation function | Tanh |
| Crossover operator | Single-point, Pc = 0:5 |
| Mutation operator | Pm = 0:1 |
| Elitism strategy | Retain N-1 chromosomes in the population, where N is the total number of chromosomes in the population |
| Evaluation size | 80000 |
| Whole cycle repeat | 5000 |
Summary of the stepwise ANN parameters
| Parameter | Setting |
|---|---|
| ANN architecture | I-2-1. For each run, the increment of 1 node in the input layer, I |
| Search method | Stepwise |
| ANN learning algorithm | Backpropagation |
| ANN activation function | Tanh |
| Learning rate | 0.1 |
| Momentum rate | 0.5 |
| Maximum epochs | 3000 |
| Window epoch | 1000 |
| Threshold for error | 0.01 |
| Random sampling | 50 |
| Maximum repeats on stepwise | 10 |
| Maximum loops on the whole modelling process | 10 |
| Cross-validation | Monte-Carlo with the ratio of 60:20:20 |
Summary of the data sets and the classification results based on 50 random sampling
| Data set | Class | Sample type | Sample size | Total peaks | Training set (MCCV) | Blind data set | ||
|---|---|---|---|---|---|---|---|---|
| Train | Test | Validation | ||||||
| Melanoma | S2 v. S3 | Serum | 99 | 2560 | 41 | 14 | 14 | 30 |
| Classification (%) | 93.21 | 97.38 | 90.62 | 90.93 | ||||
| Cord blood | High v. Low | Plasma | 158 | 2647 | 67 | 22 | 22 | 47 |
| Classification (%) | 96.45 | 96.73 | 91.18 | 92.34 | ||||
List of the top-10 ranked ions for melanoma data set
| Rank | Ion ( | Ave. Train Error | Ave. Test Error | Ave. Valid. Error | ||
|---|---|---|---|---|---|---|
| 1 | 1531.6 | 1531.12 | 1532.08 | 0.099 | 0.097 | 0.110 |
| 2 | 2916.61 | 2916.12 | 2917.1 | 0.0656 | 0.054 | 0.074 |
| 3 | 2425.27 | 2424.79 | 2425.75 | 0.0605 | 0.049 | 0.070 |
| 4 | 1196.57 | 1196.1 | 1197.04 | 0.0485 | 0.041 | 0.065 |
| 5 | 2917.59 | 2917.14 | 2918.05 | 0.050 | 0.045 | 0.054 |
| 6 | 1940.05 | 1939.57 | 1940.51 | 0.047 | 0.048 | 0.060 |
| 7 | 2426.25 | 2425.78 | 2426.71 | 0.036 | 0.037 | 0.063 |
| 8 | 1995.99 | 1995.51 | 1996.47 | 0.047 | 0.044 | 0.065 |
| 9 | 2543.07 | 2542.58 | 2543.57 | 0.038 | 0.043 | 0.072 |
| 10 | 1197.56 | 1197.06 | 1198.05 | 0.050 | 0.049 | 0.076 |
ANN prediction based on 30 blinded samples in the melanoma data set
| 50 ANN sub-models | ||||||
|---|---|---|---|---|---|---|
| Sample label | S2 | S3 | ANN output | Std err in 95% CI | ANN classification | Target output |
| Blind 1 | 50 | 0 | 0 | 0 | S2 | S2 |
| Blind 2 | 50 | 0 | 0 | 0 | S2 | S2 |
| Blind 3 | 50 | 0 | 0 | 0 | S2 | S2 |
| Blind 4 | 50 | 0 | 0 | 0 | S2 | S2 |
| Blind 5 | 0 | 50 | 1 | 0 | S2 | |
| Blind 6 | 40 | 10 | 0.200 | 0.115 | S2 | S2 |
| Blind 7 | 50 | 0 | 0 | 0 | S2 | S2 |
| Blind 8 | 50 | 0 | 0 | 0 | S2 | S2 |
| Blind 9 | 46 | 4 | 0.080 | 0.078 | S2 | S2 |
| Blind 10 | 50 | 0 | 0 | 0 | S2 | S2 |
| Blind 11 | 42 | 8 | 0.160 | 0.106 | S2 | S2 |
| Blind 12 | 47 | 3 | 0.060 | 0.069 | S2 | S2 |
| Blind 13 | 50 | 0 | 0 | 0 | S2 | S2 |
| Blind 14 | 49 | 1 | 0.020 | 0.040 | S2 | S2 |
| Blind 15 | 0 | 50 | 1 | 0 | S2 | |
| Blind 16 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 17 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 18 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 19 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 20 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 21 | 5 | 45 | 0.900 | 0.087 | S3 | S3 |
| Blind 22 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 23 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 24 | 5 | 45 | 0.900 | 0.087 | S3 | S3 |
| Blind 25 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 26 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 27 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 28 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 29 | 0 | 50 | 1 | 0 | S3 | S3 |
| Blind 30 | 0 | 50 | 1 | 0 | S3 | S3 |
S2 refers to the stage 2 of melanoma. S3 is the stage 3 of melanoma. ANN output is computed based on the average performance from 50 random samplings. Std err with 95% CI refers to the standard error for ANN output in 95% confident interval range. ANN classification indicates the final outcome of the model. Target output refers to the original group to which the sample belongs to.
Figure 3ROC for model performance in the melanoma data set.
List of the top-10 ranked ions for the cord blood data set
| Rank | Ion ( | Ave. Train Error | Ave. Test Error | Ave. Valid. Error | ||
|---|---|---|---|---|---|---|
| 1 | 2914.5 | 2913.96 | 2915.01 | 0.090 | 0.093 | 0.095 |
| 2 | 1062.6 | 1062.01 | 1063.28 | 0.033 | 0.030 | 0.042 |
| 3 | 3058.5 | 3058.04 | 3058.94 | 0.042 | 0.032 | 0.038 |
| 4 | 1424.9 | 1424.42 | 1425.42 | 0.030 | 0.029 | 0.047 |
| 5 | 3460.8 | 3460.32 | 3461.31 | 0.026 | 0.027 | 0.035 |
| 6 | 3061.5 | 3060.99 | 3061.92 | 0.027 | 0.025 | 0.045 |
| 7 | 2081.1 | 2080.59 | 2081.63 | 0.027 | 0.031 | 0.047 |
| 8 | 2369.3 | 2368.86 | 2369.75 | 0.023 | 0.025 | 0.050 |
| 9 | 1073.6 | 1073.19 | 1074 | 0.023 | 0.025 | 0.050 |
| 10 | 3062.4 | 3061.96 | 3062.93 | 0.019 | 0.022 | 0.032 |
ANN prediction based on 47 blinded samples in the cord blood data set
| 50 ANN sub-models | ||||||
|---|---|---|---|---|---|---|
| Sample label | High | Low | ANN output | Std err in 95% CI | ANN classification | Target output |
| Blind 1 | 48 | 2 | 0.040 | 0.057 | High | High |
| Blind 2 | 50 | 0 | 0 | 0 | High | High |
| Blind 3 | 50 | 0 | 0 | 0 | High | High |
| Blind 4 | 50 | 0 | 0 | 0 | High | High |
| Blind 5 | 50 | 0 | 0 | 0 | High | High |
| Blind 6 | 50 | 0 | 0 | 0 | High | High |
| Blind 7 | 35 | 15 | 0.300 | 0.132 | High | High |
| Blind 8 | 50 | 0 | 0 | 0 | High | High |
| Blind 9 | 48 | 2 | 0.040 | 0.057 | High | High |
| Blind 10 | 41 | 9 | 0.180 | 0.111 | High | High |
| Blind 11 | 48 | 2 | 0.040 | 0.057 | High | High |
| Blind 12 | 50 | 0 | 0 | 0 | High | High |
| Blind 13 | 40 | 10 | 0.200 | 0.115 | High | High |
| Blind 14 | 50 | 0 | 0 | 0 | High | High |
| Blind 15 | 46 | 4 | 0.080 | 0.078 | High | High |
| Blind 16 | 49 | 1 | 0.020 | 0.040 | High | High |
| Blind 17 | 50 | 0 | 0 | 0 | High | High |
| Blind 18 | 50 | 0 | 0 | 0 | High | High |
| Blind 19 | 41 | 9 | 0.180 | 0.111 | High | High |
| Blind 20 | 28 | 22 | 0.440 | 0.143 | High | High |
| Blind 21 | 50 | 0 | 0 | 0 | High | High |
| Blind 22 | 11 | 39 | 0.780 | 0.120 | Low | Low |
| Blind 23 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 24 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 25 | 2 | 48 | 0.960 | 0.057 | Low | Low |
| Blind 26 | 2 | 48 | 0.960 | 0.057 | Low | Low |
| Blind 27 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 28 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 29 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 30 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 31 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 32 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 33 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 34 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 35 | 2 | 48 | 0.960 | 0.057 | Low | Low |
| Blind 36 | 34 | 16 | 0.320 | 0.135 | Low | |
| Blind 37 | 19 | 31 | 0.620 | 0.140 | Low | Low |
| Blind 38 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 39 | 5 | 45 | 0.900 | 0.087 | Low | Low |
| Blind 40 | 16 | 34 | 0.680 | 0.135 | Low | Low |
| Blind 41 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 42 | 8 | 42 | 0.840 | 0.106 | Low | Low |
| Blind 43 | 5 | 45 | 0.900 | 0.087 | Low | Low |
| Blind 44 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 45 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 46 | 0 | 50 | 1 | 0 | Low | Low |
| Blind 47 | 0 | 50 | 1 | 0 | Low | Low |
High and low refer to the quantity of stem cells in cord blood. ANN output is computed based on the average performance from 50 random samplings. Std err with 95% CI refers to the standard error for ANN output in 95% confident interval range. ANN classification indicates the final outcome of the model. Target output refers to the original group to which the sample belongs to.
Figure 4ROC for model performance in the cord blood data set.