| Literature DB >> 19455243 |
Xutao Deng1, Huimin Geng, Hesham H Ali.
Abstract
Many studies showed inconsistent cancer biomarkers due to bioinformatics artifacts. In this paper we use multiple data sets from microarrays, mass spectrometry, protein sequences, and other biological knowledge in order to improve the reliability of cancer biomarkers. We present a novel Bayesian network (BN) model which integrates and cross-annotates multiple data sets related to prostate cancer. The main contribution of this study is that we provide a method that is designed to find cancer biomarkers whose presence is supported by multiple data sources and biological knowledge. Relevant biological knowledge is explicitly encoded into the model parameters, and the biomarker finding problem is formulated as a Bayesian inference problem. Besides diagnostic accuracy, we introduce reliability as another quality measurement of the biological relevance of biomarkers. Based on the proposed BN model, we develop an empirical scoring scheme and a simulation algorithm for inferring biomarkers. Fourteen genes/proteins including prostate specific antigen (PSA) are identified as reliable serum biomarkers which are insensitive to the model assumptions. The computational results show that our method is able to find biologically relevant biomarkers with highest reliability while maintaining competitive predictive power. In addition, by combining biological knowledge and data from multiple platforms, the number of putative biomarkers is greatly reduced to allow more-focused clinical studies.Entities:
Year: 2007 PMID: 19455243 PMCID: PMC2675834
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.Study design of biomarkers extraction and their application in disease prognosis. Peak detection, alignment, and peak selection are performed on the mass spectrometry data. Gene selection is performed on microarray data. Then the pre-biomarkers are filtered using a Bayesian network model to obtain final biomarkers
Figure 2.Proposed BN model for biomarker finding.
Founder nodes (Protein and PTMs), their effects on the mass spectra pattern and their prior distributions.
| Carboxylation | C | +58.005479 | Beta (2, 5) |
| Phosphory | T,S,Y | +79.966330 | Beta (2, 2) |
| Water loss | S,T | –18.010565 | Beta (2, 5) |
| Oxidation | M | +15.994915 | Beta (2, 5) |
| Doubly charged | parent peptide | Beta (2, 10) | |
| Trypsin | after K/R except before P | – | Beta (8, 2) |
| Chymotrypsin | after FLWY, except before P | – | Beta (8, 2) |
| Loss AA after cleavage | N-terminal, 1–4 AA lost | – | Beta (2, 10) |
| Signal Position | Determined by SignalP 3.0 | – | Determined by SignalP 3.0 |
| Protein | – | – | Uniform |
Figure 3.Example of MS simulation results. The range of m/z values between 1,400 and 3,000 is magnified to show the detail of MS pattern.
240 pre-biomarkers obtained from MA, their t-score, and their HMM output.
| 13904866 | 4.49 | 0.016 |
| 113950 | −6.97 | 0.001 |
| 15431295 | 5.63 | 0 |
| 15055539 | 7.32 | 0 |
| 133041 | 6.52 | 0 |
| 4506607 | 4.75 | 0 |
| 1096944 | 4.75 | 0.001 |
| 136479 | 4.35 | 0 |
| 1082889 | 4.35 | 1 |
| 4557647 | 4.33 | 0 |
| 16753225 | 5.45 | 0.006 |
| 4506699 | 4.46 | 0 |
| 1096940 | 4.46 | 0 |
| 13904870 | 4.87 | 0 |
| 15220431 | 4.75 | 0 |
| 11415026 | 6.34 | 0 |
| 14591909 | 4.55 | 0 |
| 266921 | 5.69 | 0 |
| 1350706 | 5.77 | 0 |
| 4506713 | 4.20 | 0 |
| 133825 | 4.92 | 0 |
| 15718687 | 4.67 | 0 |
| 16579885 | 4.51 | 0.001 |
| 2493600 | −4.10 | 0.002 |
| 116533 | −5.06 | 1 |
| 1095781 | −5.06 | 0.986 |
| 1708274 | 5.47 | 0 |
| 7512733 | 5.47 | 0.01 |
| 9910418 | −4.45 | 0 |
| 128117 | 4.71 | 0.998 |
| 4557575 | 5.18 | 0.97 |
| 12229574 | −6.61 | 0.998 |
| 25402878 | −6.61 | 0 |
| 7108362 | 5.72 | 0 |
| 4759146 | −4.65 | 0.996 |
| 4557543 | −4.05 | 0.998 |
| 5803003 | −4.06 | 0.002 |
| 17366160 | 4.73 | 0.064 |
| 17433099 | −4.21 | 0 |
| 7657603 | 5.61 | 0 |
| 9955963 | 9.00 | 0.024 |
| 7656967 | 4.10 | 1 |
| 17374817 | −4.23 | 0.045 |
| 121148 | 4.72 | 0.726 |
| 4885645 | −5.26 | 0 |
| 113954 | 4.11 | 0 |
| 2497437 | 4.15 | 0.351 |
| 12585545 | 4.47 | 0.001 |
| 11352059 | 4.47 | 0 |
| 12643308 | 5.28 | 0 |
| 1345650 | −4.99 | 1 |
| 4826768 | 4.69 | 0 |
| 4506701 | −4.13 | 0 |
| 17978471 | 4.23 | 0 |
| 10190714 | −4.68 | 0.003 |
| 14602449 | −4.45 | 1 |
| 7512879 | −4.45 | 0.045 |
| 11321634 | 4.48 | 0 |
| 5921743 | −4.19 | 0.005 |
| 6005824 | −4.58 | 0.024 |
| 11321603 | −5.52 | 0 |
| 4507547 | 4.79 | 0.001 |
| 7662254 | −4.58 | 0.001 |
| 7512876 | −4.58 | 0.986 |
| 18490978 | −5.60 | 0.367 |
| 11356305 | −5.60 | 0.115 |
| 11360104 | −5.60 | 0.001 |
| 9297107 | 4.06 | 0.999 |
| 50403775 | −4.37 | 0.129 |
| 3914303 | −6.61 | 0.043 |
| 2495731 | −4.83 | 0 |
| 18699734 | 4.19 | 0 |
| 115945 | −4.36 | 1 |
| 13626119 | −4.82 | 1 |
| 123057 | 10.40 | 0.278 |
| 399193 | −4.01 | 0 |
| 7657176 | 4.97 | 1 |
| 4758626 | −4.03 | 0.004 |
| 12230067 | −4.96 | 0.005 |
| 133486 | −5.75 | 0.001 |
| 132387 | −6.07 | 0 |
| 13129026 | 4.04 | 0 |
| 4503537 | 4.64 | 0 |
| 68068024 | 5.93 | 0 |
| 13129064 | −5.13 | 0.006 |
| 1585496 | −5.13 | 1 |
| 8134636 | −5.25 | 0 |
| 121735 | −4.96 | 0.015 |
| 4502109 | −4.72 | 0 |
| 4885559 | −4.43 | 0.029 |
| 11181775 | −4.10 | 0 |
| 5453722 | 4.20 | 0.373 |
| 14165437 | 4.12 | 0 |
| 11993943 | −4.19 | 0 |
| 13878821 | 4.29 | 0.75 |
| 2135080 | 4.29 | 0 |
| 1082633 | 4.29 | 0 |
| 728834 | −4.88 | 0.962 |
| 10720334 | 5.12 | 0 |
| 127442 | 5.75 | 0.235 |
| 1708887 | 4.97 | 0.999 |
| 10716563 | 5.79 | 0.999 |
| 1351211 | 4.32 | 0 |
| 113463 | 6.45 | 0.049 |
| 14249524 | 4.79 | 0.999 |
| 16418409 | 4.52 | 0.998 |
| 10445223 | 4.63 | 0 |
| 6679189 | 5.17 | 1 |
| 133948 | 5.27 | 0.014 |
| 117098 | 5.16 | 0.449 |
| 125174 | 4.53 | 0.996 |
| 5902014 | 4.97 | 0.005 |
| 4504631 | −4.18 | 0.002 |
| 1173039 | 4.32 | 0 |
| 1172922 | 4.27 | 0 |
| 4505581 | −4.33 | 0 |
| 728833 | −4.67 | 0.962 |
| 4506699 | 4.46 | 0 |
| 118504 | −4.56 | 0.845 |
| 56405387 | −5.21 | 0 |
| 135304 | 4.70 | 0 |
| 133701 | −4.72 | 0.004 |
| 4505623 | −4.87 | 0.008 |
| 11291391 | −4.87 | 0 |
| 11360002 | −4.87 | 0.003 |
| 127983 | 5.35 | 0 |
| 7661670 | 5.50 | 0.001 |
| 4502875 | 6.55 | 0.805 |
| 13432136 | 4.97 | 0.353 |
| 114322 | 6.07 | 0.001 |
| 13878805 | 4.04 | 0.997 |
| 10863909 | 4.03 | 0.999 |
| 25398579 | 4.03 | 0 |
| 5902020 | −5.38 | 0 |
| 4506427 | −5.37 | 0.999 |
| 12643622 | 7.07 | 0.627 |
| 7512940 | 7.07 | 0.004 |
| 728831 | 7.81 | 0.902 |
| 4758792 | 4.18 | 0.215 |
| 10835025 | 5.25 | 0.072 |
| 4502877 | 4.21 | 0.978 |
| 6912682 | −4.88 | 0.999 |
| 71834857 | 4.98 | 0.998 |
| 13638228 | 4.93 | 0.004 |
| 9994169 | 4.66 | 0.001 |
| 18104976 | 4.07 | 0.703 |
| 8393299 | −4.23 | 1 |
| 5453603 | 4.91 | 0.092 |
| 14916573 | 4.81 | 0 |
| 5031597 | 4.39 | 0.001 |
| 4758950 | 4.60 | 0.863 |
| 11056046 | 4.97 | 1 |
| 13637934 | −4.94 | 0 |
| 25387602 | −4.94 | 0 |
| 1703205 | 4.04 | 0 |
| 417246 | 5.19 | 0.002 |
| 119172 | 6.46 | 0 |
| 4507877 | −5.34 | 0 |
| 14916999 | 4.16 | 1 |
| 1706396 | 4.46 | 0.884 |
| 5803145 | 4.35 | 0 |
| 1345695 | −4.11 | 0 |
| 231741 | −5.46 | 0 |
| 4758032 | 4.52 | 0 |
| 4507357 | −4.04 | 0 |
| 6166568 | 4.21 | 0 |
| 13994151 | −4.42 | 0 |
| 14548187 | −5.17 | 0 |
| 4885509 | −4.34 | 0.997 |
| 129483 | 4.32 | 1 |
| 7657552 | 5.37 | 0.007 |
| 1583602 | 5.37 | 1 |
| 1589585 | 5.37 | 0.157 |
| 61252057 | −5.02 | 0.99 |
| 16306550 | −5.58 | 0 |
| 5174485 | −4.47 | 0.999 |
| 4557617 | −4.62 | 1 |
| 114392 | −4.35 | 0 |
| 16445393 | 10.05 | 0.999 |
| 133116 | 4.19 | 0.581 |
| 231475 | −4.62 | 0 |
| 6005924 | −7.54 | 0 |
| 115601 | −5.92 | 0 |
| 2495724 | −4.68 | 0 |
| 4505835 | −5.17 | 0.541 |
| 17380550 | −4.16 | 0.006 |
| 4757902 | −5.67 | 0 |
| 14149680 | −4.04 | 0 |
| 3024727 | 5.20 | 0.999 |
| 50403771 | 4.54 | 0.051 |
| 399866 | 4.88 | 0 |
| 13124879 | −4.47 | 0 |
| 6226951 | 4.14 | 0 |
| 5453541 | 5.68 | 0.999 |
| 4826878 | −4.21 | 0 |
| 5729836 | 4.63 | 0 |
| 1705731 | −5.07 | 0.031 |
| 7513030 | −5.07 | 0.001 |
| 8923881 | 4.34 | 0 |
| 3915626 | −7.40 | 1 |
| 9257222 | −4.77 | 0 |
| 1705650 | 4.92 | 0 |
| 12643880 | 4.09 | 0.958 |
| 10047134 | −5.06 | 0 |
| 68846235 | 5.49 | 0 |
| 121110 | −4.54 | 0.995 |
| 8923444 | 4.08 | 0 |
| 126047 | 5.81 | 0 |
| 5453736 | −6.37 | 0 |
| 62512184 | 4.95 | 1 |
| 2135919 | 4.95 | 0 |
| 117501 | 4.60 | 1 |
| 5032159 | −4.59 | 0.042 |
| 55977848 | 4.51 | 0.557 |
| 11348280 | 4.51 | 0.895 |
| 13878450 | −5.14 | 0 |
| 4505037 | −6.97 | 0.868 |
| 6912268 | −6.26 | 0.02 |
| 131762 | 4.74 | 0 |
| 127983 | 5.35 | 0 |
| 226527 | 5.35 | 0 |
| 4557355 | −4.41 | 0 |
| 4758936 | 4.70 | 0.996 |
| 17402909 | −4.46 | 0 |
| 12751475 | 4.82 | 0.992 |
| 1730015 | 5.55 | 0.427 |
| 11352548 | 5.55 | 0 |
| 4557617 | −4.62 | 1 |
| 226527 | −4.62 | 0 |
| 223828 | −4.62 | 0.722 |
| 17380263 | 4.12 | 0.014 |
| 2842764 | 4.02 | 0.989 |
| 113950 | −6.97 | 0.001 |
| 10835023 | −4.30 | 0.007 |
| 1352464 | −4.16 | 0.001 |
| 4758594 | −4.31 | 0.992 |
| 4557413 | −4.46 | 0 |
| 125969 | 5.58 | 0 |
| 118295 | 4.50 | 0 |
| 2829468 | −4.69 | 0 |
Figure 4.Pre-biomarkers distributed in a space defined by the scores on each data set (MA, SP, MS). Every point on the mesh has a score S(protein) = 0.1; above the mesh, S(protein) > 0.1; under the mesh, S(protein) < 0.1.
Figure 5.Distributions of candidate biomarkers and all human proteins. a. Distribution of S(MS|protein); b. Distribution of S(protein).
Six sets of parameter settings where S4 is the default setting.
| (2,2) | (2,2) | |
| (2,2) | (2,2) | (1,1) |
| (2,2) | ||
| (3,7) | (3,7) | |
| (8,2) | (8,2) | (8,2) |
| (8,2) | ||
| 20 | 50 | |
| 20 | 50 | 20 |
| 20 | ||
| 1000 | 1000 | |
| 1000 | 1000 | 1000 |
| 500 |
Figure 6.Sensitivity analysis results. Each column represents a biomarker and each row represents a parameter set. A black or gray square corresponds to presence or absence of a biomarker under a certain set of parameter.
Biomarkers and their scores under default parameter setting.
| 16418409 | PORIMIN | cell death | 1.00 | 0.92 | 0.99 | 0.93 |
| 12643880 | STK39 | S/T kinase | 0.40 | 0.88 | 0.95 | 0.34 |
| 71834853 | PSA | Androgens regulated | 0.34 | 0.95 | 0.99 | 0.32 |
| 16445393 | CDH12 | cell-cell adhesion | 0.31 | 0.99 | 0.99 | 0.31 |
| 7656967 | CELSR1 | protein-protein interactions | 0.33 | 0.89 | 1.00 | 0.30 |
| 11056046 | IGSF4B(TSLC1) | Immuno-globulin | 0.30 | 0.95 | 1.00 | 0.29 |
| 1708887 | LU | Immuno-globulin | 0.30 | 0.95 | 0.99 | 0.28 |
| 9297107 | NRP1 | cell growth factor | 0.28 | 0.88 | 0.99 | 0.24 |
| 62512184 | SEL1L | gene regulation | 0.25 | 0.95 | 1.00 | 0.24 |
| 4557575 | FAAH | fatty acid amides | 0.21 | 0.96 | 0.97 | 0.20 |
| 12751475 | SLC39A6 | zinc transporter | 0.21 | 0.94 | 0.99 | 0.19 |
| 10716563 | CANX | calcium ion binding | 0.18 | 0.97 | 0.99 | 0.18 |
| 14916999 | GRP78 | Protein folding | 0.20 | 0.89 | 1.00 | 0.18 |
| 117501 | CRTC | calcium ion binding | 0.17 | 0.93 | 1.00 | 0.16 |
Specification of 14 biomarkers identified by our BN mothods.
| 16418409 | pro-oncosis receptor inducing membrane injury gene ( | TTSVSQNTSQJSTSTM(4)TVTHNSSVTTAASSVTI
| 4347 |
| 12643880 | STK39_HUMAN
| VKEENPEIAVSASTIPEQIQSLSVHDSQGPPNANE
| 4148 |
| APAPAAPAAPAPAPAPAPAAQAVGWPIC(1)RDAYE | 4802 | ||
| LQEVIGSGATAVVQAAL | |||
| 71834853 | prostate specific antigen isoform 3 preproprotein ( | QCVDLHVISNDVC(1)AQVHPQK | 2291 |
| 16445393 | cadherin 12, type 2 preproprotein ( | – | |
| 7656967 | cadherin EGF LAG seven-pass G-type receptor 1 ( | PVVHIQAVDADSGENARL | 1891 |
| SFAGPIGAVIIINTVTSVLSAKVSCQRK | 2859 | ||
| YVV(5)GWGIPAIVTGLAVGLDPQGYGNPDF | 2897 | ||
| ADIGGMLPGLTVRSVVVGGASEDKVSVRRGF | 3129 | ||
| DLAATQDADFHEDVIHSGSALLAPATRAAW | 3149 | ||
| 11056046 | immunoglobulin superfamily, member 4B ( | GTYLTHEAKGSDDAPDADTAIINAEGGQSGGDD
| 4056 |
| 1708887 | LU_HUMAN Lutheran blood group glycoprotein precursor ( | - | |
| 9297107 | NRP1_HUMAN Neuropilin-1 precursor (Vascular endothelial cell growth factor 165 receptor) ( | GGIAVDDISINNHISQEDC(1)AKPADLDK | 2896 |
| EGEIGKGNLGGIAVDDISINNHISQEDCAK | 3096 | ||
| GM(4)ESGEIHSDQITASSQYSTNWSAERSRL | 3243 | ||
| 62512184 | SEL1L_HUMAN Sel-1 homolog ( | KPALTAIEGTAHGEPC(1)HFPF | 2181 |
| 4557575 | fatty acid amide hydrolase ( | - | |
| 12751475 | solute carrier family 39 (zinc transporter), member 6 ( | DSQQPAVLEEEEVMIAHAHPQEVYNEY | 3155 |
| GQSDDLIHHHHDYHHILHHHHHQNHHPHSHSQR
| 4882 | ||
| 10716563 | calnexin precursor ( | GTAIVEBHDGHDDDVIDIEDDLDDVIEEVEDSKPD
| 4630 |
| 14916999 | GRP78_HUMAN 78 kDa glucose-regulated protein precursor ( | DVSLLTIDNGVFEVVATNGDTHLGGEDF | 2934 |
| NTVVPTKKSQIFSTASDNQPTVTIKVYEGERPLT
| 4356 | ||
| 117501 | CRTC_HUMAN Calreticulin precursor (CRP55) (Calregulin) (HACBP) (ERp60) ( | YTLIVRPDNTYEVKIDNSQVESGSL | 2840 |
SVM prediction performance using different biomarkers.
| 14.71 | 25.49 | 16.67 | |
| 80.77 | 75.00 | 85.51 | |
| 89.36 | 75.00 | 83.10 |
| double Beta (double alpha1, double alpha2) | double Gamma (double alpha) |
| { | //gamma(alpha, 1), alpha >1 |
| double y1 = Gamma(alpha1); | { |
| double y2 = Gamma(alpha2); | double a = 1/sqrt(2*alpha-1); |
| return y1/(y1 + y2); | double b = alpha-log(4.0); |
| } | double q = alpha+1/a; |
| double theta = 4.5; | |
| double d = 1 + log(theta); | |
| B: double U1 = Uniform(); | |
| double U2 = Uniform(); | |
| double V = a*log(U1/(1-U1)); | |
| double Y = alpha*exp(V); | |
| double Z = U1*U1*U2; | |
| double W = b+q*V-Y; | |
| if(W + d-theta*Z> = 0) | |
| return Y; | |
| else if(W> = log(Z)) | |
| return Y; | |
| else | |
| goto B; | |
| } |