| Literature DB >> 22034872 |
Seungmook Lee1, Min-Seok Kwon, Hyoung-Joo Lee, Young-Ki Paik, Haixu Tang, Jae K Lee, Taesung Park.
Abstract
BACKGROUND: Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies. The cornerstone of both methods is peptide identification based on a proteomic database search and subsequent estimation of peptide retention time. However, they often suffer from restrictive database search and inaccurate estimation of the liquid chromatography (LC) retention time. Furthermore, conventional peptide identification methods based on the spectral library search algorithms such as SEQUEST or SpectraST have been found to provide neither the best match nor high-scored matches. Lastly, these methods are limited in the sense that target peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.To overcome these limitations, we propose a novel method, namely Quantification method based on Finding the Identical Spectral set for a Homogenous peptide (Q-FISH) to estimate the peptide's abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra. Intuitively, our Q-FISH method compares all possible pairs of experimental spectra in order to identify both known and novel proteins, significantly enhancing identification accuracy by grouping replicated spectra from the same peptide targets.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22034872 PMCID: PMC3234305 DOI: 10.1186/1471-2105-12-423
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Work flow chart. This figure shows a flow schematic of the analysis process performed by Q-FISH algorithm
Figure 2Pattern-plots of reference spectrum and experimental MS/MS spectra in clustered spectral sets. This figure shows pattern-plots the of the experimental MS/MS spectra with plotted using positive intensities (upper part) and the reference spectrum using negative intensities (lower part). Then, (a) all of nine spectra were identified as a same peptide, while (b) two of the eleven spectra are not identified by SEQUEST and (c) four of the seven spectra were only identified by SpectraST, although pattern-plots are very similar.
Results of SEQUEST & SpectraST for spectra in clustered spectral sets
| Spectral Set ID | Sample | Sequence | XCorr | RT | precursor ion | precursor intensity | SpectraST |
|---|---|---|---|---|---|---|---|
| S366006 | HCC-3 | SIFSAVLDELK | 2.35 | 6096.18 | 1223.4 | 25828.2 | 1 |
| Normal-1 | SIFSAVLDELK | 2.38 | 6144.00 | 1222.6 | 12823.4 | 1 | |
| Normal-1 | SIFSAVLDELK | 2.12 | 6197.20 | 1224.2 | 385800.0 | 1 | |
| Normal-1 | SIFSAVLDELK | 2.30 | 6248.89 | 1224.2 | 145284.0 | 1 | |
| Normal-2 | SIFSAVLDELK | 1.99 | 6278.55 | 1223.4 | 6218.5 | 1 | |
| Normal-2 | SIFSAVLDELK | 2.35 | 6341.52 | 1222.3 | 14101.8 | 1 | |
| Normal-2 | SIFSAVLDELK | 1.98 | 6441.91 | 1223.1 | 2800.2 | 1 | |
| Normal-3 | SIFSAVLDELK | 2.56 | 6149.98 | 1224.4 | 421560.0 | 1 | |
| Normal-3 | SIFSAVLDELK | 2.37 | 6154.65 | 1222.3 | 23456.6 | 1 | |
| S1157004 | Normal-1 | VDFPQDQLTALTGR | 2.77 | 3724.74 | 1565.1 | 106099.0 | 1 |
| Normal-2 | VDFPQDQLTALTGR | 2.33 | 3647.59 | 1562.4 | 143286.0 | 1 | |
| Normal-1 | VDFPQDQLTALTGR | 2.46 | 3779.98 | 1562.2 | 75465.1 | 1 | |
| HCC-3 | VDFPQDQLTALTGR | 2.10 | 3854.18 | 1562.3 | 34323.8 | 1 | |
| Normal-2 | VDFPQDQLTALTGR | 2.07 | 3695.20 | 1562.7 | 159244.0 | 1 | |
| Normal-3 | VDFPQDQLTALTGR | 2.07 | 3825.73 | 1562.4 | 69159.3 | 1 | |
| Normal-3 | VDFPQDQLTALTGR | 2.24 | 3775.23 | 1562.9 | 71196.3 | 1 | |
| HCC-1 | VDFPQDQLTALTGR | 2.02 | 3930.66 | 1562.2 | 25175.7 | 1 | |
| HCC-2 | VDFPQDQLTALTGR | 1.95 | 3977.91 | 1562.2 | 12849.6 | 1 | |
| Normal-2 | M#WLSSMCSMRSAR | 1.29 | 3629.94 | 1564.7 | 86816.8 | 1 | |
| HCC-1 | VDFPQDQLTALTGR | 2.20 | 3907.36 | 1564.2 | 19403.5 | 1 | |
| S65002 | HCC-1 | EILVGDVGQTVDDPYATFVK | 3.81 | 4619.64 | 1084.4 | 21166.7 | 1 |
| HCC-3 | EILVGDVGQTVDDPYATFVK | 3.70 | 4573.01 | 1084.7 | 46939.4 | X | |
| HCC-3 | EILVGDVGQTVDDPYATFVK | 2.32 | 4579.55 | 1083.5 | 22077.0 | 1 | |
| Normal-1 | EILVGDVGQTVDDPYATFVK | 2.87 | 4516.45 | 1084.0 | 26598.5 | 1 | |
| Normal-2 | EILVGDVGQTVDDPYATFVK | 4.08 | 4461.41 | 1084.5 | 19416.7 | X | |
| Normal-2 | EILVGDVGQTVDDPYATFVK | 3.49 | 4514.74 | 1084.7 | 91100.7 | X | |
| Normal-3 | EILVGDVGQTVDDPYATFVK | 3.37 | 4548.32 | 1084.5 | 23254.4 | 1 | |
These spectra were clustered by the proposed Q-FISH algorithm. In the case of spectral set S366006, all spectra in spectral set were identified as a same peptide sequence "SIFSAVLDELK" by both of SEQUEST and SpectraST, while two spectra in S1157004 are not identified by SEQUEST (XCorr < 2.11). Also, all spectra in S65002 are identified by SEQUEST with high scores, while four spectra were only identified by SpectraST. If we relied only on SEQUEST or SpectraST, these spectra in S1157004 or S65002 would be excluded.
Figure 3Scatter plot between different samples and within replicated samples. This figure represents the scatter plot with the number of spectra in clustered sets obtained through the replicated experiments on HCC and normal tissue samples, respectively. Then, two black boxes show the relationships of the number of spectra in replicated each HCC and normal samples, while a gray box represents the relationships of the number of spectra in clustered sets between HCC and normal samples.
Correlation matrix and the number of shared spectral clusters between different samples and within replicated samples
| HCC1 | HCC2 | HCC3 | Normal1 | Normal2 | Normal3 | |
|---|---|---|---|---|---|---|
| HCC1 | 1.0000a | 0.8315 | 0.8125 | 0.1549 | 0.0828 | 0.1088 |
| HCC2 | 1.0000 | 0.8048 | 0.1232 | 0.0654 | 0.0899 | |
| HCC3 | 1.0000 | 0.1394 | 0.0654 | 0.0911 | ||
| Normal1 | 1.0000 | 0.7178 | 0.7449 | |||
| Normal2 | 1.0000 | 0.7302 | ||||
| Normal3 | 1.0000 |
a: correlation coefficient, b: # of spectral clusters, and c: # of shared spectral clusters,
This table shows correlation matrix with number of spectra in same cluster between different samples and within replicated samples. The number of spectra in the same cluster within replicated samples showed high correlations, while the number of spectra between different samples showed weak correlations.
2 × 2 tables for literature search results of Q-FISH and SEQUEST
| Q-FISH | SEQUEST | ||||||
|---|---|---|---|---|---|---|---|
| HCC | Normal | Total | HCC | Normal | Total | ||
| Literature | Over-Expressed | 17 | 42 | 26 | 60 | ||
| Under-Expressed | 6 | 23 | 9 | 33 | |||
| Total | 31 | 34 | 65 | 43 | 50 | 93 | |
| Accuracy | 64.62% | 62.37% | |||||
We assume that if a peptide is reported in a previous literature, it is assumed to be correctly identified. We compared two results (Q-FISH and SEQUEST) through literature search. Based on this report, the following 2 × 2 tables can be constructed
Lists of differentially expressed peptides in HCC and normal sample.
| HCC sample | |||||||
|---|---|---|---|---|---|---|---|
| HCC | AKR1B10 | IVENIQVFDFK | 2 | 2.04 | 0.95 | Aldo-keto reductase family 1 member B10 | 20388846 |
| ALB | DVFLGMFLYEYAR | 2 | 2.04 | 0.96 | Putative uncharacterized protein ALB | 20658536 | |
| ECHDC3 | VIIISAEGPVFSSGHDLK | 2 | 2.14 | 0.95 | Isoform 1 of Enoyl-CoA hydratase domain-containing protein 3, mitochondrial | 21495032 | |
| EEF1A2 | THINIVVIGHVDSGK | 3 | 2.29 | 0.83 | Elongation factor 1-alpha 2 | 18161050 | |
| EEF2 | AYLPVNESFGFTADLR | 3 | 3.30 | 0.96 | Elongation factor 2 | 18161940 | |
| ENO1 | FTASAGIQVVGDDLTVTNPK | 33 | 2.51 | 0.61 | Isoform alpha-enolase of Alpha-enolase | 18813785 | |
| FGG | EGFGHLSPTGTTEFWLGNEK | 2 | 3.16 | 0,95 | Isoform Gamma-B of Fibrinogen gamma chain | 19596924 | |
| FN1 | SSPVVIDASTAIDAPSNLR | 2 | 2.45 | 0.96 | Isoform 1 of Fibronectin | 16820872 | |
| FTCD | EAQELSLPVVGSQLVGLVPLK | 2 | 2.98 | 0.99 | Isoform A of Formimidoyltransferase-cyclodeaminase | 18571811 | |
| GAPDH | WGDAGAEYVVESTGVFTTMEK | 5 | 3.57 | 0.96 | Glyceraldehyde-3-phosphate dehydrogenase | 20714864 | |
| HBD | FFESFGDLSSPDAVMGNPK | 2 | 2.37 | 0.96 | Hemoglobin subunit delta | 9214599 | |
| HMOX1 | ALDLPSSGEGLAFFTFPNIASATK | 2 | 2.82 | 0.90 | Heme oxygenase 1 | 20664735 | |
| HRSP12 | IEIEAVAIQGPLTTASL | 2 | 2.31 | 0.98 | Ribonuclease UK114 | 18349270 | |
| HSPA5 | NQLTSNPENTVFDAK | 4 | 2.51 | 0.97 | HSPA5 protein | 19445531 | |
| HSPA9 | VINEPTAAALAYGLDK | 2 | 2.04 | 0.93 | Stress-70 protein, mitochondrial | 18334731 | |
| DIVMTQSPDSLAVSLGER | 2 | 2.52 | 0.99 | ||||
| HSPD1 | ALMLQGVDLLADAVAVTMGPK | 3 | 2.66 | 0.95 | 60 kDa heat shock protein, mitochondrial | 21533669 | |
| NME1 | VMLGETNPADSKPGTIR | 2 | 2.57 | 0.97 | Isoform 1 of Nucleoside diphosphate kinase A | 17594820 | |
| EISLWFKPEELVDYK | 2 | 2.27 | 0.95 | ||||
| P4HB | VDATEESDLAQQYGVR | 2 | 2.38 | 0.81 | Protein disulfide-isomerase | 21207424 | |
| PRDX6 | LIALSIDSVEDHLAWSK | 3 | 3.48 | 0.93 | Peroxiredoxin-6 | 19893992 | |
| TKT | ILATPPQEDAPSVDIANIR | 3 | 2.16 | 0.98 | cDNA FLJ54957, highly similar to Transketolase | 17321041 | |
| VCP | LIVDEAINEDNSVVSLSQPK | 2 | 2.49 | 0.98 | Transitional endoplasmic reticulum ATPase | 12560433 | |
| VIM | EMEENFAVEAANYQDTIGR | 3 | 3.28 | 0.99 | Vimentin | 19843643 | |
| breast cancer | EEF1D | SLAGSSGPGASSGTSGDHGELVVR | 2 | 3.17 | 0.93 | Elongation factor 1-delta | 17997862 |
| HBB | GTFATLSELHCDK | 2 | 2.09 | 0.97 | Hemoglobin subunit beta | 20097481 | |
| colon cancer | ACTN1 | GYEEWLLNEIR | 3 | 2.03 | 0.99 | Alpha-actinin-1 | 17898132 |
| ACLISLGYDVENDR | 2 | 2.09 | 0.94 | ||||
| ATP5B | DQEGQDVLLFIDNIFR | 2 | 2.58 | 0.98 | ATP synthase subunit beta, mitochondrial | 20080835 | |
| HMGCS2 | LMFNDFLSASSDTQTSLYK | 3 | 2.87 | 0.93 | Hydroxymethylglutaryl-CoA synthase, mitochondrial | 16940161 | |
| colorectral cancer | ATP5A1 | NVQAEEMVEFSSGLK | 2 | 2.65 | 0.95 | ATP synthase subunit alpha, mitochondrial | 9261598 |
| EVAAFAQFGSDLDAATQQLLSR | 3 | 2.88 | 0.87 | ||||
| Leukemia | IDH1 | SIEDFAHSSFQMALSK | 2 | 2.53 | 0.97 | Isocitrate dehydrogenase [NADP] cytoplasmic | 21205756 |
| pancreatic cancer | EPPK1 | LLEAQIATGGVIDPVHSHR | 2 | 2.64 | 0.97 | epiplakin 1 | 18498355 |
| lung cancer | FGB | DNENVVNEYSSELEK | 3 | 2.57 | 0.97 | Fibrinogen beta chain | 20142248 |
| cell migration. | FLNB | YAPTEVGLHEMHIK | 2 | 2.02 | 0.97 | Isoform 1 of Filamin-B | 20110358 |
| XRCC5 | YAPTEAQLNAVDALIDSMSLAK | 5 | 3.60 | 0.94 | ATP-dependent DNA helicase 2 subunit 2 | ||
| AP1B1 | LAPPLVTLLSAEPELQYVALR | 2 | 2.81 | 0.99 | Isoform A of AP-1 complex subunit beta-1 | ||
| PLEC | AGTLSITEFADMLSGNAGGFR | 2 | 2.16 | 0.89 | Isoform 1 of Plectin-1 | ||
| SDHAF2 | PAPEIFENEVMALLR | 3 | 2.41 | 0.93 | Protein EMI5 homolog, mitochondrial | ||
| TUBA4A | AFVHWYVGEGMEEGEFSEAR | 2 | 2.40 | 0.98 | Tubulin alpha-4A chain | ||
| AVFVDLEPTVIDEVR | 2 | 2.23 | 0.98 | ||||
| TYMP | DVTATVDSLPLITASILSK | 3 | 2.84 | 0.93 | Thymidine phosphorylase | ||
| UGP2 | TLDGGLNVIQLETAVGAAIK | 2 | 2.98 | 0.94 | Isoform 1 of UTP--glucose-1-phosphate uridylyltransferase | ||
| TUBB | AILVDLEPGTMDSVR | 2 | 1.97 | 0.95 | Tubulin beta chain | ||
| TPI | VTNGAFTGEISPGMIK | 2 | 2.52 | 0.95 | Triosephosphate isomerase (Fragment) | ||
| Unknown | LFIGGLSFETTEESLR | 2 | 2.64 | 0.97 | Putative uncharacterized protein HNRNPA2B1 | ||
| SVPTSTVFYPSDGVATEK | 3 | 2.77 | 0.93 | cDNA FLJ54957, highly similar to Transketolase | |||
| RHVFGESDELIGQK | 2 | 2.09 | 0.96 | ||||
| VFSNGADLSGVTEEAPLK | 2 | 2.24 | 0.90 | PRO2275 | |||
| Normal sample | |||||||
| Related Cancer | Gene Name | Shogun Sequence | #(normal)b | XCorr | Q Score | Protein Name | PMIDc |
| HCC | A2M | VSVQLEASPAFLAVPVEK | 2 | 2.36 | 0.93 | Alpha-2-macroglobulin | 18959789 |
| LLLQQVSLPELPGEYSMK | 3 | 2.25 | 0.96 | ||||
| ACTA2 | YPIEHGIITNWDDMEK | 3 | 2.42 | 0.96 | Actin, aortic smooth muscle | 21214675 | |
| ALB | RPCFSALEVDETYVPK | 2 | 2.18 | 0.90 | Putative uncharacterized protein ALB | 20658536 | |
| ALDH2 | VAEQTPLTALYVANLIK | 2 | 2.55 | 0.86 | Aldehyde dehydrogenase, mitochondrial | 20186752 | |
| ALDH6A1 | ENTLNQLVGAAFGAAGQR | 2 | 2.46 | 0.89 | Methylmalonate-semialdehyde dehydrogenase [acylating], mitochondrial | 17786358 | |
| LFIHESIHDEVVNR | 2 | 2.61 | 0.96 | ||||
| VNAGDQPGADLGPLITPQAK | 2 | 3.27 | 0.98 | ||||
| ALDOB | GILAADESVGTMGNR | 3 | 2.40 | 0.85 | Fructose-bisphosphate aldolase B | 17786358 | |
| ELSEIAQSIVANGK | 2 | 2.32 | 0.96 | ||||
| ASL | INVLPLGSGAIAGNPLGVDR | 3 | 3.18 | 0.76 | Argininosuccinate lyase | 19138817 | |
| ASS1 | NPWSMDENLMHISYEAGILENPK | 2 | 2.74 | 0.96 | Argininosuccinate synthase | 20104527 | |
| BHMT | ISGQEVNEAACDIAR | 2 | 2.23 | 0.62 | Betaine--homocysteine S-methyltransferase 1 | 19960509 | |
| AGPWTPEAAVEHPEAVR | 2 | 2.62 | 0.93 | ||||
| C5orf33 | VATQAVEDVLNIAK | 2 | 2.23 | 0.97 | Isoform 2 of UPF0465 protein C5orf33 | 21495032 | |
| CAT | GAGAFGYFEVTHDITK | 2 | 2.17 | 0.78 | Catalase | 21324921 | |
| FNTANDDNVTQVR | 2 | 2.40 | 0.92 | ||||
| FGG | AIQLTYNPDESSKPNMIDAATLK | 3 | 3.76 | 0.92 | Fibrinogen gamma chain | 17018627 | |
| ETFA | LEVAPISDIIAIK | 5 | 2.73 | 0.89 | Electron transfer flavoprotein alpha-subunit | 20515076 | |
| CPS1 | TVLMNPNIASVQTNEVGLK | 3 | 2.42 | 0.99 | Isoform 1 of Carbamoyl-phosphate synthase [ammonia], mitochondrial | 12143053 | |
| FLGVAEQLHNEGFK | 3 | 2.67 | 0.97 | ||||
| AVNTLNEALEFAK | 2 | 2.58 | 0.96 | ||||
| VLGTSVESIMATEDR | 3 | 2.22 | 0.88 | ||||
| IEFEGQPVDFVDPNK | 2 | 2.52 | 0.98 | ||||
| GLNSESMTEETLK | 2 | 2.63 | 0.95 | ||||
| CYP3A7 | EMVPIIAQYGDVLVR | 2 | 2.37 | 0.80 | Cytochrome P450 variant 3A7 | 17978482 | |
| DCI | DADVQNFVSFISK | 3 | 2.20 | 0.99 | Isoform 1 of 3,2-trans-enoyl-CoA isomerase, mitochondrial | 1903293 | |
| ECHS1 | ALNALCDGLIDELNQALK | 2 | 3.33 | 0.98 | Enoyl-CoA hydratase, mitochondrial | 15492826 | |
| EIF5 | AMGPLVLTEVLFNEK | 5 | 2.41 | 0.83 | Eukaryotic translation initiation factor 5 | 19175833 | |
| FBP1 | LDVLSNDLVMNMLK | 7 | 2.34 | 0.72 | Fructose-1,6-bisphosphatase 1 | 19637194 | |
| FH | SGLGELILPENEPGSSIMPGK | 3 | 2.20 | 0.98 | Isoform Mitochondrial of Fumarate hydratase, mitochondrial | 1958270 | |
| AAAEVNQDYGLDPK | 3 | 2.23 | 0.97 | ||||
| IYELAAGGTAVGTGLNTR | 2 | 2.24 | 0.97 | ||||
| FLNA | ASGPGLNTTGVPASLPVEFTIDAK | 3 | 2.68 | 0.97 | Isoform 2 of Filamin-A | 21471709 | |
| HPD | SQIQEYVDYNGGAGVQHIALK | 2 | 2.99 | 0.98 | 4-hydroxyphenylpyruvate dioxygenase | 8558370 | |
| HSPA5 | SQIFSTASDNQPTVTIK | 2 | 2.16 | 0.97 | HSPA5 protein | 19445531 | |
| KRT8 | LKLEAELGNMQGLVEDFK | 59 | 2.08 | 0.43 | Keratin, type II cytoskeletal 8 | 18932288 | |
| PBLD | VNTENLLQVENTGK | 2 | 2.33 | 0.94 | Phenazine biosynthesis-like domain-containing protein | 20525558 | |
| PDIA4 | EVSQPDWTPPPEVTLVLTK | 3 | 2.49 | 0.98 | Protein disulfide-isomerase A4 | 19016532 | |
| PEBP1 | GNDISSGTVLSDYVGSGPPK | 6 | 3.51 | 0.96 | Phosphatidylethanolamine-binding protein 1 | 20739083 | |
| PHB | NITYLPAGQSVLLQLPQ | 3 | 2.56 | 0.86 | Prohibitin | 21318481 | |
| PRDX6 | ELAILLGMLDPAEK | 4 | 2.00 | 0.94 | Peroxiredoxin-6 | 19893992 | |
| SELENBP1 | NTGTEAPDYLATVDVDPK | 2 | 2.06 | 0.96 | cDNA FLJ55757, highly similar to Selenium-binding protein 1 | 21338716 | |
| SORBS1 | LTPVQVLEYGEAIAK | 2 | 2.64 | 1.00 | Isoform 9 of Sorbin and SH3 domain-containing protein 1 | 11374898 | |
| SORD | LENYPIPEPGPNEVLLR | 2 | 1.99 | 0.97 | Sorbitol dehydrogenase | 12848999 | |
| STIP1 | ALSVGNIDDALQCYSEAIK | 2 | 2.54 | 0.97 | Stress-induced-phosphoprotein 1 | 17627933 | |
| TPI1 | VAHALAEGLGVIACIGEK | 2 | 3.35 | 0.99 | Isoform 2 of Triosephosphate isomerase | 18813785 | |
| TXNDC5 | ALAPTWEQLALGLEHSETVK | 3 | 4.01 | 0.98 | Thioredoxin domain-containing protein 5 | 16574106 | |
| Vκ3 | EIVLTQSPATLSLSPGER | 2 | 2.97 | 0.97 | Rheumatoid factor D5 light chain (Fragment) | 15207089 | |
| ADH1A | FSLDALITHVLPFEK | 6 | 2.60 | 0.92 | Alcohol dehydrogenase 1A | 16054971 | |
| ELGATECINPQDYK | 2 | 2.15 | 0.94 | ||||
| ADH4 | ISEAFDLMNQGK | 4 | 2.95 | 0.94 | Isoform 2 of Alcohol dehydrogenase 4 | 16054971 | |
| GGVDFALDCAGGSETMK | 3 | 3.25 | 0.96 | ||||
| FNLDALVTHTLPFDK | 8 | 2.76 | 0.95 | ||||
| AAIAWEAGKPLCIEEVEVAPPK | 3 | 2.71 | 0.99 | ||||
| DLHKPIQEVIIELTK | 5 | 3.08 | 0.99 | ||||
| prostate cancer | COL6A2 | YGGLHFSDQVEVFSPPGSDR | 2 | 2.33 | 0.86 | Isoform 2C2A' of Collagen alpha-2(VI) chain | 18353764 |
| LLTPITTLTSEQIQK | 3 | 2.57 | 0.93 | ||||
| VAVVTYNNEVTTEIR | 5 | 2.38 | 0.67 | ||||
| IEDGVPQHLVLVLGGK | 2 | 2.01 | 0.86 | ||||
| RPS27A | TITLEVEPSDTIENVK | 2 | 2.23 | 0.98 | ubiquitin and ribosomal protein S27a precursor | 15647830 | |
| breast cancer | EMILIN1 | LVGSGLHTVEAAGEAR | 2 | 2.47 | 0.96 | EMILIN-1 | 16243817 |
| MYH9 | NLPIYSEEIVEMYK | 2 | 2.06 | 0.97 | Isoform 1 of Myosin-9 | 18796164 | |
| QLLQANPILEAFGNAK | 3 | 2.80 | 0.86 | Isoform 1 of Myosin-9 | |||
| IAEFTTNLTEEEEK | 13 | 2.29 | 0.65 | Isoform 1 of Myosin-9 | |||
| colon cancer | ALDH1A1 | GYFVQPTVFSNVTDEMR | 3 | 3.18 | 0.97 | Retinal dehydrogenase 1 | 21435460 |
| ATP5B | TVLIMELINNVAK | 5 | 3.18 | 0.88 | ATP synthase subunit beta, mitochondrial | 20080835 | |
| ETFA | AAVDAGFVPNDMQVGQTGK | 2 | 2.13 | 0.98 | Electron transfer flavoprotein subunit alpha, mitochondrial | 16708797 | |
| GTSFDAAATSGGSASSEK | 6 | 2.53 | 0.86 | ||||
| ANXA6 | GLGTDEDTIIDIITHR | 2 | 2.48 | 0.98 | annexin VI isoform 2 | 21137014 | |
| Leukemia | GLUD1 | HGGTIPIVPTAEFQDR | 2 | 2.48 | 0.98 | Glutamate dehydrogenase 1, mitochondrial | 19683518 |
| IDH2 | LNEHFLNTTDFLDTIK | 3 | 2.77 | 0.98 | Isocitrate dehydrogenase [NADP], mitochondrial | 21205756 | |
| gastic carcinoma | HIST4H4 | TVTAMDVVYALK | 2 | 2.03 | 0.96 | Histone H4 | 19139817 |
| colorectal cancer | RRBP1 | TLQEQLENGPNTQLAR | 2 | 2.74 | 0.88 | Isoform 3 of Ribosome-binding protein 1 | 19425502 |
| pancreatic cancer | ARG1 | TGLLSGLDIMEVNPSLGK | 4 | 2.71 | 0.91 | Isoform 1 of Arginase-1 | 21347333 |
| CALM1 | VFDKDGNGYISAAELR | 3 | 2.50 | 0.93 | Calmodulin | 18852131 | |
| EAFSLFDKDGDGTITTK | 2 | 2.62 | 0.98 | ||||
| ovarian cancer | HAAO | TQGSVALSVTQDPACK | 2 | 2.56 | 0.91 | Isoform 1 of 3-hydroxyanthranilate 3,4-dioxygenase | 19724865 |
| Lung cancer | ACY1 | TVQPKPDYGAAVAFFEETAR | 2 | 2.50 | 0.99 | cDNA FLJ60317, highly similar to Aminoacylase-1 | 8394326 |
| cell migration. | FLNB | LVSPGSANETSSILVESVTR | 2 | 3.21 | 0.99 | Isoform 1 of Filamin-B | 19915675 |
| UGP2 | ILTTASSHEFEHTK | 2 | 3.30 | 0.93 | Isoform 1 of UTP--glucose-1-phosphate uridylyltransferase | ||
| IQRPPEDSIQPYEK | 4 | 2.38 | 0.95 | ||||
| ALDH4A1 | EEIFGPVLSVYVYPDDKYK | 3 | 3.34 | 0,95 | Delta-1-pyrroline-5-carboxylate dehydrogenase, mitochondrial | ||
| COL14A1 | HFLENLVTAFDVGSEK | 3 | 2.39 | 0.77 | Isoform 1 of Collagen alpha-1(XIV) chain | ||
| DCTN2 | LLGPDAAINLTDPDGALAK | 2 | 2.24 | 0.94 | dynactin 2 | ||
| EEF1B2 | SPAGLQVLNDYLADK | 3 | 2.86 | 0.84 | Elongation factor 1-beta | ||
| GRHPR | IAAAGLDVTSPEPLPTNHPLLTLK | 3 | 3.11 | 0.99 | Glyoxylate reductase/hydroxypyruvate reductase | ||
| HSD17B10 | VMTIAPGLFGTPLLTSLPEK | 3 | 2.80 | 0.91 | Isoform 1 of 3-hydroxyacyl-CoA dehydrogenase type-2 | ||
| PCBD1 | VHITLSTHECAGLSER | 2 | 2.54 | 0.96 | Pterin-4-alpha-carbinolamine dehydratase | ||
| PDIA6 | GSTAPVGGGAFPTIVER | 3 | 2.05 | 0.87 | Isoform 2 of Protein disulfide-isomerase A6 | ||
| PTGR1 | HFVGYPTNSDFELK | 2 | 2.24 | 0.93 | Prostaglandin reductase 1 | ||
| TGPLPPGPPPEIVIYQELR | 7 | 2.56 | 0.96 | ||||
| TF | SAGWNIPIGLLYCDLPEPR | 3 | 2.65 | 0.97 | Serotransferrin | ||
| EDPQTFYYAVAVVK | 4 | 2.49 | 0.92 | ||||
| unknown | PAHVVVGDVLQAADVDK | 2 | 2.88 | 0.96 | 22 kDa protein | ||
| HCC and normal sample | |||||||
| Related Cancer | Gene Name | Shogun Sequence | #(HCC)a /#(normal)b | XCorr | Q Score | Protein Name | PMIDc |
| HCC | CPS1 | MEYDGILIAGGPGNPALAEPLIQNVR | 2/11 | 3.92 | 0.91 | carbamoyl-phosphate synthetase 1 | 12143053 |
| SIFSAVLDELK | 1/8 | 3.87 | 0.92 | ||||
| IAPSFAVESIEDALK | 3/13 | 2.96 | 0.85 | ||||
| TAVDSGIPLLTNFQVTK | 1/10 | 2.50 | 0.45 | ||||
| HBA1 | VADALTNAVAHVDDMPNALSALSDLHAHK | 1/8 | 3.67 | 0.93 | Hemoglobin subunit alpha 1 | 20572306 | |
| VGAHAGEYGAEALER | 4/13 | 2.05 | 0.94 | ||||
| P4HB | ILFIFIDSDHTDNQR | 10/15 | 2.88 | 0.49 | Protein disulfide-isomerase | 21207424 | |
| HNRNPC | MIAGQVLDINLAAEPK | 21/9 | 2.31 | 0.46 | Heterogeneous nuclear ribonucleoprotein C (C1/C2), isoform CRA_b | 20572306 | |
| PGK1 | VSHVSTGGGASLELLEGK | 16/8 | 3.36 | 0.46 | Phosphoglycerate kinase 1 | 19200351 | |
| ACTB | DLYANTVLSGGTTMYPGIADR | 10/3 | 3.25 | 0.96 | Actin, cytoplasmic 1 | 16493704 | |
| GSTA1 | NDGYLMFQQVPMVEIDGMK | 2/6 | 2.24 | 0.83 | Glutathione S-transferase | 20604928 | |
| FABP1 | SVTELNGDIITNTMTLGDIVFK | 17/6 | 3.43 | 0.78 | Fatty acid-binding protein | 12245374 | |
| CES1 | EGYLQIGANTQAAQK | 13/1 | 2.21 | 0.76 | Isoform 1 of Liver carboxylesterase 1 | 19658107 | |
| Breast Cancer | LGALS7/LGALS7B | LVEVGGDVQLDSVR | 19/1 | 2.35 | 0.65 | Galectin-7/p53-induced gene 1 protein | 20382700 |
| HBB | FFESFGDLSTPDAVMGNPK | 39/67 | 2.72 | 0.74 | Hemoglobin subunit beta | 20097481 | |
| MDH2 | VDFPQDQLTALTGR | 4/7 | 2.49 | 0.93 | Malate dehydrogenase 2 | 19485423 | |
| MYH9 | LQQELDDLLVDLDHQR | 9/15 | 2.54 | 0.56 | Myosin, heavy polypeptide 9, non-muscle, isoform CRA_a | 18796164 | |
| Ovarian cancer | PSMA2 | YNEDLELEDAIHTAILTLK | 3/5 | 4.48 | 0.84 | Proteasome subunit alpha type-2 | 14960231 |
| Lung cancer | AKR1A1 | DPDEPVLLEEPVVLALAEK | 3/5 | 3.16 | 0.63 | Aldo-keto reductase family 1 | 17114299 |
| Chromophobe renal cell carcinomas | ATP5H | NLIPFDQMTIEDLNEAFPETK | 3/5 | 2.48 | 0.95 | ATP synthase subunit d, mitochondrial | 20440404 |
| Leukemia | IGKC | VDNALQSGNSQESVTEQDSK | 3/6 | 3.95 | 0.92 | Ig kappa chain C region | 12357370 |
| RPS7 | TLTAVHDAILEDLVFPSEIVGK | 5/3 | 3.92 | 0.92 | 40S ribosomal protein S7 | ||
a the number of spectral sets in HCC samples
b the number of spectral set in normal samples.
c the PubMed index for MEDLINE
Table 4 shows lists of DEPs in HCC sample, normal sample, and both samples. In HCC sample and normal sample, 57 and 115 reference spectra were identified by SEQUEST. Among these spectra, 29 and 59 peptides were known biomarkers for the human liver cancer. In both sample, we performed a beta-binomial test for finding out DEPs. The result shows that only 84 out of 1,571 reference spectra indicate heterogeneity of spectral counts between HCC and normal tissue samples. Among these 84 reference spectra, only 22 were identified by SEQUEST.
Figure 4Scatter plot of spectral counts between normal and HCC samples. This figure plots the number of spectra in clustered sets in HCC and normal sample, respectively. The × axis and y axis represent the number of expressed spectra in each HCC and normal sample. Specifically, the grey triangle indicates DEPs identified with the use of SEQUEST, whereas the black circle indicates unidentified DEPs.
Validation for clustering result using the false clustering rate (FCR)
| FCR using RT information | FCR for the cut-off value | ||
|---|---|---|---|
| 1 | 0.0288 | 0.0 | 1.0000 |
| 2 | 0.0307 | 0.1 | 0.9486 |
| 3 | 0.0380 | 0.2 | 0.8060 |
| 4 | 0.0467 | 0.3 | 0.6525 |
| 0.4 | 0.4515 | ||
| 5 | 0.0553 | 0.5 | 0.3178 |
| 6 | 0.0639 | ||
| 7 | 0.0719 | 0.7 | 0.0034 |
| 8 | 0.0806 | 0.8 | 0.0008 |
| 9 | 0.0895 | 0.9 | 0.0003 |
| 10 | 0.0981 | 1.0 | 0.0000 |
In order to validate the clustering results, we propose a new measure to estimate the clustering error rate using the spectral retention time (RT) information. We computed the false clustering rate (FCR) for various values of threshold Δ, as summarized. We also calculated FCR to determine the cut-off value of correlation coefficient for spectral clustering. We computed FCR for the various values of the given ρ, as summarized. We chose ρ = 0.6 which yielded FCR close to 0.05.