| Literature DB >> 16901351 |
Hui Zhang1, Paul Loriaux, Jimmy Eng, David Campbell, Andrew Keller, Pat Moss, Richard Bonneau, Ning Zhang, Yong Zhou, Bernd Wollscheid, Kelly Cooke, Eugene C Yi, Hookeun Lee, Elaine R Peskind, Jing Zhang, Richard D Smith, Ruedi Aebersold.
Abstract
There has been considerable recent interest in proteomic analyses of plasma for the purpose of discovering biomarkers. Profiling N-linked glycopeptides is a particularly promising method because the population of N-linked glycosites represents the proteomes of plasma, the cell surface, and secreted proteins at very low redundancy and provides a compelling link between the tissue and plasma proteomes. Here, we describe UniPep http://www.unipep.org--a database of human N-linked glycosites--as a resource for biomarker discovery.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16901351 PMCID: PMC1779586 DOI: 10.1186/gb-2006-7-8-R73
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Representative output of N-linked glycosites from database using UniPep. UniPep contains all proteins in the International Protein Index (IPI) database (version 2.28) with at least one N-linked glycosite and allows users to view all the predicted and identified N-linked glycosites from a specific protein. For each potential N-linked glycoprotein, a user can see the protein annotation, predicted subcellular location, and sequence(s) of predicted N-linked glycosites(s). The uniqueness of a peptide in the database is also presented as number of hits in the database, and for those peptides present in multiple proteins, linkage to other proteins in the database is provided. If any predicted N-linked glycosite was identified in the dataset from this study, then it is listed as an identified peptide with PeptideProphet score [39] to allow researchers to evaluate the confidence of the identification. The sequence of the proteins queried is overlaid with different sequence features such as the N-linked glycosites, the predicted and identified peptide sequences, signal peptide, and transmembrane segment(s) [21].
Distribution of unique tryptic peptides and tryptic peptides containing the N-X-T/S motif over subcellular classes of proteins in the human protein (IPI) database
| Tryptic peptidesa | Peptides containing N-X-T/S | |||
| Number of peptidesa | Number of proteins | Number of peptidesa | Number of proteins | |
| Intracellular | 510,685(68.2%b) | 26,721(66.6%c) | 32,770(4.4%b) | 17,475(43.6%c) |
| Secreted | 80,069(10.7%) | 3,772(9.4%) | 7,195(1.0%b) | 2,772(6.9%c) |
| Transmembrane | 114,282(15.3%) | 6,375(15.9%) | 10,359(1.4%) | 4,645(11.6%) |
| Cell surface | 70,126(9.4%) | 3,242(8.1%) | 5,138(0.7%) | 2,166(5.4%) |
| All extracellular | 264,477(35.5%) | 13,389(33.4%) | 22,692(3.0%) | 9,583(23.9%) |
| Total protein | 749,163(100%) | 40,110(100%) | 52,442(7.0%) | 27,058(67.5%) |
The human International Protein Index (IPI) database (version 2.28) contains a total of 40,110 protein entries. aTryptic peptides are defined as peptide sequences that end with Arg or Lys, are not followed by proline, and fall within the mass range from 500 to 5000 Da. bThe percentage represents the fraction of total tryptic peptides from the human database (749,163). cThe percentage represents the fraction of total proteins from the human database (40,110).
Figure 2Consistent analysis pipeline. Shown is a schematic presentation of consistent analysis pipeline for the identification of high-quality N-linked glycosites using glycopeptide capture and LC-MS/MS. LC, liquid chromatography; MS/MS, tandem mass spectrometry.
False-positive and false-negative rates of peptide identifications in liver tissue predicted by PeptideProphet at different probability thresholds
| Probability score cutoff | False-negative rate | False-positive rate |
| 0.99 | 0.6042 | 0.0025 |
| 0.95 | 0.4037 | 0.0099 |
| 0.90 | 0.3297 | 0.0172 |
| 0.80 | 0.2621 | 0.0304 |
| 0.70 | 0.2252 | 0.0437 |
| 0.60 | 0.1964 | 0.0593 |
| 0.50 | 0.1713 | 0.0787 |
| 0.40 | 0.1440 | 0.1091 |
| 0.30 | 0.1262 | 0.1364 |
| 0.20 | 0.1041 | 0.1877 |
| 0.10 | 0.0724 | 0.3010 |
| 0.00 | 0.0000 | 0.9295 |
Number of unique N-linked glycosites and percentage of sites from intracellular or extracellular proteins using different peptide probability thresholds
| Probability threshold | Database | |||||
| ≥0.5 | ≥0.8 | ≥0.9 | ≥0.95 | ≥0.99 | ||
| Number of unique | 5202 | 2870 | 2265 | 1895 | 1522 | 52442 |
| Number of unique | 2207 | 817 | 363 | 264 | 124 | 32770 |
| Number of unique | 1326 | 1086 | 1011 | 946 | 834 | 7195 |
| Number of unique | 976 | 523 | 408 | 337 | 263 | 10359 |
| Number of unique | 633 | 444 | 383 | 348 | 301 | 5138 |
| Number of unique | 2935 | 2053 | 1802 | 1631 | 1398 | 22692 |
Figure 3Ratio of identified N-linked glycosites identified from proteins predicted as intracellular proteins and extracellular proteins. The extracellular proteins include secreted proteins, cell surface proteins, and transmembrane proteins. The findings are expressed a function of probability stringency.
Summary of N-linked glycosites identified from different sample sources with probability score at least 0.99
| Sample source | Number of unique glycosites | Number of source-specific glycosites | Number of spectra used for ID |
| All | 1,522 | 173,841 | |
| Plasma | 828 | 433 | 156,814 |
| Bladder | 145 | 3 | 1,121 |
| Breast cancer cells | 369 | 135 | 2,725 |
| Liver | 202 | 13 | 964 |
| Lymphocytes | 288 | 156 | 2,847 |
| Prostate cancer cells | 71 | 4 | 108 |
| Prostate tissue | 354 | 53 | 3,804 |
| Cerebrospinal fluid | 407 | 113 | 5,453 |
Figure 4Comparison of number of N-linked glycosites commonly or uniquely detected from plasma and tissues/cells. Shows the overlap of N-linked glycosites identified in plasma with tissues or cells.