| Literature DB >> 15642101 |
Frank Desiere1, Eric W Deutsch, Alexey I Nesvizhskii, Parag Mallick, Nichole L King, Jimmy K Eng, Alan Aderem, Rose Boyle, Erich Brunner, Samuel Donohoe, Nelson Fausto, Ernst Hafen, Lee Hood, Michael G Katze, Kathleen A Kennedy, Floyd Kregenow, Hookeun Lee, Biaoyang Lin, Dan Martin, Jeffrey A Ranish, David J Rawlings, Lawrence E Samelson, Yuzuru Shiio, Julian D Watts, Bernd Wollscheid, Michael E Wright, Wei Yan, Lihong Yang, Eugene C Yi, Hui Zhang, Ruedi Aebersold.
Abstract
A crucial aim upon the completion of the human genome is the verification and functional annotation of all predicted genes and their protein products. Here we describe the mapping of peptides derived from accurate interpretations of protein tandem mass spectrometry (MS) data to eukaryotic genomes and the generation of an expandable resource for integration of data from many diverse proteomics experiments. Furthermore, we demonstrate that peptide identifications obtained from high-throughput proteomics can be integrated on a large scale with the human genome. This resource could serve as an expandable repository for MS-derived proteome information.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15642101 PMCID: PMC549070 DOI: 10.1186/gb-2004-6-1-r9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Analysis pipeline for the annotation of the human genome with high-quality peptide sequences derived from high-throughput MS analysis of biological samples.
Figure 2Visualization of PeptideAtlas peptide entries in the Ensembl DAS browser as a separate track at the top called PeptideAtlas, displayed as light blue rectangles. The Ensembl genome browser, here showing 10 kilobases (kb) on chromosome 12, can be used to zoom into the genome down to the nucleotide level. A light blue line connects peptides that map on intro/exon boundaries. Details about the peptide, including its unique identifier, peptide sequence, best PeptideProphet probability [22] (marked SCORE) and PeptideAtlas hyperlink are displayed.
Summary of PeptideAtlas results
| Human | ||
| Ensembl version | 22.34d.1 2004-06-02 | 19.3a.2, 2003-07-01 |
| Ensembl gene predictions | 23758 | 13525 from Release 3.1 FlyBase |
| Ensembl gene transcripts | 34091 | 18289 |
| PeptideAtlas version | FullHumanEns22APD0704P0.9 | Fly 2 |
| PeptideAtlas peptides | 26840 | 4406 |
| Number of experiments | 52 | 3 |
| PeptideProphet probability threshold | 0.9 | 0.9 |
| PeptideAtlas mapped peptides | 25754 | 4406 |
| PeptideAtlas mapped proteins | 9747 | 3107 |
| PeptideAtlas mapped genes | 6423 | 1876 |
| Percentage of the genome | 27 % | 14 % |
Figure 3Cumulative number of distinct peptides as a function of the addition of more good spectra (identified with P ≥ 0.9). Eventually the pattern is expected to show saturation, as most observable peptides will have been cataloged. However, at present there is no evidence of saturation and around 100 new peptides are still cataloged per 1,000 identified spectra added.
Figure 4Distribution of PeptideAtlas peptides on the human genome. Each chromosome is described by three columns. The left-most column shows a chromosome's standard banding. The right-most column presents a histogram of the mapping of peptides to chromosomal regions; a line's length represents the number of peptides mapped to a chromosomal region. The central column indicates the over/under representation of peptides in a given region. Green regions represent more mapped peptides than expected at uniform random; red regions indicate fewer mapped peptides than expected at uniform random.
Figure 5View of the DNA-dependent protein kinase catalytic subunit PRKDC gene (ENSG00000121031), which is matched by 90 distinct peptides in PeptideAtlas.
Figure 6Example of peptides confirming a case of alternative splicing of the lamin A/C gene (LMNA). PAp00038023 was identified as part of protein ENSP00000310687 from the SiHa human cell line experiment. PAp00042742 was identified as part of protein ENSP00000292304 from a human B-cell experiment.
Comparison of different probability thresholds that were applied to the MS results
| Probability | ≥ 0.70 | ≥ 0.90 | ≥ 0.95 | ≥ 0.99 |
| Total number of passing spectra | 245724 | 224793 | 211674 | 179410 |
| Peptides | 31290 | 26840 | 25022 | 21598 |
| Distinct peptides with protein sequence matches | 29393 | 25754 | 24172 | 21030 |
| Number of mapped proteins | 11612 | 9747 | 9016 | 8134 |
| Number of simple reduced proteins | 7097 | 5826 | 5383 | 4845 |
| False-positive estimate MS/MS spectra | 2.4% | 0.9% | 0.05% | 0.01% |
| False-positive estimate with protein sequence matches | <16% | <6% | <3% | <0.8% |