| Literature DB >> 30053269 |
Shyamasree Saha1, Eleni A Chatzimichali1, David A Matthews2, Conrad Bessant1,3.
Abstract
PITDB is a freely available database of translated genomic elements (TGEs) that have been observed in PIT (proteomics informed by transcriptomics) experiments. In PIT, a sample is analyzed using both RNA-seq transcriptomics and proteomic mass spectrometry. Transcripts assembled from RNA-seq reads are used to create a library of sample-specific amino acid sequences against which the acquired mass spectra are searched, permitting detection of any TGE, not just those in canonical proteome databases. At the time of writing, PITDB contains over 74 000 distinct TGEs from four species, supported by more than 600 000 peptide spectrum matches. The database, accessible via http://pitdb.org, provides supporting evidence for each TGE, often from multiple experiments and an indication of the confidence in the TGE's observation and its type, ranging from known protein (exact match to a UniProt protein sequence), through multiple types of protein variant including various splice isoforms, to a putative novel molecule. PITDB's modern web interface allows TGEs to be viewed individually or by species or experiment, and downloaded for further analysis. PITDB is for bench scientists seeking to share their PIT results, for researchers investigating novel genome products in model organisms and for those wishing to construct proteomes for lesser studied species.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30053269 PMCID: PMC5753392 DOI: 10.1093/nar/gkx906
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Simplified schematic showing how the PIT workflow populates PITDB. First, TGEs are found by de novo assembling transcripts from RNA-seq data, mapping these against a genome, then searching MS/MS data from the same sample against ORFs generated from the transcripts. ORFs with peptide evidence (TGEs) are then BLASTed against protein sequences from UniProt to classify them as known, novel, isoform, etc. and assess the level of confidence in that classification using the factors shown in Table 1. All key results generated during the process are deposited in the integrated PITDB database, which can be accessed via the web.
Scheme used to assign confidence ratings to TGE observations that BLAST suggests are variants of known proteins
|
|
Overview of PITDB data content at the time of writing
| Species | Samples | Known proteins | High confidence novel TGEs (3★ or more) | |||
|---|---|---|---|---|---|---|
| Exact match to UniProt protein or isoform | UniProt protein with polymorphisms | Other isoforms | High confidence isoforms (3★ or more) | |||
|
| 31 | 3,008 | 254 | 9,615 | 77 | 2 |
|
| 10 | 1,008 | 303 | 29,234 | 1,767 | 331 |
|
| 8 | 2,384 | 464 | 21,534 | 123 | 20 |
|
| 1 | 2,017 | 101 | 3,137 | 540 | 0 |
We have TGEs from four species including two well-studied species (human and mouse) and two without a well-established proteome (P. alecto and A. aegypti). TGEs are categorized into 19 classes: known protein, known protein with variation, 16 distinct types of novel isoform and novel based on their BLAST alignments to reference proteomes of the species under study. A small percentage of identified TGEs have variations such as single amino acid polymorphisms (SAPs), multiple amino acid alterations (ALT), insertions and deletions. Among the isoforms of known proteins, a large proportion of TGEs show partial mapping to an existing protein with a longer or shorter sequence.
Figure 2.Examples of some key elements of PITDB’s user interface, including (A) the organism summary page for Pteropus alecto showing the total number of TGEs etc. in numerical and graphical form and providing access to TGEs via an interactive table; (B) summary of mass spectrometry evidence for TGE0070846 (a potential novel isoform of human Tetratricopeptide repeat protein 9C); (C) variations in sequence found between TGE0000273 and P. alecto Ras-related protein Rap-1A protein (UniProt accession L5K2Z3); (D) the genomic context of TGE and peptide observations associated with mouse protein E0CY49.