| Literature DB >> 34791421 |
Ludwig Lautenbacher1, Patroklos Samaras2, Julian Muller2, Andreas Grafberger2, Marwin Shraideh3,4, Johannes Rank3,4, Simon T Fuchs3,4, Tobias K Schmidt2, Matthew The2, Christian Dallago5,6, Holger Wittges3,4, Burkhard Rost5,7, Helmut Krcmar3,4, Bernhard Kuster2,8, Mathias Wilhelm1.
Abstract
ProteomicsDB (https://www.ProteomicsDB.org) is a multi-omics and multi-organism resource for life science research. In this update, we present our efforts to continuously develop and expand ProteomicsDB. The major focus over the last two years was improving the findability, accessibility, interoperability and reusability (FAIR) of the data as well as its implementation. For this purpose, we release a new application programming interface (API) that provides systematic access to essentially all data in ProteomicsDB. Second, we release a new open-source user interface (UI) and show the advantages the scientific community gains from such software. With the new interface, two new visualizations of protein primary, secondary and tertiary structure as well an updated spectrum viewer were added. Furthermore, we integrated ProteomicsDB with our deep-neural-network Prosit that can predict the fragmentation characteristics and retention time of peptides. The result is an automatic processing pipeline that can be used to reevaluate database search engine results stored in ProteomicsDB. In addition, we extended the data content with experiments investigating different human biology as well as a newly supported organism.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34791421 PMCID: PMC8728203 DOI: 10.1093/nar/gkab1026
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The architecture of ProteomicsDB. The data content and data layer of ProteomicsDB are accessible via three application programming interfaces (APIs). The API4UI is used by the frontend and contains predefined requests to the data in ProteomicsDB for the purpose of data visualization. The novel vue-based visualization layer of ProteomicsDB (top left) is separated into three levels. The proteomicsdb-components package is agnostic toward ProteomicsDB and thus usable on any website. The package proteomicsdb-wrappers connects the components with ProteomicsDB and can be re-used on any website as well. The package proteomicsdb-view contains the entire vue-based frontend of ProteomicsDB. The APIv1.1 is used by external resources (top right) and will remain publicly available. The new APIv2.0 provides access to virtually any datasets stored in ProteomicsDB.
Figure 2.APIv2.0. The tables of ProteomicsDB are grouped into topic clusters (e.g. Repository and Peptide identification data, see Figure 1 data layers). Each table is available in the API as a separate entity (square boxes). To navigate between entities with (dashed black arrows) or across (solid black arrows) topic clusters, corresponding navigation properties were defined that allow the traversal of the available data. A detailed documentation of the API is available online under https://www.proteomicsdb.org/vue/apiv2/.
Figure 3.Screenshot of the new vue-based protein summary page. The organism selection is located at the top left next to the ProteomicsDB logo. In the top middle, a new universal search field was added visible at all times. The hamburger button on the top right opens the main navigation panel of ProteomicsDB. On the left, the protein navigation panel is shown. The protein summary page shows general information about the selected protein as well as the sequence coverage and the expression of the protein for tissues and body fluids.
Figure 4.Protein Feature Viewer. This interactive visualization depicts different information about the primary and secondary structure about the protein in separate tracks. Each of these tracks can be expanded to reveal a more detailed view, exemplified by the expanded predicted secondary structure. Each region of a track can be selected to reveal additional information, exemplified for the Furin-like-repeats domain. In the bottom left, the table shows available 3D structures from PDB for this proteins. The selected structure is shown in the bottom right and the selected region (yellow highlight) is marked in red in the protein structure.
Figure 5.Spectrum viewer. The spectrum viewer (bottom) visualizes the selected peptide spectrum match from the table in the top left. The configuration element on the top right can be used for, but is not limited to, retrieving reference spectra depicted in the mirror view to the bottom. Reference spectra can be generated in real-time by Prosit or requested from ProteomeTools. In between the experimental and reference spectrum, the alignment error between an observed and reference peak is shown in parts-per-million (ppm). The spectral similarity between the experimental and reference spectrum is measured by calculating the Pearson correlation (PCC) and normalized spectral contrast angle (SA). The measures inside the brackets show the result of this comparison when taking either the peaks of the experimental or reference spectrum into account whereas the values outside the brackets show the measures calculated taking all peaks form both spectra into account.
Figure 6.Integration of Prosit into ProteomicsDB. (A) Depiction of the workflow implemented to enable automatic rescoring of projects in ProteomicsDB. Raw mass spectrometry data are downloaded from PRIDE. The rescoring is performed on the database search results stored in ProteomicsDB by retrieving predictions from Prosit. The resulting scores are merged by percolator and imported into ProteomicsDB where the picked protein approach is used for FDR estimation. (B) The number of proteins (right) and peptides (left) identified with (blue) and without (red) rescoring at an estimated PSM, peptide and protein FDR of 1% for 30 tissues from Wang et al. (29). (C) Distribution of target and decoy Q-scores of proteins supported by peptide identifications for all mouse proteins in ProteomicsDB. The example highlights the q-value of the Pyruvate kinase PKM (P52480).
Figure 7.New data added to ProteomicsDB. (A) Expression bodymap (left) of rice illustrated on the example for Phosphoglycerate kinase (A0A0P0WP33). The individual expression values are depicted in the barplot (right). (B) Venn diagram showing the overlap of human genes, for which proteomics, transcriptomics or biochemical assay data is available in ProteomicsDB. (C) Venn diagram showing the overlap of human tissues, cell lines and body fluids for which proteomics, transcriptomics or cell viability assay data are available in ProteomicsDB. (D) Barplot showing the increase in data across the depicted categories (y-axis) from 2019 to 2021.