| Literature DB >> 31724716 |
Monique Zahn-Zabal1, Pierre-André Michel1, Alain Gateau1, Frédéric Nikitin1, Mathieu Schaeffer1,2, Estelle Audot1, Pascale Gaudet1, Paula D Duek1, Daniel Teixeira1, Valentine Rech de Laval1,2,3, Kasun Samarasinghe1,2, Amos Bairoch1,2, Lydie Lane1,2.
Abstract
The neXtProt knowledgebase (https://www.nextprot.org) is an integrative resource providing both data on human protein and the tools to explore these. In order to provide comprehensive and up-to-date data, we evaluate and add new data sets. We describe the incorporation of three new data sets that provide expression, function, protein-protein binary interaction, post-translational modifications (PTM) and variant information. New SPARQL query examples illustrating uses of the new data were added. neXtProt has continued to develop tools for proteomics. We have improved the peptide uniqueness checker and have implemented a new protein digestion tool. Together, these tools make it possible to determine which proteases can be used to identify trypsin-resistant proteins by mass spectrometry. In terms of usability, we have finished revamping our web interface and completely rewritten our API. Our SPARQL endpoint now supports federated queries. All the neXtProt data are available via our user interface, API, SPARQL endpoint and FTP site, including the new PEFF 1.0 format files. Finally, the data on our FTP site is now CC BY 4.0 to promote its reuse.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31724716 PMCID: PMC7145669 DOI: 10.1093/nar/gkz995
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Data content of neXtProt data release 2019-08-22
| Entries | Statistics | Change since data release 2016-08-25 | Source(s) |
|---|---|---|---|
| Entries | 20 399 | +338 | UniProtKB |
| Isoforms (Sequences) | 42 410 | +386 | UniProtKB |
| Binary interactions | 240 010 | +99 740 | IntAct, neXtProt |
| Post-translational modifications (PTMs) | 190 921 | +48 468 | UniProtKB, neXtProt, PeptideAtlas, GlyConnect |
| Variants (including disease mutations) | 6 019 871 | +1 075 957 | UniProtKB, COSMIC, dbSNP, neXtProt, gnomAD |
| Phenotypic annotations | 19 602 | +11 588 | neXtProt |
| Entries with a molecular function | 17 177 | +654 | GOA, neXtProt |
| Entries with a biological process | 16 964 | +692 | GOA, neXtProt |
| Entries with an expression profile | 19 367 | +1038 | Bgee, HPA, neXtProt |
| Entries with a disease | 4553 | +637 | UniProtKB |
| Entries with proteomics data | 18 727 | +1448 | PeptideAtlas, neXtProt |
| Entries with an experimental 3D structure | 6 505 | +765 | PDB via UniProtKB |
| Cited publications | 115 935 | +16 013 | All resources |
Coverage in neXtProt data release 2019-08-22
| Annotations | Entries with evidence from UniProtKB (%a) | Entries with evidence from any source (%a) |
|---|---|---|
| GO molecular function | 12 360 (60%) | 17 177 (84%) |
| GO biological process | 11 665 (57%) | 16 964 (83%) |
| GO cellular component | 14 581 (71%) | 18 129 (89%) |
| Subcellular location | 16 590 (81%) | 18 527 (91%) |
| Binary interactions | 8653 (42%) | 16 411 (80%) |
| Expression | 9876 (48%) | 19 527 (96%) |
| Peptide mapping | 0 (0%) | 18 727 (92%) |
| Antibody mapping | 0 (0%) | 16 423 (80%) |
| Post-translational modifications (PTMs) | 14 000 (69%) | 15 905 (78%) |
| Variants | 12 919 (63%) | 19 621 (96%) |
aThe total number of entries is 20 399.
Figure 1.Tabular view showing the expression data for insulin (NX_P01308). Expression data at the mRNA and protein level are displayed in the same semi-quantitative manner for easier comparison. Four levels of expression (undetected, low, medium, high) are possible. Mousing-over the data point displays the expression level textually.
Figure 2.Allele frequency information in the Sequence view for BRCA1 (NX_P38398) variants. To find a gnomAD variant, search in the feature table with the gnomAD ID. A link to the corresponding variant in gnomAD is found in the description of the variant and the evidence. The allele frequency, with the allele count and allele number in brackets, as well as the homozygote count, are displayed in the evidence.
Figure 3.Protein digestion tool. (A) Input form requiring the neXtProt isoform accession number for the protein to be digested. Default digestion parameters (maximum number of miscleavages, minimum peptide length and maximum peptide length) can be modified by the user. (B) Peptide count and unique peptide count for the digestion with 27 proteases or conditions. Select a protease to see the peptides obtained. (C) Table displaying information about the peptides obtained with the selected digestion conditions. The peptide sequence, length, number of missed cleavages, position in the sequence, whether the peptide is unique or not (without taking into account variants) and whether the peptide is found in neXtProt, as a natural and synthetic (SRM peptide) are displayed. A link to the neXtProt Peptide view of the entry is provided.
Figure 4.Federated SPARQL query examples. Screenshot showing all queries tagged ‘federated’ in the neXtProt SNORQL interface.