| Literature DB >> 34718696 |
Abstract
Proteome-pI 2.0 is an update of an online database containing predicted isoelectric points and pKa dissociation constants of proteins and peptides. The isoelectric point-the pH at which a particular molecule carries no net electrical charge-is an important parameter for many analytical biochemistry and proteomics techniques. Additionally, it can be obtained directly from the pKa values of individual charged residues of the protein. The Proteome-pI 2.0 database includes data for over 61 million protein sequences from 20 115 proteomes (three to four times more than the previous release). The isoelectric point for proteins is predicted by 21 methods, whereas pKa values are inferred by one method. To facilitate bottom-up proteomics analysis, individual proteomes were digested in silico with the five most commonly used proteases (trypsin, chymotrypsin, trypsin + LysC, LysN, ArgC), and the peptides' isoelectric point and molecular weights were calculated. The database enables the retrieval of virtual 2D-PAGE plots and customized fractions of a proteome based on the isoelectric point and molecular weight. In addition, isoelectric points for proteins in NCBI non-redundant (nr), UniProt, SwissProt, and Protein Data Bank are available in both CSV and FASTA formats. The database can be accessed at http://isoelectricpointdb2.org.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34718696 PMCID: PMC8728302 DOI: 10.1093/nar/gkab944
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.An overview of the Proteome-pI 2.0 database. Isoelectric points and molecular weights for individual proteins from 20 115 proteomes are visualized on virtual 2D PAGE plots (top left) and can be retrieved according to the predictions from one of 21 algorithms (top right). The data for individual proteins are accompanied by dissociation constant (pKa) predictions (middle). The proteomes are digested in silico by one of the five most commonly used proteases (trypsin, chymotrypsin, trypsin + LysC, LysN, ArgC) (bottom right). Additionally, auxiliary statistics are provided (e.g. di-amino acid frequencies) (bottom left).
General statistics of the Proteome-pI 2.0 database (20 115 proteomes with 61 329 034 proteins in total)
|
|
|
|
|
| |
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
mw, molecular weight in kDa; mean size in amino acids. For more statistics, see Supplementary Table S1. ‘Major’ and ‘minor’ refer to splicing isoforms of proteins used for calculation of the statistics.
Amino acid frequency for the kingdoms of life in the Proteome-pI 2.0 database
| Kingdom | Ala | Cys | Asp | Glu | Phe | Gly | His | Ile | Lys | Leu | Met | Asn | Pro | Gln | Arg | Ser | Thr | Val | Trp | Tyr | Total amino acids |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Viruses | 7.81 | 1.29 | 6.20 | 6.46 | 3.91 | 6.72 | 1.96 | 6.05 | 6.24 | 8.28 | 2.51 | 4.99 | 4.25 | 3.62 | 5.31 | 6.47 | 6.14 | 6.66 | 1.42 | 3.71 | 122 870 810 |
| Archaea | 8.95 | 0.90 | w7.00 | 7.94 | 3.65 | 7.84 | 1.86 | 6.03 | 4.18 | 9.11 | 2.14 | 3.36 | 4.36 | 2.48 | 5.83 | 6.12 | 5.84 | 8.16 | 1.06 | 3.18 | 213 285 886 |
| Bacteria | 10.64 | 0.90 | 5.67 | 6.06 | 3.76 | 8.01 | 2.08 | 5.52 | 4.22 | 10.12 | 2.31 | 3.35 | 4.82 | 3.49 | 6.18 | 5.75 | 5.58 | 7.42 | 1.31 | 2.81 | 9 693 905 784 |
| Eukaryota | 7.38 | 1.85 | 5.34 | 6.55 | 3.79 | 6.35 | 2.50 | 4.94 | 5.64 | 9.38 | 2.27 | 4.13 | 5.56 | 4.27 | 5.71 | 8.45 | 5.56 | 6.24 | 1.24 | 2.81 | 13 901 635 566 |
| All | 8.72 | 1.46 | 5.49 | 6.36 | 3.78 | 7.04 | 2.32 | 5.19 | 5.05 | 9.67 | 2.29 | 3.81 | 5.24 | 3.94 | 5.90 | 7.33 | 5.57 | 6.74 | 1.27 | 2.81 | 23 931 698 046 |
Similar statistics for the 20 115 individual proteomes included in Proteome-pI are available online on separate subpages. Additionally, the online version of the table http://isoelectricpointdb2.org/statistics.html is accompanied by an error estimated with 1000 bootstraps. For di-amino acid frequencies, see Supplementary Table S3.