Literature DB >> 16381845

PPD v1.0--an integrated, web-accessible database of experimentally determined protein pKa values.

Christopher P Toseland1, Helen McSparron, Matthew N Davies, Darren R Flower.   

Abstract

The Protein pK(a) Database (PPD) v1.0 provides a compendium of protein residue-specific ionization equilibria (pK(a) values), as collated from the primary literature, in the form of a web-accessible postgreSQL relational database. Ionizable residues play key roles in the molecular mechanisms that underlie many biological phenomena, including protein folding and enzyme catalysis. The PPD serves as a general protein pK(a) archive and as a source of data that allows for the development and improvement of pK(a) prediction systems. The database is accessed through an HTML interface, which offers two fast, efficient search methods: an amino acid-based query and a Basic Local Alignment Search Tool search. Entries also give details of experimental techniques and links to other key databases, such as National Center for Biotechnology Information and the Protein Data Bank, providing the user with considerable background information. The database can be found at the following URL: http://www.jenner.ac.uk/PPD.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16381845      PMCID: PMC1347398          DOI: 10.1093/nar/gkj035

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

A significant proportion of chemical reactions involving proteins are mediated through electrostatic interactions of their ionizable residues (1). Such residues greatly influence the conformation of a protein and therefore its function (2,3), as demonstrated by their folding mechanisms (4–6), enzyme catalysis and protein–protein interactions (7). With respect to enzyme catalysis, residues can act as proton donors and acceptors within the catalytic site and help stabilize transition states, with a concomitant influence on the rate of reaction (8,9). The dissociation constant (Ka) is a measure of the acidity of a compound, i.e. its ability to donate a proton. Ka values range widely from 1010 for the strongest acids, such as sulphuric, to 10−50 for the weakest, such as methane. Therefore a negative logarithmic scale is usually applied (pKa = −log10 Ka), whereby Ka values for sulphuric acid and methane would become pKa values of −10 and 50, respectively. Generally, more negative pKa values correspond to stronger acids. The pKa values of individual amino acid residues in proteins are determined by the ionization of their side-chain groups. For the 20 natural amino acids, pKa values range from 4.0 for the side-chain carboxyl of aspartate to 12.0 for the side-chain guanididium group of arginine. Main-chain groups are not ionizable, although two additional ionizable groups exist at the N- and C-termini. Residues within proteins have pKa values that are moderated by their micro-environments, the nature of their near neighbours, the extent of hydrogen bonding and so on and can take on a range of values different from that of a model residue. NMR spectroscopy is the most widely used method for determining the pKa values of individual residues, with an accuracy of ∼0.1 pH units. Although many NMR methods are available, most entries in the Protein pKa Database (PPD) are derived using 1H, 13C and 15N experiments. Inaccuracies in NMR experiments stem from the range of pH values tested, variations in ionic strength and the reversibility of the titration (10). In light of this, new combination methods are being used based on NMR spectroscopy coupled with site-directed mutagenesis, which leads to more accurate pKa values (10,11). The functional importance of ionizable residues has led to numerous attempts to predict individual residue-specific pKa values (12–16). pKa values are usually calculated from 3D structures using the Poisson–Boltzmann equation. However, variations occur between calculated and experimentally measured pKa values (13). Molecular dynamic simulations have also been used for such predictions, although this only gives rise to a marginal increase in accuracy (17). As only a small handful of reviews have attempted to compile residue-specific protein pKa values (10,18,19), it was decided to develop a database that would serve as a standard compendium against which to compare new experimental or theoretical results. The PPD v1.0 contains >1400 amino acid pKa values, sourced from experimental data. Cross-references to several external databases—the Protein Data Bank (PDB) (20), the Enzyme Nomenclature and Classification database (21) and the National Center for Biotechnology Information (NCBI) Entrez-Protein—have also been incorporated into the database.

DATABASE DEVELOPMENT

PPD v1.0 has been implemented using a postgreSQL relational database, which provides an appropriate infrastructure for all foreseeable future developments of the archive. The data were initially compiled in a Microsoft ACCESS database after exhaustive searching of the primary literature, which included using keyword searches of the NCBI PubMed database (). The postgreSQL database is structured into seven normalized tables, populated from a flat-file export of the ACCESS database using PERL scripts integrated with SQL. As data are continually accumulating, archiving data is an on-going process: automatic, periodic updates will be made to the postgreSQL database. The PPD user interface is provided by a series of HTML pages. There are two searchable forms available within the PPD site. One offers either a broad or focussed PPD search. The other searches PPD using Basic Local Alignment Search Tool (BLAST). These forms target either a PERL/SQL script or a CGI script which in turn queries the database. The bespoke search engine facilitates fast, efficient and flexible data retrieval (Searching the Database). PPD is freely available on the world wide web (PPD).

DATABASE CONTENT

The data within PPD was sourced from the primary literature to give >1400 entries, containing pKa values for >160 proteins (Table 1). The database contains pKa values for amino acid side-chains, as well as the N- and C-termini. Data are archived for all amino acid residues, with the exception of methionine. However most entries focus on glutamate, lysine, histidine and aspartate, which together account for >75% of the data. As these four are all key ionizable residues, the apparent bias is not driven by our selection, but by the available experimental data. Very little data are currently available for arginine: its pKa value (∼12) essentially precludes measurement by titration as proteins will denature at such a high basic pH.
Table 1

Database summary

Database entries1401
Proteins
    Total163
    PDB structures146
    Sequences115
    Enzymes49
Experiments
    Technique13C*1H*15N*2D*RS
    Entries2357804611256
Journals189

RS = Raman Difference Spectroscopy and * = NMR spectroscopy.

Cross-references to key external databases are also included. These provide links to the protein sequence, using NCBI Entrez-Protein, and any relevant protein structure in the PDB (20). If applicable, the enzyme classification is also given, with links to the Enzyme Nomenclature and Classification Database, developed in line with the International Union of Biochemistry and Molecular Biology (21), providing details of the enzyme reactions. In addition, a link is given to the original literature reference via the NCBI PubMed journals database. These links provide key background knowledge associated with each archived protein. A full description of the database fields is given in Table 2.
Table 2

Content of the database entries

Entry fieldDescription
ProteinStates the relevant protein and provides a link to NCBI Entrez-Protein sequence
PDBStates the proteins PDB identification and provides a link to the structure
ECThe Enzymes Commissions identification and provides a link to the external database
SpeciesSpecies in which the protein is found
Protein descriptionGives the basic function of the protein
Amino acidThe amino acid to which the pKa refers
ResidueThe residue number to which the pKa refers
pKapKa value for the corresponding residue
MethodExperiment techniques used to obtain data, e.g. NMR
TemperatureTemperature at which the experiment was carried out
pHRange or fixed pH at which the experiment were carried out
ConditionsConcentrations of substances used in the experiment
Unit intervalsIntervals at which recordings were taken (pH units)
ReferenceFull literature reference with link to the PubMed database
The ability to carry out accurate predictions of pKa values depends on having access to a high quality source of data; a principal aim of PPD is to provide such a source. Only experimentally determined pKa values are cited in PPD; predicted pKa values are not included. The quality of data contained in PPD v1.0 is largely dependent upon the accuracy of each experimental determination, thus it contains only values from certain selected techniques: NMR spectroscopy, Raman Difference spectroscopy and UV spectroscopy. Protein pKa values are dependent on both intrinsic and extrinsic factors. Intrinsic factors include invariant properties of the protein investigated, such as sequence and structure. Extrinsic factors include the experimental conditions used, such as the temperature, the range of pH tested, protein concentrations as well as the experimental method. Thus we attempt to record all relevant experimental conditions when available. As logistic considerations preclude us from undertaking independent verification of the data, we are obliged to trust the values reported in the literature. It should be noted that the phenomenon of cooperative deprotonation can create circumstances under which pKa values can not be used as a parameter that describes the ionization behaviour of the corresponding group (22–24).

SEARCHING THE DATABASE

Two methods to search PPD are available: an amino acid query-based interface (Figure 1) and a BLAST (25) interface. The implementation of a bespoke search system allows the user to perform extensive or focussed searches from a single user interface. The simplest search, using the amino acid query interface, would specify one amino acid residue only. A complex search would accommodate up to four amino acids and pKa ranges, along with experimental method, protein name and species. The search engine allows the choice of how results are presented. The default option returns amino acids and their associated properties (Figure 1B); while the second option returns proteins which contain the specified amino acids (Figure 1C).
Figure 1

Overview of the amino acid query search. The amino acid nominations are entered in (A). (B) shows the default result presentation, from which the pKa data (D) for the specified residues can be accessed. (C) shows the alternative presentation, with the display of proteins containing the nominated amino acid(s).

The alternative search interface is based on BLAST (25). A local database of protein sequences found in PPD was compiled from SwissProt (26) and an additional postgreSQL table was created to hold this data. The local database is searched using the NCBI BLASTP and BLASTX programs (25), allowing input of either protein or nucleotide sequences. The HTML front-end connects to a web server-based PL/CGI script which interacts with the BLASTP or BLASTX programs. The output contains links to PPD entries, which are created using SwissProt (26) accession codes.

FUTURE WORK

There is an obvious need to extend the number of entries through continuous addition of data from new, and newly-identified, publications. The database also needs to be maintained, ensuring links to external databases remain current. Initially, as with all databases, random errors will occur owing to human error during data acquisition or will be extant within the original experimental data. The database will be assessed for errors and inconsistencies, thus maintaining, as far as possible, the overall veracity of our data. As mentioned, we have tried to maintain a high degree of accuracy, through rigorous data selection; however, user feedback will foment improvements. Moreover, feedback focussing on the search interfaces and the general infrastructure will allow us to develop appropriately both the database and its interface in an efficient and ergonomic manner.

DISCUSSION AND CONCLUSIONS

The PPD is a unique compilation of protein pKa values sourced from experimental data only. PPD is novel: no database of its kind currently exists. Compared with other post-genomic databases, the size of PPD is limited, but this reflects its highly focused nature: the burgeoning of such focussed databases is a continuing trend in modern bioinformatics (27,28). The relatively modest size of the database will increase as new data is published. Access to PPD data is given through an interface available via the world wide web and includes both a BLAST search and an amino acid query search system. The BLAST search, which is linked to pKa entries and external databases, allows PPD to be a cohesive and integrated source of protein information. PPD facilitates data-driven in silico prediction methods addressing the relationship between ionizable groups and protein function, be that protein–protein interaction, protein folding or enzyme catalysis. A brief summary of pKa data for each amino acid is shown in Table 3, which also includes both the mean and SD of the corresponding measured pKa values. From the PPD data, we have shown the distribution of pKa values for the six most frequent residues: glutamic acid, lysine, tyrosine, aspartic acid, histidine and cysteine (Figure 2). Certain residues (aspartate, glutamate, lysine and histidine) have pKa values which show relatively narrow distributions, while other residues (cysteine and tyrosine) show a wider dispersion of values; however, this may only be a reflection of the amount of data available for these residues. While it is clear that mean values approximate closely model values, the corresponding SDs are high, reflecting the wide distribution of ionization states in actual proteins. Aspartate, for example, has a mean pKa of 3.6 versus a model value of 4.0, yet the SD is 1.4. As the data for each residue increases, trends in residue-specific pKa data will become more evident and more certain.
Table 3

pKa data associated with each amino acid

Amino acids
    ResidueAspCysGluHisLysTyrN-terminusC-terminus
    Number of entries28225297404207652638
    Mean pKa3.66.874.296.3310.459.618.713.19
    SD1.432.611.051.351.192.161.490.76
Figure 2

Distribution pattern of pKa values. Each column represents a count of pKa values for the specified amino acid and pKa.

In recent years, there has been an impetus to accumulate data on all scales from the atomic to the genomic; this has led to a rapid increase in the number of databases. Databases are increasingly forming the backbone of science in general and post-genomic biology in particular. PPD v1.0 was developed to provide an easily accessible compilation of protein pKa values. Despite the small size of PPD, the data it contains has utility throughout many different disciplines and, we may hope, the database will grow, through time, into a comprehensive protein pKa resource.
  27 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The ionization of a buried glutamic acid is thermodynamically linked to the stability of Leishmania mexicana triose phosphate isomerase.

Authors:  A M Lambeir; J Backmann; J Ruiz-Sanz; V Filimonov; J E Nielsen; I Kursula; B V Norledge; R K Wierenga
Journal:  Eur J Biochem       Date:  2000-05

3.  Insensitivity of perturbed carboxyl pK(a) values in the ovomucoid third domain to charge replacement at a neighboring residue.

Authors:  W R Forsyth; A D Robertson
Journal:  Biochemistry       Date:  2000-07-11       Impact factor: 3.162

4.  Electrostatic contributions to protein-protein interactions: fast energetic filters for docking and their physical basis.

Authors:  R Norel; F Sheinerman; D Petrey; B Honig
Journal:  Protein Sci       Date:  2001-11       Impact factor: 6.725

5.  The pKa of His-24 in the folding transition state of apomyoglobin.

Authors:  M Jamin; B Geierstanger; R L Baldwin
Journal:  Proc Natl Acad Sci U S A       Date:  2001-05-15       Impact factor: 11.205

6.  Mechanistic roles of Thr134, Tyr160, and Lys 164 in the reaction catalyzed by dTDP-glucose 4,6-dehydratase.

Authors:  B Gerratana; W W Cleland; P A Frey
Journal:  Biochemistry       Date:  2001-08-07       Impact factor: 3.162

7.  Continuum electrostatic analysis of irregular ionization and proton allocation in proteins.

Authors:  Assen Koumanov; Heinz Rüterjans; Andrey Karshikoff
Journal:  Proteins       Date:  2002-01-01

8.  Ionization properties of titratable groups in ribonuclease T1. I. pKa values in the native state determined by two-dimensional heteronuclear NMR spectroscopy.

Authors:  N Spitzner; F Löhr; S Pfeiffer; A Koumanov; A Karshikoff; H Rüterjans
Journal:  Eur Biophys J       Date:  2001-07       Impact factor: 1.733

9.  Ionization properties of titratable groups in ribonuclease T1. II. Electrostatic analysis.

Authors:  A Koumanov; N Spitzner; H Rüterjans; A Karshikoff
Journal:  Eur Biophys J       Date:  2001-07       Impact factor: 1.733

10.  Empirical relationships between protein structure and carboxyl pKa values in proteins.

Authors:  William R Forsyth; Jan M Antosiewicz; Andrew D Robertson
Journal:  Proteins       Date:  2002-08-01
View more
  20 in total

1.  Gut pH as a limiting factor for digestive proteolysis in cultured juveniles of the gilthead sea bream (Sparus aurata).

Authors:  Lorenzo Márquez; Rocío Robles; Gabriel A Morales; Francisco J Moyano
Journal:  Fish Physiol Biochem       Date:  2011-11-16       Impact factor: 2.794

2.  A collaborative environment for developing and validating predictive tools for protein biophysical characteristics.

Authors:  Michael A Johnston; Damien Farrell; Jens Erik Nielsen
Journal:  J Comput Aided Mol Des       Date:  2012-04-04       Impact factor: 3.686

3.  Highly perturbed pKa values in the unfolded state of hen egg white lysozyme.

Authors:  John Bradley; Fergal O'Meara; Damien Farrell; Jens Erik Nielsen
Journal:  Biophys J       Date:  2012-04-03       Impact factor: 4.033

4.  Calculating pKa values in the cAMP-dependent protein kinase: the effect of conformational change and ligand binding.

Authors:  Una Bjarnadottir; Jens Erik Nielsen
Journal:  Protein Sci       Date:  2010-12       Impact factor: 6.725

5.  DelPhiPKa web server: predicting pKa of proteins, RNAs and DNAs.

Authors:  Lin Wang; Min Zhang; Emil Alexov
Journal:  Bioinformatics       Date:  2015-10-29       Impact factor: 6.937

6.  A fast and accurate computational approach to protein ionization.

Authors:  Velin Z Spassov; Lisa Yan
Journal:  Protein Sci       Date:  2008-08-19       Impact factor: 6.725

7.  On the development of protein pKa calculation algorithms.

Authors:  Tommy Carstensen; Damien Farrell; Yong Huang; Nathan A Baker; Jens Erik Nielsen
Journal:  Proteins       Date:  2011-07-08

8.  aKMT Catalyzes Extensive Protein Lysine Methylation in the Hyperthermophilic Archaeon Sulfolobus islandicus but is Dispensable for the Growth of the Organism.

Authors:  Yindi Chu; Yanping Zhu; Yuling Chen; Wei Li; Zhenfeng Zhang; Di Liu; Tongkun Wang; Juncai Ma; Haiteng Deng; Zhi-Jie Liu; Songying Ouyang; Li Huang
Journal:  Mol Cell Proteomics       Date:  2016-06-21       Impact factor: 5.911

9.  Capturing, sharing and analysing biophysical data from protein engineering and protein characterization studies.

Authors:  Damien Farrell; Fergal O'Meara; Michael Johnston; John Bradley; Chresten R Søndergaard; Nikolaj Georgi; Helen Webb; Barbara Mary Tynan-Connolly; Una Bjarnadottir; Tommy Carstensen; Jens Erik Nielsen
Journal:  Nucleic Acids Res       Date:  2010-08-19       Impact factor: 16.971

10.  MCCE2: improving protein pKa calculations with extensive side chain rotamer sampling.

Authors:  Yifan Song; Junjun Mao; M R Gunner
Journal:  J Comput Chem       Date:  2009-11-15       Impact factor: 3.376

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.