Literature DB >> 27613420

PCDDB: new developments at the Protein Circular Dichroism Data Bank.

Lee Whitmore1, Andrew John Miles1, Lazaros Mavridis2, Robert W Janes3, B A Wallace4.   

Abstract

The Protein Circular Dichroism Data Bank (PCDDB) has been in operation for more than 5 years as a public repository for archiving circular dichroism spectroscopic data and associated bioinformatics and experimental metadata. Since its inception, many improvements and new developments have been made in data display, searching algorithms, data formats, data content, auxillary information, and validation techniques, as well as, of course, an increase in the number of holdings. It provides a site (http://pcddb.cryst.bbk.ac.uk) for authors to deposit experimental data as well as detailed information on methods and calculations associated with published work. It also includes links for each entry to bioinformatics databases. The data are freely available to accessors either as single files or as complete data bank downloads. The PCDDB has found broad usage by the structural biology, bioinformatics, analytical and pharmaceutical communities, and has formed the basis for new software and methods developments.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27613420      PMCID: PMC5210608          DOI: 10.1093/nar/gkw796

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The Protein Circular Dichroism Data Bank (PCDDB) (1) is an open access data bank for the deposition and dissemination of circular dichroism (CD) spectra and synchrotron radiation circular dichroism (SRCD) spectra and metadata. The data bank was created in 2009 (first as an accession-only data bank, and later as a deposition data bank), and has been in continuous use since then. Accessors of the data do not have to register to access or download files, although depositors must register (as in other deposition data banks) to ensure good practice and traceability of the data. Any user can make an account for the purposes of logging repeat searches, saving subsets of files, or receiving notification of content enhancements. The PCDDB entries include spectral data (including both raw and final processed CD spectra, and associated HT/HV curves (which are, effectively, pseudo-absorbance spectra)), detailed information on the sample and experimental conditions used to obtain the spectra, and links to related bioinformatics databases, including the Protein Data Bank (PDB) (2), the UniProt sequence database (3), the CATH protein structure classification system (4), and the Enzyme Classification (EC) database (5), where available. Plots of the spectra (Figure 1) are available for display online. Entries may also include annotations in the ‘keywords’ which can provide useful identifiers; for example, ‘SMP180’ which indicates the spectrum is a component of the membrane protein reference dataset (6) used in empirical secondary structure analyses. The accession ID system allows for grouping of related spectra, such as thermal melt series of spectra undertaken for stability studies. Entries include information on the associated publication describing the work (citation to the original work is required of any user of the data). Each entry includes a validation section, as a guide to the data quality for both the depositor and user. The validation takes the form of a series of individual tests on the data and metadata, as well as providing an overall validation level. Individual entries can be downloaded in several ASCII formats, or the entire contents of the database can be downloaded as a single compressed archive file. The entries now include collections of proteins that cover both a wide range of secondary structures and folds (7), as well as more specialised types (such as membrane proteins (6) and beta-sheet rich proteins (8)), in addition to individual spectra of folded and unfolded proteins.
Figure 1.

Plots of overlaid selected circular dichroism spectra from the PCDDB, including (upper right hand corner) their PCDDB ID codes.

Plots of overlaid selected circular dichroism spectra from the PCDDB, including (upper right hand corner) their PCDDB ID codes. A number of individual publications (Biochemistry, Biophysical Journal) and publishers (Nature Publishing Group, and PLOS journals), research councils and funding agencies (i.e., http://www.bbsrc.ac.uk/funding/apply/application-guidance/justification-resources/resources/) suggest that PCDDB depositions be made in association with their publications or funding. The PCDDB is an approved component member of biosharing.org (https://biosharing.org/biodbcore-000613). The records within the data bank include a high proportion of SRCD spectra, which has been the direct consequence of SRCD beamline scientists (and synchrotrons) encouraging their users to deposit data (9). It is noteworthy that the PCDDB remains the only publically-available database for protein circular dichroism spectra.

NEW FEATURES

The features of the PCDDB were described in the initial report of its release (1). Since that time, a number of new or modified features, procedures, and associated materials have been developed based on user and developer suggestions, in order to make the resource more complete, user-friendly, and informative. These new features are described below.

Entry naming formats

The naming convention for PCDDB entries has been updated (on the basis of a recommendation from the International Scientific Advisory Board): Entries now begin with the initial letters CD (i.e. CDxxxxxxxxxx), to signify being associated with this data bank. The letters are followed by 10 digits which can be thought of as, sequentially, a seven-digit main code, a one-digit revision number and a two-digit series number. Grouped spectra therefore (e.g. a thermal melt series), may all have the same seven-digit main code and one-digit revision number, but sequential two-digit series numbers, so they are recognisable as part of related experiments. Old codes are aliased to these new ones in the searching facility.

New database fields

Additional database fields have been added: the sample supplier (person or company), mutation details (compared to wildtype), continuous or stepped scan (a relevant experimental detail), and final spectrum calibrated (to determine what processing has been applied to the spectra). These fields were added to improve data traceability and enable reproducibility. As both conventional CD and SRCD spectra may be deposited, additional fields specific for SRCD data collection have also been added.

Deposition improvements

Depositors are now sent notifications of release, and periodic reminders of entries that are still in pre-release. If depositions are released prior to publication, updates can be made to the entry to note the citing publication by contacting the PCDDB via email (PCDDB@mail.cryst.bbk.ac.uk). To improve the ease (and time required) for depositing multiple similar files, depositors may now use previous entries as templates (to be modified rather than re-created). To aid new depositors, tutorials about deposition, both online – (http://www.youtube.com/user/ThePcddb) and in print (10), have been created.

Validation

Each entry includes an online validation report, and following experience gained in analysing depositions, the validation procedures have been improved and extended. A validation report is provided to the depositor prior to the release of the entry so that they may assess the quality and completeness of their deposition, and make any changes/additions they deem necessary prior to release. Validation reports include flag (minor issues) or fail (major issues) tags for individual items and overall, and a completeness label. The PCDDB also provides the facility for publication reviewers (with permission) to anonymously access unreleased entries as a guide when considering a paper for publication. Alternatively, depositors can provide the validation report in .pdf format that was created as part of the deposition process to journals as additional data for reviewing purposes.

Enhanced searching

The database metadata can be searched using a number of parameters [Table 1 lists the current/new search terms and parameters] including protein names, source organism, a range of experimental parameters, bioinformatics information (including PDBIDs, Uniprot codes, EC number, and CATH class), with the option of excluding/including entries that have ‘failed’ or ‘flagged’ validation status. It can now also be searched for proteins with specific ranges of secondary structure types (as an example: >50% helix). Keywords can be searched for protein type (e.g., membrane protein), and whether the entry is one included in a CD analysis reference database (such as SP175 (7) or SMP180 (6)). Additionally, a novel development for searching the spectral data using a spectrum as the input has been added; this enables identification of related spectra (something which may have value in studies seeking to identify structurally-related proteins). Performing spectral matching relevant to circular dichroism is not necessarily as straight forward as ranking the potential matches by minimal RMSD differences, so several search methods are enabled, following the concepts previously described for the DichroMatch web server (11). Finally, the database may now be queried by deposition date (before, after, or between specific dates), which can be helpful in enabling accessors to identify new entries.
Table 1.

Popular search parameters: (all text fields support * wildcards and most values include options for greater than, less than, or between)

Identity:
PCDDBID
Deposition date (YYYYMMDD)
Protein name
Sample characteristics:
Source organism
Expression system
Molecular weight (Da)
Spectral/experimental characteristics:
Minimum wavelength (nm)
Protein concentration (min or max), mg/ml
Instrument/synchrotron
Temperature, °C
Protein purity, %
Sample secondary structures:
Alpha helix (%)
3–10 helix (%)
Pi helix (%)
Beta strand (%)
Beta bridge (%)
Hydrogen-bonded bend (%)
Hydrogen-bonded turn (%)
Irregular (%)
Bioinformatics information:
Keyword/phrase
PDB ID
UniProt ID
Enzyme Classification (EC) number
CATH classification
Deposition information:
Depositor/principal investigator name
Depositor address
Utility:
Show all entries
Show all entries with a PDB record

Enhanced operations on search results and enhanced plots

Search results can now be selected for inclusion in subsets via checkboxes. These subsets can be downloaded, and if the user is logged in (an option for accessors), they can be saved as lists for future use. Whilst online plots are not intended to be of publication quality since the spectral files can be downloaded by users for production purposes, it was recognised that the ability to facilely compare spectra of several proteins online was important. Hence, plotting spectra of multiple entries in the subset list on a single figure is now possible (Figure 1).

Associated tools/software

The main page of the site includes links to associated analytical tools such as the DichroWeb (12,13), 2Struc (14) and DichroMatch (11) analysis websites. There is also a link to the ValiDichro standalone data testing website (http://valispec.cryst.bbk.ac.uk/circularDichroism/ValiDichro/upload.html) (15), which enables users to perform validation analyses equivalent to the data checking software utilised in the curation of PCDDB entries (but in this case, not requiring deposition), as an aid to experimental good practice.

Associated YouTube informational videos on the ‘PCCDB Channel’

Instructional videos have been created to provide additional related information for users of the PCCDB. The channel may be accessed from the PCDDB home page or from the YouTube icon on the footer of all pages. The videos now include the following topics: An Introduction to the PCDDB, and information on how to make depositions (https://www.youtube.com/watch?v=NTblyIhwjog), information on analysing CD spectra using the DichroWeb server (https://www.youtube.com/watch?v=QZat_Wr2NGM), how to calibrate CD spectra (https://www.youtube.com/watch?v=ovY6yVxw-tI), how to calibrate CD instruments (https://www.youtube.com/watch?v=PEIDelWvSsg), how to calibrate the pathlengths of CD sample cells (https://www.youtube.com/watch?v=PEIDelWvSsg), and how to clean and load CD cells (https://www.youtube.com/watch?v=OhD50eiLzWI). Users are invited to submit suggestions for additional informational videos (to PCDDB@mail.cryst.bbk.ac.uk).

Information pages

A glossary of terms commonly used in the field has been added and may be accessed via the homepage. Software updates are noted on the ‘Version History’ page (http://pcddb.cryst.bbk.ac.uk/verhist.php), that is part of the ‘about’ section, which may be of particular use to software developers that access this database programmatically.

Spectrum of the Month feature

To recognise the contributions of our depositors, a selected ‘Featured Spectrum of the Month’ is displayed on the front page, with links to its entry. Featured spectra include both new and existing entries, along with graphic illustrations of their crystal structures (when available) from associated PDB codes. The selected spectra are chosen based on their having interesting spectral features that may be novel or representative of new classes of entries, or when a protein is highly accessed due to popular interest in its structure.

Bulletins

Below the ‘Featured Spectrum’ notices of relevant meetings and workshops are posted (suggestions from external users may be offered by contacting the PCDDB developer). Software and database holding updates are listed on the left-hand ‘Information’ panel.

USES

As a resource for structural biology, biochemistry and bioinformatics, the PCDDB has been accessed both by users obtaining individual spectra for specific applications and also as complete (or selected component) downloads. A number of the types of applications for the PCDDB data that were envisioned in the initial report describing the development of the PCDDB (1) have now been realised. Individual files have been used for biochemical/structural biology studies of specific and related proteins, including comparisons with spectra of other proteins of known structure (16,17), comparisons of environmental effects on protein structure, where the structure of a given protein is known only in a single environment (crystal) (18), and using spectra and calculated secondary structure (and also thermal melt profiles) to identify the structure and stability of a protein as an aid to crystallisation studies (19). As complete or selected subsets of downloads, they have been used for new method developments (both bioinformatics and physical methods), including: developing and/or testing new algorithms for secondary structure determination (8,20–22), using the ratio of the 222 and 200 nm peaks as a measure of identifying folded vs. unfolded proteins (23), proposing a new method for determining membrane protein helical content from the spectral slope between 230–240 nm (24), and testing of calculations of vibronic structure contributions to far UV CD spectra (25). Based on the large number of downloads that have been undertaken thus far, it is expected that many additional uses will be identified by users in the future.

FUTURE DEVELOPMENTS

Our aim is to expand the PCDDB holdings beyond traditional solution circular dichroism spectroscopy on proteins, by including entries in the rapidly growing spectral areas of oriented CD (oCD) and oriented SRCD (oSRCD) (26). The data bank will also be expanded to include other sample types, including nucleic acids (both RNAs and DNAs) and peptides. These will require the addition of new fields in the spectral parameters, different fields in the sample characteristics and bioinformatics links, and different validation procedures (likely including machine learning to identify parameter outliers). As a result we will be adding new experts in these areas to our advisory boards.

CONCLUSIONS

The PCDDB is an ongoing resource for the deposition and dissemination of CD spectral data and associated metadata of proteins. Its entries are linked to a range of other bioinformatics resources such as the Protein Data Bank, the CATH protein classification database, the EC database, and sequence databases. New developments are constantly being added (often in response to users’ requests made via the contact email PCDDB@mail.cryst.bbk.ac.uk). It has had >1 million files downloaded, and has been accessed by >10,000 unique users in the ∼5 years since its inception, and has been used for individual structural biology/biochemistry studies as well as for new bioinformatics methods developments.
  25 in total

1.  Thermodynamic dissection of the intrinsically disordered N-terminal domain of human glucocorticoid receptor.

Authors:  Jing Li; Hesam N Motlagh; Carolyn Chakuroff; E Brad Thompson; Vincent J Hilser
Journal:  J Biol Chem       Date:  2012-06-04       Impact factor: 5.157

2.  IntEnz, the integrated relational enzyme database.

Authors:  Astrid Fleischmann; Michael Darsow; Kirill Degtyarenko; Wolfgang Fleischmann; Sinéad Boyce; Kristian B Axelsen; Amos Bairoch; Dietmar Schomburg; Keith F Tipton; Rolf Apweiler
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  DICHROWEB, an online server for protein secondary structure analyses from circular dichroism spectroscopic data.

Authors:  Lee Whitmore; B A Wallace
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

4.  Methamphetamine binds to α-synuclein and causes a conformational change which can be detected by nanopore analysis.

Authors:  Omid Tavassoly; Jeremy S Lee
Journal:  FEBS Lett       Date:  2012-07-04       Impact factor: 4.124

5.  Accurate secondary structure prediction and fold recognition for circular dichroism spectroscopy.

Authors:  András Micsonai; Frank Wien; Linda Kernya; Young-Ho Lee; Yuji Goto; Matthieu Réfrégiers; József Kardos
Journal:  Proc Natl Acad Sci U S A       Date:  2015-06-02       Impact factor: 11.205

6.  CAPITO--a web server-based analysis and plotting tool for circular dichroism data.

Authors:  Christoph Wiedemann; Peter Bellstedt; Matthias Görlach
Journal:  Bioinformatics       Date:  2013-05-15       Impact factor: 6.937

7.  Supersaturation-limited amyloid fibrillation of insulin revealed by ultrasonication.

Authors:  Hiroya Muta; Young-Ho Lee; József Kardos; Yuxi Lin; Hisashi Yagi; Yuji Goto
Journal:  J Biol Chem       Date:  2014-05-20       Impact factor: 5.157

8.  Vibronic structure in the far-UV electronic circular dichroism spectra of proteins.

Authors:  Zhuo Li; David Robinson; Jonathan D Hirst
Journal:  Faraday Discuss       Date:  2015       Impact factor: 4.008

9.  PCDDB: the Protein Circular Dichroism Data Bank, a repository for circular dichroism spectral and metadata.

Authors:  Lee Whitmore; Benjamin Woollett; Andrew John Miles; D P Klose; Robert W Janes; B A Wallace
Journal:  Nucleic Acids Res       Date:  2010-11-11       Impact factor: 16.971

10.  ValiDichro: a website for validating and quality control of protein circular dichroism spectra.

Authors:  Benjamin Woollett; Lee Whitmore; Robert W Janes; B A Wallace
Journal:  Nucleic Acids Res       Date:  2013-04-26       Impact factor: 16.971

View more
  20 in total

1.  Non-covalent Encapsulation of siRNA with Cell-Penetrating Peptides.

Authors:  Martina Tuttolomondo; Henrik J Ditzel
Journal:  Methods Mol Biol       Date:  2021

2.  The spliceosomal proteins PPIH and PRPF4 exhibit bi-partite binding.

Authors:  Caroline Rajiv; S RaElle Jackson; Simon Cocklin; Elan Z Eisenmesser; Tara L Davis
Journal:  Biochem J       Date:  2017-10-25       Impact factor: 3.857

3.  Integration of Experimental Data and Use of Automated Fitting Methods in Developing Protein Force Fields.

Authors:  Marcelo D Polêto; Justin A Lemkul
Journal:  Commun Chem       Date:  2022-03-18

4.  Disordered-Ordered Protein Binary Classification by Circular Dichroism Spectroscopy.

Authors:  András Micsonai; Éva Moussong; Nikoletta Murvai; Ágnes Tantos; Orsolya Tőke; Matthieu Réfrégiers; Frank Wien; József Kardos
Journal:  Front Mol Biosci       Date:  2022-05-03

5.  Valproic acid interactions with the NavMs voltage-gated sodium channel.

Authors:  Geancarlo Zanatta; Altin Sula; Andrew J Miles; Leo C T Ng; Rubben Torella; David C Pryde; Paul G DeCaen; B A Wallace
Journal:  Proc Natl Acad Sci U S A       Date:  2019-12-10       Impact factor: 11.205

6.  A study on the secondary structure of the metalloregulatory protein CueR: effect of pH, metal ions and DNA.

Authors:  Ria K Balogh; Eszter Németh; Nykola C Jones; Søren Vrønning Hoffmann; Attila Jancsó; Béla Gyurcsik
Journal:  Eur Biophys J       Date:  2021-04-28       Impact factor: 1.733

7.  DichroMatch at the protein circular dichroism data bank (DM@PCDDB): A web-based tool for identifying protein nearest neighbors using circular dichroism spectroscopy.

Authors:  Lee Whitmore; Lazaros Mavridis; B A Wallace; Robert W Janes
Journal:  Protein Sci       Date:  2017-10-25       Impact factor: 6.725

8.  PDB2CD visualises dynamics within protein structures.

Authors:  Robert W Janes
Journal:  Eur Biophys J       Date:  2017-04-03       Impact factor: 1.733

9.  PDBMD2CD: providing predicted protein circular dichroism spectra from multiple molecular dynamics-generated protein structures.

Authors:  Elliot D Drew; Robert W Janes
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

10.  The intrinsically disordered Tarp protein from chlamydia binds actin with a partially preformed helix.

Authors:  James Tolchard; Samuel J Walpole; Andrew J Miles; Robin Maytum; Lawrence A Eaglen; Ted Hackstadt; B A Wallace; Tharin M A Blumenschein
Journal:  Sci Rep       Date:  2018-01-31       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.