Literature DB >> 17142228

The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data.

Helen Berman¹, Kim Henrick, Haruki Nakamura, John L Markley.

Abstract

The worldwide Protein Data Bank (wwPDB) is the international collaboration that manages the deposition, processing and distribution of the PDB archive. The online PDB archive is a repository for the coordinates and related information for more than 38 000 structures, including proteins, nucleic acids and large macromolecular complexes that have been determined using X-ray crystallography, NMR and electron microscopy techniques. The founding members of the wwPDB are RCSB PDB (USA), MSD-EBI (Europe) and PDBj (Japan) [H.M. Berman, K. Henrick and H. Nakamura (2003) Nature Struct. Biol., 10, 980]. The BMRB group (USA) joined the wwPDB in 2006. The mission of the wwPDB is to maintain a single archive of macromolecular structural data that are freely and publicly available to the global community. Additionally, the wwPDB provides a variety of services to a broad community of users. The wwPDB website at http://www.wwpdb.org/ provides information about services provided by the individual member organizations and about projects undertaken by the wwPDB.

Entities: Chemical Disease Gene

Mesh：

Substances：

Year: 2006 PMID： 17142228 PMCID： PMC1669775 DOI： 10.1093/nar/gkl971

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

HISTORY AND BACKGROUND

The Protein Data Bank (PDB) was founded in 1971 to provide a repository for three-dimensional (3D) structure data of experimentally determined biological macromolecules (1–3). The PDB archive contains 3D coordinate data, information about the chemical content such as polymer sequence and ligand chemistry, information about the experiment used to derive the structure and some qualitative descriptions of the structure. When the PDB was in its infancy, the archive contained seven structures composed of loosely structured free text. Today, the PDB archive contains close to 40 000 structures and relies upon strict ontologies that define the content of these entries. The data contained in the PDB are generated and submitted by scientists from around the globe to sites in the United States, Europe and Asia. The worldwide PDB (wwPDB) was established in 2003 to formally recognize the international nature of the PDB archive (2,4) and to ensure that the data files remain uniform in content and format. The founding members are the RCSB PDB (USA) (1), the Macromolecular Structure Database at the European Bioinformatics Institute (MSD-EBI) (5) and the Protein Data Bank Japan (PDBj) at Osaka University. These wwPDB sites share responsibilities in data deposition, processing and distribution of the PDB archive, and agree to support a single, standardized archive of structural data (Table 1). The BioMagResBank (BMRB) at the University of Wisconsin-Madison (USA) (6) became a member in 2006 and will be a deposition site for primary experimental data and PDB data.

Table 1

wwPDB Data deposition and access sites

	Access PDB FTP	Deposit data	Main website
RCSB PDB
MSD-EBI
PDBj
BMRB

wwPDB Data deposition and access sites A wwPDB Advisory Committee (wwPDBAC) consists of representatives appointed by each member site as well as representatives of the international X-ray, NMR and electron microscopy (EM) communities. wwPDBAC meets yearly and provides advice about policies governing the content, format and distribution of the PDB data files. The website () contains the formal agreement for the operation of the wwPDB organization, links to the deposition and access sites, and news and announcements about policies and projects related to the wwPDB.

MEMBER DEPOSITION SITES

The advances in protein cloning, expression, labeling, purification through to structure determination has resulted in a rapid increase in the rate at which new protein structures are determined. Progress is also being made in structure determinations of nucleic acids, particularly RNA molecules. A key component of the wwPDB is that its tools are able to efficiently capture and curate data as the amount deposited grows exponentially (Table 1). Although the sites are physically dispersed and use three different tools for data capture and processing (ADIT, ADIT-NMR and AutoDep), all the data are annotated and processed using common standards. To ensure that the core data are represented uniformly, the wwPDB sites actively collaborate to exchange core reference information (e.g. the dictionary description for ligands) and to ensure that standard practices are followed. The annotators at all sites maintain daily communication via video teleconferencing, exchange visits and email; they are currently extending and updating the annotation manuals that will be made publicly available. Every week, the data processed at each site are forwarded to the RCSB PDB for inclusion in the archive. At present, the RCSB PDB is the archive keeper and as such has sole write access to the PDB archive. Statistics about the PDB structures deposited and processed by the wwPDB are available from (Tables 2 and 3).

Table 2

PDB structures deposited and processed by year and site (as of August 28, 2006)

Year	Total depositions	Deposited to			Processed by
		RCSB PDB	PDBj	EBI	RCSB PDB	PDBj	EBI
2000	2983	2445	10	528	2294	161	528
2001	3286	2673	118	495	2407	384	495
2002	3563	2769	289	505	2401	657	505
2003	4830	3488	673	669	3135	1026	669
2004	5508	3796	900	812	3083	1613	812
2005	6677	4506	1166	1005	3562	2110	1005
2006	4728	3239	725	764	2659	1305	764
Total	31 575	22 916	3881	4778	19 545	7252	4778

Table 3

PDB structures released per year (experimentally solved structures only, as of August 28, 2006)

Year	Total
2000	2632
2001	2840
2002	3018
2003	4185
2004	5230
2005	5421
2006	4154
Total	27 480

PDB structures deposited and processed by year and site (as of August 28, 2006) PDB structures released per year (experimentally solved structures only, as of August 28, 2006)

DATA ACCESS: MEMBER FTP AND WEBSITES

The ‘PDB archive’ is the collection of flat files that are maintained in three different formats: the legacy PDB file format; the PDB exchange format that follows the mmCIF syntax (); and the PDBML/XML format (7) that is a direct translation of the PDB exchange format. Each wwPDB site distributes the same PDB archive via FTP. The archive is updated weekly. Time-stamped snapshots of the PDB archive are added each year to . They provide a frozen copy of the archive as it appeared at that time for research and historical purposes. The most recent snapshot was added in January 2006. It includes the 34 421 experimentally determined coordinate files that were current (i.e. not obsolete) as of January 3, 2006, and the directory containing the frozen content as of January 6, 2005. Scripts are available to download all, or part, of a snapshot automatically. In addition to providing access to the PDB archive, each wwPDB site provides databases and websites that provide different views and analyses of the structural data contained within the PDB archive (8–14).

DATA UNIFORMITY

wwPDB members collaborate to ensure the uniformity of the PDB archive. The PDB Exchange Dictionary consolidates content from a variety of dictionaries and includes extensions to describe NMR, EM and protein production data (15). wwPDB data processing, exchange and annotation depend upon this dictionary and the mmCIF format (16) to help make the data more consistent across the archive. In the past, query across the complete PDB archive has been limited by missing, erroneous and inconsistently reported data, nomenclature and functional annotation. The evolution of experimental methods, functional knowledge of proteins and methods used to process these data has introduced various inconsistencies into the PDB archive and has inspired different versions of the PDB format. Over the years, the MSD-EBI, PDBj and the RCSB PDB have been working individually on correcting errors in the archive. Under the wwPDB banner, these groups are now working to integrate all remediation efforts into a single consistent collection of data files. This work includes improving the representation of PDB small molecule data, assessing the required chemical definitions and their correspondences in PDB entries, resolving any remaining differences in the macromolecular sequences assigned by each group and resolving differences in primary citation assignments. The BMRB has been collaborating with MSD-EBI and RCSB PDB on standardizing restraint data associated with PDB depositions (17,18). The remediated data (PDB V.2) will be made available for public review in 2007 and will form the basis of the wwPDB websites. The data released before remediation (PDB V.1) will continue to be available for the historical record.

PHASING OUT THEORETICAL MODEL DEPOSITIONS TO THE PDB ARCHIVE

Effective October 15, 2006, PDB depositions were restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules. This policy was recommended and endorsed by a working group composed of structural and computational biologists and endorsed by the wwPDB Advisory Committee. Thus, theoretical model depositions (such as models determined purely in silico using, for example, homology or ab initio methods) will no longer be accepted.

NEWS AND ANNOUNCEMENTS

The News sections of the wwPDB website gives information about the outcome of the wwPDBAC meetings and policy statements affecting the PDB data files. A recent example is the announcement of the policy for the archiving of in silico models (19).

16 in total

1. BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures.

Authors: Jurgen F Doreleijers; Aart J Nederveen; Wim Vranken; Jundong Lin; Alexandre M J J Bonvin; Robert Kaptein; John L Markley; Eldon L Ulrich
Journal: J Biomol NMR Date: 2005-05 Impact factor: 2.835

2. Outcome of a workshop on archiving structural models of biological macromolecules.

Authors: Helen M Berman; Stephen K Burley; Wah Chiu; Andrej Sali; Alexei Adzhubei; Philip E Bourne; Stephen H Bryant; Roland L Dunbrack; Krzysztof Fidelis; Joachim Frank; Adam Godzik; Kim Henrick; Andrzej Joachimiak; Bernard Heymann; David Jones; John L Markley; John Moult; Gaetano T Montelione; Christine Orengo; Michael G Rossmann; Burkhard Rost; Helen Saibil; Torsten Schwede; Daron M Standley; John D Westbrook
Journal: Structure Date: 2006-08 Impact factor: 5.006

3. PDBML: the representation of archival macromolecular structure data in XML.

Authors: John Westbrook; Nobutoshi Ito; Haruki Nakamura; Kim Henrick; Helen M Berman
Journal: Bioinformatics Date: 2004-10-27 Impact factor: 6.937

4. PQS: a protein quaternary structure file server.

Authors: K Henrick; J M Thornton
Journal: Trends Biochem Sci Date: 1998-09 Impact factor: 13.807

5. The Protein Data Bank: a computer-based archival file for macromolecular structures.

Authors: F C Bernstein; T F Koetzle; G J Williams; E F Meyer; M D Brice; J R Rodgers; O Kennard; T Shimanouchi; M Tasumi
Journal: J Mol Biol Date: 1977-05-25 Impact factor: 5.469

6. Creation of a nuclear magnetic resonance data repository and literature database.

Authors: E L Ulrich; J L Markley; Y Kyogoku
Journal: Protein Seq Data Anal Date: 1989

7. GASH: an improved algorithm for maximizing the number of equivalent residues between two protein structures.

Authors: Daron M Standley; Hiroyuki Toh; Haruki Nakamura
Journal: BMC Bioinformatics Date: 2005-09-08 Impact factor: 3.169

8. E-MSD: improving data deposition and structure quality.

Authors: M Tagari; J Tate; G J Swaminathan; R Newman; A Naim; W Vranken; A Kapopoulou; A Hussain; J Fillon; K Henrick; S Velankar
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. The RCSB PDB information portal for structural genomics.

Authors: Andrei Kouranov; Lei Xie; Joanna de la Cruz; Li Chen; John Westbrook; Philip E Bourne; Helen M Berman
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema.

Authors: Nita Deshpande; Kenneth J Addess; Wolfgang F Bluhm; Jeffrey C Merino-Ott; Wayne Townsend-Merino; Qing Zhang; Charlie Knezevich; Lie Xie; Li Chen; Zukang Feng; Rachel Kramer Green; Judith L Flippen-Anderson; John Westbrook; Helen M Berman; Philip E Bourne
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

406 in total

1. Some B(eq) are more equivalent than others.

Authors: Ethan A Merritt
Journal: Acta Crystallogr A Date: 2011-10-13 Impact factor: 2.290

2. A conserved interaction with the chromophore of fluorescent proteins.

Authors: Amit Choudhary; Kimberli J Kamer; Ronald T Raines
Journal: Protein Sci Date: 2011-12-21 Impact factor: 6.725

3. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins.

Authors: Fatemeh Miri Disfani; Wei-Lun Hsu; Marcin J Mizianty; Christopher J Oldfield; Bin Xue; A Keith Dunker; Vladimir N Uversky; Lukasz Kurgan
Journal: Bioinformatics Date: 2012-06-15 Impact factor: 6.937

Review 4. Sustainable digital infrastructure. Although databases and other online resources have become a central tool for biological research, their long-term support and maintenance is far from secure.

Authors: Ruth Bastow; Sabina Leonelli
Journal: EMBO Rep Date: 2010-09-17 Impact factor: 8.807

5. Application of protein engineering to enhance crystallizability and improve crystal properties.

Authors: Zygmunt S Derewenda
Journal: Acta Crystallogr D Biol Crystallogr Date: 2010-04-21

6. Biased retrieval of chemical series in receptor-based virtual screening.

Authors: Natasja Brooijmans; Jason B Cross; Christine Humblet
Journal: J Comput Aided Mol Des Date: 2010-10-30 Impact factor: 3.686

7. Computer-aided drug design platform using PyMOL.

Authors: Markus A Lill; Matthew L Danielson
Journal: J Comput Aided Mol Des Date: 2010-10-30 Impact factor: 3.686

8. KB-Rank: efficient protein structure and functional annotation identification via text query.

Authors: Elchin S Julfayev; Ryan J McLaughlin; Yi-Ping Tao; William A McLaughlin
Journal: J Struct Funct Genomics Date: 2012-01-21

9. Role of glycoside phosphorylases in mannose foraging by human gut bacteria.

Authors: Simon Ladevèze; Laurence Tarquis; Davide A Cecchini; Juliette Bercovici; Isabelle André; Christopher M Topham; Sandrine Morel; Elisabeth Laville; Pierre Monsan; Vincent Lombard; Bernard Henrissat; Gabrielle Potocki-Véronèse
Journal: J Biol Chem Date: 2013-09-16 Impact factor: 5.157

10. On the role of physics and evolution in dictating protein structure and function.

Authors: Jeffrey Skolnick; Mu Gao; Hongyi Zhou
Journal: Isr J Chem Date: 2014-08-01 Impact factor: 3.333