| Literature DB >> 29067736 |
Stephen K Burley1,2,3, Helen M Berman1, Cole Christie3, Jose M Duarte3, Zukang Feng1, John Westbrook1, Jasmine Young1, Christine Zardecki1.
Abstract
The Protein Data Bank (PDB) is one of two archival resources for experimental data central to biomedical research and education worldwide (the other key Primary Data Archive in biology being the International Nucleotide Sequence Database Collaboration). The PDB currently houses >134,000 atomic level biomolecular structures determined by crystallography, NMR spectroscopy, and 3D electron microscopy. It was established in 1971 as the first open-access, digital-data resource in biology, and is managed by the Worldwide Protein Data Bank partnership (wwPDB; wwpdb.org). US PDB operations are conducted by the RCSB Protein Data Bank (RCSB PDB; RCSB.org; Rutgers University and UC San Diego) and funded by NSF, NIH, and DoE. The RCSB PDB serves as the global Archive Keeper for the wwPDB. During calendar 2016, >591 million structure data files were downloaded from the PDB by Data Consumers working in every sovereign nation recognized by the United Nations. During this same period, the RCSB PDB processed >5300 new atomic level biomolecular structures plus experimental data and metadata coming into the archive from Data Depositors working in the Americas and Oceania. In addition, RCSB PDB served >1 million RCSB.org users worldwide with PDB data integrated with ∼40 external data resources providing rich structural views of fundamental biology, biomedicine, and energy sciences, and >600,000 PDB101.rcsb.org educational website users around the globe. RCSB PDB resources are described in detail together with metrics documenting the impact of access to PDB data on basic and applied research, clinical medicine, education, and the economy.Entities:
Keywords: 3D electron microscopy; FAIR principles; NMR spectroscopy; PDB; PDBx/mmCIF; Protein Data Bank; RCSB; Research Collaboratory for Structure Bioinformatics; Worldwide Protein Data Bank; biocuration; chemical component dictionary; crystallography; data archive; data deposition; integrative/hybrid methods; macromolecular structure; metadata; open access; validation; wwPDB
Mesh:
Year: 2017 PMID: 29067736 PMCID: PMC5734314 DOI: 10.1002/pro.3331
Source DB: PubMed Journal: Protein Sci ISSN: 0961-8368 Impact factor: 6.725
Figure 1Week in the life of the RCSB PDB, showing the progression from data deposition at wwPDB regional data centers to preparation and finalization of weekly releases by the RCSB PDB acting as PDB Archive Keeper, followed by Stage I (partial) and Stage II (full) Global PDB Data Release.
Figure 2OneDep system workflow.
Figure 3PDB depositions by geography (January 2000‐June 2017) and breakdown of funding source for US depositions.
Selected RCSB PDB Resources
| Resources to search the PDB archive | Simple top bar search: search by PDB ID, author name, keyword, sequence, or ligand. Autosuggestions are automatically displayed, for example, a query of “protease” launches a suggestion box that organizes possible related UniProt molecule name, structural domain, Enzyme Classification, and so forth |
| Pressing return or the search button will perform a plain text search of PDB data, RCSB PDB news, and Molecule of the Month articles | |
| Advanced Search: Combine multiple searches of specific types of data in a logical AND or OR (e.g., structures with UniProt molecule name “HIV‐1 Protease” with resolution less than 1.5 Å) | |
| Chemical Component Searching: search for ligand descriptions or structures containing particular small molecules by ID, InChI descriptor, formula, or chemical structure | |
| Annotation browsers: provide access to structures in the PDB archive using different hierarchical classification trees, including the Anatomical Therapeutic Chemical (ATC) Classification System from the WHO Collaborating Centre for Drug Statistics Methodology; Membrane Proteins identified using the mpstruc database, | |
| Sequence searching: find structures matching or similar to a given sequence by either entering a target sequence or using a particular chain in a PDB structure | |
| Search results | To explore individual entries returned from a search, Structure Summary pages provide an overview key information about an entry with options to download data, search for similar structures containing the same data (e.g., classification, author, organism, ligand), and options to view in 3D, explore external annotations, study the sequence, compare the sequence and 3D structure with other PDB structures, and access information about the experiment |
| Query results containing multiple entries can be further refined (by organism, molecule name, taxonomy, etc.), sorted (by size, release date, resolution, etc.), or explored by individual entry. A variety of tabular reports can be generated (described below) | |
| Resources for visualization |
Structure summary pages off links to interactive 3D views in NGL |
| Protein Feature View is a graphical summary of a full‐length protein sequence from UniProt and its correspondence to PDB entries, annotations from external databases (such as Pfam), homology models information from the Protein Model Portal, predicted regions of protein disorder, and hydrophobic regions | |
| Human Gene View is a tool for visualizing correspondences between the human genome | |
| Pathway View maps metabolic pathway components | |
| Resources for analysis | Tabular reports can be generated for a set of structures that can be customized to include ∼150 data items. Summary reports can also be generated |
|
Comparison Tool |
Figure 4Data integrated from external resources enables research. Information about publications, sequence annotations, drug interactions, and more are updated regularly to enable Data Consumers to browse entire PDB archive by external annotations, access annotations for individual structures, and visualize data in 2D and 3D. Examples shown, clockwise from the upper right: Images from PoseView32 are available on Structure Summary pages; the Gene View tool illustrates correspondences between the human genome and PDB structure; metabolic pathways maps in the Pathway View identify pathway components with PDB structures and homology models.10
Figure 5Fraction of published PDB structures cited in subject‐area publications. The impact of individual structures can also be assessed using PDB archive data download statistics. An RCSB PDB study completed in July 2017 documented that each PDB structure has been downloaded an average of ∼30,400 times since 2007. Some PDB structures are extremely “popular.” The top 1% of downloaded structures have each been downloaded an average ∼105,000 times since 2007. Individual structure download statistics are provided on the wwPDB website (www.wwpdb.org/stats/search).
Figure 6Category‐normalized citation impact of publications in different subject categories citing Berman et al. (2000) from Clarivate Analytics.37
Figure 7Representative PDB structures that exemplifying impact on our understanding of Fundamental Biology, Biomedicine, and Energy Research. (a) Nucleosome Core Particle (PDB ID 1aoi41); (b) Major Histocompatibility Complex 1 (1hla43); (c) Photosystem II (1s5l47).
Impact of US‐Funded PDB Structures as Measured by Number of Publications and Citations Thereof, with Fraction Represented Within the Top 40% of Global PDB Downloads (Since 2007)
| Number PDB depositions funded | Publications | Citations | % in Top 40% of PDB downloads | |
|---|---|---|---|---|
| NSF‐funded | 4577 | 1841 | 56,026 | 97% |
| NIH‐funded | 47,347 | 21,340 | 874,367 | 98% |
| DoE‐funded | 11,201 | 4406 | 104,493 | 98% |
|
|
|
|
|
|
Figure 8(a) XFEL serial crystallography reveals what happens when adenine binds to a riboswitch66 and (b) I/HM multi‐scale structural model of the nuclear pore Nup84 complex.67