| Literature DB >> 31691821 |
David R Armstrong1, John M Berrisford1, Matthew J Conroy1, Aleksandras Gutmanas1, Stephen Anyango1, Preeti Choudhary1, Alice R Clark1, Jose M Dana1, Mandar Deshpande1, Roisin Dunlop1, Paul Gane1, Romana Gáborová2, Deepti Gupta1, Pauline Haslam1, Jaroslav Koča2, Lora Mak1, Saqib Mir1, Abhik Mukhopadhyay1, Nurul Nadzirin1, Sreenath Nair1, Typhaine Paysan-Lafosse1,3, Lukas Pravda1, David Sehnal2, Osman Salih4, Oliver Smart1, James Tolchard1, Mihaly Varadi1, Radka Svobodova-Vařeková2, Hossam Zaki1, Gerard J Kleywegt1,4, Sameer Velankar1.
Abstract
The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with ∼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.Entities:
Mesh:
Year: 2020 PMID: 31691821 PMCID: PMC7145656 DOI: 10.1093/nar/gkz990
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Image of cholesterol (PDB three-letter code: CLR) generated from the PDB Chemical Component Dictionary, using the pdbeccdutils process. (B) Chemical scaffold (in green) of cholesterol, identified and highlighted using the pdbeccdutils process.
Summary of data enrichment collaborations
| Enriched data | Enrichment process | Main outcomes |
|---|---|---|
| Mapping to Rfam | • Data incorporated into the PDBe database and displayed on relevant PDBe entry pages. | • Assignment of over 5000 RNA chains to 98 Rfam families in >1500 structures in the PDB. |
| Identification of cofactors | • PDBe worked with the Thornton group at EMBL-EBI to set up a process to identify cofactor and cofactor-like molecules in the PDB. | • Identified 417 unique cofactor-like small molecules. |
| Preliminary Pfam domain assignments | • Collaboration with Pfam team to implement provisional domain-assignment process at PDBe. | • As of September 2019, over 16 000 PDB entries had Pfam domains assigned which are not in the official latest Pfam release (V32.0, September 2018). |
| Preliminary CATH domain assignments (CATHb) | • CATHb data released by CATH team provides preliminary CATH structural domain assignment for PDB structures on a weekly basis. | • As of August 2019, around 30 000 new entries have CATH domains assigned which are not in the official full CATH release (V4.2, September 2017). |
| Standardized information on crystallographic cells dimensions (NIGGLI) | • Standardization of cell dimensions using Niggli reduction ( | • Standardized cell dimensions in PDBe's search used by Phaser ( |
Figure 2.New features added to the PDBe search. (A) The advanced search form supports 122 fields in queries. (B) Sequence searching using HMMER. (C) Autocomplete option to find relevant search terms for each search field.
Figure 3.New features in presenting search results, including: number of facets (left) increased to 39 to improve filtering of results, access to 3D visualization and file downloads directly from the search results (green text); alignment and statistics provided for sequence searches, links to PDBe-KB aggregated views.
Additions and updates to the PDBe REST API and FTP sites
| PDBe REST API (pdbe.org/api) | • New endpoints for protein information. |
| PDBe enriched FTP ( | • mmCIF-format assembly files added ( |
Figure 4.Improved visualization of nucleic acids and carbohydrates in LiteMol. (A) PDB entry 5ezi (48), the structure of a malaria antigen in complex with an antibody, focused on the branched glycan in the structure. The glycan is shown using the 3D-SNFG representation. (B) PDB entry 5x2g (49), structure of a CRISPR-Cas9 complex of protein with nucleic acids, using the new visualization for RNA and DNA.