Literature DB >> 19759860

DECOMP: a PDB decomposition tool on the web.

Rafael Ordog¹, Zoltán Szabadka, Vince Grolmusz.

Abstract

The protein databank (PDB) contains high quality structural data for computational structural biology investigations. We have earlier described a fast tool (the decomp_pdb tool) for identifying and marking missing atoms and residues in PDB files. The tool also automatically decomposes PDB entries into separate files describing ligands and polypeptide chains. Here, we describe a web interface named DECOMP for the tool. Our program correctly identifies multi-monomer ligands, and the server also offers the preprocessed ligand-protein decomposition of the complete PDB for downloading (up to size: 5GB) AVAILABILITY: http://decomp.pitgroup.org.

Entities: Chemical

Keywords: PDB; SEQRES; decomposition; ligands; server; web tool

Year: 2009 PMID： 19759860 PMCID： PMC2737496 DOI： 10.6026/97320630003413

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

The Protein Data Bank [1] started to function as the depository of the crystallographic data, complementing journal publications: researchers solved the structure of a protein, wrote a paper on the result, and deposited the data of the solution in the publicly available PDB. The irregularities of the structure deposited (such as lacking atomic coordinates, broken chains, unidentified substructures) are mostly remarked in the cited publications and also in the remark-fields of the PDB file. The textual annotations in the scientific publication elsewhere or in the remark-fields in the very same PDB-file, however, make the automatic processing of the protein-structures very difficult. This statement may be a little bit confusing, since atoms, carrying the HET label are not supposed to be in the peptide-chain, so those structures that contains HET atoms other than the oxygen of the water would qualify for being a complex. Unfortunately, this is not the case. Metal ions, modified residues (in a surprisingly large number), and small molecules added in the crystallization all contain heteroatoms, and they are frequently not considered to be ligands. With our decomp_pdb program [2] protein-ligand complexes are identified reliably, and the ligands are deposited in separate files. Missing residues and atoms in chains are handled properly, that is, even if several atoms are missing from a chain our algorithm will still not recognize the parts as distinct chains. Placeholders are inserted into chains for missing residues/atoms (an example is given in Figure 2), denoting that the objects were not measured crystallographically, but - according to the more reliable sequence information - they should be there. This way our algorithm "repairs'' faulty PDB's, or recognizes that flexible chain sequences are present. We should remark, that missing atoms are usually a sign of mobile loop or string in the protein-crystal, since flexible atoms will not give usable electron density maps. Consequently, mapping missing atoms this way may help to automatically identify flexible protein parts. Ligands are identified without using the HET-atom labels, properly handling modified residues and small artifacts, due to crystallization protocols. CONECT records of the ligand-atoms are computed automatically (these records for the ligands generally are not present in the PDB file).

Figure 2

Atoms or residues are frequently missing at the beginning or at the end of polypeptide chains. In this example a missing residue and six missing atoms are identified at the beginning of chain B of pdb entry 10gs.

Methodology

Our program selects atoms from the PDB entry that are part of a protein or DNA chain. We do not use the chain-identifier for this purpose. However, we use SEQRES data and refined graph-theoretical algorithms described elsewhere [2]. It selects the water molecules, and removes them from the set of possible ligand atoms. Then metal and other small ions are selected, that will not be considered as ligands. A complete list of residue names that were considered as ions (so not as ligands) is given in the file ion_list.txt. All the remaining atoms will form the set of ligand atoms. Within this set, we use a graph-theory component detecting algorithm, so a ligand is defined as a connected component of the graph formed by the ligand atoms as vertices and the covalent bonds between the ligand atoms as the edges.

Functionality

The DECOMP tool correctly identifies ligand molecules, even if they are composed of more than one monomers. For example, when decomposing PDB entry 10GS with options “Export ligands”, the file 10gs.pdb.out.lig.3 contains the 3- monomer GLU-BCS-PG9 molecule correctly (Figure 1).

Figure 1

The DECOMP_PDB output-ligand 10gs.pdb.out.lig.3 contains the 3-monomer GLU-BCS-PG9 molecule correctly, in one single file, even if it contains three monomer ID’s.

Utility

Provide a list of PDB codes in the appropriate box at the web server and check the desired options. The PDB codes should be separated either by “spaces” or “new line” characters. Press the “schedule job” button and the request will be inserted into a queue. Progress is monitored in the “Log window”. The result will be a link in the “Log window” to a tar.gz file. The result file contains one directory for each of the pdb’s listed. Each of these directories contains an error log with “.pdb.error” extension, the decomposed pdb file with “.pdb” extension, and if “Export ligands” or “Export ions” option was specified, than a separate file is present for each of the ligands or ions. An error file is presented if there was a fatal error while processing the PDB file. The result files are usually viewed by popular PDB viewer tools. A preprocessed, constantly updated compressed file can be downloaded with the results when the entire PDB file has be decomposed. The result files are stored for 3 days, and log files are stored for 30 days in the server.

2 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. High throughput processing of the structural information in the protein data bank.

Authors: Zoltan Szabadka; Vince Grolmusz
Journal: J Mol Graph Model Date: 2006-09-12 Impact factor: 2.518

2 in total

1. Identification of high-affinity anti-CD16A allotype-independent human antibody domains.

Authors: Wei Li; Hongjia Yang; Dimiter S Dimitrov
Journal: Exp Mol Pathol Date: 2016-10-03 Impact factor: 3.362

2. Escape from humoral immunity is associated with treatment failure in HIV-1-infected patients receiving long-term antiretroviral therapy.

Authors: Yabo Ouyang; Qianqian Yin; Wei Li; Zhenpeng Li; Desheng Kong; Yanling Wu; Kunxue Hong; Hui Xing; Yiming Shao; Shibo Jiang; Tianlei Ying; Liying Ma
Journal: Sci Rep Date: 2017-07-24 Impact factor: 4.379

2 in total