Glen van Ginkel1, Lukáš Pravda1, José M Dana1, Mihaly Varadi1, Peter Keller2, Stephen Anyango1, Sameer Velankar3. 1. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. 2. Global Phasing Ltd., Sheraton House, Castle Park, Cambridge, CB3 0AX, UK. 3. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. sameer@ebi.ac.uk.
Abstract
BACKGROUND: Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. RESULTS: To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository ( http://pypi.org/project/pdbecif ) and from GitHub ( https://github.com/pdbeurope/pdbecif ) along with rich documentation and many ready-to-use examples. CONCLUSIONS: PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.
BACKGROUND: Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. RESULTS: To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository ( http://pypi.org/project/pdbecif ) and from GitHub ( https://github.com/pdbeurope/pdbecif ) along with rich documentation and many ready-to-use examples. CONCLUSIONS:PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.
Authors: David R Armstrong; John M Berrisford; Matthew J Conroy; Aleksandras Gutmanas; Stephen Anyango; Preeti Choudhary; Alice R Clark; Jose M Dana; Mandar Deshpande; Roisin Dunlop; Paul Gane; Romana Gáborová; Deepti Gupta; Pauline Haslam; Jaroslav Koča; Lora Mak; Saqib Mir; Abhik Mukhopadhyay; Nurul Nadzirin; Sreenath Nair; Typhaine Paysan-Lafosse; Lukas Pravda; David Sehnal; Osman Salih; Oliver Smart; James Tolchard; Mihaly Varadi; Radka Svobodova-Vařeková; Hossam Zaki; Gerard J Kleywegt; Sameer Velankar Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971
Authors: Stephen K Burley; Charmi Bhikadiya; Chunxiao Bi; Sebastian Bittrich; Li Chen; Gregg V Crichlow; Cole H Christie; Kenneth Dalenberg; Luigi Di Costanzo; Jose M Duarte; Shuchismita Dutta; Zukang Feng; Sai Ganesan; David S Goodsell; Sutapa Ghosh; Rachel Kramer Green; Vladimir Guranović; Dmytro Guzenko; Brian P Hudson; Catherine L Lawson; Yuhe Liang; Robert Lowe; Harry Namkoong; Ezra Peisach; Irina Persikova; Chris Randle; Alexander Rose; Yana Rose; Andrej Sali; Joan Segura; Monica Sekharan; Chenghua Shao; Yi-Ping Tao; Maria Voigt; John D Westbrook; Jasmine Y Young; Christine Zardecki; Marina Zhuravleva Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971