| Literature DB >> 26937241 |
Andrius Merkys1, Antanas Vaitkus1, Justas Butkus2, Mykolas Okulič-Kazarinas1, Visvaldas Kairys1, Saulius Gražulis3.
Abstract
A syntax-correcting CIF parser, COD::CIF::Parser, is presented that can parse CIF 1.1 files and accurately report the position and the nature of the discovered syntactic problems. In addition, the parser is able to automatically fix the most common and the most obvious syntactic deficiencies of the input files. Bindings for Perl, C and Python programming environments are available. Based on COD::CIF::Parser, the cod-tools package for manipulating the CIFs in the Crystallography Open Database (COD) has been developed. The cod-tools package has been successfully used for continuous updates of the data in the automated COD data deposition pipeline, and to check the validity of COD data against the IUCr data validation guidelines. The performance, capabilities and applications of different parsers are compared.Entities:
Keywords: CIF parsers; Crystallography Open Database; Perl
Year: 2016 PMID: 26937241 PMCID: PMC4762566 DOI: 10.1107/S1600576715022396
Source DB: PubMed Journal: J Appl Crystallogr ISSN: 0021-8898 Impact factor: 3.304
Key–value pairs of a hash that represents a single CIF data block as constructed by COD::CIF::Parser
| Key | Value |
|---|---|
|
| Scalar. String denoting the name of a CIF data block. |
|
| Array. Lower-cased data names present in the CIF data block. |
|
| Hash table. Keys are equal to the values of the |
|
| Hash table. Keys are equal to the values of the |
|
| Hash table. Keys are equal to the values of the |
|
| Array of arrays. Each inner array corresponds to a loop from the CIF data block and contains a list of tags present in the loop. |
|
| Hash. Keys are equal to the values of the |
Figure 1An example of a CIF input for parsing.
Figure 2An internal CIF data structure created by COD::CIF::Parser after processing the CIF from Fig. 1 ▸.
Figure 3The top-level grammar rules defining error message syntax for COD::CIF::Parser.
Figure 4Examples of COD::CIF::Parser diagnostic messages.
Figure 5Tests of CIF and STAR parsers. Crosses (‘’) denote detected syntax errors, slashes (‘’) denote warnings, and dashes (‘–’) mark special cases when programs hang for an indefinite amount of time and have to be stopped manually. In the column ‘CIF conforming’ crosses mark files that do not conform to CIF syntax.
Total parsing time of 350 598 CIFs from the COD
| Parser | Run time (min) |
|---|---|
|
| 19.73 |
|
| 3.95 |
|
| 3.62 |
|
| 39.75 |
|
| 3062.22 |
|
| 5.15 |
|
| 2.68 |
|
| 3.67 |