Literature DB >> 9322015

Code generation through annotation of macromolecular structure data.

J Biggs1, C Pu, P Bourne.   

Abstract

The maintenance of software which uses a rapidly evolving data annotation scheme is time consuming and expensive. At the same time without current software the annotation scheme itself becomes limited and is less likely to be widely adopted. A solution to this problem has been developed for the macromolecular Crystallographic Information File (mmCIF) annotation scheme. The approach could be generalized for a variety of annotation schemes used or proposed for molecular biology data. mmCIF provides a highly structured and complete annotation for describing NMR and X-ray crystallographic data and the resulting macromolecular structures. This annotation is maintained in the mmCIF dictionary which currently contains over 3,200 terms. A major challenge is to maintain code for converting between mmCIF and Protein Data Bank (PDB) annotations while both continue to evolve. The solution has been to define a simple domain specific language (DSL) which is added to the extensive annotation already found in the mmCIF dictionary. The DSL calls specific mapping modules for each category of data item in the mmCIF dictionary. Adding or changing the mapping between PDB and mmCIF items of data is straightforward since data categories (and hence mapping modules) correspond to elements of macromolecular structure familiar to the experimentalist. Each time a change is made to the macromolecular annotation the appropriate change is made to the easily located and modifiable mapping modules. A code generator is then called which reads the mapping modules and creates a new executable for performing the data conversion. In this way code is easily kept current by individuals with limited programming skill, but who have an understanding of macromolecular structure and details of the annotation scheme. Most important, the conversion process becomes part of the global dictionary and is not open to a variety of interpretations by different research groups writing code based on dictionary contents. Details of the DSL and code generator are provided.

Mesh:

Substances:

Year:  1997        PMID: 9322015

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  1 in total

1.  EzCatDB: the enzyme reaction database, 2015 update.

Authors:  Nozomi Nagano; Naoko Nakayama; Kazuyoshi Ikeda; Masaru Fukuie; Kiyonobu Yokota; Takuo Doi; Tsuyoshi Kato; Kentaro Tomii
Journal:  Nucleic Acids Res       Date:  2014-10-16       Impact factor: 16.971

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.