| Literature DB >> 23151636 |
V Ramraj1, G Evans, J M Diprose, R M Esnouf.
Abstract
When embarking upon X-ray diffraction data collection from a potentially novel macromolecular crystal form, it can be useful to ascertain whether the measured data reflect a crystal form that is already recorded in the Protein Data Bank and, if so, whether it is part of a large family of related structures. Providing such information to crystallographers conveniently and quickly, as soon as the first images have been recorded and the unit cell characterized at an X-ray beamline, has the potential to save time and effort as well as pointing to possible search models for molecular replacement. Given an input unit cell, and optionally a space group, Nearest-cell rapidly scans the Protein Data Bank and retrieves near-matches.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23151636 PMCID: PMC3498934 DOI: 10.1107/S0907444912040590
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
Figure 1Schematic showing Nearest-cell’s logic. (1) The input cell is first converted to P1 if required. (2) It is then compared with every known P1 cell in the PDB using MATFIT (McLachlan, 1972 ▶; Kabsch, 1976 ▶, 1978 ▶); the schematic in box 2a shows an example superposition with one permutation of the database P1 cell (O′ superposed on O, A′ on A, B′ on B and C′ on C). If the lowest r.m.s. difference of all six superpositions is less than the specified cutoff (see §2.2.1), the database cell qualifies as a positive match. (3) The family-clustering algorithm clusters PDB entries into families of sequence similarity. Results are then displayed to the user with each family represented by the PDB entry with the smallest r.m.s. difference from the input. Families can be expanded to show all hits, as shown in Fig. 2 ▶.
Figure 2Typical output from Nearest-cell, shown as part of Diamond’s fast_dp report for a thaumatin unit cell. The results are appended to the end of a fast_dp run. Family 1 contained 46 thaumatin unit cells clustered together, showing the effectiveness of the family-clustering algorithm for reducing the number of results displayed to the user (inset). Note that this family contains two exact matches (r.m.s. difference = 0.00 Å).