| Literature DB >> 22949812 |
Hui-Meng Lu1,2, Da-Chuan Yin1,2, Yong-Ming Liu1,2, Wei-Hong Guo1,2, Ren-Bin Zhou1.
Abstract
The protein structural entries grew far slower than the sequence entries. This is partly due to the bottleneck in obtaining diffraction quality protein crystals for structural determination using X-ray crystallography. The first step to achieve protein crystallization is to find out suitable chemical reagents. However, it is not an easy task. Exhausting trial and error tests of numerous combinations of different reagents mixed with the protein solution are usually necessary to screen out the pursuing crystallization conditions. Therefore, any attempts to help find suitable reagents for protein crystallization are helpful. In this paper, an analysis of the relationship between the protein sequence similarity and the crystallization reagents according to the information from the existing databases is presented. We extracted information of reagents and sequences from the Biological Macromolecule Crystallization Database (BMCD) and the Protein Data Bank (PDB) database, classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the crystallization reagents. The results showed that there is a pronounced positive correlation between them. Therefore, according to the correlation, prediction of feasible chemical reagents that are suitable to be used in crystallization screens for a specific protein is possible.Entities:
Keywords: X-ray crystallography; crystallization reagents; molecular structure; protein crystallization; protein sequence similarity
Mesh:
Substances:
Year: 2012 PMID: 22949812 PMCID: PMC3431810 DOI: 10.3390/ijms13089514
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1The main crystallization reagents and their frequencies in the non-redundant dataset from the Biological Macromolecule Crystallization Database (BMCD).
The numbers of clusters and entries of the large sequence similarity clusters (LSSC) datasets.
| LSSC datasets | Number of clusters | Amount of entries | Average size of each cluster |
|---|---|---|---|
| LSSC30 | 173 | 3,921 | 22.7 |
| LSSC40 | 144 | 3,006 | 20.9 |
| LSSC50 | 122 | 2,433 | 19.9 |
| LSSC60 | 105 | 2,068 | 19.7 |
| LSSC70 | 87 | 1,757 | 20.2 |
| LSSC80 | 81 | 1,586 | 19.6 |
| LSSC90 | 70 | 1,340 | 19.1 |
Figure 2(a) The reagent consistency within each LSSC (S) and random groups (S). (Error Bar: standard error of mean; ** p < 0.001 of the t-test results). (b) The reagent consistency against the sequence similarity level of the LSSC and random datasets. (Error bar: standard error of mean; Dashed line: the linear regression line between reagent consistency and sequence similarity.)
Figure 3Range of mean V values in each cluster in LSSC datasets (from 0.032 to 0.989) was wider than the range in random datasets (form 0.347 to 0.871). (Group numbers: 1~173 belonged to the LSSC30 or RAN30 datasets, 194~298 belonged to the LSSC60 or RAN60 datasets, 319~388 belonged to the LSSC90 or RAN90 datasets; Solid black square: mean V in each cluster in LSSC datasets, hollow red triangle: mean of V in each group in random datasets.)
Comparison of V variance (VARj) between LSSC and Random datasets.
| Datasets | Group number under | Group size | Proportion of lower | |
|---|---|---|---|---|
| LSSC30 | 0.109 | 132 | 173 | 76.3% |
| LSSC60 | 0.111 | 81 | 105 | 77.1% |
| LSSC90 | 0.097 | 53 | 70 | 75.7% |
| RAN30 | 0.099 | 93 | 173 | 53.8% |
| RAN60 | 0.101 | 58 | 105 | 55.2% |
| RAN90 | 0.098 | 38 | 70 | 54.3% |
Figure 4The analysis strategy and process of this work.