| Literature DB >> 28375141 |
Abstract
Composite data sets measured on different objects are usually affected by random errors, but may also be influenced by systematic (genuine) differences in the objects themselves, or the experimental conditions. If the individual measurements forming each data set are quantitative and approximately normally distributed, a correlation coefficient is often used to compare data sets. However, the relations between data sets are not obvious from the matrix of pairwise correlations since the numerical value of the correlation coefficient is lowered by both random and systematic differences between the data sets. This work presents a multidimensional scaling analysis of the pairwise correlation coefficients which places data sets into a unit sphere within low-dimensional space, at a position given by their CC* values [as defined by Karplus & Diederichs (2012), Science, 336, 1030-1033] in the radial direction and by their systematic differences in one or more angular directions. This dimensionality reduction can not only be used for classification purposes, but also to derive data-set relations on a continuous scale. Projecting the arrangement of data sets onto the subspace spanned by systematic differences (the surface of a unit sphere) allows, irrespective of the random-error levels, the identification of clusters of closely related data sets. The method gains power with increasing numbers of data sets. It is illustrated with an example from low signal-to-noise ratio image processing, and an application in macromolecular crystallography is shown, but the approach is completely general and thus should be widely applicable.Entities:
Keywords: classification; correlation coefficient; dimensionality reduction; eigenanalysis; isomorphism; random and systematic error; sparse data
Mesh:
Substances:
Year: 2017 PMID: 28375141 PMCID: PMC5379934 DOI: 10.1107/S2059798317000699
Source DB: PubMed Journal: Acta Crystallogr D Struct Biol ISSN: 2059-7983 Impact factor: 7.652
Terms used in this paper
| Term | Meaning | Example(s) |
|---|---|---|
|
| Number of value pairs for correlation-coefficient calculation between data sets ( | Number of unique reflections common to two data sets; number of image pixels within a mask |
|
| Number of experiments | Number of data sets; number of images |
|
| Dimension of reduced space | 2 |
|
| Bivariate scalar function of data sets | Correlation coefficient between data sets with |
|
| Bivariate scalar function of the representation of data sets | Scalar product in |
|
| Experimental data of data set | Reflection intensities of data set |
|
| Representation of data set | Points in plane representing images (Fig. 1 |
| CC | Correlation coefficient | CC1/2, CC* |
| σ | Estimated standard deviation | Estimated error of intensity value |
| Systematic difference | Changes of experimental result owing to features of particular object; may be common to some data sets. Systematic differences lead to non-isomorphism/inhomogeneity. | Different conformation of molecule leading to different image or diffraction |
| Random error/difference | Unpredictable change of experimental result arising from effects that cannot be controlled by the experimenter and are unrelated to changes in other measurements of the same data set or in other data sets | Poisson statistics in photon-counting experiments; electronic noise in measurement apparatus; statistical variation within samples drawn from a homogeneous population |
Figure 1Portrait of A. Einstein (Wikimedia). (a) Example of portrait with added noise; the signal-to-noise ratio is 1:9. (b) Symmetric result of averaging of noisy images and mirror images. (c) Histograms of correlation coefficients (red, between images of the same type; blue, between images of different types; green, sum of both histograms). (d) Result of two-dimensional analysis: each cross represents one image. Arrows point to images with a 1:13 signal-to-noise ratio. The axes are unitless; only the relevant area of the possible range (a circle with radius 1) is shown. The angle between the two prototypic directions is 65°; its cosine agrees with the correlation of 0.43 between the image and its mirror. (e) The result of averaging the 50 noisy original images; the overall noise level is reduced by averaging. (f) as (e) but for the 50 noisy mirror images
Figure 2(a) Analysis of original photosystem I XFEL data shows two clusters corresponding to the two possible indexing modes. (b) Analysis of properly indexed photosystem I XFEL data; projection on the xy plane. (c) Analysis of properly indexed photosystem I XFEL data; projection on the yz plane.