| Literature DB >> 16046813 |
Abstract
The goal of proteomics is the complete characterization of all proteins. Efforts to characterize subcellular location have been limited to assigning proteins to general categories of organelles. We have previously designed numerical features to describe location patterns in microscope images and developed automated classifiers that distinguish major subcellular patterns with high accuracy (including patterns not distinguishable by visual examination). The results suggest the feasibility of automatically determining which proteins share a single location pattern in a given cell type. We describe an automated method that selects the best feature set to describe images for a given collection of proteins and constructs an effective partitioning of the proteins by location. An example for a limited protein set is presented. As additional data become available, this approach can produce for the first time an objective systematics for protein location and provide an important starting point for discovering sequence motifs that determine localization.Entities:
Year: 2005 PMID: 16046813 PMCID: PMC1184054 DOI: 10.1155/JBB.2005.87
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1Flow chart for clustering protein subcellular location patterns.
Figure 2Selected images from the 3D 3T3 image dataset. Tagged protein names are shown with a hyphen followed by a clone number if the same protein was tagged in more than one clone in the dataset. Representative images are shown for (a) Atp5a1-1, (b) Ewsh, (c) Glut1, (d) Tubb2-1, (e) Canx, and (f) Hmga1-1. The top portion of each panel shows a projection on the x-y plane and the bottom shows a projection on the x-z plane.
Figure 3Histograms of selected features before z-score normalization. Examples of features with (a) a roughly Gaussian distribution (3D-SLF11.6, average object to center of fluorescence distance), (b) a roughly Poisson distribution (3D-SLF11.23, texture feature average of co-occurrence matrix sum variance), and (c) a biomodal distribution (3D-SLF11.37, texture feature range of co-occurrence matrix sum entropy).
Algorithm 1Procedure: clustering on confusionmatrix (ConfusionMatrix, threshold).
Optimal feature set for distinguishing the 3D 3T3 images (3D-SLF18). The features are listed in decreasing order of discriminating power as evaluated by SDA.
| Feature name | Feature description |
| 3D-SLF11.16 | The fraction of fluorescence in above threshold pixels that are along an edge |
| 3D-SLF11.19 | Average of correlation |
| 3D-SLF11.23 | Average of sum variance |
| 3D-SLF11.31 | Range of contrast |
| 3D-SLF11.5 | Ratio of maximum object volume to minimum object volume |
| 3D-SLF11.28 | Average of info measure of correlation 1 |
| 3D-SLF11.3 | Average object volume (average number of above threshold pixels per object) |
| 3D-SLF11.21 | Average of inverse difference moment |
| 3D-SLF11.24 | Average of sum entropy |
| 3D-SLF11.33 | Range of sum of squares of variance |
| 3D-SLF11.22 | Average of sum average |
| 3D-SLF11.29 | Average of info measure of correlation 2 |
| 3D-SLF11.25 | Average of entropy |
| 3D-SLF11.34 | Range of inverse difference moment |
| 3D-SLF11.2 | Euler number of the cell |
| 3D-SLF11.41 | Range of info measure of correlation 1 |
| 3D-SLF11.27 | Average of difference entropy |
| 3D-SLF11.26 | Average of difference variance |
| 3D-SLF11.37 | Range of sum entropy |
| 3D-SLF11.40 | Range of difference entropy |
| 3D-SLF11.35 | Range of sum average |
| 3D-SLF11.36 | Range of sum variance |
| 3D-SLF11.20 | Average of sum of squares of variance |
| 3D-SLF11.32 | Range of correlation |
| 3D-SLF11.4 | Standard deviation (SD) of object volumes |
| 3D-SLF11.38 | Range of entropy |
| 3D-SLF11.10 | SD of absolute value of the horizontal component of object to protein center of fluorescence (COF) distances |
| 3D-SLF11.9 | Average absolute value of the horizontal component of object to COF distance |
| 3D-SLF11.18 | Average of contrast |
| 3D-SLF11.13 | SD of signed vertical component of object to protein center of fluorescence (COF) distances |
| 3D-SLF11.6 | Average object to COF distance |
| 3D-SLF11.17 | Average of angular second moment |
| 3D-SLF11.42 | Range of info measure of correlation 2 |
| 3D-SLF11.12 | Average signed vertical component of object to protein center of fluorescence (COF) distances |
Comparison of clustering methods and distance functions. The agreement between the sets of clusters resulting from the four clustering methods described in the text was measured using the κ test. The standard deviations of the statistic under the null hypothesis were estimated to range between 0.014 and 0.023 from multiple simulations.
| Clustering approaches compared | z-scored Euclidean distance | Mahalanobis distance |
| 1 | 0.5397 | |
| 0.4171 | 0.3634 | |
| Consensus versus ConfMat | 0.4171 | 0.1977 |
| 0.2055 | 0.1854 | |
| Consensus versus visual | 0.2055 | 0.1156 |
Figure 4A consensus subcellular location tree generated on the 3D 3T3 image dataset using SDA-selected 3D-SLF11 features. The columns show the protein names (if known), human observations of subcellular location, and subcellular location inferred from gene ontology (GO) annotations. The sum of the lengths of horizontal edges connecting two proteins represents the distance between them in the feature space. Proteins for which the location described by human observation differs significantly from that inferred from GO annotations are marked (**).