| Literature DB >> 22971117 |
William J Cukierski1, Kaustav Nandy, Prabhakar Gudla, Karen J Meaburn, Tom Misteli, David J Foran, Stephen J Lockett.
Abstract
BACKGROUND: Correct segmentation is critical to many applications within automated microscopy image analysis. Despite the availability of advanced segmentation algorithms, variations in cell morphology, sample preparation, and acquisition settings often lead to segmentation errors. This manuscript introduces a ranked-retrieval approach using logistic regression to automate selection of accurately segmented nuclei from a set of candidate segmentations. The methodology is validated on an application of spatial gene repositioning in breast cancer cell nuclei. Gene repositioning is analyzed in patient tissue sections by labeling sequences with fluorescence in situ hybridization (FISH), followed by measurement of the relative position of each gene from the nuclear center to the nuclear periphery. This technique requires hundreds of well-segmented nuclei per sample to achieve statistical significance. Although the tissue samples in this study contain a surplus of available nuclei, automatic identification of the well-segmented subset remains a challenging task.Entities:
Mesh:
Year: 2012 PMID: 22971117 PMCID: PMC3484015 DOI: 10.1186/1471-2105-13-232
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1(a,b) There are significant differences in sample appearance, even for tissue processed under identical conditions.
Figure 2Overview of the automated steps to extract gene position information from images, which can then be employed for high-throughput studies.(a) The fluorescence image is acquired. (b) A multistage watershed segmentation algorithm creates a set of candidate segmentations. (c) A logistic regression assigns a probability to each candidate, screening out those with a low likelihood of being well segmented. (d) Examples of highly ranked vs. poorly ranked segmented objects. Blue: true segmentations of nuclei. Red border: automatically deduced candidate segmentations. Green/Red dots: FISH-labeled genes (e) Gene position measurements, such as the radial probability distributions, are made using the correctly segmented nuclei borders. (f) A confusion matrix illustrating potential outcomes of a binary classification. The red dotted line represents a candidate segmentation. False positives are the most critical source of error in a ranked retrieval of nuclei, potentially creating incorrect gene position measurements.
Four categories of features were extracted and tested in the automatic pipeline
| | Area | Number of pixels in segmentation mask |
| | Perimeter | Length of segmentation boundary |
| | Perimeter/Area | Perimeter to area ratio |
| | Solidity | Ratio of area to the convex hull area |
| | CH Perimeter / Perimeter | Ratio of convex hull perimeter to object perimeter |
| | Max Circle Ratio | Ratio of the area of the largest inscribed circle to the total area |
| | Ellipse Eccentricity | Eccentricity of the ellipse that has the same second-moments as the mask |
| Morphological | Ellipse Error Ratio | Area difference between best-fit ellipse and boundary |
| Length | Measure of elongation | |
| | Width | Measure of breadth |
| | Mean Pairwise Distance | Mean all-pairs distance between points on perimeter |
| | Polar Histogram | Measures isotropy of border points at a given angle |
| | Num. Severe Corners | Number of strong corners in segmentation contour |
| | Box-counting Dimension | Fractal dimension of the perimeter |
| | Erosion Profile | Identifies segmentations with narrow passages separating large areas |
| | Elliptical Fourier | Number of elliptical Fourier coefficients to reconstruct the mask to within |
| | | 10% area error |
| | Mean Intensity | Mean of grayscale intensity inside nucleus |
| Texture | Intensity Range | Range of grayscale intensity inside nucleus |
| Entropy | Global entropy of grayscale values inside nucleus | |
| | Gray-level Co-occurrence | Statistics of the gray-level co-occurrence matrix |
| | Num. FISH | Number of FISH spots |
| | FISH/Area | Number of FISH spots normalized by area |
| FISH | FISH CH Area | Ratio of convex hull area formed on FISH spots to total area |
| | FISH Boundary | Measures whether the FISH convex hull intersects the nuclear boundary |
| | Mean FISH Distance | Mean distance between FISH spots |
| | Num. Nuclei | Number of candidate nuclei in the image |
| | Intensity Ratio | Mean intensity of band surrounding the nucleus compared to the mean intensity inside |
| Contextual | Betweenness Centrality | Betweenness centrality of nucleus in a graph connecting nuclei in the image |
| Number Neighbors | Number of neighbors connected to the nucleus | |
| Mean Edge Distance | Mean edge distance to 1st-level neighbors |
aSummarized here are brief descriptions of the features.
Figure 3(a) ROC curves for each leave-one-out experiment trial, with the ground truth taken from all three reviewers (b) ROC performance as a function of the number of reviewers (c) Sorted values of the posterior probability values for each dataset, scaled to the domain [0,1]. The area under these curves is a representation of the yield of well segmented nuclei from the dataset. (d) The effect of training set size on the regression performance, reported for 500 repetitions.
ROC values of manual labels were constructed
| ( | 0.868 | |
| ( | 0.905 | |
| ( | 0.936 |
aROC areas for the manual agreement (N = 43,956).
Comparison of the automated method with manual analysis shows acceptable agreement
| D1 | NCBD | 63 | 0.1132 | 0.1918 | 0.1363 | 0.1321 | ||
| D2 | Cancer | 164 | 0.2689 | 0.4360 | 0.0000 | 0.0000 | ||
| D3 | Cancer | 115 | 0.0050 | 0.0001 | 0.0003 | 0.0000 | ||
| D4 | Cancer | 133 | 0.0000 | 0.0001 | 0.0000 | 0.0000 | ||
| D5 | Cancer | 188 | 0.0237 | 0.0000 | 0.0000 | 0.0000 | X | |
| D6 | Cancer | 147 | 0.0004 | 0.0182 | 0.5239 | 0.5019 | X | |
| D7 | Cancer | 63 | 0.0000 | 0.0000 | 0.0724 | 0.0013 | X | |
| D8 | Cancer | 174 | 0.0002 | 0.0000 | 0.3017 | 0.3576 | ||
| D9 | Cancer | 120 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | ||
| D10 | Cancer | 117 | 0.0001 | 0.0005 | 0.0157 | 0.0201 | ||
| D11 | Cancer | 87 | 0.0001 | 0.0007 | 0.0000 | 0.0286 | X | |
| D12 | Cancer | 153 | 0.0009 | 0.0048 | 0.0000 | 0.0130 | X | |
| D13 | Cancer | 37 | 0.0000 | 0.0004 | 0.0526 | 0.9827 | ||
| | | | | | | Total: | 11/13 | 10/13 |
| 84.60% | 76.90% |
a13 test datasets with their tissue type (NCBD = non-cancerous breast disease) and the number of nuclei screened with a probability above 0.1.
Figure 4(a) Scatterplot of the normalized man vs. machine EDT. This is a comparison of the nEDT vs. cEDT values between fully manual and machine segmentation of 1086 nuclei, comprising 3720 gene markers. (b) Normalized EDT error as a function of position within the nucleus (0 = center, 1 edge) (c) Scatterplot of the cumulative man vs. machine EDT. (d) Cumulative EDT error as a function of position within the nucleus. FISH spots close to the periphery have larger error for cEDT, while those for nEDT are not position dependent. The red line is the median, the box extends to the 25th and 75th percentiles, the whiskers are the most extreme data points not considered outliers, outliers are plotted as red “+”s. Outliers above 0.3 not shown to improve visualization.