| Literature DB >> 25490766 |
Fei Dong1, Humayun Irshad2, Eun-Yeong Oh2, Melinda F Lerwill3, Elena F Brachtel3, Nicholas C Jones3, Nicholas W Knoblauch2, Laleh Montaser-Kouhsari2, Nicole B Johnson2, Luigi K F Rao3, Beverly Faulkner-Jones2, David C Wilbur3, Stuart J Schnitt2, Andrew H Beck2.
Abstract
The categorization of intraductal proliferative lesions of the breast based on routine light microscopic examination of histopathologic sections is in many cases challenging, even for experienced pathologists. The development of computational tools to aid pathologists in the characterization of these lesions would have great diagnostic and clinical value. As a first step to address this issue, we evaluated the ability of computational image analysis to accurately classify DCIS and UDH and to stratify nuclear grade within DCIS. Using 116 breast biopsies diagnosed as DCIS or UDH from the Massachusetts General Hospital (MGH), we developed a computational method to extract 392 features corresponding to the mean and standard deviation in nuclear size and shape, intensity, and texture across 8 color channels. We used L1-regularized logistic regression to build classification models to discriminate DCIS from UDH. The top-performing model contained 22 active features and achieved an AUC of 0.95 in cross-validation on the MGH data-set. We applied this model to an external validation set of 51 breast biopsies diagnosed as DCIS or UDH from the Beth Israel Deaconess Medical Center, and the model achieved an AUC of 0.86. The top-performing model contained active features from all color-spaces and from the three classes of features (morphology, intensity, and texture), suggesting the value of each for prediction. We built models to stratify grade within DCIS and obtained strong performance for stratifying low nuclear grade vs. high nuclear grade DCIS (AUC = 0.98 in cross-validation) with only moderate performance for discriminating low nuclear grade vs. intermediate nuclear grade and intermediate nuclear grade vs. high nuclear grade DCIS (AUC = 0.83 and 0.69, respectively). These data show that computational pathology models can robustly discriminate benign from malignant intraductal proliferative lesions of the breast and may aid pathologists in the diagnosis and classification of these lesions.Entities:
Mesh:
Year: 2014 PMID: 25490766 PMCID: PMC4260962 DOI: 10.1371/journal.pone.0114885
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Image processing of UDH (A), low grade DCIS (B), and high grade DCIS (C) includes conversion of hematoxylin and eosin-stained images to binary images via automated threshold and watershed segmentation (D-F).
Elliptical approximations of identified nuclear objects are shown (G-I). Quantitative measurements (perimeter and circularity in this example) of nuclear distributions, where each data point represents one nuclear object, are shown as scatterplots and contour plots of two-dimensional kernel density estimations for each case (J-L).
Features and weights in the DCIS vs. UDH classification model.
| Feature Name | Summary Function | Feature Class | Color Space | Weight |
| Mean_Variance_Red | Mean | Intensity | Red (RGB) | 0.000875 |
| Mean_Kurtosis_Green | Mean | Intensity | Green (RGB) | −3.15038 |
| Mean_Kurtosis_BR | Mean | Intensity | BlueRatio | 0.002031 |
| Mean_Skewness_BR | Mean | Intensity | BlueRatio | 0.523952 |
| SD_Mean_Red | SD | Intensity | Red (RGB) | −0.05881 |
| SD_Variance_Green | SD | Intensity | Green (RGB) | 0.000543 |
| Mean_Minor | Mean | Morphology | N/A | −0.13152 |
| Mean_Round | Mean | Morphology | N/A | −28.1278 |
| Mean_IDM_Blue | Mean | Texture | Blue (RGB) | 101.3997 |
| Mean_ClusterShade_HSV | Mean | Texture | V (HSV) | 0.000004 |
| Mean_GLN_Lab | Mean | Texture | L (Lab) | −0.074964 |
| Mean_LGLRE_Lab | Mean | Texture | L (Lab) | −5239.083 |
| Mean_SRLGLE_Lab | Mean | Texture | L (Lab) | 0.000001 |
| Mean_LRLGLE_Lab | Mean | Texture | L (Lab) | −329.1062 |
| Mean_GLN_Luv | Mean | Texture | L (Luv) | −0.000184 |
| Mean_LGLRE_Luv | Mean | Texture | L (Luv) | −0.058987 |
| Mean_SRLGLE_Luv | Mean | Texture | L (Luv) | −35.01663 |
| Mean_LRLGLE_Luv | Mean | Texture | L (Luv) | −0.563487 |
| Mean_GLN_HE | Mean | Texture | H (HE) | −0.592543 |
| SD_Inertia_Red | SD | Texture | Red (RGB) | 0.000525 |
| SD_Entropy_Blue | SD | Texture | Blue (RGB) | −7.797402 |
| SD_IDM_HSV | SD | Texture | V (HSV) | −89.83901 |
Figure 2Learning a logistic regression model to distinguish DCIS vs. UDH on the MGH Dataset.
A. Area under the receiver operating characteristic curve (AUC) based on predictions made on held-out cases in cross-validation with respect to log of the regularization parameter λ (bottom) and the number of features active in the model (top). Dotted lines indicate λ of model with maximum AUC (left) and largest λ such that the AUC is within one standard error of the maximum (right). B. Receiver operating characteristic (ROC) curve based on predictions made in cross-validation for the λ value that maximized the AUC in cross-validation for discriminating DCIS vs UDH on the MGH dataset.
Figure 3Validation of DCIS vs. UDH model on the BIDMC Dataset.
A. Each point represents a case from the validation dataset. The cases are ranked based on the probability of UDH, which is indicated on the Y-Axis. The red points represent cases of DCIS and the black points represent cases of UDH. B. Receiver operating characteristic (ROC) curve based on based on predictions made on the BIDMC validation dataset for discriminating DCIS vs UDH.
Figure 4Learning a logistic regression model to distinguish DCIS vs. UDH on the MGH and BIDMC Datasets.
A. Area under the receiver operating characteristic curve (AUC) based on predictions made on held-out cases in cross-validation with respect to log of the regularization parameter λ (bottom) and the number of features active in the model (top). B. Receiver operating characteristic (ROC) curve based on based on predictions made in cross-validation for the λ value that maximized the AUC in cross-validation for discriminating DCIS vs UDH on the MGH and BIDMC datasets.
Classification performance for DCIS and UDH classification models across a range of classification tasks and using varying subsets of features.
|
| |||
| Total Features | Selected Features | AUC | |
| UDH vs High Grade DCIS | 392 | 26 | 97 |
| UDH vs Low Grade DCIS | 392 | 29 | 95 |
| UDH vs Intermediate Grade DCIS | 392 | 46 | 90 |
|
| |||
| Total Features | Selected Features | AUC | |
| Low Grade vs High Grade DCIS | 392 | 17 | 98 |
| Low Grade vs Intermediate Grade DCIS | 392 | 8 | 83 |
| Intermediate Grade vs High Grade DCIS | 392 | 37 | 69 |
|
| |||
| Total Features | Selected Features | AUC | |
| Textural Features | 288 | 27 | 91 |
| Morphological Features | 24 | 16 | 89 |
| Intensity Features | 80 | 24 | 85 |
|
| |||
| Total Features | Selected Features | AUC | |
| Red Channel | 46 | 28 | 90 |
| V (HSV) Channel | 46 | 26 | 89 |
| L (Luv) Channel | 46 | 23 | 88 |
| L (Lab) Channel | 46 | 23 | 88 |
| BlueRatio Image | 46 | 21 | 85 |
| Green Channel | 46 | 31 | 84 |
| Blue Channel | 46 | 28 | 84 |
| H (H&E) | 46 | 37 | 82 |