| Literature DB >> 30577746 |
Will Fischer1, Sanketh S Moudgalya2, Judith D Cohn3, Nga T T Nguyen3, Garrett T Kenyon3.
Abstract
BACKGROUND: Histopathology images of tumor biopsies present unique challenges for applying machine learning to the diagnosis and treatment of cancer. The pathology slides are high resolution, often exceeding 1GB, have non-uniform dimensions, and often contain multiple tissue slices of varying sizes surrounded by large empty regions. The locations of abnormal or cancerous cells, which may constitute a small portion of any given tissue sample, are not annotated. Cancer image datasets are also extremely imbalanced, with most slides being associated with relatively common cancers. Since deep representations trained on natural photographs are unlikely to be optimal for classifying pathology slide images, which have different spectral ranges and spatial structure, we here describe an approach for learning features and inferring representations of cancer pathology slides based on sparse coding.Entities:
Keywords: Cancer pathology slides; Deep learning; Locally Competitive Algorithm; Sparse coding; TCGA; Transfer learning; Unsupervised learning
Mesh:
Year: 2018 PMID: 30577746 PMCID: PMC6302377 DOI: 10.1186/s12859-018-2504-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Matched tumor/non-tumor tissue images
| Tissue of origin | Tumor type | Count |
|---|---|---|
| Adrenal gland | Pheochromocytoma and Paraganglioma | 6 |
| Bile duct | Cholangiocarcinoma | 18 |
| Bladder | Bladder Urothelial Carcinoma | 45 |
| Breast | Breast Invasive Carcinoma | 429 |
| Colon | Colon Adenocarcinoma | 130 |
| Colon | Rectum Adenocarcinoma | 27 |
| Cervix | Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma | 6 |
| Stomach | Stomach Adenocarcinoma | 68 |
| Head and neck | Head and Neck Squamous Cell Carcinoma | 116 |
| Lung | Lung Adenocarcinoma | 179 |
| Lung | Lung Squamous Cell Carcinoma | 115 |
| Liver | Liver Hepatocellular Carcinoma | 118 |
| Esophagus | Esophageal Carcinoma | 16 |
| Pancreas | Pancreatic Adenocarcinoma | 8 |
| Prostate | Prostate Adenocarcinoma | 124 |
| Kidney | Kidney Chromophobe | 69 |
| Kidney | Kidney Renal Clear Cell Carcinoma | 214 |
| Kidney | Kidney Renal Papillary Cell Carcinoma | 78 |
| Sarcoma | Sarcoma | 4 |
| Melanoma (skin) | Skin Cutaneous Melanoma | 2 |
| Thyroid | Thyroid Carcinoma | 114 |
| Thymus | Thymoma | 4 |
| Uterus | Uterine Corpus Endometrial Carcinoma | 54 |
For each tumor from a given patient, at least one slide image was labeled as cancerous (“primary tumor”) and at least one image as “normal” (adjacent samples or clean margin)
Fig. 1Preprocessing of TCGA pathology slides. Full-extent low-resolution images were used to determine image coordinates; full-resolution image slices were used to generate sparse representations. Top: initial image; center: fast Fourier transform versus all-white, to determine optically dark regions of the image; bottom: non-overlapping image slices representing a succession of darkest remaining portions of the image. Full resolution regions of interest (ROIs; colored boxes) were extracted from the SVS file; the four darkest ROIs from each image were used for the analyses reported here
Fig. 2Sample region-of-interest (ROI) images. Each group of 8 small images contains ROIs derived from contemporaneous normal and tumor tissue samples from a single patient; within each group, the top row of 4 represents normal tissue; the bottom row, tumor tissue. Groups represent the following tumor types (left to right): row 1, adrenal, bile duct, bladder, stomach; row 2, breast, breast, colon, colon; row 3, lung, liver, pancreas, thyroid; row 4, prostate, prostate, kidney, kidney. Some sample pairs show overt tumor signatures (e.g., tissue disorganization, densely packed nuclei associated with rapid proliferation), but other samples lack such obvious features
Fig. 3Distribution of feature coefficients. Histogram giving the percentage of non-zero activation coefficients for each of the 512 512×512 feature maps, averaged over a large set of ROIs
Fig. 4Feature dictionary. Dictionary of 512 convolutional feature kernels learned from the complete set of tumor and non-tumor image ROIs
Fig. 5Image reconstructions. Samples of reconstructed images based on convolutional feature kernels and weights (coefficients). Top: original images; bottom: reconstructions
Summary of classification performances
| Approach | Classification score |
|---|---|
| Sparse coding, SVM | 84.23% |
| RESNET-152 | 85.48 ±0.36% |
| Sparse coding, MLP | 93.32 ±0.21% |