Literature DB >> 35136672

An Expandable Informatics Framework for Enhancing Central Cancer Registries with Digital Pathology Specimens, Computational Imaging Tools, and Advanced Mining Capabilities.

David J Foran1,2, Eric B Durbin3,4, Wenjin Chen1, Evita Sadimin1,2, Ashish Sharma5, Imon Banerjee5, Tahsin Kurc6, Nan Li5, Antoinette M Stroup7, Gerald Harris7, Annie Gu5, Maria Schymura8, Rajarsi Gupta6, Erich Bremer6, Joseph Balsamo6, Tammy DiPrima6, Feiqiao Wang6, Shahira Abousamra9, Dimitris Samaras9, Isaac Hands4, Kevin Ward10, Joel H Saltz6.   

Abstract

BACKGROUND: Population-based state cancer registries are an authoritative source for cancer statistics in the United States. They routinely collect a variety of data, including patient demographics, primary tumor site, stage at diagnosis, first course of treatment, and survival, on every cancer case that is reported across all U.S. states and territories. The goal of our project is to enrich NCI's Surveillance, Epidemiology, and End Results (SEER) registry data with high-quality population-based biospecimen data in the form of digital pathology, machine-learning-based classifications, and quantitative histopathology imaging feature sets (referred to here as Pathomics features).
MATERIALS AND METHODS: As part of the project, the underlying informatics infrastructure was designed, tested, and implemented through close collaboration with several participating SEER registries to ensure consistency with registry processes, computational scalability, and ability to support creation of population cohorts that span multiple sites. Utilizing computational imaging algorithms and methods to both generate indices and search for matches makes it possible to reduce inter- and intra-observer inconsistencies and to improve the objectivity with which large image repositories are interrogated.
RESULTS: Our team has created and continues to expand a well-curated repository of high-quality digitized pathology images corresponding to subjects whose data are routinely collected by the collaborating registries. Our team has systematically deployed and tested key, visual analytic methods to facilitate automated creation of population cohorts for epidemiological studies and tools to support visualization of feature clusters and evaluation of whole-slide images. As part of these efforts, we are developing and optimizing advanced search and matching algorithms to facilitate automated, content-based retrieval of digitized specimens based on their underlying image features and staining characteristics.
CONCLUSION: To meet the challenges of this project, we established the analytic pipelines, methods, and workflows to support the expansion and management of a growing repository of high-quality digitized pathology and information-rich, population cohorts containing objective imaging and clinical attributes to facilitate studies that seek to discriminate among different subtypes of disease, stratify patient populations, and perform comparisons of tumor characteristics within and across patient cohorts. We have also successfully developed a suite of tools based on a deep-learning method to perform quantitative characterizations of tumor regions, assess infiltrating lymphocyte distributions, and generate objective nuclear feature measurements. As part of these efforts, our team has implemented reliable methods that enable investigators to systematically search through large repositories to automatically retrieve digitized pathology specimens and correlated clinical data based on their computational signatures. Copyright:
© 2022 Journal of Pathology Informatics.

Entities:  

Keywords:  Cancer registries; computational imaging; deep-learning; digital pathology

Year:  2022        PMID: 35136672      PMCID: PMC8794027          DOI: 10.4103/jpi.jpi_31_21

Source DB:  PubMed          Journal:  J Pathol Inform


INTRODUCTION

The NCI’s Surveillance, Epidemiology, and End Results (SEER) program is a coordinated system of 19 cancer registries that is charged with providing timely and accurate data regarding cancer incidence, mortality, treatment, and survival. Pathology datasets currently available in the SEER registries are qualitative in nature, consisting of scoring and staging data captured in normal registry abstracts and pathology reports. Such datasets are generally subject to inter-observer variability, which can result in biases in population-wide studies of cancer incidence, mortality, survival, and prevalence. The main goal of our project is to enrich SEER registry data with high-quality population-based digital biospecimen data in the form of pathology tissue images and detailed computational tissue characterizations and features (also referred to as Pathomics features) derived from the images. Examples of Pathomics data include detailed characterizations of cancer and stromal nuclei and quantification and mapping of tumor-infiltrating lymphocytes (TILs) along a supplementary histology classification generated through deep-learning algorithms. These data will augment existing registry data with quantitative features obtained directly from clinically acquired whole slide tissue images and provide detailed and nuanced information on tumor histology. The scientific premise motivating this work is that the incorporation of quantitative digital pathology into the cancer registries will result in a valuable population-wide dataset that can provide additional insight into the underlying characteristics of cancer. Next Generation Sequencing (NGS) technologies have captured much attention of the clinical community for their capacity to provide insight as to personalized choice in treatment and therapy. A major limitation of NGS technologies is that they obliterate the spatial information associated within and throughout the tumor environment. Histopathology and immunostaining localization techniques preserve this information which is invaluable in making accurate determinations. In fact, it is through the process of histopathology examination that tumor margins/volumes are determined by pathologists prior to the NGS analysis. These parameters are subsequently used to help guide decisions regarding appropriate cut-offs for allele frequencies and drive other components of the overall analysis. Pathomics features extracted from high-resolution pathology images are a quantitative surrogate of what is described in a pathology report. The important distinction is that these features are reproducible, unlike human observations, which are highly qualitative and subject to a high degree of inter- and intra-observer variability. The importance of increasing reproducibility and reducing inter-observer variability in pathology studies has been previously reported.[1234567891011121314151617181920212223242526] Moreover, many studies have demonstrated that quantitative image characterizations (e.g., nuclear features, patterns of TILs) are promising biomarkers which can be used to predict outcome and treatment response, if available in a large population.[27282930313233343536373839] These biomarkers integrated with clinical and genomics data can provide new opportunities to enhance our understanding of cancer incidence, mortality, survival, along with statistical characterizations of lifetime risk, and to improve prediction and assessment of therapeutic effectiveness. Our project began as collaboration among investigators within the state cancer registries of New Jersey, Georgia, and Kentucky. The consortium of partnering sites has recently expanded to include the newly established New York Cancer Registry. In this collaborative effort, we are implementing a framework of data curation and analysis workflows, computational imaging tools, and informatics infrastructure to support the creation and management of a well-curated, integrated repository of high-quality digitized pathology images and Pathomics features, for subjects whose data are being collected by the registries. The framework is being developed in close collaboration with SEER registries to ensure that it is scalable and in-line with existing registry processes and can support queries and the creation of population cohorts that span multiple registries. In our framework, whole slide tissue images in the repository are systematically processed to compute Pathomics data and to establish linkages with registry data. The current set of Pathomics data includes (1) quantification of TILs, (2) segmentation and computational description of cancerous and stromal nuclei, (3) segmentation of tumor regions, (4) characterization of regional Gleason grade for prostate cancer, and (5) identification of non-small cell lung cancer (NSCLC) adenocarcinoma subtypes. This initial set is primarily motivated by an increasing number of scientific studies that investigate TILs and the relationships among TILs, tumors, and nuclear structure of tissue.[404142434445] Such investigations can provide important information to advance our understanding of immune response in many cancer types. In the future, additional Pathomics features, such as the spectral and spatial signatures of staining characteristics exhibited by the digitized specimens, will be incorporated into our framework. The informatics infrastructure for this project is being built on open-source software and leverages modern software technologies, such as containerization and web-based applications, for a scalable, extensible implementation.[4647] The infrastructure facilitates visualization of high-resolution whole slide tissue images along with associated Pathomics datasets. User authentication and access controls are implemented to thwart unauthorized access to data. The informatics infrastructure is being expanded to include tools to support content-based image retrieval. Presently, the repository manages diagnostic whole slide tissue images and analysis results obtained from 772 prostate cases, 1410 NSCLC cases, 70 breast cancer cases, and 48 lymphoma cases from the New Jersey State Cancer Registry and from 198 breast cancer cases from the Georgia State Cancer Registry. The scientific validation of the proposed environment will be undertaken through performance studies led by investigators throughout the four collaborating sites with an overarching focus on breast cancer, colorectal cancer, lymphoma, melanoma, NSCLC, and prostate cancer. We are confident that this repository will enable effective integration of pathology imaging and feature data as an invaluable resource in SEER registries. In the rest of the paper, we describe the design and implementation of the key components of the framework: the data curation and analysis processes, the initial set of image analysis methods, and the underlying informatics infrastructure for data management and visualization.

MATERIALS AND METHODS

Aggregation, quality control, and linkage of image data

The first component of our framework is the curation of pathology imaging data and linkage with other data from the cancer registries. Image quality control is an essential step, because specimen preparation protocols and tissue scanning procedures may result in imaging artifacts and variations in image quality. We devised and refined a workflow to facilitate the collection and quality control of digitized tissue specimens and linkage of images with correlated data extracted from the cancer registries. Here we describe the workflow deployed at Rutgers and the New Jersey SEER registry; the other sites—Georgia, Kentucky, and New York—are incrementally adopting analogous workflows as approved by their SEER registries and Institutional Review Boards (IRBs). Figure 1 depicts an instance of the workflow. Specimen retrieval and imaging are coordinated at the Biomedical Informatics Shared Resource (BISR) of Rutgers Cancer Institute of New Jersey (RCINJ). Breast, colorectal, lung, melanoma, and prostate cancer cases suitable for the project exhibiting well-defined tumor type and diagnoses are selected by a pathologist at the RCINJ and Rutgers Robert Wood Johnson Medical School. Cases within approximately a 2-year window are retrieved from onsite storage, whereas others are requested from offsite storage with the help of BioSpecimens Repository Service of RCINJ. After a certified pathologist selects suitable slides according to requirement of each cancer type—e.g., prostate cancer specimens are selected according to the Gleason grade—the specimens are imaged with an Olympus VS120 whole slide scanner with no protected health information appearing in image filename, image metadata, or the images themselves.
Figure 1

Workflow for assembling linked image/data cohorts

Workflow for assembling linked image/data cohorts Team members from the BISR and NJCR perform cross-specialty review of the data for quality control. A secure, IRB-approved, Oracle-based (Redwood Shores, CA, USA) Clinical Research Data Warehouse is used at Rutgers to facilitate review of imaging and correlated clinical information on an individual patient basis or as part of large cohorts. The data warehouse has been commissioned to house multimodal data (genomics, digital pathology, radiology images). It orchestrates aggregation of information originating from multiple data sources including Electronic Medical Records, Clinical Trial Management Systems, Tumor Registries, Biospecimen Repositories, Radiology and Pathology archives, and Next Generation Sequencing services [Figure 2]. Innovative solutions were implemented in the warehouse to detect and extract unstructured clinical information that was embedded in paper/text documents, including synoptic pathology reports. The Warehouse receives objective oversight by a standing Data Governance Council.[48] An Informatica-based (Redwood City, CA, USA) extraction transformation and load interface (ETL) has been developed to automatically populate the Data Warehouse with data elements originating from the multi-modal data sources. This past year our team worked closely with the Google Healthcare team to successfully create and test an instance of the Data Warehouse on the Google Cloud Platform (GCP). In May 2020, we demonstrated the scalability of the cloud-based ETL, Warehouse, and Data Mart. As part of the project, our team will expand the use of the Warehouse by configuring it to integrate digitized pathology specimens with data originating from all of the collaborating cancer registries.
Figure 2

Clinical Research Data Warehouse workflow. The research data warehouse aggregates information from multiple data sources such as electronic health records, tumor registries, and radiology and pathology archives. It facilitates review of imaging data and linked clinical data on a single patient or cohort basis

Clinical Research Data Warehouse workflow. The research data warehouse aggregates information from multiple data sources such as electronic health records, tumor registries, and radiology and pathology archives. It facilitates review of imaging data and linked clinical data on a single patient or cohort basis The images and cases are linked through deidentified ID sequences. The New Jersey State Cancer Registry receives the deidentified ID as well as case information including specific surgery number and date, so that after data retrieval and decoding encrypted fields, the deidentified ID is linked with clinical data associated with the case and, more specifically, with the diagnostic surgery. This ensures that the cancer specimen images are associated with the correct staging of the disease at the time of diagnosis so that it can be used in downstream research. The total corpus of data comprising the linked data sets encompasses more than 150 data elements, including the de-coded NAACCR data, as shown in Table 1. The de-identified images are analyzed through a set of deep-learning analysis pipelines as described in the subsequent sections.
Table 1

Representative categories and linked data elements

SourceCategoryRepresentative elements
Cancer Demographics age_at_dx, sex, marital_status_at_dx, race, nhia, napiia, county_at_dx, etc
Registry
Vital information vital_status, date_of_death, primary_cause
Tumor information Primary_site, laterality, grade, diagnosis_confirmation
Tumor extension and metastasiscs_extension, cs_tumor_size, cs_lymph_nodes, cs_mets_at_dx
Pathology info and tumor staginghistology_icdo3, behavior_icdo3, clinical and pathology staging in AJCC 6, 7, 8 and SEER staging
Site-specific data cs_site_specific factors
Tumor treatments Surgical, radiation, hormone, BRM, and other cancer treatment information
Imaging Pathology images Digitized representative diagnostic slides in Olympus (.vsi) and Philips (.svs?) whole slide image formats, including image metadata such as imaging device, optical settings and configuration, specimen staining, etc.
Computational imaging signaturesTumor-infiltrating lymphocytes; tumor pattern segmentation; tumor and stromal nuclei segmenta-tion; spatial and spectral signatures
Representative categories and linked data elements

Extraction of Pathomics features

Development of tissue image analysis methods is a highly active area of research and implementation. A variety of analysis methods for segmentation and classification of objects, regions, and structures (such as nuclei, tumors, glands) in tissue images have been developed. Excellent overviews of existing techniques can be found in several review papers.[49505152535455] Deep-learning-based analysis approaches have become popular, because deep-learning methods have been shown to outperform traditional image analysis methods in many application domains, including digital pathology. Our current tissue image analysis library consists of deep-learning methods developed by our group to classify patterns of TILs,[5657] segment tumor regions, classify tumor subtypes,[5859] and segment nuclei in whole slide images (WSIs) of hematoxylin and eosin-stained tissue samples.[6061] We should note that the analysis functionality is not limited to methods implemented by our group only. We have started with these methods because (1) they are based on state-of-the-art convolutional neural network architectures, such as VGG16,[62] Inception V4,[63] ResNet,[64] and U-Net,[65] (2) they have achieved high accuracy scores, and (3) they have been previously used, refined, and validated in generating large, curated Pathomics datasets. For example, the TIL models were developed in close collaboration with pathologists, who generated a large set of training data, evaluated analysis results, and helped refine the models. The final models were employed to produce and publish a TIL dataset from 5202 WSIs from 13 cancer types.[5657] The nucleus segmentation model was developed in a similar approach with one difference. In addition to manually annotated segmentations, a synthetic data generation method, based on generative adversarial networks,[66] was used to significantly increase the diversity and size of training data.[60] The model trained with the combined manual and synthetic training data was used to generate a quality-controlled dataset of 5 billion segmented nuclei in 5060 WSIs from 10 cancer types[61] in the Cancer Genome Atlas (TCGA) repository. We plan to expand the suite of analysis methods and incorporate state-of-the-art methods developed by other groups over time. Indeed, at the time of writing this manuscript, we are in the process of integrating and validating Hover-Net[67] in the framework for segmentation and classification of nuclei. The current suite of TIL analysis models can resolve TIL distributions in a WSI at the level of 50 × 50 µm2 patches. The characteristics of tumor regions and the relationship between tumor regions and lymphocyte cells can be used to determine cancer stage and evaluate response to treatment. Our current models can segment tumor regions in lung, prostate, pancreatic, and breast cancer types and can classify tumor and non-tumor regions at the level of 88 × 88 µm2 patches. The model for prostate cancer can segment and label a tumor subregion with one of the three Gleason scores: Benign, Grade 3, and Grade 4+5. The lung tumor segmentation model is able to segment and label a tumor subregion with one of the six tumor subtypes: acinar, benign, lepidic, micropapillary, mucinous, and solid. Nucleus segmentation is one of the core digital pathology analysis steps. The shape and texture properties and spatial distributions of nuclei in tissue specimens are used in cancer diagnosis and staging. Our nucleus segmentation model can detect nuclei and delineate their boundaries in WSIs. After a WSI has been processed by the segmentation model, we compute a set of shape, intensity, and texture features. We use the PyRadiomics library[68] to compute the patch-level features.

Management, visualization, and review of Pathomics features

Our data analysis workflow implements an iterative train-predict-review-refine process to curate robust Pathomics features. This process is based on our earlier works in curating large Pathomics datasets[575961] and is carried out as part of the training and prediction phases of the deep-learning analysis pipelines. We developed a set of tools to enable the iterative process and to provide support for the management, indexing, and interactive viewing of WSIs and analysis results. The tools are implemented as a set of web-based applications and services in the PRISM and QuIP software platforms.[4647] Using these tools, pathologists can inspect the output of a tumor or TIL analysis pipeline as full-resolution heatmap overlays on WSIs. A heatmap is a spatial representation of prediction probabilities assigned to individual image patches by the deep-learning model; the probability value indicates if a patch is class-positive (e.g., TIL-positive, tumor-positive). Figure 3 shows example heatmaps generated from the TIL (upper figure) and tumor (lower figure) analysis pipelines. Nuclear segmentation results can be viewed as polygons, which represent the boundaries of segmented nuclei as overlays on the images in QuIP [Figure 4].
Figure 3

TIL and tumor analysis results displayed as a heatmap on the whole slide tissue image. TIL analysis results on the left and the tumor segmentation results on the right. The red color indicates a higher probability of a patch being TIL-positive (or tumor-positive) and the blue color indicates a lower probability

Figure 4

Segmented nuclei overlaid as polygons shown in blue on the WSI. Each polygon represents the boundary of a segmented nucleus

TIL and tumor analysis results displayed as a heatmap on the whole slide tissue image. TIL analysis results on the left and the tumor segmentation results on the right. The red color indicates a higher probability of a patch being TIL-positive (or tumor-positive) and the blue color indicates a lower probability Segmented nuclei overlaid as polygons shown in blue on the WSI. Each polygon represents the boundary of a segmented nucleus Figure 5 shows how the iterative process is executed with QuIP. For example, after a set of WSIs are processed by the TIL and tumor segmentation models, the source WSIs and the heatmaps are loaded to QuIP for management and visualization. The heatmaps and WSIs are also transformed into feature maps. Feature maps are lower resolution representations of the heatmaps and WSIs in a four-panel image. Figure 6 illustrates an example feature map which combines TIL results from a VGG16 model and tumor segmentation results from a ResNet model. The upper left corner of the image is the low-resolution tissue image, the upper right corner is the tumor segmentation map, the lower left corner represents the TIL map, and the lower right corner is the combined and thresholded TIL and tumor maps. Feature maps allow a pathologist to review results more efficiently than examining full-resolution images and maps. If the pathologist sees potential problems with the results during this review, they use the web applications in QuIP to visualize the WSIs and heatmaps at higher resolutions. If the review necessitates refinements to the model, additional training data are generated and added to the training dataset. They can annotate regions in an image using web-based visualization and annotation tools. Patches extracted from these annotations are reviewed and labeled to create additional training data. The model is refined by re-training the method with the updated training dataset.
Figure 5

The iterative workflow starts with a set of patches which are extracted from whole slide tissue images and labeled for initial model training. Predictions from the trained model are reviewed as feature maps and heatmaps. The heatmaps are annotated to generate additional labeled patches which are added to the training dataset. The deep learning network is retrained with the updated training dataset to refine the model

Figure 6

A feature map representation of TIL and tumor analysis results generated from a WSI in the Cancer Genome Atlas repository. The low-resolution version of the input WSI is displayed in the upper left corner. The upper right corner is the tumor segmentation map. The TIL map is displayed in the lower left corner. The lower right corner is the combined and thresholded TIL and tumor maps

The iterative workflow starts with a set of patches which are extracted from whole slide tissue images and labeled for initial model training. Predictions from the trained model are reviewed as feature maps and heatmaps. The heatmaps are annotated to generate additional labeled patches which are added to the training dataset. The deep learning network is retrained with the updated training dataset to refine the model A feature map representation of TIL and tumor analysis results generated from a WSI in the Cancer Genome Atlas repository. The low-resolution version of the input WSI is displayed in the upper left corner. The upper right corner is the tumor segmentation map. The TIL map is displayed in the lower left corner. The lower right corner is the combined and thresholded TIL and tumor maps

RESULTS

The current implementation of the framework—the curation and analysis workflows, analysis methods, and informatics infrastructure—has been successfully deployed. The workflows and analytic methods have received IRB approval at all collaborating institutions. The framework has been employed to create a repository of diagnostic images from 772 prostate cases, 1410 NSCLC cases, 70 breast cancer cases, and 48 lymphoma cases from the New Jersey State Cancer Registry and from 198 breast cancer cases from the Georgia State Cancer Registry. The repository also contains results from TIL and tumor segmentation for each image and more than 2.5 billion segmented nuclei from all of the images. For each image, there are two TIL analysis results (one generated from the VGG16 network and the other from the Inception V4 network). The images and Pathomics data are managed by an instance of QuIP running at Stony Brook for interactive visualization of images and Pathomics features. All of the results and images are also stored in Box folders to facilitate bulk data downloads.

DISCUSSION AND CONCLUSIONS

Evaluation of cancer control interventions in prevention, screening, and treatment and their effects on population trends in incidence and mortality hinge on accurate, reproducible, and nuanced pathology characterizations. Diagnostic and treatment guidelines also specify detailed measurements of TILs, nuclear grade; i.e., evaluation of the size and shape of the nucleus in the tumor cells, mitoses, and IHC staining, which are currently not included in cancer registry data abstraction. Presently, the SEER Pathology workflow, depicted in Figure 7, begins with normal registry abstracts and electronic pathology (e-Path) reports securely transmitted to the SEER registries. Although scoring and staging data are captured and made available through the registries, there have been numerous studies that showed a high level of inter-observer variability among the diagnostic classifications rendered by pathologists, which can potentially give rise to biases when conducting population-wide studies. As the diagnosis of cancer and its immune response to therapy is made through tissue studies, the integration of pathology imaging in SEER registries is critical to precisely classify tumors and predict tumor response to therapies.
Figure 7

Pathology image workflow. WSIs are de-identified and analyzed by deep-learning analysis pipelines deployed in containers. Image data are linked to the SEER Registry database to enhance it with quantitative imaging features (such as TIL distributions and tumor segmentations) extracted by deep-learning models. De-identified images and imaging features can then be used for data mining and research purposes

Pathology image workflow. WSIs are de-identified and analyzed by deep-learning analysis pipelines deployed in containers. Image data are linked to the SEER Registry database to enhance it with quantitative imaging features (such as TIL distributions and tumor segmentations) extracted by deep-learning models. De-identified images and imaging features can then be used for data mining and research purposes Whole slide tissue scanning technologies have advanced significantly over the past 20 years.[69] They are capable of imaging tissue specimens at high resolution in several minutes, and with advanced auto-focussing mechanisms and automated slide trays, they can process batches of tissue samples with little-to-no manual intervention. Several studies have evaluated the utility of imaged tissue data in pathology workflows.[707172737475] The Food and Drug Administration has approved a number of digital pathology systems for diagnostic use.[76] We expect that digital pathology will be employed increasingly as part of routine pathology workflows at hospitals and medical research centers. As institutions adopt digital WSIs into their pathology workflows, we can envision that the images and molecular reports will also be securely transmitted to the SEER registries. Within the SEER registry, images will be automatically processed by the suite of feature extraction pipelines appropriate for the type of cancer. The SEER database will be enhanced with quantitative features and the accompanying pipeline distribution version. SEER*DMS will be used to link and integrate cancer abstracts, e-Path reports, WSIs, and Pathomics feature sets from all reporting facilities. De-identified images and annotations will then be extracted for data mining and research use. Our work on building a repository of curated WSIs and Pathomics features is an important step toward realizing this capability. Availability of tissue images and Pathomics datasets will also provide an invaluable resource for medical education and Pathology training as well as to facilitate multi-disciplinary approaches, improved quality control, and more efficient remote and collaborative access to tissue information.[7778] The first phase of our project focussed on the collection of cases and correlated pathology specimens from the archives of New Jersey State Cancer Registries and Rutgers Cancer Institute of New Jersey and on targeted prostate and NSCLC cases. To date, we have established a repository of (1) high-quality digitized pathology images for subjects whose data are already being routinely collected by the collaborating registries and (2) Pathomics features consisting of patterns of TILs, tumor region segmentations and classifications, and segmented nuclei. We have completed the initial linkages with registry data, thus enabling the creation of information-rich, population cohorts containing objective imaging and clinical attributes that can be mined. As part of the second phase of the effort, we have increased the number of contributing state registries to include Georgia, Kentucky, and New York and we have simultaneously expanded the scope of cancers under study by including melanoma, breast, and colorectal cancers. We will also build upon our team’s previous research efforts to design, develop, and optimize algorithms and methods that can quickly and reliably search through a growing reference library of cases to automatically identify and retrieve previously analyzed lesions which exhibit the most similar characteristics to a given query case for clinical decision support[202122257980818283848586] and to conduct more granular comparisons of tumors within and across patient populations. One of the potential advantages of this approach over purely alphanumeric search strategies is that it will enable investigators to systematically interrogate the data while visualizing the most relevant digitized pathology specimens.[3233] As part of the next phase of our project, we plan to investigate the automated nature of the full range of algorithms and methods for their capacity to enable clinicians and investigators to quickly and reliably answer questions such as: (a) What level of morphological variations are detected among a given set of tumors or specimens? (b) What changes in computational biomarker signatures occur at onset and key stages of disease progression? (c) What is the likely prognosis for a given patient population?

Software availability

The QuIP software and analysis methods are available as open-source codes for use by other research groups. The QuIP software platform can be downloaded and built from https://github.com/SBU-BMI/quip_distro. The codes for the analysis methods can be accessed from links at https://github.com/SBU-BMI/histopathology_analysis.

Financial support and sponsorship

This work is supported, in part, by UG3CA225021, UH3CA225021, U24 CA215109, U24CA180924-01A1, and 5UL1TR003017 grants from the National Institutes of Health and generous contributions to Stony Brook from Bob Beals and Betsy Barton. Additional support was provided through funding from the U.S. Department of Veterans Affairs – Boston Healthcare System through contract, IPA-RU-092920. This work leveraged resources from XSEDE, which is supported by NSF ACI-1548562 grant, including the Bridges system (NSF ACI-1445606) at the Pittsburgh Supercomputing Center. Services, results and/or products in support of the research were generated by Rutgers Cancer Institute of New Jersey Biomedical Informatics Shared Resource NCI-CCSG P30CA072720-5917.

Conflicts of interest

There are no conflicts of interest.
  79 in total

Review 1.  Observer variability in RECIST-based tumour burden measurements: a meta-analysis.

Authors:  Soon Ho Yoon; Kyung Won Kim; Jin Mo Goo; Dong-Wan Kim; Seokyung Hahn
Journal:  Eur J Cancer       Date:  2015-12-10       Impact factor: 9.162

2.  Computer-assisted discrimination among malignant lymphomas and leukemia using immunophenotyping, intelligent image repositories, and telemicroscopy.

Authors:  D J Foran; D Comaniciu; P Meer; L A Goodell
Journal:  IEEE Trans Inf Technol Biomed       Date:  2000-12

3.  A Digital Pathology Solution to Resolve the Tissue Floater Conundrum.

Authors:  Liron Pantanowitz; Pamela Michelow; Scott Hazelhurst; Shivam Kalra; Charles Choi; Sultaan Shah; Morteza Babaie; Hamid R Tizhoosh
Journal:  Arch Pathol Lab Med       Date:  2021-03-01       Impact factor: 5.534

4.  ImageMiner: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology.

Authors:  David J Foran; Lin Yang; Wenjin Chen; Jun Hu; Lauri A Goodell; Michael Reiss; Fusheng Wang; Tahsin Kurc; Tony Pan; Ashish Sharma; Joel H Saltz
Journal:  J Am Med Inform Assoc       Date:  2011-05-23       Impact factor: 4.497

Review 5.  Informatics for practicing anatomical pathologists: marking a new era in pathology practice.

Authors:  Manal Y Gabril; George M Yousef
Journal:  Mod Pathol       Date:  2010-01-15       Impact factor: 7.842

6.  U-Net: deep learning for cell counting, detection, and morphometry.

Authors:  Thorsten Falk; Dominic Mai; Robert Bensch; Özgün Çiçek; Ahmed Abdulkadir; Yassine Marrakchi; Anton Böhm; Jan Deubner; Zoe Jäckel; Katharina Seiwald; Alexander Dovzhenko; Olaf Tietz; Cristina Dal Bosco; Sean Walsh; Deniz Saltukoglu; Tuan Leng Tay; Marco Prinz; Klaus Palme; Matias Simons; Ilka Diester; Thomas Brox; Olaf Ronneberger
Journal:  Nat Methods       Date:  2018-12-17       Impact factor: 28.547

7.  Content-based histopathology image retrieval using CometCloud.

Authors:  Xin Qi; Daihou Wang; Ivan Rodero; Javier Diaz-Montes; Rebekah H Gensure; Fuyong Xing; Hua Zhong; Lauri Goodell; Manish Parashar; David J Foran; Lin Yang
Journal:  BMC Bioinformatics       Date:  2014-08-26       Impact factor: 3.169

8.  Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types.

Authors:  Le Hou; Rajarsi Gupta; John S Van Arnam; Yuwei Zhang; Kaustubh Sivalenka; Dimitris Samaras; Tahsin M Kurc; Joel H Saltz
Journal:  Sci Data       Date:  2020-06-19       Impact factor: 6.444

9.  Identifying survival associated morphological features of triple negative breast cancer using multiple datasets.

Authors:  Chao Wang; Thierry Pécot; Debra L Zynger; Raghu Machiraju; Charles L Shapiro; Kun Huang
Journal:  J Am Med Inform Assoc       Date:  2013-04-12       Impact factor: 4.497

10.  Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features.

Authors:  Kun-Hsing Yu; Ce Zhang; Gerald J Berry; Russ B Altman; Christopher Ré; Daniel L Rubin; Michael Snyder
Journal:  Nat Commun       Date:  2016-08-16       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.