| Literature DB >> 32411818 |
Mohamed Amgad1, Elisabeth Specht Stovgaard2, Eva Balslev2, Jeppe Thagaard3,4, Weijie Chen5, Sarah Dudgeon5, Ashish Sharma1, Jennifer K Kerner6, Carsten Denkert7,8,9, Yinyin Yuan10,11, Khalid AbdulJabbar10,11, Stephan Wienert7, Peter Savas12,13, Leonie Voorwerk14, Andrew H Beck6, Anant Madabhushi15,16, Johan Hartman17, Manu M Sebastian18, Hugo M Horlings19, Jan Hudeček20, Francesco Ciompi21, David A Moore22, Rajendra Singh23, Elvire Roblin24, Marcelo Luiz Balancin25, Marie-Christine Mathieu26, Jochen K Lennerz27, Pawan Kirtani28, I-Chun Chen29, Jeremy P Braybrooke30,31, Giancarlo Pruneri32, Sandra Demaria33, Sylvia Adams34, Stuart J Schnitt35, Sunil R Lakhani36, Federico Rojo37,38, Laura Comerma37,38, Sunil S Badve39, Mehrnoush Khojasteh40, W Fraser Symmans41, Christos Sotiriou42,43, Paula Gonzalez-Ericsson44, Katherine L Pogue-Geile45, Rim S Kim45, David L Rimm46, Giuseppe Viale47, Stephen M Hewitt48, John M S Bartlett49,50, Frédérique Penault-Llorca51,52, Shom Goel53, Huang-Chun Lien54, Sibylle Loibl55, Zuzana Kos56, Sherene Loi13,57, Matthew G Hanna58, Stefan Michiels59,60, Marleen Kok61,62, Torsten O Nielsen63, Alexander J Lazar41,64,65,66, Zsuzsanna Bago-Horvath67, Loes F S Kooreman68,69, Jeroen A W M van der Laak21,70, Joel Saltz71, Brandon D Gallas5, Uday Kurkure40, Michael Barnes72, Roberto Salgado12,73, Lee A D Cooper74.
Abstract
Assessment of tumor-infiltrating lymphocytes (TILs) is increasingly recognized as an integral part of the prognostic workflow in triple-negative (TNBC) and HER2-positive breast cancer, as well as many other solid tumors. This recognition has come about thanks to standardized visual reporting guidelines, which helped to reduce inter-reader variability. Now, there are ripe opportunities to employ computational methods that extract spatio-morphologic predictive features, enabling computer-aided diagnostics. We detail the benefits of computational TILs assessment, the readiness of TILs scoring for computational assessment, and outline considerations for overcoming key barriers to clinical translation in this arena. Specifically, we discuss: 1. ensuring computational workflows closely capture visual guidelines and standards; 2. challenges and thoughts standards for assessment of algorithms including training, preanalytical, analytical, and clinical validation; 3. perspectives on how to realize the potential of machine learning models and to overcome the perceptual and practical limits of visual scoring.Entities:
Keywords: Breast cancer; Cancer imaging; Prognostic markers; Tumour biomarkers; Tumour immunology
Year: 2020 PMID: 32411818 PMCID: PMC7217824 DOI: 10.1038/s41523-020-0154-2
Source DB: PubMed Journal: NPJ Breast Cancer ISSN: 2374-4677
Sample CTA algorithms from the published literature.
| Stain | Approach | Ref | Data set | Method | Ground truth | Notes |
|---|---|---|---|---|---|---|
| H&E | Patch classification | [ | Multiple sites | CNN | Labeled patches (yes/no TILs) | Strengths: large-scale study with investigation of spatial TIL maps. AV includes molecular correlates. |
| TCGA data set | Annotations are open-access | Limitations: does not distinguish sTIL and iTIL; does not classify individual TILs*. | ||||
| Other: we defined CTA TIL score as fraction of patches that contain TILs, and found this to be correlated with VTA ( | ||||||
| Semantic segmentation | [ | Breast | FCN | Traced region boundaries (exhaustive) | Strengths: large sample size and regions; investigates inter-rater variability at different experience levels; delineation of tumor, stroma and necrosis regions. | |
| TCGA data set | Annotations are open-access | Limitations: only detects dense TIL infiltrates*; does not classify individual TILs*. | ||||
| Semantic segmentation + Object detection | [ | Breast | Seeding + FCN | Traced region boundaries (exhaustive) | Strengths: mostly follows TIL-WG VTA guidelines. AV includes correlation with consensus VTA scores and inter-pathologist variability. | |
| Private data set | Labeled & segmented nuclei within labeled region | Limitations: heavy ground truth requirement*; underpowered CV; and limited manually annotated slides. | ||||
| Object detection | [ | Breast | SVM using morphology features | Labeled nuclei | Strengths: robust analysis and exploration of molecular TIL correlates. | |
| METABRIC data set | Qualitative density scores | Limitations: individual labeled nuclei are limited; does not distinguish TILs in different histologic regions*. | ||||
| [ | Breast | RG and MRF | Labeled patches (low-medium-high density) | Strengths: explainable model and modular pipeline. | ||
| Private data set | Limitations: does not distinguish sTIL and iTIL; does not classify individual TILs. Limited AV sample size. | |||||
| [ | NSCLC | Watershed + SVM classifier | Labeled nuclei | Strengths: explainable model; robust CV; captures spatial TIL clustering. | ||
| Private data sets | Limitations: limited AV; does not distinguish sTIL and iTIL. | |||||
| Object detection + inferred TIL localization | [ | Breast | SVM classifier using morphology features | Labeled nuclei | Strengths: infers TIL localization using spatial localization. Robust CV. Investigation of spatial TIL patterns. | |
| METABRIC + private data sets | Qualitative density scores | Limitations: individual labeled nuclei are limited. not clear if spatial clustering has 1:1 correspondence with regions. | ||||
| IHC | Object detection + manual regions | [ | Colon | Complex pipeline (non-DL) | Overall density estimates | Strengths: CTA within manual regions, including invasive margin. |
| Private data set | Limitations: unpublished AV. | |||||
| Object detection | [ | Multiple | Multiple DL pipelines | Labeled nuclei within FOV (exhaustive) | Strengths: large-scale, robust AV. Systematic benchmarking. | |
| Private data set | Limitations: no CV; does not distinguish TILs in different regions*. |
This non-exhaustive list has been restricted to H&E and chromogenic IHC, although excellent works exist showing CTA based on other approaches like multiplexed immunofluorescence[21–23]. Published CTA algorithms vary markedly in their approach to TIL scoring, the robustness of their validation, their interpretability, and their consistency with published VTA guidelines. Strengths and limitations of each publication is highlighted, with general limitations (related to the broad approach used, not the specific paper) are marked with an asterisk (*). Going forward, nuanced approaches are needed, ideally incorporating workflows for robust quantification and validation as presented in this paper. Different approaches have different ground truth requirements (illustrated in Fig. 1, panel f), hence the need for large-scale ground truth data sets. We encourage all future CTA publications to open-access their data sets whenever possible. Of note are two major efforts: 1. A group of scientists, including the US FDA and the TIL-WG, is collaborating to crowdsource pathologists and collect images and pathologist annotations that can be qualified by the FDA medical device development tool program; 2. The TIL-WG is organizing a challenge to validate CTA algorithms against clinical trial outcome data (CV).
AV analytical validation, CNN convolutional neural network, DL deep learning, FCN fully convolutional network, FOV field of view, MRF markov random field, RG region growing, NSCLC non-small cell lung cancer, SVM support vector machine.
Fig. 1Outline of the visual (VTA) and computational (CTA) procedure for scoring TILs in breast carcinomas.
TIL scoring is a complex procedure, and breast carcinomas are used as an example. Specific guidelines for scoring different tumors are provided in the references. Steps involved in VTA and/or CTA are tagged with these abbreviations. CTA according to TIL-WG guidelines involves TIL scoring in different tissue compartments. a Invasive edge is determined (red) and key confounding regions like necrosis (yellow) are delineated. b Within the central tumor, tumor-associated stroma is determined (green). Other considerations and steps are involved depending on histologic subtype, slide quality, and clinical context. c Determination of regions for inclusion or exclusion in the analysis in accordance with published guidelines. d Final score is estimated (visually) or calculated (computationally). In breast carcinomas, stromal TIL score (sTIL) is used clinically. Intratumoral TIL score (iTIL) is subject to more VTA variability, which has hampered the generation of evidence demonstrating prognostic value; perhaps CTA of iTILs will prove less variable and, consequently, prognostic. e The necessity of diverse pathologist annotations for robust analytical validation of computational models. Desmoplastic stroma may be misclassified as tumor regions; Vacuolated tumor may be misclassified as stroma; intermixed normal acini or ducts, DCIS/LCIS, and blood vessels may be misclassified as tumor; plasma cells are sometimes misclassified as carcinoma cells. Note that while the term “TILs” includes lymphocytes, plasma cells and other small mononuclear infiltrates, lumping these categories may not be optimal from an algorithm design perspective; plasma cells tend to be morphologically different from lymphocytes in nuclear texture, size, and visible cytoplasm. f Various computational approaches may be used for computational scoring. The more granular the algorithm is, the more accurate/useful it is likely to be, but—as a trade-off—the more it relies on exhaustive manual annotations from pathologists. The least granular approach is patch classification, followed by region delineation (segmentation), then object detection (individual TILs). A robust computational scoring algorithm likely utilizes a combination of these (and related) approaches.
Fig. 2Conceptual pathology report for computational TIL assessment (CTA).
CTA reports might include global TIL estimates, broken down by key histologic regions, and estimates of classifier confidence. CTA reports are inseparably linked to WSI viewing systems, where algorithmic segmentations and localizations supporting the calculated scores are displayed for sanity check verification by the attending pathologist. Other elements, like local TIL estimates, TIL clustering results, and survival predictions may also be included.