Thomas E Tavolara1, M K K Niazi2, Adam C Gower3, Melanie Ginese4, Gillian Beamer4, Metin N Gurcan1. 1. Center for Biomedical Informatics, Wake Forest School of Medicine, 486 Patterson Avenue, Winston-Salem, NC 27101, United States. 2. Center for Biomedical Informatics, Wake Forest School of Medicine, 486 Patterson Avenue, Winston-Salem, NC 27101, United States. Electronic address: mniazi@wakehealth.edu. 3. Department of Medicine, Boston University School of Medicine, 72 E. Concord St Evans Building, Boston, MA 02118, United States. 4. Department of Infectious Disease and Global Health, Tufts University Cummings School of Veterinary Medicine, 200 Westboro Rd., North Grafton, MA 01536, United States.
Abstract
BACKGROUND: Machine learning sustains successful application to many diagnostic and prognostic problems in computational histopathology. Yet, few efforts have been made to model gene expression from histopathology. This study proposes a methodology which predicts selected gene expression values (microarray) from haematoxylin and eosin whole-slide images as an intermediate data modality to identify fulminant-like pulmonary tuberculosis ('supersusceptible') in an experimentally infected cohort of Diversity Outbred mice (n=77). METHODS: Gradient-boosted trees were utilized as a novel feature selector to identify gene transcripts predictive of fulminant-like pulmonary tuberculosis. A novel attention-based multiple instance learning model for regression was used to predict selected genes' expression from whole-slide images. Gene expression predictions were shown to be sufficiently replicated to identify supersusceptible mice using gradient-boosted trees trained on ground truth gene expression data. FINDINGS: The model was accurate, showing high positive correlations with ground truth gene expression on both cross-validation (n = 77, 0.63 ≤ ρ ≤ 0.84) and external testing sets (n = 33, 0.65 ≤ ρ ≤ 0.84). The sensitivity and specificity for gene expression predictions to identify supersusceptible mice (n=77) were 0.88 and 0.95, respectively, and for an external set of mice (n=33) 0.88 and 0.93, respectively. IMPLICATIONS: Our methodology maps histopathology to gene expression with sufficient accuracy to predict a clinical outcome. The proposed methodology exemplifies a computational template for gene expression panels, in which relatively inexpensive and widely available tissue histopathology may be mapped to specific genes' expression to serve as a diagnostic or prognostic tool. FUNDING: National Institutes of Health and American Lung Association.
BACKGROUND: Machine learning sustains successful application to many diagnostic and prognostic problems in computational histopathology. Yet, few efforts have been made to model gene expression from histopathology. This study proposes a methodology which predicts selected gene expression values (microarray) from haematoxylin and eosin whole-slide images as an intermediate data modality to identify fulminant-like pulmonary tuberculosis ('supersusceptible') in an experimentally infected cohort of Diversity Outbred mice (n=77). METHODS: Gradient-boosted trees were utilized as a novel feature selector to identify gene transcripts predictive of fulminant-like pulmonary tuberculosis. A novel attention-based multiple instance learning model for regression was used to predict selected genes' expression from whole-slide images. Gene expression predictions were shown to be sufficiently replicated to identify supersusceptible mice using gradient-boosted trees trained on ground truth gene expression data. FINDINGS: The model was accurate, showing high positive correlations with ground truth gene expression on both cross-validation (n = 77, 0.63 ≤ ρ ≤ 0.84) and external testing sets (n = 33, 0.65 ≤ ρ ≤ 0.84). The sensitivity and specificity for gene expression predictions to identify supersusceptible mice (n=77) were 0.88 and 0.95, respectively, and for an external set of mice (n=33) 0.88 and 0.93, respectively. IMPLICATIONS: Our methodology maps histopathology to gene expression with sufficient accuracy to predict a clinical outcome. The proposed methodology exemplifies a computational template for gene expression panels, in which relatively inexpensive and widely available tissue histopathology may be mapped to specific genes' expression to serve as a diagnostic or prognostic tool. FUNDING: National Institutes of Health and American Lung Association.
Authors: Alena Arlova; Chengcheng Jin; Abigail Wong-Rolle; Eric S Chen; Curtis Lisle; G Thomas Brown; Nathan Lay; Peter L Choyke; Baris Turkbey; Stephanie Harmon; Chen Zhao Journal: J Pathol Inform Date: 2022-01-20