Gael Dournes1,2,3, Chase S Hall4,3, Matthew M Willmering5, Alan S Brody5, Julie Macey2, Stephanie Bui6, Baudouin Denis de Senneville7, Patrick Berger8,2, François Laurent8,2, Ilyes Benlala8,2, Jason C Woods5,9. 1. Université de Bordeaux, INSERM, Centre de Recherche Cardio-Thoracique de Bordeaux, U1045, CIC 1401, Bordeaux, France gael.dournes@chu-bordeaux.fr. 2. CHU Bordeaux, Service d'Imagerie Thoracique et Cardiovasculaire, Service des Maladies Respiratoires, Service d'Exploration Fonctionnelle Respiratoire, CIC 1401, Pessac, France. 3. These two authors contributed equally to this work. 4. Division of Pulmonary, Critical Care and Sleep Medicine, Dept of Internal Medicine, University of Kansas School of Medicine, Kansas City, KS, USA. 5. Center for Pulmonary Imaging Research, Division of Pulmonary Medicine and Dept of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA. 6. Bordeaux University Hospital, Hôpital Pellegrin-Enfants, Paediatric Cystic Fibrosis Reference Center (CRCM), CIC 1401, Bordeaux, France. 7. Université de Bordeaux, Mathematical Institute of Bordeaux (IMB), UMR CNRS 5251, Talence, France. 8. Université de Bordeaux, INSERM, Centre de Recherche Cardio-Thoracique de Bordeaux, U1045, CIC 1401, Bordeaux, France. 9. Dept of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA.
Abstract
BACKGROUND: Chest computed tomography (CT) remains the imaging standard for demonstrating cystic fibrosis (CF) airway structural disease in vivo. However, visual scoring systems as an outcome measure are time consuming, require training and lack high reproducibility. Our objective was to validate a fully automated artificial intelligence (AI)-driven scoring system of CF lung disease severity. METHODS: Data were retrospectively collected in three CF reference centres, between 2008 and 2020, in 184 patients aged 4-54 years. An algorithm using three 2D convolutional neural networks was trained with 78 patients' CT scans (23 530 CT slices) for the semantic labelling of bronchiectasis, peribronchial thickening, bronchial mucus, bronchiolar mucus and collapse/consolidation. 36 patients' CT scans (11 435 CT slices) were used for testing versus ground-truth labels. The method's clinical validity was assessed in an independent group of 70 patients with or without lumacaftor/ivacaftor treatment (n=10 and n=60, respectively) with repeat examinations. Similarity and reproducibility were assessed using the Dice coefficient, correlations using the Spearman test, and paired comparisons using the Wilcoxon rank test. RESULTS: The overall pixelwise similarity of AI-driven versus ground-truth labels was good (Dice 0.71). All AI-driven volumetric quantifications had moderate to very good correlations to a visual imaging scoring (p<0.001) and fair to good correlations to forced expiratory volume in 1 s % predicted at pulmonary function tests (p<0.001). Significant decreases in peribronchial thickening (p=0.005), bronchial mucus (p=0.005) and bronchiolar mucus (p=0.007) volumes were measured in patients with lumacaftor/ivacaftor. Conversely, bronchiectasis (p=0.002) and peribronchial thickening (p=0.008) volumes increased in patients without lumacaftor/ivacaftor. The reproducibility was almost perfect (Dice >0.99). CONCLUSION: AI allows fully automated volumetric quantification of CF-related modifications over an entire lung. The novel scoring system could provide a robust disease outcome in the era of effective CF transmembrane conductance regulator modulator therapy.
BACKGROUND: Chest computed tomography (CT) remains the imaging standard for demonstrating cystic fibrosis (CF) airway structural disease in vivo. However, visual scoring systems as an outcome measure are time consuming, require training and lack high reproducibility. Our objective was to validate a fully automated artificial intelligence (AI)-driven scoring system of CF lung disease severity. METHODS: Data were retrospectively collected in three CF reference centres, between 2008 and 2020, in 184 patients aged 4-54 years. An algorithm using three 2D convolutional neural networks was trained with 78 patients' CT scans (23 530 CT slices) for the semantic labelling of bronchiectasis, peribronchial thickening, bronchial mucus, bronchiolar mucus and collapse/consolidation. 36 patients' CT scans (11 435 CT slices) were used for testing versus ground-truth labels. The method's clinical validity was assessed in an independent group of 70 patients with or without lumacaftor/ivacaftor treatment (n=10 and n=60, respectively) with repeat examinations. Similarity and reproducibility were assessed using the Dice coefficient, correlations using the Spearman test, and paired comparisons using the Wilcoxon rank test. RESULTS: The overall pixelwise similarity of AI-driven versus ground-truth labels was good (Dice 0.71). All AI-driven volumetric quantifications had moderate to very good correlations to a visual imaging scoring (p<0.001) and fair to good correlations to forced expiratory volume in 1 s % predicted at pulmonary function tests (p<0.001). Significant decreases in peribronchial thickening (p=0.005), bronchial mucus (p=0.005) and bronchiolar mucus (p=0.007) volumes were measured in patients with lumacaftor/ivacaftor. Conversely, bronchiectasis (p=0.002) and peribronchial thickening (p=0.008) volumes increased in patients without lumacaftor/ivacaftor. The reproducibility was almost perfect (Dice >0.99). CONCLUSION: AI allows fully automated volumetric quantification of CF-related modifications over an entire lung. The novel scoring system could provide a robust disease outcome in the era of effective CF transmembrane conductance regulator modulator therapy.