Identification and characterisation of interstitial lung disease using CT can be difficult. For the general radiologist, such cases can be infrequent, and the various types of interstitial lung disease and their radiological correlates might be unfamiliar, especially given changes in diagnostic criteria that occur over time. However, radiological evaluation of interstitial lung disease and further characterisation of pulmonary fibrosis can be difficult even for the subspecialist radiologist. There are multiple reasons for this, but a crucial factor is confident detection of a specific pattern of abnormality—honeycombing. The presence of honeycombing is the key distinction between a typical usual interstitial pneumoniaCT pattern and a probable usual interstitial pneumoniaCT pattern in the current Fleischner diagnostic criteria for idiopathic pulmonary fibrosis, and between usual interstitial pneumonia and possible usual interstitial pneumonia in the older 2011 American Thoracic Society (ATS)/European Respiratory Society (ERS)/Japanese Respiratory Society/Latin American Thoracic Association guidelines. However, honeycombing is a source of substantial interobserver disagreement in the evaluation of CT studies in patients with suspected interstitial lung disease, and studies have found only fair to moderate interobserver agreement for overall CT classification of pulmonary fibrosis.5, 6In the Lancet Respiratory Medicine, Simon Walsh and colleagues used a deep learning algorithm to classify patients with idiopathic pulmonary fibrosis on CT. Deep learning is a subset of machine learning, which, in turn, is part of the broader concept of artificial intelligence. Machine learning allows computers to extract patterns from appropriately classified (labelled) input data (typically images for radiology) and generate similar labels for newly presented unknown data. However, instead of being taught specific rules, the computer will identify relationships between input and output on its own. A simple example would be the use of machine learning to classify photographs of pets as being either a dog or a cat. The algorithm is initially trained with expert-labelled images (validation or training set) and then tested using new, unlabelled images (testing set). Deep learning algorithms use an artificial neural network of layered connections to evaluate and process data from the raw input to the desired output classification. Early neural networks had few layers (typically less than five), but deep learning algorithms can have large numbers of layers, many of them specialised to augment or diminish specific input features.9, 10 Machine learning techniques have been used to address an increasing variety of medical questions over the past few years and imaging applications are especially common, given the richness and complexity of imaging data.In the study of Walsh and colleagues, the algorithm developed for the classification of pulmonary fibrosis was assessed using a cohort of 150 high-resolution CT studies obtained from patients with fibrotic lung disease. This group had been previously evaluated and classified by 91 subspecialist radiologists using the 2011 ATS/ERS guidelines. Using the majority opinion as the reference standard, the algorithm had an accuracy slightly greater (73·3%) than the median accuracy of all radiologists (70·7% [IQR 65·3–74·7]), and outperformed 60 (66%) of the 91 radiologists. The algorithm provided equally prognostic discrimination between usual interstitial pneumonia and non-usual interstitial pneumonia diagnoses (hazard ratio 2·88, 95% CI 1·79–4·61, p<0·0001) compared with the majority opinion of the thoracic radiologists (2·74, 1·67–4·48, p<0·0001).Although the results show that deep learning methods can classify fibrotic lung disease with essentially equivalent performance to subspecialist radiologists, there are several limitations. In general, the performance of a deep learning algorithm is improved with increasing number of images used for training and increasing accuracy of the expert labelling. For this study, the training set consisted of only 929 scans, which is very small compared to most machine learning tasks. This issue was partly addressed by subdivision of each of the studies into roughly 500 individual four-image sets (so-called montages), each of which was analysed separately. Second, only a single radiologist did the expert labelling of the training set. This factor introduces bias into the algorithm training. Furthermore, as the authors discuss, there is not necessarily an independent gold standard for the diagnosis of usual interstitial pneumonia. Despite these limitations, the overall performance of the algorithm was remarkable.A strength of deep learning algorithms is their ability to evaluate diverse data types to identify relationships. It would make sense that the addition of non-radiological data—laboratory and pulmonary function test data, findings on clinical examination, and demographic data—might further strengthen our ability to make predictions regarding outcomes in patients with interstitial lung disease, such as response to specific therapies and mortality, which could substantially improve patient management. From this standpoint, the study represents a great initial step.
Authors: Ganesh Raghu; Harold R Collard; Jim J Egan; Fernando J Martinez; Juergen Behr; Kevin K Brown; Thomas V Colby; Jean-François Cordier; Kevin R Flaherty; Joseph A Lasky; David A Lynch; Jay H Ryu; Jeffrey J Swigris; Athol U Wells; Julio Ancochea; Demosthenes Bouros; Carlos Carvalho; Ulrich Costabel; Masahito Ebina; David M Hansell; Takeshi Johkoh; Dong Soon Kim; Talmadge E King; Yasuhiro Kondoh; Jeffrey Myers; Nestor L Müller; Andrew G Nicholson; Luca Richeldi; Moisés Selman; Rosalind F Dudden; Barbara S Griss; Shandra L Protzko; Holger J Schünemann Journal: Am J Respir Crit Care Med Date: 2011-03-15 Impact factor: 21.405
Authors: Morgan P McBee; Omer A Awan; Andrew T Colucci; Comeron W Ghobadi; Nadja Kadom; Akash P Kansagra; Srini Tridandapani; William F Auffermann Journal: Acad Radiol Date: 2018-03-30 Impact factor: 3.173