Literature DB >> 35070762

Implementation of artificial intelligence in the histological assessment of pulmonary subsolid nodules.

Jiajun Deng¹, Mengmeng Zhao¹, Qiuyuan Li¹, Yikai Zhang², Minjie Ma³, Chuanyi Li⁴, Jun Wang⁵, Yunlang She¹, Yan Jiang¹, Yunzeng Zhang⁶, Tingting Wang⁷, Chunyan Wu⁸, Likun Hou⁸, Sheng Zhong⁹, Shengxi Jin¹⁰, Dahong Qian⁵, Dong Xie¹, Yuming Zhu¹, Yasmeen K Tandon¹¹, Annemiek Snoeckx^12,13, Feng Jin¹⁴, Bentong Yu¹⁵, Guofang Zhao^16,17, Chang Chen^1,3,18.

Abstract

BACKGROUND: Clinical management of subsolid nodules (SSNs) is defined by the suspicion of tumor invasiveness. We sought to develop an artificial intelligent (AI) algorithm for invasiveness assessment of lung adenocarcinoma manifesting as radiological SSNs. We investigated the performance of this algorithm in classification of SSNs related to invasiveness.
METHODS: A retrospective chest computed tomography (CT) dataset of 1,589 SSNs was constructed to develop (85%) and internally test (15%) the proposed AI diagnostic tool, SSNet. Diagnostic performance was evaluated in the hold-out test set and was further tested in an external cohort of 102 SSNs. Three thoracic surgeons and three radiologists were required to evaluate the invasiveness of SSNs on both test datasets to investigate the clinical utility of the proposed SSNet.
RESULTS: In the differentiation of invasive adenocarcinoma (IA), SSNet achieved a similar area under the curve [AUC; 0.914, 95% confidence interval (CI): 0.813-0.987] with that of the 6 doctors (0.900, 95% CI: 0.867-0.922). When interpreting with the assistance of SSNet, the sensitivity of junior doctors, specificity of senior doctor, and their accuracy were significantly improved. In the external test, SSNet (AUC: 0.949, 95% CI: 0.884-1.000) achieved a better AUC than doctors (AUC: 0.883, 95% CI: 0.826-0.939) whose AUC increased (AUC: 0.908, 95% CI: 0.847-0.982) with SSNet assistance. In the histological subtype classifications, SSNet achieved better performance than practicing doctors. The AUCs of doctors were significantly improved with the assistance of SSNet in both 4-category and 3-category classifications to 0.836 (95% CI: 0.811-0.862) and 0.852 (95% CI: 0.825-0.882), respectively.
CONCLUSIONS: The AI diagnostic system achieved non-inferior performance to doctors, and will potentially improve diagnostic performance and efficiency in SSN evaluation. 2021 Translational Lung Cancer Research. All rights reserved.

Entities: Chemical

Keywords: Artificial intelligence (AI); computed tomography (CT); lung adenocarcinoma; pulmonary subsolid nodules (SSNs)

Year: 2021 PMID： 35070762 PMCID： PMC8743520 DOI： 10.21037/tlcr-21-971

Source DB: PubMed Journal: Transl Lung Cancer Res ISSN： 2218-6751

Introduction

Previously, it has been reported that a reduction of mortality with low-dose computed tomography (CT) in a number of lung cancer screening trials (1-3). Consequently, lung cancer screening is more and more being implemented in the past two decades. With the increased use of CT and pulmonary subsolid nodules (SSNs), SSNs are increasingly being detected. Imaging assessment of invasiveness of SSNs is essential in the clinical management of patients. However, the histological prediction of SSNs, which has been reported to have a 9% detection rate in screening trials, poses several challenges (4,5). The degree of invasiveness is used as the basis for clinical management decisions. Lung adenocarcinoma appearing as SSN can present with a variety of morphological and imaging features, which can be related to different degrees of invasiveness and prognosis. Reported evidence of high intra-observer and interobserver variability in the invasiveness classification of SSNs has highlighted concerns about undertreatment and overtreatment. Therefore, an accurate diagnostic system or assistant tool can have a beneficial clinical impact (6). To overcome these diagnostic challenges, a number of solutions for malignancy evaluation have been previously proposed (7-11), including radiological density, morphological features, and clinical features. Risk-assessing tools based on clinical and radiological features have been used to determine cancer risk and standardize clinical management recommendations (8,12). Additionally, quantitative analyses have been carried out to evaluate malignancy depending on accurate delineation of nodule borders and feature engineering (13-16). However, the application of previous methods relies on subjective interpretation or manual segmentation, indicating the implementation of automated approaches remains unsolved. Recently advanced AI models have demonstrated specialist-level classification performance in medical image diagnosis (17-24). AI models which automatically correspond representative features from medical image data to specific task, have recently been introduced as a novel technique (25,26). The development of an accurate AI system could reduce the inconsistency among doctors with different expertise and provide management decision support. There is limited research on developing AI algorithm classifying invasiveness of pulmonary nodules (7,27,28). Attempts at assessing SSN invasiveness have been limited to binary classification or simple comparison with doctors (27,29). Previous researches constructed algorithms based on 2D image rather than 3D volume, which has limited the performance of AI techniques. Rare evidence has been reported in external validation in developing invasiveness classification AI system. Nevertheless, the clinical utility of AI-assisted diagnostic models needs to be investigated (6). In the present study, we aimed to elucidate the applicability and reliability of a 3D AI algorithm to assess the invasiveness of SSNs by comparing both against the diagnostic performance of chest radiologists and thoracic surgeons and our previously developed feature-based radiomic signature (10). We investigated the practicality of our proposed AI algorithm by evaluating the improvement of prediction performance when the proposed method served as a second opinion. To further investigate the clinical utility, the proposed AI system was validated in an external cohort with chest radiologists and thoracic surgeons. To our knowledge, this is the first investigation of how AI assists doctors in SSN malignancy evaluation. We present the following article in accordance with the STARD reporting checklist (available at https://dx.doi.org/10.21037/tlcr-21-971).

Methods

Patient selection and study materials

Consecutive patients who underwent pulmonary resection for lung adenocarcinoma between January 2013 and December 2015 in Shanghai Pulmonary Hospital were retrospectively collected. Using the descriptive events including “subsolid nodule”, “part-solid nodule”, “non-solid nodule”, “mixed nodule”, “ground-glass nodule” or “ground glass opacity”, we retrieved the preoperative CT examinations of patients and 4,679 scans were confirmed. The CT scans were reviewed and SSNs were included under the following criteria: (I) the maximum diameter of lesion ≤3 cm on thin-section CT images (<1.5 mm) within 2 weeks prior to the surgery; (II) pulmonary nodules were histopathologically confirmed as atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), or invasive adenocarcinoma (IA) according to the lung tumor classification; (III) patients without a history of malignancy or surgery. For patients with multiple lesions, cases without corresponding confirmed pathological diagnosis were excluded. A total of 1,471 patients with 1,589 SSNs from Shanghai Pulmonary Hospital (Shanghai, China) were included in the present study. A total of 1,349 (85%) nodules from 1,262 patients comprised the development set, including a training subset (n=1,191, 75%) and an internal subset (n=158, 10%); 240 nodules (15%) from 209 patients comprised a hold-out test dataset, of which data were unseen during the training course. To independently test the diagnostic value of the proposed framework, an external test dataset of 100 patients with 102 SSNs was included according to the same criteria from Hwa Mei Hospital (Ningbo, China). The workflow of patient inclusion is shown in Figure S1. In the internal test dataset, 14 (5.8%) patients were diagnosed with AAH, 67 (27.9%) with AIS, 55 (22.9%) with MIA, and 104 (43.4%) of with IA. In the external test dataset, 5 (4.9%) patients were diagnosed with AAH, 25 (24.5%) with AIS, 24 (23.5%) with MIA, and 48 (47.1%) with IA. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). This retrospective study was approved by the Shanghai Pulmonary Hospital Institutional Review Board (No. L20-344). The need for informed consent was waived.

Data extraction and annotations

Chest CT images were acquired on two different scanners: Somatom Definition AS+ (Siemens Medical Systems, Germany, n=1,263) and iCT256 (Philips Medical Systems, Netherlands, n=308). All image data were reconstructed with slice thickness of <1.5 mm (30) and matrix of 512 mm × 512 mm. All CT scans were download form our picture archiving and communications systems (PACS) as digital imaging and communications in medicine (DICOM) images. The personal information of patients in CT images including name, medical number and hospital name were eliminated and images were transformed into NIfTI (NII) format by using an in-house software. The lung CT NII format images were imported into 3D slicer (version 4.8.0, Brigham and Women’s Hospital) for labelling. The region of interest (ROI) of SSNs was annotated with a bounding box including the SSN by 2 junior thoracic surgery doctors (Y.S. and J.D. with 4 and 2 years of experience, respectively), then the consensus of ROI was obtained by discussion with an expert chest radiologist (J.S. with 28 years of experience). Bronchi and pulmonary vessels were excluded as far as possible from the ROI. Then the image data of ROI was extracted in the “rcsv” format for further analysis. Each segmented ROI was annotated by a specific histopathologic label according to the specific histologic subtype of AAH, AIS, MIA, and IA. The histologic slides and results were reviewed by two experienced pathologists (L.H. and C.W.) separately with hematoxylin and eosin slides of the study cases in the absence of any clinical or radiologic information. And these histologic labels were reported in accordance with the updated classification of lung cancer (31).

Study design

As illustrated in , an AI algorithm for invasiveness assessment, SSNet, was constructed for the histological classification of lung adenocarcinoma appearing as pulmonary SSNs on CT scans (Details of algorithm development are shown in Appendix 1). SSNet was developed and tested by using a retrospective dataset with patients with available CT images and corresponding histological diagnoses (31). The diagnostic performance of SSNet was evaluated by comparing with those of our previously reported radiomic feature-based signature (10) (Appendix 1) and three chest radiologists and three thoracic surgeons with experience ranging from junior to senior (clinical experience of three to more than 20 years). The 6 doctors were asked to evaluate the SSNs again with the prediction of SSNet to investigate the clinical utility based on performance improvements. Both the diagnostic accuracy and clinical utility for SSNs were further examined in an external test cohort.

Figure 1

Flowchart of the study design. Artificial intelligence diagnostic tool, SSNet, was first developed and validated using retrospective datasets, then evaluated in an external dataset for its clinical utility. SSNs, subsolid nodules. ROI, region of interest.

Clinical interpretation of CT images

All included cases were reviewed independently in a blinded fashion by six independent doctors from junior to senior degree in thoracic surgery and imaging (Y.S. and T.W. of junior degree with less than 5 years of experience, J.M. and J.S. of intermediate degree with 5–15 years of experience, D.X. and X.S. of senior degree with more than 25 years of experience in thoracic imaging, respectively). All readers subjectively were asked to provide categories of four histological subtypes, then the predicted labels were regrouped according to different tasks for evaluation. To evaluate the increments of diagnostic performance assisted by the SSNet, the six doctors were asked to re-evaluate the SSNs at least 4 weeks after the first evaluation with the predictions of SSNet as a second opinion. Inter-observer variability and diagnostic performance were compared against those from the first evaluation. Clinical interpretation was done without time constrain in RadiAnt Viewer (version 4.6.9, Medixant, https://www.radiantviewer.com). Raters were free to adjust the display window setting and use electronic caliper provided in the software.

Performance testing and statistical analysis

In the discrimination of IA from non-IA, including AAH, AIS, and MIA, the diagnostic performance of SSNet was compared to practicing doctors and radiomic signature using area under receiver-operating characteristic (ROC) curve (AUC) metric (Appendix 1). Comparisons of the diagnostic performance between SSNet and the practicing doctors were also done in 3- and 4-category classifications in a similar manner. Statistically significant differences in AUCs were assessed with Bonferroni-corrected confidence intervals (CIs; 1–0.05/n). Interobserver variability in participant level was evaluated by kappa concordance index. Performance metrics of sensitivity, specificity, accuracy, positive predictive value, and negative predictive value of each method were measured. Area under the precision-recall curve (AUPRC) and the F1 score were used to evaluate the multiple-category classification performance and reported as the macro average and micro average. Performance-evaluation metrics of practicing doctors were reported in group level and participant level, respectively. McNemar’s test was used to compare the statistical difference of sensitivity, specificity, and accuracy between the performance of SSNet and that of practicing doctors, as well as between the performance of practicing doctors with and without the interpretation assistance of SSNet in the binary classification task. Statistical analyses were performed in MedCalc (version 15.2.0; Mariakerke, Belgium), SPSS (version 23.0; IBM, Armonk, NY, USA), and R software (version 3.6.2; https://www.r-project.org/). P<0.05 was considered statistically significant.

Results

Baseline information

The internal dataset consisted of 1,589 SSNs from 471 patients (median age: 57 years, range, 23–82 years) and the external test dataset cohort included 102 SSNs from 100 patients (median age: 56 years, range, 28–75 years). The distribution of histological subtypes was similar between the 2 test datasets. There was no significant difference in age and sex of the 2 cohorts (Table S1).

Diagnostic performance in invasive classification

In the differentiation of IA from minimally invasive/pre-invasive lesions, the ROC curves for the 3 approaches are illustrated in , and comparisons of AUCs are reported in . The SSNet algorithm (AUC: 0.914, 95% CI: 0.813–0.987) performed as well as practicing doctors (AUC: 0.900, 95% CI: 0.867–0.922). The radiomic signature was inferior to the practicing doctors, with an AUC of 0.845 (95% CI: 0.806–0.883) and a statistically significant difference.

Figure 2

Table 1

Diagnostic performance and clinical utility in the internal and external test

Tasks	AUC	95% CI	Difference (Bonferroni corrected CI)	Advantage
Internal test
Two classifications
SSNet	0.914	0.813–0.987	–	–
Human (unassisted)	0.900	0.867–0.922	–0.014 (–0.090 to 0.060)*	No difference
Human (assisted)	0.937	0.911–0.970	0.037 (–0.078 to –0.014)^†	Human (assisted)
Radiomics	0.845	0.806–0.883	0.067 (–0.034 to 0.145)*	No difference
Radiomics	0.845	0.806–0.883	0.071 (0.032–0.110)^‡	Human (unassisted)
Three classifications
SSNet	0.874	0.832–0.909	–	–
Human (unassisted)	0.844	0.816–0.864	–0.030 (0.000–0.087)*	SSNet
Human (assisted)	0.852	0.825–0.882	0.008 (-0.015-0.042)^†	No difference
Four classifications
SSNet	0.869	0.824–0.892	–	–
Human (unassisted)	0.835	0.817–0.862	–0.034 (0.012–0.098)*	SSNet
Human (assisted)	0.836	0.811–0.862	0.001 (–0.030 to 0.036)^†	Human (assisted)
External test
Two classifications
SSNet	0.949	0.884–1.000	–
Human (unassisted)	0.883	0.826–0.939	–0.066 (0.037–0.212)*	SSNet
Human (assisted)	0.908	0.847–0.982	0.025 (–0.092 to 0.029)^†	Human (assisted)

*, AUC difference was calculated as the AUC of the algorithm minus the AUC of the doctors (unassisted) or the AUC of radiomics. †, AUC difference was calculated as the AUC of the doctors (assisted) minus the AUC of the doctors (unassisted). ‡, AUC difference was calculated as the AUC of the doctors (unassisted) minus the AUC of the radiomics. To account for multiple hypothesis testing, the Bonferroni corrected CI (1−0.05/n, 97.5% for 2 classifications; 98.3% for 3 classifications; 98.8% for 4 classifications) around the difference was computed. AUC, area under the receiver-operating characteristic curve; CI, confidence interval.

ROC curves showing the diagnostic performance in binary (A,D), 3-category (B,E), and 4-category (C,F) classifications. (A-C) ROC curves measure performance on the methodology-level, including practicing doctors with and without SSNet served as a second viewer. (D-F) ROC curves measure performance on the participant-level of practicing doctors, indicating the performance improvement with the assistance of SSNet. ROC, receiver-operating characteristic; AUC, area under ROC curve. *, AUC difference was calculated as the AUC of the algorithm minus the AUC of the doctors (unassisted) or the AUC of radiomics. †, AUC difference was calculated as the AUC of the doctors (assisted) minus the AUC of the doctors (unassisted). ‡, AUC difference was calculated as the AUC of the doctors (unassisted) minus the AUC of the radiomics. To account for multiple hypothesis testing, the Bonferroni corrected CI (1−0.05/n, 97.5% for 2 classifications; 98.3% for 3 classifications; 98.8% for 4 classifications) around the difference was computed. AUC, area under the receiver-operating characteristic curve; CI, confidence interval. In the differentiation of pre-invasive, minimally invasive, and invasive lesions, the ROC curves for SSNet and practicing doctors are illustrated in , and comparisons of AUCs are reported in . The SSNet algorithm (AUC: 0.874, 95% CI: 0.832–0.909) performed better than that of practicing doctors (AUC: 0.844, 95% CI: 0.816–0.864). In the differentiation of all 4 histological subtypes of lung adenocarcinoma, the ROC curves for SSNet and practicing doctors are illustrated in , and comparisons of AUCs are reported in . The SSNet algorithm (AUC: 0.869, 95% CI: 0.824–0.892) performed better than practicing doctors (AUC: 0.835, 95% CI: 0.817–0.862).

Performance of SSNet in assisting readers

In the performance test using the SSNet algorithm, AUCs of clinicians were 0.937, 0.852, and 0.836 for differentiating IA from non-IA, MIA, AIS, and for differentiating IA from AAH and MIA, AIS, and AAH, respectively (; ). Compared with the diagnostic performance of subjective evaluation only, increments of AUCs were 0.037, 0.008, and 0.001, respectively, and those in multiple-category classification were statistically significant (). Specifically, in the differentiation of IA, the sensitivity of 1 of the junior doctors improved from 0.750 to 0.885 (P=0.004), and the specificity of 1 of the senior doctors increased from 0.897 to 0.949 (P=0.039). In the multiple-category classification, improvements in multiple statistics were observed in 3 practicing doctors (Tables S2,S3). Improvements were seen in the evaluation metrics in all classes for 3-category classification and were more often in classes of lower-grade invasiveness in the 4-category classification. Kappa statistics improved from 0.480 to 0.496 for 4-category classification but decreased from 0.601 to 0.596 for 3 classifications, respectively.

Performance evaluation in details

In discriminating invasive lesions, performance-evaluating results, including sensitivity, specificity, and accuracy were shown in and . In terms of the approach level, SSNet achieved a sensitivity of 0.933, which was higher than the micro-average sensitivity of practicing doctors (0.846), and a radiomic signature of 0.885. SSNet accuracy was 0.921, which was higher than the micro-average accuracy of practicing doctors (0.919) and that of the radiomic signature (0.866). For the 3-category classification, SSNet had better performance for most of the evaluation metrics when compared with practicing doctors as a group and individually (; Tables S2,S4). For the 4-category classification, SSNet demonstrated better performance for all evaluation metrics when compared with group or individual performance of practicing doctors (; Tables S3,S4). SSNet maintained better performance in terms of micro-average AUPRC than junior doctors and 1 of the mid-career doctors. A mid-career and a senior doctor achieved a higher micro-average AUPRC than SSNet.

Figure 3

Table 2

Comparison of SSNet, radiomic signature, and practicing doctors to differentiate invasive adenocarcinoma in the internal and external test

Performance metrics	SSNet	Radiomics	Unassisted							Assisted
			Junior		Middle		Senior		Micro average	Junior		Middle		Senior		Micro average
			1	2	1	2	1	2	Micro average	1	2	1	2	1	2	Micro average
Retrospective
Sensitivity	0.933	0.885	0.750	0.875	0.702	0.894	0.933	0.923	0.846	0.885	0.885	0.798	0.731	1.000	0.769	0.845
McNemar’s test			<0.001*	0.146*	<0.001*	0.424*	1.000*	1.000*		0.004^†	1.000^†	0.052^†	0.001^†	0.016^†	0.002^†
Specificity	0.794	0.673	0.860	0.831	0.934	0.794	0.816	0.897	0.855	0.816	0.831	0.912	0.941	0.772	0.949	0.870
McNemar’s test			0.049*	0.302*	<0.001*	1.000*	0.250*	0.003*		0.146^†	1.000^†	0.508^†	<0.001^†	0.180^†	0.039^†
Accuracy	0.921	0.866	0.897	0.919	0.909	0.912	0.929	0.952	0.919	0.916	0.921	0.926	0.919	0.931	0.931	0.880
McNemar’s test			<0.001*	0.054	<0.001*	0.596*	0.250*	0.012*		0.001^†	1.000^†	0.031^†	<0.001^†	0.007^†	<0.001^†
Kappa^‡			0.718							0.701
Prospective
Sensitivity	0.958		0.708	0.854	0.625	0.875	0.958	0.896	0.819	0.0.875	0.813	0.729	0.667	1.000	0.729	0.802
McNemar’s test			<0.001*	0.063*	<0.001*	0.289*	1.000*	0.375*		0.039^†	0.688^†	0.227^†	0.006^†	0.500^†	0.039^†
Specificity	0.796		0.815	0.852	0.907	0.759	0.815	0.833	0.830	0.759	0.778	0.870	0.944	0.778	0.944	0.846
McNemar’s test			1.000*	0.453*	0.031*	0.727*	1.000*	0.688*		0.453^†	0.289^†	0.727^†	0.006^†	0.727^†	0.031^†
Accuracy	0.932		0.867	0.921	0.873	0.897	0.938	0.926	0.904	0.897	0.885	0.891	0.897	0.938	0.915	0.904
McNemar’s test			0.004*	0.039*	0.001*	0.804*	1.000 *	0.227*		0.019^†	0.791^†	0.167^†	<0.001^†	0.344^†	0.001^†
Kappa^‡			0.632							0.649

1, 2 represents doctors 1 and 2. *, McNemar’s test P value for comparison of evaluation metrics between SSNet and practicing doctors alone; †, McNemar’s test P-value for comparison of evaluation metrics between practicing doctors with and without the assistance of SSNet; ‡, Kappa value was calculated as Fleiss’ kappa for the 6 readers.

Box graph demonstrating the evaluation metrics in binary (A), 3-category (B-D), and 4-category (E-H) classifications. 1, performance of SSNet; 2, performance of previously constructed radiomic signature; 3–8, performance of practicing doctors without artificial intelligence interpretation; and 9–14, performance of practicing doctors with artificial intelligence interpretation. NPV, negative predictive value; PPV, positive predictive value. 1, 2 represents doctors 1 and 2. *, McNemar’s test P value for comparison of evaluation metrics between SSNet and practicing doctors alone; †, McNemar’s test P-value for comparison of evaluation metrics between practicing doctors with and without the assistance of SSNet; ‡, Kappa value was calculated as Fleiss’ kappa for the 6 readers.

External test for diagnostic performance

The primary outcome was evaluated for 102 SSNs. SSNet demonstrated excellent diagnostic performance, with an AUC of 0.949 (95% CI: 0.884–1.000), and was better than that of the practicing doctors (AUC: 0.883, 95% CI: 0.826–0.982) (; ). The sensitivity and accuracy of the differentiation for IA by SSNet was 0.958 and 0.932, respectively, which was significantly higher than the micro-average sensitivity (0.819) and accuracy (0.904) of practicing doctors as a group, respectively. In the evaluation of clinical utility, the AUC of practicing doctors was significantly improved from 0.883 to 0.908 with the assistance of SSNet (; ). Sensitivity and specificity micro averages of practicing doctors also improved. For the participant level of performance improvement, the accuracy of a junior doctor significantly improved from 0.867 to 0.897 (). The kappa statistic also improved from 0.632 to 0.649.

Figure 4

ROC curves showing the diagnostic performance (A) for invasive adenocarcinoma discrimination in prospective validation by SSNet and practicing doctors (B). ROC, receiver-operating characteristic; AUC, area under ROC curve.

Discussion

Clinical management for SSNs between invasive and pre-invasive lesions is different. Therefore, the use of a risk-prediction tool that distinguishes invasive lesions from pre-invasive ones is significant. In the present study, we demonstrated that a simple 3D AI diagnostic tool, SSNet, based on CT images enabled the differentiation of IA from pre-invasive/minimally invasive lesions and histological subtype classification in lesions that appeared as SSNs (<3 cm) on chest CT. Performance was equal between SSNet, radiomic signature and doctors in binary discrimination on an internal test but better for SSNet than doctors on external test in the classification of more than two categories. In addition, the use of SSNet enhanced doctors’ SSN interpretations. Evaluation by radiomic signature, which was previously developed for discriminating IAs, did not reach its optimal performance in our internal test. A possible reason for performance discrepancy would be the spectrum effect (25,32). The population used for the construction of a radiomic signature had a higher proportion of invasive lesions (>50%), while the rate of IA was relatively low in the present study (<50%) (10). A similar performance drop was also seen in another validation experiment using a similar population (33). Previous models developed to differentiate invasive lesions by CT features had an AUC of 0.64–0.91 (7,10,34-38). However, these models have not been validated in an external cohort. The performance of the SSN evaluation model based on predefined features was limited by the subjectivity and proficiency in CT interpretation, whereas the AI-based evaluation model is able to learn representative features from raw medical image without specifying radiological features. Recently, Wang et al. proposed an AI system using a 3D convolutional neural network for differentiating pre-invasive lesions from IA appearing as SSNs no larger than 3 cm (29). In their study, the proposed architecture with an AUC of 0.892 outperformed the performances of 4 radiologists, who yielded an AUC between 0.805 and 0.867. However, the model was not designed for further discrimination of specific lung adenocarcinoma histological subtypes and the model was not fully investigated or externally validated for its clinical utility. In our study, the 3D SSNet utilized the volumetric data of thin-section CT from 1,471 patients and the proposed AI system achieved a competitive AUC of 0.914 in terms of differentiating IA from pre-invasive and minimally invasive lesions compared with doctors. In addition, the external diagnostic evaluation found that SSNet outperformed practicing doctors in IA discrimination. Concerns remain regarding the actual help of an AI-based evaluation system in clinical practice. To date, limited studies have investigated the benefits of an AI system in assessing invasiveness (7,20,39,40). In the present study, we investigated the improvement with the assistance of AI interpretation as a reference. Based on our results, the performance of practicing doctors improved with the assistance of SSNet in invasive discrimination and multiple-category invasiveness assessments. Although the AUPRC, a metric evaluating a classifier’s performance in imbalanced data, of SSNet was lower than some of the practicing doctors in terms of multiple-category classification, the accuracy of doctors improved with SSNet. These findings support the role of the AI system as a second viewer that would increase the AUPRC in diagnosing cases that doctors might misinterpret or miss. In the present study, SSNet achieved better performance than practicing doctors in identifying lesions with lower grade of invasiveness, which was shown in the 3- and 4-category classifications. The F1 scores for AAH class and MIA class were relatively low due to the number of samples, which could limit the performance of the AI system. It is recommended that patients with AAH are routinely followed up or undergo resection after comprehensive evaluation. MIA is defined as small nodules with ≤5 mm predominantly lepidic invasion, Lim et al. inferred that invasion ≤5 mm might not greatly contribute in the emergence of increased attenuation on CT scan (41). Therefore, MIA appearing as SSNs can easily be misclassified as AIS or IA by doctors and the AI system (Figure S2). In the present study, there was no incorrect prediction of SSNet as AAH being misclassified as IA or IA as AAH (Figure S2). SSNet achieved a higher AUPRC in differentiating histological subtypes than junior doctor in multiple-category classification. Although the AUPRC of SSNet was not higher than those of intermediate and senior doctors, the reduced time in interpretation of SSNs and the improved diagnostic performance would streamline the workflow and reduce subjectivity bias. Additionally, several underestimated predictions can be corrected by the AI system, as radiological features indicating malignancy could be absent or not fully identified (Figure S2). This was also proved by the highest sensitivity by SSNet and the confusion matrices. The SSN evaluation process is simple. Only a cuboid box that fully embraces SSN at its mass center is required for the invasiveness evaluation. Therefore, the incorporation of SSNet into the workflow would improve efficiency and accuracy in identifying SSNs and the workload of radiologists. The present study has several limitations. First, data of this study only reflects this particular study population and cannot extrapolated to a screening setting. Secondly, though the proposed AI algorithm showed discriminative ability in classifying invasiveness of SSNs, the relevant image features that dominate model for decision-making were not reflected due to the black box nature of deep learning algorithm. Further explainable deep learning modes with good performance are necessary to improve the transparency and re reliability for humans. Additionally, the generalizability of our proposed method should be confirmed with more external and prospective validation. In the present study, the test dataset was split prior to model development and treated as a hold-out dataset to evaluate diagnostic performance. An external test cohort was used for validation. It should be acknowledged that although geographically external validation as applied, the SSN interpretation was done in a non-clinical environment and would not change clinical decisions that happened to the included patients. There is a potential decision threshold shift that could bias the actual clinical utility of the AI diagnostic tool. Therefore, further prospective studies for clinical decision making with long-time follow-up are needed to warrant the clinical utility of this diagnostic AI system and the value of improving patient outcome for such an AI diagnostic system designed for clinical management decision support. Multi-institutional randomized controlled trials are critical to test the benefit of incorporating AI into workflow. In conclusion, the proposed SSNet is helpful in evaluating SSNs and can be used to assess invasiveness. The implementation of SSNet in practice has the potential to improve patient care workflow and optimize clinical decision support. The safety and feasibility of AI-assisted tools in supporting clinical decisions for SSNs are warranted in long-term and multi-institutional trials. The article’s supplementary files as

40 in total

1. Computed Tomography-Based Score Indicative of Lung Cancer Aggression (SILA) Predicts the Degree of Histologic Tissue Invasion and Patient Survival in Lung Adenocarcinoma Spectrum.

Authors: Cyril Varghese; Srinivasan Rajagopalan; Ronald A Karwoski; Brian J Bartholmai; Fabien Maldonado; Jennifer M Boland; Tobias Peikert
Journal: J Thorac Oncol Date: 2019-05-04 Impact factor: 15.609

2. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors: Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal: JAMA Date: 2016-12-13 Impact factor: 56.272

3. Subsolid Lung Nodules: Potential for Overdiagnosis.

Authors: Hans-Ulrich Kauczor; Oyunbileg von Stackelberg
Journal: Radiology Date: 2019-09-17 Impact factor: 11.105

4. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests.

Authors: D F Ransohoff; A R Feinstein
Journal: N Engl J Med Date: 1978-10-26 Impact factor: 91.245

5. Analysis of CT morphologic features and attenuation for differentiating among transient lesions, atypical adenomatous hyperplasia, adenocarcinoma in situ, minimally invasive and invasive adenocarcinoma presenting as pure ground-glass nodules.

Authors: Lin Qi; Ke Xue; Cheng Li; Wenjie He; Dingbiao Mao; Li Xiao; Yanqing Hua; Ming Li
Journal: Sci Rep Date: 2019-10-10 Impact factor: 4.379

6. Reduced Lung-Cancer Mortality with Volume CT Screening in a Randomized Trial.

Authors: Harry J de Koning; Carlijn M van der Aalst; Pim A de Jong; Ernst T Scholten; Kristiaan Nackaerts; Marjolein A Heuvelmans; Jan-Willem J Lammers; Carla Weenink; Uraujh Yousaf-Khan; Nanda Horeweg; Susan van 't Westeinde; Mathias Prokop; Willem P Mali; Firdaus A A Mohamed Hoesein; Peter M A van Ooijen; Joachim G J V Aerts; Michael A den Bakker; Erik Thunnissen; Johny Verschakelen; Rozemarijn Vliegenthart; Joan E Walter; Kevin Ten Haaf; Harry J M Groen; Matthijs Oudkerk
Journal: N Engl J Med Date: 2020-01-29 Impact factor: 91.245

7. A nomogram for predicting the risk of invasive pulmonary adenocarcinoma for patients with solitary peripheral subsolid nodules.

Authors: Chenghua Jin; Jinlin Cao; Yu Cai; Lijie Wang; Kai Liu; Weiyu Shen; Jian Hu
Journal: J Thorac Cardiovasc Surg Date: 2016-10-24 Impact factor: 5.209

8. Persistent pure ground-glass opacity lung nodules ≥ 10 mm in diameter at CT scan: histopathologic comparisons and prognostic implications.

Authors: Hyun-Ju Lim; Soomin Ahn; Kyung Soo Lee; Joungho Han; Young Mog Shim; Sookyoung Woo; Jae-Hun Kim; Miyeon Yie; Ho Yun Lee; Chin A Yi
Journal: Chest Date: 2013-10 Impact factor: 9.410

9. Development and Validation of a Deep Learning-based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs.

Authors: Eui Jin Hwang; Sunggyun Park; Kwang-Nam Jin; Jung Im Kim; So Young Choi; Jong Hyuk Lee; Jin Mo Goo; Jaehong Aum; Jae-Joon Yim; Chang Min Park
Journal: Clin Infect Dis Date: 2019-08-16 Impact factor: 9.079

Review 10. Applications of hyperspectral imaging in the detection and diagnosis of solid tumors.

Authors: Yating Zhang; Xiaoqian Wu; Li He; Chan Meng; Shunda Du; Jie Bao; Yongchang Zheng
Journal: Transl Cancer Res Date: 2020-02 Impact factor: 1.241

1 in total

1. [Clinical Study of Artificial Intelligence-assisted Diagnosis System in Predicting the  Invasive Subtypes of Early-stage Lung Adenocarcinoma Appearing as Pulmonary Nodules].

Authors: Zhipeng Su; Wenjie Mao; Bin Li; Zhizhong Zheng; Bo Yang; Meiyu Ren; Tieniu Song; Haiming Feng; Yuqi Meng
Journal: Zhongguo Fei Ai Za Zhi Date: 2022-04-20

1 in total