Literature DB >> 35832446

A random forest algorithm predicting model combining intraoperative frozen section analysis and clinical features guides surgical strategy for peripheral solitary pulmonary nodules.

Liqiang Qian¹, Yinjie Zhou², Wanqin Zeng³, Xiaoke Chen¹, Zhengping Ding¹, Yujia Shen³, Yifeng Qian⁴, Davide Tosi⁵, Mario Silva⁶, Yuchen Han⁷, Xiaolong Fu³.

Abstract

Background: Intraoperative frozen section (FS) analysis has been used to guide the extent of resection in patients with solitary pulmonary nodules (SPNs), but its accuracy varies greatly among different hospitals. Artificial intelligence (AI) and multidimensional data technology are developing rapidly these years, meanwhile, surgeons need better methods to guide the surgical strategy of SPNs. We established predicting models combining FS results with multidimensional perioperative clinical features using logistic regression analysis and the random forest (RF) algorithm to get more accurate extent of SPN resection.
Methods: Patients with peripheral SPNs who underwent FS-guided surgical resection at the Shanghai Chest Hospital (January 2017-December 2018) were retrospectively examined (N=3,089). The accuracy of intraoperative FS-guided resection extent was analyzed and used as Model 1. The clinical features (sex, age, CT features, tumor markers, smoking history, lesion size and nodule location) of patients were collected, and Models 2 and 3 were established using logistic regression and RF algorithms to combine the FS with clinical features. We confirmed the performance of these models in an external validation cohort of 117 patients from Hwa Mei Hospital, University of Chinese Academy of Science (Ningbo No. 2 Hospital). We compared the effectiveness in classifying low/high-risk groups of SPN among them.
Results: The accuracy of FS analysis was 61.3%. Model 3 exhibited the best diagnostic accuracy and had an area under the curve of 0.903 in n the internal validation cohort and 0.919 in the external validation cohort. The calibration plots and net reclassification index (NRI) of Model 3 also exhibited significantly better performance than the other models. Improved diagnostic accuracy was observed in in both internal and external validation cohort. Conclusions: Using an RF algorithm, clinical characteristics can be combined with intraoperative FS analysis to significantly improve intraoperative judgment accuracy for low- and high-risk tumors, and may serve as a reliable complementary method when FS evaluation is equivocal, improving the accuracy of the extent of surgical resection. 2022 Translational Lung Cancer Research. All rights reserved.

Entities: Chemical

Keywords: Solitary pulmonary nodule (SPN); diagnostic accuracy; frozen section (FS); random forest (RF); surgical resection

Year: 2022 PMID： 35832446 PMCID： PMC9271446 DOI： 10.21037/tlcr-22-395

Source DB: PubMed Journal: Transl Lung Cancer Res ISSN： 2218-6751

Introduction

Most solitary pulmonary nodules (SPNs) are identified in the early stage through pathological diagnosis and are potentially curable. However, the accurate diagnosis of SPNs is clinically challenging because lesions may represent inflammation, infection, benign lung tumors, or other non-malignant issues (1). Many SPNs with ground glass opacity (GGO) components are diagnosed as lung adenocarcinomas or precancerous lesions, such as adenomatous atypical hyperplasia (AAH), adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), or invasive adenocarcinoma (IAC) (2). The extent of surgical resection for SPNs varies according to the diagnosis. For malignant lung tumors classified as high risk, the standardized surgical method involves lobectomy and systemic node dissection because of the probability of postoperative recurrence and metastasis (3). However, following the publication of the International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society classification in 2011 (4), several studies, such as Zhang et al. (5), have reported early-stage lung adenocarcinomas (e.g., AAH, AIS, and MIA) are associated with good prognosis, and sublobar resection without lymphadenectomy is currently considered a more appropriate surgical procedure. The same applies to benign tumors and some low-grade malignant tumors (such as carcinoid tumors) (6), classified as low risk. Therefore, the classification of SPNs determines the extent of surgical resection, which is crucial for optimized planning of tailored surgical approach aiming to minimal invasiveness while maintaining radical intent. Nonetheless, it is difficult to diagnose SPNs preoperatively because of the significant uncertainties with the application of computed tomography (CT), bronchoscopy, and needle biopsy (7,8). Frozen sections (FS) of specimens resected during surgery have become the primary diagnostic modality for SPNs. FS are used to determine both the benign or malignant nature of SPNs and extent of tumor infiltration for low-risk or high-risk malignant tumors. As they are used to guide surgeons in determining the extent of surgical resection, there is a critical need for achieving high diagnostic accuracy when using FS. Whether FS accurately determine the properties and infiltration degree of SPNs remains controversial. Liu et al. suggested intraoperative FS accurately determine the degree of tumor infiltration and guide the resection strategy in patients with lung adenocarcinoma (9). However, other studies have found a certain error rate in determining the tumor infiltration degree of lung adenocarcinoma solely based on FS compared with the final pathology (FP) (10,11). Better predictions of the final pathological outcome of lung adenocarcinoma have been achieved by combining FS results with tumor diameter (12). Furthermore, SPNs may represent other pathological diagnoses other than lung adenocarcinoma, rendering FS-guided diagnosis more challenging. Currently, no large-scale studies investigating the accuracy of intraoperative FS in determining SPN properties and guiding surgical resection exist. In recent years, artificial intelligence and machine learning (ML) have been widely used in various fields, including medicine (13). Artificial intelligence and ML algorithms analyze large volumes of data by learning a decisional process, which can be continuously refined for improved performance (14). The random forest (RF) algorithm is an important ML algorithm. It’s essentially an ensemble learning algorithm based on bagging. Its basic principle is to combine multiple weak classifiers, and the final results are voted or averaged, so that the results of the overall model have high accuracy and better generalization. The clinical features, such as sex, age, CT features, tumor markers, smoking history, lesion size and nodule location are very suitable to be defined as “weak classifiers” in the RF algorithm. This real-world study aimed to evaluate the accuracy of the extent of SPN resection under intraoperative FS guidance using logistic regression analysis and the RF algorithm to establish a model combining FS results with multidimensional perioperative clinical information. We verified whether this model could improve the accuracy of intraoperative SPN classification. We present the following article in accordance with the STARD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-395/rc).

Methods

Study cohort and data collection

This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013) and approved by the Committee of Medical Ethics of Shanghai Chest Hospital (approval number KS21002, 2021-2) and Hwa Mei Hospital, University of Chinese Academy of Science (Ningbo No. 2 Hospital, approval number YJ-NBEY-KY-2021-140-01, 2021-9). Informed consent was waived because of the retrospective nature of the study. We retrospectively analyzed all peripheral SPN (located at the outer 1/3 of lung field) resections performed under the guidance of intraoperative FS at the Shanghai Chest Hospital between January 2017 and December 2018. An external cohort from Hwa Mei Hospital, University of Chinese Academy of Science (Ningbo No. 2 Hospital) between January 2017 and December 2018 were also collected and used for external validation. The research design of this study and exclusion criteria are shown in . Preoperative tests (contrast-enhanced chest CT, abdominal CT or ultrasonography, brain magnetic resonance imaging or CT, and radionuclide bone scan for most patients and positron emission tomography or CT for the rest) were performed to assess the clinical stage of the lesion. Clinicopathologic data, such as sex, age at surgery, CT features (GGO component and pleura indentation), presence of tumor markers, smoking history, lesion size measured in fresh specimens, nodule location, resection type, FS diagnosis, and FP, were collected.

Figure 1

Flowchart of patient inclusion. CT, computed tomography; FS, frozen section.

CT scans and tumor markers

CT imaging and tumor marker assessments were performed approximately 1 week (6.8±3.2 days) before surgery. Most chest CT scans were contrast-enhanced (some GGOs were not). CT scans in Shanghai Chest Hospital were obtained using Brilliance iCT and Brilliance 64 CT scanners (Philips Healthcare, Eindhoven, Netherlands). Each nodule was reviewed twice by two radiologists (YLM and TGY) with 15 and 10 years of experience, respectively, and CT features were distinguished based on the presence of GGO (nodule with/without GGO component) and pleural indentation. The standard values of tumor markers were as follows: carcinoembryonic antigen, 0–5 ng/mL; carbohydrate antigen 19-9, 0–5 ng/mL; cytokeratin 19 fragment, 0–1.5 ng/mL; neuron-specific enolase, 0–25 ng/mL; and cancer antigen 125, 0–35 U/mL.

Evaluation of FS and final pathology findings

After the tumors were removed via sublobar resection, pathologists immediately performed FS diagnosis of the specimens, and if the lesion was diagnosed as adenocarcinoma, the presence of AAH, AIS, MIA, and IAC were determined. After FS diagnosis, the specimens were immersed in 10% neutral buffered formalin and embedded in paraffin. All FS diagnoses were compared with the final pathologic diagnoses of the corresponding permanent paraffin sections. The pathological diagnoses were made according to the 2015 World Health Organization classification for lung tumors and 2011 International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society Classification. The pathologies of pulmonary nodules were divided into the following seven categories: AAH, AIS, MIA, IAC, other types of malignant tumors (squamous cell carcinoma, large-cell carcinoma, and lymphoepithelioid-like carcinoma), low-grade malignant tumors (carcinoid), and benign findings (pulmonary hamartoma, adenoma, granuloma, aspergilloma, tuberculosis, and inflammation). They were then divided into two groups: high-risk (IAC and other types of malignant tumors) and low-risk (AAH, AIS, MIA, low-grade malignant tumors, and benign tumors) groups. Compared with FP, the concordance of the FS results was defined as follows: “correct” (consistent with FP), “underestimated”, “overestimated”, “error” (misjudged between benign and malignant), and “equivocal” (or deferred).

Surgical procedures

During surgery, sublobar resection (including wedge resection and segmentectomy) was first performed, followed by FS pathological examination. If the FS pathological result indicated a high-risk classification, subsequent lobectomy and lymph node dissection were performed. However, if the FS pathological result was equivocal or deferred, the surgical team determined the extent of resection based on experience.

Model establishment and statistical analysis

The dataset of Shanghai Chest Hospital was divided into two cohorts by the date of surgery as follows: (I) training cohort including patients who underwent surgery from January 2017 to June 2018; and (II) internal validation cohort including patients who underwent surgery from July 2018 to December 2018. An independent dataset from Hwa Mei Hospital, University of Chinese Academy of Science (Ningbo No. 2 Hospital) as an external validation cohort (15). We use cross validation for calibration. First, we established a univariate logistic regression model based on FS alone for the diagnosis of high- or low-risk groups (Model 1), which was then combined with patient characteristics, including age, sex, smoking history, maximum SPN diameter in CT, with/without GGO component, pleural indentation, and tumor marker results, to derive a multivariable logistic regression model (Model 2). Finally, the RF binary classification models were trained using the same features in Model 2 (Model 3). The classification performance of the abovementioned models was evaluated using confusion matrix analysis, which included accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV), Youden’s index was applied to calculate the sensitivity and specificity. These models were also evaluated using a receive operating characteristic curve (ROC) to calculate the area under the curve (AUC) and 95% confidence interval (CI). The threshold of AUC is set to 0.5, which means when the output probability of a case is larger than 0.5, then the case is considered as high-risk group. Delong Test was applied for comparing AUC values of the three models. We also generated calibration plots and determined the net reclassification index (NRI). Results were compared in both the internal and external validation cohorts. All statistical analyses, model building, and model evaluation were performed in R using the caret package (version 3.5.2; R Foundation for Statistical Computing, Vienna, Austria; http://www.r-project.org). Statistical significance was defined as a two-sided P value <0.05.

Results

Patient data

A total of 8,163 patients with pulmonary nodules underwent surgical resection at Shanghai Chest Hospital, from January 2017 to December 2018. Data were analyzed for the 3,098 patients with peripheral SPNs ≤3 cm who met the inclusion criteria, and their clinicopathological characteristics are summarized in .

Table 1

Clinicopathologic characteristics of patients included in the study (N=3,098)

Characteristics	Total (N=3,098)	AAH (n=16)	AIS (n=432)	MIA (n=634)	IAC (n=1,385)	Other infiltrative malignancies (n=62)	Low-grade malignancies (n=5)	Benign (n=564)	P
Age (mean ± SD) (years)	54.57±10.96	55.3±9.84	49.3±11.4	50.7±11.7	58.1±9.14	63.4±7.0	63.2±4.9	53.2±10.7	<0.001
Sex, n (%)									<0.001
Male	1,237 (39.9)	3 (18.8)	138 (31.9)	176 (27.8)	576 (41.6)	53 (85.5)	2 (40.0)	289 (51.2)
Female	1,861 (60.1)	13 (81.2)	294 (68.1)	458 (72.2)	809 (58.4)	9 (14.5)	3 (60.0)	275 (48.8)
Surgical methods, n (%)									<0.001
Wedge resection	1,104 (35.6)	11 (68.8)	267 (61.8)	284 (44.8)	62 (4.5)	1 (1.6)	2 (40.0)	477 (84.6)
Segmentectomy	536 (17.3)	3 (18.8)	133 (30.8)	215 (33.9)	95 (6.9)	2 (3.2)	2 (40.0)	86 (15.2)
Lobectomy	1,458 (47.1)	2 (12.6)	32 (7.4)	135 (21.3)	1,228 (88.6)	59 (95.2)	1 (20.0)	1 (0.2)
Location of tumor, n (%)									0.002
RUL	1,106 (35.7)	9 (56.3)	174 (40.3)	236 (37.2)	513 (37.0)	18 (29.0)	1 (20.0)	155 (27.5)
RML	84 (2.7)	1 (6.3)	9 (2.1)	21 (3.3)	18 (1.3)	0	0	35 (6.2)
RLL	541 (17.5)	0	50 (11.6)	106 (16.7)	238 (17.2)	7 (11.3)	2 (40.0)	138 (24.5)
LUL	860 (25.8)	5 (31.2)	141 (32.6)	178 (28.1)	391 (28.3)	23 (37.1)	0	122 (21.6)
LLL	507 (16.3)	1 (6.2)	58 (13.4)	93 (14.7)	225 (16.2)	14 (22.6)	2 (40.0)	114 (20.2)
Maximum diameter of tumor, n (%)									<0.001
≤1 cm	1,341 (43.3)	13 (81.3)	407 (94.2)	478 (75.4)	151 (10.9)	3 (4.9)	3 (60.0)	286 (50.7)
1< d ≤2 cm	1,226 (39.6)	2 (12.5)	23 (5.3)	151 (23.8)	801 (57.8)	26 (41.9)	1 (20.0)	222 (39.4)
2< d ≤3 cm	531 (17.1)	1 (6.2)	2 (0.5)	5 (0.8)	433 (31.3)	33 (53.2)	1 (20.0)	56 (9.9)
Lymph node situation, n (%)									0.957
N0	3,007 (97.1)	16 (100.0)	432 (100.0)	634 (100.0)	1,301 (93.9)	55 (88.7)	5 (100.0)	564 (100.0)
N1	31 (1.0)	0	0	0	29 (2.1)	2 (3.2)	0	0
N2	60 (1.9)	0	0	0	55 (4.0)	5 (8.1)	0	0
CT imaging, n (%)
GGO component									<0.001
With	2,378 (76.8)	16 (100.0)	432 (100.0)	634 (100.0)	1,017 (73.4)	1 (1.6)	1 (20.0)	277 (49.1)
Without	720 (23.2)	0	0	0	368 (26.6)	61 (98.4)	4 (80.0)	287 (50.9)
Pleural indentation									<0.001
Yes	1,135 (36.6)	0	88 (20.4)	182 (28.7)	826 (59.6)	38 (61.3)	0	1 (0.2)
No	1,963 (63.4)	16 (100.0)	344 (79.6)	452 (71.3)	559 (40.4)	24 (38.7)	5 (100.0)	563 (99.8)
Smoking history, n (%)									0.001
Yes	239 (7.7)	2 (12.5)	12 (2.8)	26 (4.1)	119 (8.6)	44 (71.0)	0	36 (6.4)
No	2,859 (92.3)	14 (87.5)	420 (97.2)	608 (95.9)	1,266 (91.4)	18 (29.0)	5 (100.0)	528 (93.6)
Tumor biomarkers (mean ± SD)
CEA	2.76±7.9	2±1.18	1.77±1.51	1.88±1.26	3.66±11.5	4.37±6.12	2.29±1.27	2.66±1.74	0.438
CA19-9	2.48±1.17	2.24±1.24	2.28±0.98	2.38±1.09	2.54±1.11	3.11±1.27	2.83±0.87	2.57±1.44	<0.001
CYFRA21-1	0.85±0.85	0.98±0.80	0.812±0.50	0.847±0.82	0.861±0.98	1.06±0.71	0.68±0.11	0.843±0.79	0.711
NSE	18.14±6.68	16.5±6.27	17.9±6.62	17.7±6.44	18.5±7.05	17.00±4.61	17±3.62	18±6.28	0.685
CA125	12.06±15.4	11±5.53	13.6±35.2	12.1±10.3	11.5±7.83	11±4.54	10.5±3.64	12.3±9.05	0.433

Minimum 1 cm adenocarcinoma with lymph node metastasis (N1, N2), 1.5 cm squamous carcinoma with lymph node metastasis (N1). AAH, atypical adenomatous hyperplasia; AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma; SD, standard deviation; RUL, right upper lobe; RML, right middle lobe; RLL, right lower lobe; LUL, left upper lobe; LLL, left lower lobe; CT, computed tomography; GGO, ground glass opacity; CEA, carcinoembryonic antigen; CA19-9, carbohydrate antigen 19-9; CYFRA21-1, cytokeratin 19 fragment antigen21-1; NSE, neuron-specific enolase; CA125, cancer antigen 125. Lymph node metastasis was observed in IAC 1 cm in CT screening (both N1 and N2) in diameter and squamous carcinoma 1.5 cm (N1) in diameter, whereas it was not observed in patients with AAH/AIS/MIA or other malignant tumors <1 cm who underwent systemic lymphadenectomy or lymph node sampling. Results for tumor markers (including carcinoembryonic antigen, carbohydrate antigen 19-9, cytokeratin 19 fragment, neuron-specific enolase, and cancer antigen 125) were considered in both the training cohort (n=2,059) and internal validation cohort (n=963). The clinicopathological characteristics of the external cohort are summarized in Table S1.

FS and surgical procedure accuracy

The comparison of FS and FP results is shown in . The FS concordance compared with FP was: AAH, 81.3%; AIS, 34.3%; MIA, 8.8%; IAC, 77.2%; other types of malignancy, 88.7%; low-grade malignancy, 40%; and benign, 98.4%. FS results compared with FP results were as follows (stratified by pathological type): correct, 1,898 (61.3%); underestimated, 54 (1.7%); overestimated, 100 (3.2%); error, 12 (0.4%); and equivocal, 1,034 (33.4%). FS results compared with FP results were classified as follows (stratified by high or low-risk): correct, 2,022 (65.3%); underestimated, 19 (0.6%); overestimated, 23 (0.7%); and equivocal, 1,034 (33.4%).

Table 2

Comparison of frozen section and final pathology results

Frozen section results	Final pathology results, n (%)
Frozen section results	AAH (n=16)	AIS (n=432)	MIA (n=634)	IAC (n=1,385)	Other types of malignancy (n=62)	Low-grade malignancy (n=5)	Benign (n=564)
AAH	13 (81.3*)	13 (3.0)	3 (0.5)	4 (0.3)	0	0	0
AIS	0	148 (34.3*)	23 (3.6)	2 (0.1)	0	0	0
MIA	0	77 (17.8)	56 (8.8*)	9 (0.6)	0	0	0
IAC	1 (6.2)	12 (2.8)	10 (1.6)	1,068 (77.2*)	1 (1.6)	0	0
Other types of malignancy	0	0	0	0	55 (88.7*)	1 (20.0)	0
Low-grade malignancy	0	0	0	0	0	2 (40.0*)	1 (0.2)
Benign	0	4 (0.9)	1 (0.2)	2 (0.1)	2 (3.2)	1 (20.0)	555 (98.4*)
Equivocal	2 (12.5)	178 (41.2)	541 (85.3)	300 (21.7)	4 (6.5)	1 (20.0)	8 (1.4)

*, indicates the frozen section accuracy for each type of solitary pulmonary nodule. AAH, atypical adenomatous hyperplasia; AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma. Tumor size, MIA pathology, and GGO components were identified as risk factors for incorrect FS determination in the univariate and multivariate regression analyses (). The accuracy of the extent of surgical resection was as follows: correct surgical extent, 81.2% (n=2,516), and incorrect surgical extent, 18.8% (n=582). Of the 1,034 patients with equivocal FS results, the extent of resection was correct in 494 (47.8%) and incorrect proportion is 52.2% (540 patients, 277 too large and 263 too small).

Table 3

Univariable and multivariable analyses of factors contributing to incorrect frozen section diagnosis

Variable	Univariate analysis		Multivariate analysis
Variable	OR (95% CI)	P	OR (95% CI)	P
Age	0.97 (0.97, 0.98)	<0.001	1.01 (1.00, 1.02)	0.246
Maximum diameter	0.29 (0.25, 0.33)	<0.001	0.38 (0.30, 0.47)	<0.001
Sex
Male	Reference		Reference
Female	1.80 (1.54, 2.11)	<0.001	1.13 (0.91, 1.40)	0.272
Location
LUL	Reference		Reference
LLL	0.97 (0.77, 1.23)	0.819	1.30 (0.95, 1.76)	0.096
RUL	1.15 (0.95, 1.38)	0.147	1.20 (0.94, 1.52)	0.138
RML	0.95 (0.58, 1.51)	0.825	1.14 (0.55, 2.35)	0.725
RLL	0.81 (0.64, 1.02)	0.075	0.95 (0.70, 1.30)	0.769
Pathology
AAH	Reference		Reference
AIS	3.40 (1.08, 14.99)	0.059	3.99 (1.24, 17.80)	0.035
MIA	28.77 (9.05, 127.41)	<0.001	43.50 (13.36, 195.34)	<0.001
IAC	1.29 (0.41, 5.64)	0.696	5.15 (1.58, 23.14)	0.013
Other types of cancer	0.46 (0.11, 2.42)	0.320	8.30 (1.72, 47.26)	0.010
Low-grade malignancy	1.08 (0.05, 11.69)	0.950	3.55 (0.13, 49.48)	0.367
Benign	0.06 (0.02, 0.31)	<0.001	0.16 (0.04, 0.79)	0.013
Pleural indentation
No	Reference		Reference
Yes	0.77 (0.66, 0.90)	0.001	0.86 (0.69, 1.06)	0.154
GGO component
No	Reference		Reference
Yes	16.26 (11.55, 23.72)	<0.001	4.27 (2.85, 6.62)	<0.001
History of smoking
No	Reference		Reference
Yes	0.51 (0.37, 0.70)	<0.001	0.95 (0.61, 1.46)	0.807

OR, odds ratio; CI, confidence interval; LUL, left upper lobe; LLL, left lower lobe; RUL, right upper lobe; RML, right middle lobe; RLL, right lower lobe; AAH, atypical adenomatous hyperplasia; AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma; GGO, ground glass opacity.

Models

As shown in , the AUC for Model 1 was 0.633 in the internal validation cohort (95% CI: 0.603–0.662), whereas those for Models 2 and 3 were 0.889 (0.869–0.909) and 0.903 (0.884–0.922), respectively. Comparison among the three models revealed the AUC of Model 3 was significantly larger than that of Models 1 and 2 (P<0.001 and P=0.012, respectively). The classification performance of the models was also evaluated using the NRI and calibration plots (the number of replications is 1,000). Comparison of the NRI between Models 2 and 3 is presented in and calibration plots in the internal validation cohort are presented in . The NRI between these two models was 0.06 (0.03–0.10), indicating that Model 3 exhibited significantly better reclassification performance. The AUC of the three models in the external validation cohort is shown in , the AUC for Model 1 was 0.639 (95% CI: 0.553–0.726), whereas those for Models 2 and 3 were 0.889 (0.830–0.948) and 0.919 (0.871–0.967), respectively. Comparison among the three models revealed the AUC of Model 3 was significantly larger than that of Model 1 (P<0.001), with no significant difference between Models 2 and 3 (P=0.196). Comparison of the NRI between Models 2 and 3 is presented in and the calibration plots for the external validation cohort are presented in . The NRI comparison between Models 2 and 3 was 0.10 (0.01–0.27), also indicating Model 3 exhibited significantly better reclassification performance.

Figure 2

Figure 3

ROC curve, NRI, and calibration plot of the three models in the external validation cohort. (A) AUC for the three models; (B) NRI of the three models; (C) calibration plots of Model 1; (D) calibration plots of Model 2; (E) calibration plots of Model 3. AUC, area under the ROC curve; NRI, net reclassification index; Pr, probability; ROC, receiver operating characteristic.

ROC curve, NRI, and calibration plot of the three models in the internal validation cohort. (A) AUC for the three models; (B) NRI of the three models; (C) calibration plots of Model 1; (D) calibration plots of Model 2; (E) calibration plots of Model 3. AUC, area under the ROC curve; NRI, net reclassification index; Pr, probability; ROC, receiver operating characteristic. ROC curve, NRI, and calibration plot of the three models in the external validation cohort. (A) AUC for the three models; (B) NRI of the three models; (C) calibration plots of Model 1; (D) calibration plots of Model 2; (E) calibration plots of Model 3. AUC, area under the ROC curve; NRI, net reclassification index; Pr, probability; ROC, receiver operating characteristic. We conducted confusion matrix analysis and assessed the accuracy, sensitivity, specificity, NPV, and PPV of the three models, and their comparison is shown in . Model 3 exhibited the best diagnostic accuracy, with >80% accuracy, sensitivity, specificity, PPV, and NPV in both the internal and external validation cohorts.

Table 4

Diagnostic accuracy of the different models

Cohorts	Model	Accuracy (%)	Sensitivity (%)	Specificity (%)	PPV (%)	NPV (%)
Internal validation	Model 1	62.82	76.82	49.70	58.88	69.58
	Model 2	79.65	78.97	80.28	78.97	80.28
	Model 3	82.76	84.98	80.68	80.49	85.14
External validation	Model 1	62.39	73.47	54.41	53.73	74.00
	Model 2	82.05	77.55	85.29	79.17	84.06
	Model 3	87.18	85.71	88.24	84.00	89.55

NPV, negative predictive value; PPV, positive predictive value.

Discussion

In this research, we aimed to establish an ML model to determine the invasion status of SPNs to aid surgeons in decision-making regarding the extent of surgical resection and lymphadenectomy. CT screening assists with identifying some early-stage lung cancers, particularly those associated with favorable histology (16), and increasing interest in sublobar resection has been shown to preserve lung function, to reduce perioperative morbidity, and to provide a chance of resection for a subsequent primary lung cancer (17,18). To date, the optimal extent of surgical resection and lymphadenectomy remains controversial. Sublobar resection without lymph node dissection may be the preferred surgical procedure for some low-grade malignancies and early-stage lung adenocarcinomas (5,6). However, “spread through air spaces (19)” and lymph node metastases may still be present for some lung malignancies with smaller diameters (≤1 cm) (20) (similar to this study) and require lobectomy and lymphadenectomy. It was recently revealed that it is inappropriate to decide on surgical strategies solely based on imaging performance, because many GGO-predominant nodules may also be IACs, and the extent of infiltration cannot be determined based on the amount of GGO component (7). Although FS may represent a better choice for guiding the surgical strategy, its small specimen volume makes FS-guided determination more challenging. It is also difficult to interpret lung tissue FS because of severely distorted architecture, ice crystal formation, and the complete collapse of the alveolar spaces during cryosection. This issue is of particular concern for the determination of MIA when stromal invasion is ≤5 mm. MIA leads to a diagnosis of IAC, and neglecting the invasive component leads to a diagnosis of AIS. In this study, it was also found that lesions with GGO component, those with smaller diameter, and those with MIA pathology are high risk factors for incorrect cryosection determination. Various reports have shown that the accuracy of FS diagnosis varies across hospitals (9,10,12,21). Large-scale medical centers, such as the Shanghai Chest Hospital (>17,000 thoracic surgeries in 2020), may have a high surgical volume, which needs critically short FS time (usually <30 min). Consequently, the accuracy of FS pathology was measured as 61.3% in this study, which slightly improved to 65.3% when tumors were stratified based on the high- or low-risk group. In the real-world setting, the diagnosis of “atypia, defer to permanent sections” when examining minute pulmonary lesions on FS is often made by the surgical pathologist because it avoids possible diagnostic errors and potential medico-legal exposure. In this study, the rate of such cases was as high as 33.4%. While many pathologists have also tried to adopt new methods, such as the inflation method, to improve the accuracy of FS (22,23), the number of cases in these studies was too limited to establish a definitive method of FS. Interestingly, correct surgical extent was determined in 81.2% of patients, suggesting that even with ambiguous FS results, surgeons made partly accurate judgments, either empirically or with reference to other factors. Studies have shown that combining intraoperative FS results with tumor diameter may significantly increase judgment accuracy (12), and some investigators have also used radiomics methods combined with intraoperative FS to determine the infiltration degree of adenocarcinoma (24,25). ML has played an increasingly important role in the classification and prediction of problems and has achieved excellent results in the diagnosis and treatment of heart failure (26), survival prediction in patients with breast cancer (27), medical imaging (28), and biomedicine (29). Meanwhile, RF, an ensemble learning method based on a decision tree, has exhibited unparalleled accuracy among current algorithms, run efficiently on large databases, and generated an internal unbiased estimate of the generalization error as forest building progresses, making it an effective method for estimating missing data while maintaining accuracy (30). As the established model was robust and could deal with nonlinear problems, we were motivated to investigate whether a more accurate determination of SPNs could be achieved using logistic regression analysis and an RF algorithm to build models. In clinical practice, classification of tumors into low- and high-risk groups is sufficient for surgeons. Therefore, the models were also set to determine the low- and high-risk groups rather than accurate pathological results. In the selection of clinical features, cigarette smoking was identified as a major risk factor for lung cancer because cigarettes contain numerous carcinogens, mutagens, and other toxicants (31). Regarding preoperative imaging, we selected GGO component and pleural indentation as two indicators, as most of these are associated with lung malignancies (32,33). Additionally, increases in tumor marker levels have been associated with certain lung malignancies (34,35). Model 2 (logistic regression), combining clinical features and FS results, and Model 3 (established using the RF algorithm) were better than Model 1. Model 3 was optimal, showing an increase in accuracy from 62.82% to 82.76% in the internal validation cohort, with significant improvements in precision and specificity. In the ROC, the AUC also increased from 63.3% to 90.3% in the internal validation cohort, and the calibration plots and NRI confirmed these results. In the independent external validation group, Model 3 increased the accuracy from 62.39% to 87.18%, and the difference was statistically significant, and in ROC, the AUC increased from 63.9% to 91.9%, as did the calibration plots and NRI. By testing the internal validation group against the external validation group, we found that Model 3 presents a significant advantage in determining the low-/high-risk group. Therefore, we conclude that single-dimensional information (such as FS, CT, and others) is insufficient to determine the nature of SPN more accurately, and the combination of multi-dimensional data is required to make a synergistic judgment and improve accuracy. Furthermore, as the RF algorithm-based models in ML may significantly improve the validity of the judgment, this method may effectively help surgeons decide on the surgical resection area under the current situation in which imaging histology and CT image texture analysis are not widely used.

Study limitations

First, the judgment accuracy of Model 3 was insufficient at 82.7%, although this may be improved by increasing the amount of data when using the RF algorithm. Second, the imaging features analyzed in this study included only the GGO components and pleural indentations because these features are easily accessible in real-world clinical practice and helpful in both large medical centers and small hospitals. However, advancements in radiomics techniques may allow the use of vast information contained within CT images in future studies. Notably, deep learning techniques offer a potential solution for interpreting these complex and ever-increasing data in CT images. Our previous study identified epidermal growth factor receptor mutation status in patients with lung adenocarcinoma using CT images based on a three-dimensional deep convolutional neural network method (36). The application of deep learning and extraction of additional CT data may improve model accuracy in this study. Third, we validated the classification results of the model using an external validation cohort, and the results showed Model 3 still exhibited the best classification results, and a larger AUC was obtained compared with Model 3 in the internal validation cohort. However, the AUC in Model 3 was not significantly larger than that in Model 2 because of the relatively small number of cases included in the external validation cohort. Therefore, it is reasonable to suspect Model 3 may exhibit better classification results when applied to a larger external population. Future studies may overcome these limitations by conducting multicenter, standardized trials and exploring more suitable ways of combining large amounts of clinical data and FS to identify strategies that may increase the accuracy of intraoperative classification in patients with SPNs. In conclusion, our results suggest an RF model combining clinical characteristics and intraoperative FS may significantly improve the accuracy of SPN classification. The model may also be used as a reliable complementary method when FS evaluation is equivocal, resulting in a more accurate extent of surgical resection. This may aid surgeons in making more accurate surgical decisions to avoid unnecessary lung function loss and related complications. Future studies should consider using deep learning to quantitatively analyze paraffin sections (used to determine neurological tumor pathology) (37) and intraoperative FS images to incorporate them into the model, improving the model accuracy and increasing the objectivity of intraoperative FS analysis. The article’s supplementary files as

37 in total

Review 1. Machine Learning for Medical Imaging.

Authors: Bradley J Erickson; Panagiotis Korfiatis; Zeynettin Akkus; Timothy L Kline
Journal: Radiographics Date: 2017-02-17 Impact factor: 5.333

2. Artificial Intelligence, Machine Learning, Deep Learning, and Cognitive Computing: What Do These Terms Mean and How Will They Impact Health Care?

Authors: Stefano A Bini
Journal: J Arthroplasty Date: 2018-02-27 Impact factor: 4.757

Review 3. Evaluation of the solitary pulmonary nodule.

Authors: Ashleigh Cruickshank; Geoff Stieler; Faisal Ameer
Journal: Intern Med J Date: 2019-03 Impact factor: 2.048

4. Evaluation of Cyfra 21-1: a potential tumor marker for non-small cell lung carcinomas.

Authors: D Karnak; G Ulubay; O Kayacan; S Beder; E Ibis; G Oflaz
Journal: Lung Date: 2001 Impact factor: 2.584

5. Identifying epidermal growth factor receptor mutation status in patients with lung adenocarcinoma by three-dimensional convolutional neural networks.

Authors: Jun-Feng Xiong; Tian-Ying Jia; Xiao-Yang Li; Wen Yu; Zhi-Yong Xu; Xu-Wei Cai; Ling Fu; Jie Zhang; Bin-Jie Qin; Xiao-Long Fu; Jun Zhao
Journal: Br J Radiol Date: 2018-08-13 Impact factor: 3.039

Review 6. Carcinoembryonic antigen (CEA) as tumor marker in lung cancer.

Authors: M Grunnet; J B Sorensen
Journal: Lung Cancer Date: 2011-12-06 Impact factor: 5.705

7. Surgical treatment of non-small cell lung cancer 1 cm or less in diameter.

Authors: Daniel L Miller; Charles M Rowland; Claude Deschamps; Mark S Allen; Victor F Trastek; Peter C Pairolero
Journal: Ann Thorac Surg Date: 2002-05 Impact factor: 4.330

8. Precise Diagnosis of Intraoperative Frozen Section Is an Effective Method to Guide Resection Strategy for Peripheral Small-Sized Lung Adenocarcinoma.

Authors: Shilei Liu; Rui Wang; Yang Zhang; Yuan Li; Chao Cheng; Yunjian Pan; Jiaqing Xiang; Yawei Zhang; Haiquan Chen; Yihua Sun
Journal: J Clin Oncol Date: 2015-11-23 Impact factor: 44.544

9. Pulmonary function changes after different extent of pulmonary resection under video-assisted thoracic surgery.

Authors: Zhitao Gu; Huimin Wang; Teng Mao; Chunyu Ji; Yangwei Xiang; Yan Zhu; Ping Xu; Wentao Fang
Journal: J Thorac Dis Date: 2018-04 Impact factor: 2.895

10. Clinical Significance of Pleural Attachment and Indentation of Subsolid Nodule Lung Cancer.

Authors: Hyung-Jun Kim; Jun Yeun Cho; Yeon Joo Lee; Jong Sun Park; Young-Jae Cho; Ho Il Yoon; Jin-Haeng Chung; Sukki Cho; Kwhanmien Kim; Kyung Won Lee; Jae Ho Lee; Choon-Taek Lee
Journal: Cancer Res Treat Date: 2019-03-25 Impact factor: 4.679