Literature DB >> 35392654

Comprehensive Analysis of Clinical Logistic and Machine Learning-Based Models for the Evaluation of Pulmonary Nodules.

Kai Zhang1, Zihan Wei1,2, Yuntao Nie1, Haifeng Shen1, Xin Wang1,2, Jun Wang1, Fan Yang1, Kezhong Chen1.   

Abstract

Introduction: Over the years, multiple models have been developed for the evaluation of pulmonary nodules (PNs). This study aimed to comprehensively investigate clinical models for estimating the malignancy probability in patients with PNs.
Methods: PubMed, EMBASE, Cochrane Library, and Web of Science were searched for studies reporting mathematical models for PN evaluation until March 2020. Eligible models were summarized, and network meta-analysis was performed on externally validated models (PROSPERO database CRD42020154731). The cut-off value of 40% was used to separate patients into high prevalence (HP) and low prevalence (LP), and a subgroup analysis was performed.
Results: A total of 23 original models were proposed in 42 included articles. Age and nodule size were most often used in the models, whereas results of positron emission tomography-computed tomography were used when collected. The Mayo model was validated in 28 studies. The area under the curve values of four most often used models (PKU, Brock, Mayo, VA) were 0.830, 0.785, 0.743, and 0.750, respectively. High-prevalence group (HP) models had better results in HP patients with a pooled sensitivity and specificity of 0.83 (95% confidence interval [CI]: 0.78-0.88) and 0.71 (95% CI: 0.71-0.79), whereas LP models only achieved pooled sensitivity and specificity of 0.70 (95% CI: 0.60-0.79) and 0.70 (95% CI: 0.62-0.77). For LP patients, the pooled sensitivity and specificity decreased from 0.68 (95% CI: 0.57-0.78) and 0.93 (95% CI: 0.87-0.97) to 0.57 (95% CI: 0.21-0.88) and 0.82 (95% CI: 0.65-0.92) when the model changed from LP to HP models. Compared with the clinical models, artificial intelligence-based models have promising preliminary results. Conclusions: Mathematical models can facilitate the evaluation of lung nodules. Nevertheless, suitable model should be used on appropriate cohorts to achieve an accurate result.
© 2022 The Authors.

Entities:  

Keywords:  Lung cancer; Machine learning; Network meta-analysis; Prediction model; Pulmonary nodules

Year:  2022        PMID: 35392654      PMCID: PMC8980995          DOI: 10.1016/j.jtocrr.2022.100299

Source DB:  PubMed          Journal:  JTO Clin Res Rep        ISSN: 2666-3643


Introduction

A pulmonary nodule (PN) is defined as an approximately round lesion surrounded by pulmonary parenchyma that is less than 3 cm in diameter. PNs have become increasingly common with the increased use of computed tomography (CT).1, 2, 3 Although most nodules are benign, a proportion of nodules are lung cancers, which is the leading cause of cancer-related death worldwide. It is considered that the incidence of cancer in patients with solitary PNs ranges from 3.2% to 4.5%., Therefore, the main goal for PN management is to identify patients with malignant nodules and administer proper treatment. Current guidelines for the management of PNs recommend a systematic approach to PN assessment on the basis of clinical and radiographic characteristics.7, 8, 9 The evaluation could be carried out either by experienced clinicians or by mathematical models developed to quantify the probability of malignancy of PNs. For patients with a high risk of malignant PNs, more aggressive interventions such as surgical intervention and CT biopsy are recommended, whereas serial high-resolution CT on a regular basis is recommended for PNs with a low risk of malignancy. Over the years, multiple models have been developed for the evaluation of PNs. Nevertheless, owing to various results and a lack of comparison, a consensus has not been made on the diagnostic value of these models. Moreover, with the development of deep learning, artificial intelligence (AI)-based models have been developed, and few articles have compared them with mathematical models. To perform a comprehensive analysis, we reviewed current clinical mathematical models that evaluate the probability of the malignancy of PNs and conducted a network analysis of the diagnostic accuracy of most often used models. We also summarized AI-based models that reported area under the curve (AUC) values and compared them with those of the mathematical models.

Materials and Methods

Search Strategy

First, we searched the Medical Subject Headings term database of the National Center for Biotechnology Information for all possible expressions for “lung cancer” and proposed possible expressions for “prediction model.” Then, we used the combination of the expressions to search the PubMed, EMBASE, Cochrane Library, and Web of Science databases up to March 30, 2020, without language limitations. The specific search strategy is listed as follows: (“Clinical Model” or “Clinical Prediction Model” or “Mathematical Model” or “Mathematical Prediction Model” or “Prediction Model” or “Gurney Model” or “Mayo Clinic Model” or “Herder Model” or “VA Model” or “PKU Model” or “Brock Model” or “TREAT Model” or “Bayesian Inference Malignancy Calculator” or “BIMC”) and (“Pulmonary Neoplasms” or “Lung Neoplasm” or “Pulmonary Neoplasm” or “Lung Cancer” or “Pulmonary Cancer” or “Pulmonary Cancers” or “Cancer of the Lung” or “Cancer of Lung” or “Pulmonary Nodule” or “Lung Nodule”). Titles and abstracts were used to identify papers related to prediction models for the cancer probability assessment of PNs. Full texts were then retrieved to extract data for calculation. This analysis was performed according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. The study design was registered in the PROSPERO database (CRD42020154731).

Selection Criteria

Reviews, case studies, editorials, meeting abstracts, and search results that were not related to any search criteria were excluded. All articles that proposed or validated prediction models for cancer probability assessment were included, and further screening was conducted after reading the full-text articles. The exclusion criteria for full-text screening were as follows: (1) the models were not built for predicting the cancer probability of lung nodules; (2) the models were not built with mathematical methods; (3) the models did not take clinical information into consideration; and (4) insufficient data for analysis.

Data Extraction

The following basic information was extracted: first author’s name, year of publication, nation, size of the study, characteristics of the patients included in the study, prevalence of malignancy among the patients, average nodule size, models compared, and the result of comparisons (including the AUC, sensitivity, and specificity of each model compared). For papers that did not report the sensitivity and specificity of the models compared from publicized materials, we sent an e-mail to request the data. Then, we calculated the number of true positives, false negatives, false positives, and true negatives from the data acquired. The AUC values of other PN evaluation methods, including biomarkers, imaging, and physician assessments, were collected in the process for further analysis. Two authors (KZ and ZW) determined the study eligibility and extracted data independently, and any discrepancies between the two authors were resolved by discussion with a third author (KC).

Evaluation of AI-Based Models

Recently, AI-based models were reported to have a fairly good performance in PN evaluation., Therefore, we also analyzed AI-based models in this study. The AUC values reported by AI-based models were collected, even when these articles were excluded from the major network meta-analysis. Although there has been no article on March 30, 2020, that has compared mathematical models with AI-based models directly, the AUC values of AI-based models were summarized and the trend was analyzed.

Statistical Analysis

All models compared in the studies were included, and the variables used in the models were reviewed. The AUC values were compared by depicting a network plot. Most often used and externally validated models (Brock, Mayo, PKU, VA) were selected for a network meta-analysis; the summary receiver operating characteristic (SROC) curve was plotted with the method proposed by Reitsma et al.; and the area under the SROC curve (AUSROC) was calculated. Sensitivity and specificity of each model were also pooled using analysis of variance model, and diagnostic OR and superiority index were calculated. In this article, we considered a model as most often used if it has been used in at least five independent cohorts. We noticed that the malignant rate of PNs in the included articles fell into the following two distributions: >40% and <25% (Supplementary Fig. 1); therefore, we used the cut-off value of 40% to separate patients into high prevalence (HP) and low prevalence (LP) and performed a subgroup analysis. Accordingly, models developed using HP nodules (malignant rate >40%) were defined as HP models and models using LP nodules (malignant rate <25%) were defined as LP models. During the analysis, when we encountered studies from the same medical center, we included the study using data from multiple hospitals to prevent duplicated patients. All analyses were performed by R Software (R version 3.6.1 [2019-07-05], The R Foundation for Statistical Computing, with packages “mada” and “meta4diag”).

Quality of Evidence

Quality Assessment of Diagnostic Accuracy Studies 2 is a tool designed by the Quality Assessment of Diagnostic Accuracy Studies 2 group for the evaluation of the quality of diagnostic accuracy studies. The tool comprises the following four domains: patient selection, index test, reference standard, and flow and timing. The methodological quality of the eligible studies was evaluated by this tool by two reviewers (KZ and ZW; Supplementary Table 1).

Results

Our search resulted in 1816 articles, and after assessment, 42 articles were eligible for the study (Fig. 1A). Further searches through the reference list did not reveal additional relevant articles. The status of the data collection is summarized in Supplementary Figure 2.
Figure 1

Process of study selection and summarization of all variables collected and used in eligible models. (A) PRISMA flow diagram of the study selection process. (B) All variables collected by eligible models. The variables are summarized in a pyramid chart and separated into five levels. Variables with a higher frequency occupy a higher level. The frequency is labeled after the variable names. (C) All variables used by the models. The variables are summarized in a pyramid chart and separated into four levels. Variables with a higher frequency occupy a higher level. The frequency is labeled after the variable names. BMI, body mass index; CEA, carcinoembryonic antigen; CT, computed tomography; CTR, consolidation/tumor ratio; FEV1, forced expiratory volume in the first second; FVC, forced vital capacity; miRNA, microRNA; NSE, neuron-specific enolase; PET, positron emission tomography; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; SCCA, squamous cell carcinoma antigen.

Process of study selection and summarization of all variables collected and used in eligible models. (A) PRISMA flow diagram of the study selection process. (B) All variables collected by eligible models. The variables are summarized in a pyramid chart and separated into five levels. Variables with a higher frequency occupy a higher level. The frequency is labeled after the variable names. (C) All variables used by the models. The variables are summarized in a pyramid chart and separated into four levels. Variables with a higher frequency occupy a higher level. The frequency is labeled after the variable names. BMI, body mass index; CEA, carcinoembryonic antigen; CT, computed tomography; CTR, consolidation/tumor ratio; FEV1, forced expiratory volume in the first second; FVC, forced vital capacity; miRNA, microRNA; NSE, neuron-specific enolase; PET, positron emission tomography; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; SCCA, squamous cell carcinoma antigen. The characteristics of all 42 articles were summarized in Table 1. A total of 23 original models were proposed by these 42 articles (Supplementary Table 2), among which 10 models were externally validated. Most articles used logistic regression to generate a new model, and a few used Bayesian analysis. The variables collected to propose a new model were summarized in Figure 1B, and the variables used in the final model are summarized in Figure 1C. Among the 23 articles that proposed a new model, 22 articles collected data on the age and 21 collected data on the nodule size of the patient, and most of them used these variables in the final model. Nevertheless, although sex was also most often collected, it was seldom used in the final model. Moreover, although there were only a few models that collected the level of uptake or maximum standardized uptake value from positron emission tomography (PET)-CT results, all of them used the result in the final model. The characteristics of the CT images of PNs, including lobulation, calcification, cavitation, border, and pleural retraction sign, were also widely used in the final models.
Table 1

Characteristics of Eligible Studies

IDArticleCountryModelsStudy PopulationSubgroupSample SizePrevalence of Malignancy, %Average Nodule Size (mm)
1Gurney et al.18,aUnited StatesGurneybPathologically confirmed SPNsHP6667B 15M 29
2Swensen et al.19,aUnited StatesMayobPathologically confirmed SPNsLP62923B 11.6M 17.8
3Herder et al.20,aNetherlandHerder,b MayobSPN without benign calcifications, referred for PET scanHP10657
4Gould et al.21,aUnited StatesVAbPNs measured between 7 and 30 mm on CTHP3755417.03
5Schultz et al.22United StatesMayo,b VAbSPNs confirmed by pathology or 2-y follow-up, age between 39 and 87 yHP1514415
6Li et al.23,aPeople’s Republic of ChinaPKU,b Mayo,b VAbPathologically confirmed SPNs after surgeryHP3715319.8
7Tian et al.24,aPeople’s Republic of ChinaR Tian et al.SPNs with PET resultHP1057112.8
8McWilliams et al.25,aCanadaBrockbPNs from current or former smokers, ages between 50 and 75 yLP7008 (PanCan)14.3
LP5021 (BCCA)13.7
9Xiao et al.26People’s Republic of ChinaMayo,b VA,b PKUbPathologically confirmed SPNs after surgeryHP1077319.3
10Deppen et al.27,aUnited StatesTREAT, MayobNodules from VUMC cohort and VA cohortHP492 (VUMC)7228
HP226 (VA)9329
11Zhang et al.28People’s Republic of ChinaPKU,b Mayo,b VAbNodule count < 5, mGGO, and solid, no metastasisHP15481
12Al-Ameri et al.29United KingdomHerder,a Mayo,a VA,a BrockaPNs confirmed by pathology or 2-y follow-up, without pure GGOHP24441
13Vachani et al.30,aUnited StatesA Vachani et al.PNs confirmed by pathology or 2-y follow-up, age >40 yHP1415513
14Soardi et al.31,aItalyBIMC, GurneybSPNs with PET result, without calcificationHP3435814.9
15Yang et al.32,aPeople’s Republic of ChinaJ Yang et al., PKU,b Mayo,b VAbPathologically confirmed SPNs after surgeryHP2526717
16Zhang et al.33,aPeople’s Republic of ChinaGMUFH,a Mayo,b VA,b Brock,b PKUbPathologically confirmed SPNsHP12060
17Chen et al. 201634,aPeople’s Republic of ChinaJ Chen et al., PKU,b Mayo,b VAbPathologically confirmed SPNsHP2006817.41
HP89 (Validation)7918.91
18Perandini et al.35ItalyHerder,b BIMCSPNs with PET result, without calcificationHP1805417.8
19Perandini et al.36ItalyMayo,b Gurney,b PKU,b BIMCPathologically confirmed SPNsHP2855515.36
20Soardi et al.37ItalyBIMC, MayobSPNs from three medical centersHP2005415.89
21Chen et al.38People’s Republic of ChinaMayo,b PKUbPathologically confirmed PNs after surgeryHP4176
22Yang et al.39,aPeople’s Republic of ChinaLi Yang et al., VA,b MayobSPN referred to CT-guided biopsyHP107846718.43
HP344 (Validation)6918.16
23Tanner et al.40United StatesMayo,b VAbSPN with progression in 60 d, age > 40 yHP3374715.8
24W Yu (2017)41,aPeople’s Republic of ChinaW Yu et al.Pathologically confirmed GGOHP3624671.6
HP206 (Validation)701.5
25Lin et al.42People’s Republic of ChinaMayobPNs from current or former smokers, ages between 55 and 74 yHP135 (JPHTCM)5115.14
HP126 (BVAMC)5014.365
26She et al.43,aPeople’s Republic of ChinaY She et al., VA,b Mayo,b PKU,b BrockbPathologically confirmed solid SPNs after surgeryHP89946717.3
HP899 (Validation)6617.3
27Yang et al.44KoreaMayo,b VA,b Brock,b HerderbNodule count < 5, mGGO, and solid, no metastasisHP2427720
28Kim et al.45KoreaBrockSingle subsolid nodules confirmed as AAH or AIS or MIA or IPAHP101 (GGO)58B 11.1M 14.2
HP309 (mGGO)91B 13.6M 17.6
29Wang et al.46,cPeople’s Republic of ChinaZU,b Mayo,b VAbSPNs with PET resultHP1776718.89
30Nair et al.47United StatesBrock,b Mayo,b VAbNodules from NLSTLP2196 (Set 1)912.1
LP6568 (Set 2)37.6
31Ying et al.48,cPeople’s Republic of ChinaYing et al., MayobPathologically confirmed microsized SPN (<10 mm)HP102476
HP10 (Validation)60
32Winter et al.49United StatesA Winter et al., BrockbNodules from NLSTLP787936.89
33Xiao et al.50,aPeople’s Republic of ChinaCJFH, Mayo,b VA,b Brock,b PKU,b GMUFHbPathologically confirmed nonsolid SPNs after surgeryHP3628717.6
34Kim et al.51,aKoreaH Kim et al., BrockbPathologically confirmed subsolid nodules after surgeryHP32147215.7
HP106 (Validation)7215.8
35Uthoff et al.52United StatesMayo,b VA,b Brock,b PKUbSPNs, age between 40 and 87 yLP31722B 9.2M 16.3
36Xi et al.53,aPeople’s Republic of ChinaK Xi et al.Pathologically confirmed SPNsHP4070B 19M 25.1
HP5275B 14.0M 18.3
37Hammer et al.54United StatesBrockbGGO and PSN from NLSTLP4346
38Marcus et al.55,aUnited KingdomUKLSbNodules from UKLS trialLP10135
39Cui et al.56,aPeople’s Republic of ChinaMayo,b Brock,b VA bSPNs confirmed by pathology or 2-y follow-upHP2777317
40Guo et al.57People’s Republic of ChinaGLCI, Mayo,b PKU,b Herder,b ZU bSPNs with PET resultHP31246918.6
HP159 (Validation)80
41González Maldonado et al.11GermanyBrock,b UKLS,b Mayo,b VA,b PKUbNodules from LUSI trialLP39032B 4.0M 9.4
42Li et al.58People’s Republic of ChinaBrock,b Mayo,b VA,b PKUbPathologically confirmed PNs after surgeryHP49686

B stands for benign and M stands for malignant.

AAH, atypical adenomatous hyperplasia; AIS, adenocarcinoma in situ; BCCA, British Columbia Cancer Agency; BIMC, Bayesian inference malignancy calculator; BVAMC, Baltimore VA Medical Center; CJFH, China-Japan Friendship Hospital; CT, computed tomography; GGO, ground ground-glass opacity; GLCI, Guangdong Lung Cancer Institute; GMUFH, The First Affiliated Hospital Of Guangzhou Medical University; HP, high prevalence; ID, identification; IPA, invasive pulmonary adenocarcinoma; JPHTCM, Jiangsu Province Hospital of Traditional Chinese Medicine; LP, low prevalence; LUSI, German Lung Cancer Screening Intervention; mGGO, mixed ground-glass opacity; MIA, minimally invasive adenocarcinoma; NLST, National Lung Screening Trial; PanCan, Pan-Canadian Early Detection of Lung Cancer; PET, positron emission tomography; PKU, Peking University; PN, pulmonary nodule; PSN, part part-solid nodule; SPN, solidary pulmonary nodule; TREAT, thoracic research evaluation and treatment; UKLS, UK Lung Cancer Screening; VA, Department of Veterans Affairs; VUMC, Vanderbilt University Medical Center; ZU, Zhejiang University.

Models first established in the article.

Externally validated model.

Characteristics of Eligible Studies B stands for benign and M stands for malignant. AAH, atypical adenomatous hyperplasia; AIS, adenocarcinoma in situ; BCCA, British Columbia Cancer Agency; BIMC, Bayesian inference malignancy calculator; BVAMC, Baltimore VA Medical Center; CJFH, China-Japan Friendship Hospital; CT, computed tomography; GGO, ground ground-glass opacity; GLCI, Guangdong Lung Cancer Institute; GMUFH, The First Affiliated Hospital Of Guangzhou Medical University; HP, high prevalence; ID, identification; IPA, invasive pulmonary adenocarcinoma; JPHTCM, Jiangsu Province Hospital of Traditional Chinese Medicine; LP, low prevalence; LUSI, German Lung Cancer Screening Intervention; mGGO, mixed ground-glass opacity; MIA, minimally invasive adenocarcinoma; NLST, National Lung Screening Trial; PanCan, Pan-Canadian Early Detection of Lung Cancer; PET, positron emission tomography; PKU, Peking University; PN, pulmonary nodule; PSN, part part-solid nodule; SPN, solidary pulmonary nodule; TREAT, thoracic research evaluation and treatment; UKLS, UK Lung Cancer Screening; VA, Department of Veterans Affairs; VUMC, Vanderbilt University Medical Center; ZU, Zhejiang University. Models first established in the article. Externally validated model. For further analysis, we included models that had been validated by at least two external sources for a descriptive analysis of AUC values and seven models qualified for the analysis. A detailed comparison is found in Figure 2A (AUC; Supplementary Table 3). Among the models that were compared more than 10 times, the PKU model had a better AUC value in 26 of 34 of the comparisons. Although only a few articles compared the BIMC model with the other models, the BIMC model yielded an excellent AUC value and wined in all the comparisons. Then, we selected the most often used models for a network meta-analysis. SROC curves were plotted for most often used models (used in at least five independent cohorts) (Fig. 2B), and the PKU model yielded the best AUC. The AUSROC values for the PKU, Brock, VA, and Mayo models were 0.830, 0.785, 0.750, and 0.743, respectively. Diagnostic OR and superior index were also calculated, which revealed similar tendencies (Fig. 2B). The pooled sensitivity and specificity of all externally validated models with diagnostic values provided are compared in Figure 2C, in which most often used models display better and more balanced performance in sensitivity and specificity.
Figure 2

Comparisons of the AUC values, SROC curves, and diagnostic values among the models. (A) AUC comparison of seven models validated by at least two external sources. Each circular node represents a validated model. The area of the node is proportional to the total number of comparisons in eligible studies. The ratio of the times of better performance to the total number of comparisons is listed inside the node. Each line represents a type of head-to-head comparison, and the color of the line is identical to that of the winning model. The width of the lines is proportional to the number of head-to-head comparisons. (B) SROC curves of models with sufficient external validation (at least five independent cohorts). The solid line depicts the SROC curve plotted by the method proposed by Reitsma et al., and individual observations are marked with round points. The summary point is marked with a triangle point on the SROC curve, and its 95% confidence region is plotted with a dotted line. Different colors are assigned to each model. AUC values are listed in parentheses after the model names in the figure legend. Result of network meta-analysis using ANOVA model is listed below. (C) Comparison of the pooled sensitivity and specificity of the validated models. The value in each cell is defined as the pooled sensitivity or specificity of the model in the same row divided by the pooled sensitivity or specificity of the model in the same column. Cells with the model name are marked in orange, and cells containing the sensitivity and specificity values are marked in yellow and blue, respectively. ANOVA, analysis of variance; AUC, area under the curve; BIMC, Bayesian Inference Malignancy Calculator; DOR, duration of response; SROC, summary receiver operating characteristic.

Comparisons of the AUC values, SROC curves, and diagnostic values among the models. (A) AUC comparison of seven models validated by at least two external sources. Each circular node represents a validated model. The area of the node is proportional to the total number of comparisons in eligible studies. The ratio of the times of better performance to the total number of comparisons is listed inside the node. Each line represents a type of head-to-head comparison, and the color of the line is identical to that of the winning model. The width of the lines is proportional to the number of head-to-head comparisons. (B) SROC curves of models with sufficient external validation (at least five independent cohorts). The solid line depicts the SROC curve plotted by the method proposed by Reitsma et al., and individual observations are marked with round points. The summary point is marked with a triangle point on the SROC curve, and its 95% confidence region is plotted with a dotted line. Different colors are assigned to each model. AUC values are listed in parentheses after the model names in the figure legend. Result of network meta-analysis using ANOVA model is listed below. (C) Comparison of the pooled sensitivity and specificity of the validated models. The value in each cell is defined as the pooled sensitivity or specificity of the model in the same row divided by the pooled sensitivity or specificity of the model in the same column. Cells with the model name are marked in orange, and cells containing the sensitivity and specificity values are marked in yellow and blue, respectively. ANOVA, analysis of variance; AUC, area under the curve; BIMC, Bayesian Inference Malignancy Calculator; DOR, duration of response; SROC, summary receiver operating characteristic. Because the prevalence of lung cancer differs considerably in HP and LP patients, further analysis was performed after separating patients into the HP group and LP group. The PKU model remained the best model among the four most often used models for HP patients (AUSROC 0.826; Fig. 3A) and had a balanced performance in diagnostic values (Fig. 3B). For LP patients, the Mayo model yielded the best AUSROC (0.928; Fig. 3C) with a smaller confidence region of its summary point. Therefore, according to available data, the Mayo model is suitable for cancer risk prediction in LP patients, with applicable diagnostic value (Fig. 3D).
Figure 3

Subgroup analysis based on patient characteristics. (A) SROC curves of models with sufficient external validation (at least five independent cohorts) used in HP patients. (B) Comparison of the pooled sensitivity and specificity of the validated models in HP patients. (C) SROC curves of models with sufficient external validation used in LP patients. (D) Comparison of the pooled sensitivity and specificity of the validated models in LP patients. In the SROC plots, the solid line depicts the SROC curve plotted, and individual observations are marked with a round point. The summary point is marked with a triangle point on the SROC curve, and its 95% confidence region is plotted with a dotted line. Different colors are assigned to each model. AUC values are listed in parentheses after the model names in the figure legend. In the comparison of the diagnostic value, the value in each cell is defined as the pooled sensitivity/specificity of the model in the same row divided by the pooled sensitivity/specificity of the model in the same column. Cells with the model name are marked in orange, and cells containing the sensitivity and specificity values are marked in yellow and blue, respectively. AUC, area under the curve; HP, high prevalence; LP, low prevalence; SROC, summary receiver operating characteristic.

Subgroup analysis based on patient characteristics. (A) SROC curves of models with sufficient external validation (at least five independent cohorts) used in HP patients. (B) Comparison of the pooled sensitivity and specificity of the validated models in HP patients. (C) SROC curves of models with sufficient external validation used in LP patients. (D) Comparison of the pooled sensitivity and specificity of the validated models in LP patients. In the SROC plots, the solid line depicts the SROC curve plotted, and individual observations are marked with a round point. The summary point is marked with a triangle point on the SROC curve, and its 95% confidence region is plotted with a dotted line. Different colors are assigned to each model. AUC values are listed in parentheses after the model names in the figure legend. In the comparison of the diagnostic value, the value in each cell is defined as the pooled sensitivity/specificity of the model in the same row divided by the pooled sensitivity/specificity of the model in the same column. Cells with the model name are marked in orange, and cells containing the sensitivity and specificity values are marked in yellow and blue, respectively. AUC, area under the curve; HP, high prevalence; LP, low prevalence; SROC, summary receiver operating characteristic. A subgroup analysis was conducted to investigate the effect of the prevalence of malignancy in different cohorts on the models. As found in Figure 4, HP models had a better result in predicting the cancer probability of PNs in HP patients, with a pooled sensitivity and specificity of 0.83 (95% confidence interval [CI]: 0.78–0.88; Fig. 4A) and 0.71 (95% CI: 0.63–0.79; Fig. 4B) compared with the LP models, which had a pooled sensitivity and specificity of 0.70 (95% CI: 0.60–0.79; Fig. 4G) and 0.70 (95% CI: 0.62–0.77; Fig. 4H). For LP patients, we observed that the pooled sensitivity and specificity decreased from 0.68 (95% CI: 0.57–0.78; Fig. 4E) and 0.93 (95% CI: 0.87–0.97; Fig. 4F) to 0.57 (95% CI: 0.21–0.88; Fig. 4C) and 0.82 (95% CI: 0.65–0.92; Fig. 4D) when the model was changed from LP models to HP models. Overall, the pooled sensitivity and specificity of the HP models were 0.82 (95% CI: 0.77–0.87; Supplementary Fig. 3A) and 0.72 (95% CI: 0.65–0.79; Supplementary Fig. 3B), and the pooled sensitivity and specificity of the LP models were 0.70 (95% CI: 0.61–0.77; Supplementary Fig. 3C) and 0.76 (95% CI: 0.68–0.83; Supplementary Fig. 3D), respectively.
Figure 4

Subgroup analysis of the effect of study population on the models. (A) Forest plot of the pooled sensitivity when the HP model is used on HP patients. (B) Forest plot of the pooled specificity when the HP model is used on HP patients. (C) Forest plot of the pooled sensitivity when the HP model is used on LP patients. (D) Forest plot of the pooled specificity when the HP model is used on LP patients. (E) Forest plot of the pooled sensitivity when the LP model is used on LP patients. (F) Forest plot of the pooled specificity when the LP model is used on LP patients. (G) Forest plot of the pooled sensitivity when the LP model is used on HP patients. (H) Forest plot of the pooled specificity when the LP model is used on HP patients. FN, false negative; FP, false positive; HP, high prevalence; LP, low prevalence; TN, true negative; TP, true positive.

Subgroup analysis of the effect of study population on the models. (A) Forest plot of the pooled sensitivity when the HP model is used on HP patients. (B) Forest plot of the pooled specificity when the HP model is used on HP patients. (C) Forest plot of the pooled sensitivity when the HP model is used on LP patients. (D) Forest plot of the pooled specificity when the HP model is used on LP patients. (E) Forest plot of the pooled sensitivity when the LP model is used on LP patients. (F) Forest plot of the pooled specificity when the LP model is used on LP patients. (G) Forest plot of the pooled sensitivity when the LP model is used on HP patients. (H) Forest plot of the pooled specificity when the LP model is used on HP patients. FN, false negative; FP, false positive; HP, high prevalence; LP, low prevalence; TN, true negative; TP, true positive. To explore the influence of PET-CT on the diagnostic performance of the models, we performed subgroup analysis on the PET-CT results. Models using PET-CT results as a variable had a high pooled sensitivity of 0.88 (95% CI: 0.77–0.95; Supplementary Fig. 4A) compared with 0.73 (95% CI: 0.68–0.77; Supplementary Fig. 4C) for models that did not use PET-CT. Nevertheless, the pooled specificity seemed to be lower in models with PET-CT results of 0.71 (95% CI: 0.49–0.89; Supplementary Fig. 4B) compared with 0.76 (95% CI: 0.71–0.80; Supplementary Fig. 4D) for models without PET-CT. Among some of the included articles, the diagnostic value of the model was also compared with that of various biomarkers, imaging methods, and physician assessment. The AUC values of each method were collected and analyzed. The average AUC value of the models was higher than that of the other methods, although no significant difference was observed (Supplementary Fig. 5A). When evaluating AI-based models, 11 articles were included, and five of them were developed using HP patients whereas others were developed using LP patients from screening projects. The AUC values of the AI-based models were compared with those of the models, biomarkers, imaging, and physicians (Supplementary Fig. 5B). In recent 5 years, the AUC of the AI-based models had raised from an average of 0.831 (±0.071) in 2017 to 0.919 in 2020, whereas the AUC of the mathematical models seems to bear a better robustness. Further regression of the AUC values of the AI-based models revealed that the AUC values of the AI-based models increased with the development of AI throughout the years (p = 0.074; Supplementary Fig. 5C), whereas mathematical models did not. Though the development of AI-based models seemed not statistically different, the trend can also be validated by studies in the same data set. Nevertheless, this might indicate that the performance of well-trained AI models might exceed that of the current methods in PN evaluation in the future. External validation is still needed for the AI-based models.

Discussion

With the increasing use of CT in lung cancer screening, it has become increasingly considerable to estimate the cancer probability accurately during the management of PNs for both inpatients in the surgical department and outpatients who participate in CT screening. In view of this, we summarized all clinical mathematical models for the evaluation of PNs and conducted a network meta-analysis for the first time. To ensure objectivity and fairness, we contacted the authors of all published articles that lacked the desired data (Supplementary Fig. 2). As the first probability model that used logistic regression, the Mayo model has become the most externally validated among all models (Table 2). It is built from a retrospective data set of 419 patients with more than 20 variables taken into consideration. Owing to the large number of variables collected, the Mayo model has remained a rather accurate model throughout the years. Among most often used models, the PKU model yields the best AUC. It is the first model built with the Chinese population and is the only eastern population model that has been validated with the western population. Compared with the Mayo model, all patients enrolled in the PKU model had a defined pathologic diagnosis and comprehensive radiographical characteristics.
Table 2

Summarization of Highlights of Different Models

First Established ModelGurney et al.1 (First Model Using Bayesian Analysis)Mayo (1997) (First Model Using Logistic Regression)
Model with the largest sample sizeBrock (7008 nodules)
Most verified modelMayo (compared in 28 articles)
Best performing modelBIMC (among all validated models)PKUPH (among all models validated by ≥5 cohorts)
Model with the most variables collectedMayo (23 variables)
Models with external validation when establishedBrock, TREAT
Models compared with physiciansGurney, Mayo, VA, Brock
Models with a nomogram or a web calculatorY She et al., Herder, BIMC, GLCI
Sample with highest and lowest cancer ratesHighest: TREATLowest: Brock
Models with highest and lowest cut-off values (mentioned in original article)Highest: CJFH (0.794)Lowest: W Yu et al. (0.3649)
Model that has been compared with AI modelsBrock (compared with AI based on CNN in David Baldwin et al., AI had better result in HP patients)

AI, artificial intelligence; BIMC, Bayesian inference malignancy calculator; CJFH, China-Japan Friendship Hospital; CNN, convolutional neural networks; GLCI, Guangdong Lung Cancer Institute; HP, high prevalence; PKUPH, Peking University People’s Hospital; TREAT, thoracic research evaluation and treatment; VA, Department of Veterans Affairs.

Summarization of Highlights of Different Models AI, artificial intelligence; BIMC, Bayesian inference malignancy calculator; CJFH, China-Japan Friendship Hospital; CNN, convolutional neural networks; GLCI, Guangdong Lung Cancer Institute; HP, high prevalence; PKUPH, Peking University People’s Hospital; TREAT, thoracic research evaluation and treatment; VA, Department of Veterans Affairs. Owing to the variation in research cohort, models proposed in past studies can be separated into two categories. The first category is models on the basis of the population who underwent lung cancer screening. The characteristic of this type of model is that benign nodules account for most of the PNs enrolled in model development. The other category is models on the basis of patients treated in clinic or surgery. The characteristic of this type of model is that eligible patients for model development have already undergone preliminary screening, during which only people with HP nodules are admitted for further treatment. Therefore, the malignant rate differs considerably in these two categories. As found in Supplementary Figure 1, the malignant rate in the first category is below 25%, whereas in the other category, this rate is usually above 40%. As a result, the effectiveness of the models established by different populations differs, and it is not fair to compare different models using the same population. The original cut-off value may no longer be suitable if the models are not used on the targeted populations (Supplementary Table 4). Nevertheless, previous studies often failed to compare these two types of models in different populations. For the first time, we distinguished between the HP group and the LP group on the basis of the probability of malignancy. Furthermore, subgroup analysis revealed that regardless of the HP model or LP model, sensitivity and specificity dropped as long as they were not used on the targeted populations. Thus, we recommend that the more suitable model should be used for the appropriate cohorts to achieve the best result. Nevertheless, there are few limitations in our analysis. First, although AUC is the most important indicator of model accuracy, AUC alone cannot comprehensively describe a model. For example, performance calibration is also important for clinical use of a model, but not enough data were provided for a subgroup analysis in our research. Moreover, most of the values of sensitivity and specificity are acquired by Youden index. Although the Youden index provides the highest overall accuracy, in some cases, one would prefer additional sensitivity at the loss of some specificity or vice versa, which makes the Youden index not suitable for some clinical scenario. Another limitation lies in the cohort included in the analysis, as the prevalence was much higher than some recent screening cohorts, which may lead to bias when used in these cohorts. Moreover, owing to the lack of data, some results bear a large 95% CI, which makes the conclusions not so determinate, and more cohorts are needed (especially outpatient cohorts) for further validation to achieve a more accurate result in the comparisons of models. It is noteworthy that PET-CT results are included in the final model as long as they are collected, suggesting that positive PET-CT results are a strong indicator for malignancy. Nevertheless, problems remain for PEC-CT, which are as follows: (1) although PET-CT improves the sensitivity, it may also have a false-positive result for inflammation, tuberculosis, and so on; (2) PET-CT is only recommended for solid nodules instead of ground-glass opacities, making the clinical application of the model restricted to solid nodules; and (3) because of the high cost of PET-CT in some countries, the result is not available in all situations, which is also a limitation for clinical application. These limitations are also the reason why PET-CT is viewed as a preclinical evaluation instead of as a standard procedure for lung cancer diagnosis by most researchers. A few studies evaluated both models and physician assessments. We analyzed these articles and found that the models have a better result than the clinicians, but there were no significant differences (Supplementary Fig. 5A). It is important to note that in these studies, the physicians were experienced and familiar with the models, which might lead to bias. The greatest strength of the models is that they are stable and easy to widely use. In fact, many doctors in small hospitals or rural hospitals do not have sufficient experience in differentiating benign and malignant PNs. We believe that the models are more accurate than these doctors and thus are of value in clinical application. Another advantage of prediction models would be the objectivity. Physician judgment of the same nodule may vary in different scenarios (different environment, emotional state, physical state, etc.), but the prediction result of a model stays the same, making the model’s assessment much more objective and repeatable. We believe that a better model could aid more in clinical work both for experienced clinicians and younger clinicians and could make a more objective conclusion for patients. Although there are guidelines recommending using models for PN evaluation, the clinical applications of these prediction models are still limited. An important reason for this is that the mathematical formula is not practical for clinical practice, and it is time consuming to calculate the cancer probability of each nodule encountered. In fact, models can be exported to an easy-to-read form, such as nomograms or web calculators. Especially for the latter, with only the clinical information of the patient as input, the malignant risk of a PN can be conveniently calculated in less than 10 seconds. Nevertheless, only a few models had a web version built at the time of publication. An example is given in Supplementary Figure 6, illustrating the typical calculation process with the mathematical formula, nomogram, and web calculator, revealing that the nomogram and web calculator are clearer and easier for clinical use. In addition, decision-making in clinical practice cannot exclusively depend on the risk probability of the model. On one hand, clinical treatment is also affected by many other factors, such as patient preference. On the other hand, the model is merely a generally comprehensive analysis. Owing to the heterogeneity of patients’ nodules, the model might not be accurate for individuals. The role of the model is to only provide a relatively reliable reference for clinical judgment, but it cannot completely replace the clinicians to make the final decision. In recent years, AI has started to play a role in the cancer risk prediction of PNs, and AI-based prediction models have been compared with traditional mathematical models. Therefore, we summarized all reported computer-aided diagnosis (CADx) systems with their AUC values (Supplementary Table 5). As we were preparing this meta-analysis, Baldwin et al. reported the result of a comparison of CADx system and the Brock model, revealing that the CADx achieved a better AUC. Nevertheless, the analysis was conducted on a HP cohort, whereas the accuracy of the Brock model might be underestimated as it was developed on the basis of LP cohorts. According to our result, it is more equitable in future studies to use identical background models for the comparison to evaluate whether such AI models have transcended the mathematical model. Despite the outstanding AUC values reported, problems remain for the CADx systems. First, because researchers seldom provide a model for external validation, there is a lack of prospective studies to validate its efficacy in different populations. In addition, because of the lack of clinical information in open data sets, all current CADx systems can only predict malignant risk with radiographical characteristics and cannot take clinical features into consideration as clinicians or models do. In some cases, such as the evaluation of PNs in patients with a history of cancer, the judgment of CADx may bear notable bias. Therefore, further exploration and improvement are still needed for the CADx systems. Mathematical models and machine learning models are both statistical models in some ways. Deep learning, the typical method used in nodule detection, uses simple functions such as the sigmoid function or the rectified linear unit function as the activation function inside individual neurons. The utilization of large amounts of neurons results in a multilayer network, whereas its purpose is still to separate nodules into benign and malignant nodules. Therefore, it is safe to say that this network places individual observations into a higher dimension and finds a function to fit the observations, which is basically the idea behind mathematical models. In fact, the discovery of new regression functions is how humans fit the observations, whereas the training of neural networks is how machines fit them. In some ways, AI is an extension of mathematical models. According to our result, it is possible that with the continuous training of AI, its diagnostic efficiency may be further improved, and eventually exceed the prediction accuracy of mathematical models in the future. Nevertheless, so far, there is no enough evidence to prove that the accuracy of AI can be improved in the future and more comparisons between AI-based models and mathematical models in the same population are still needed. Therefore, for now, the widely validated mathematical models are still the most convenient and relatively accurate way to assist PN management. In conclusion, we systematically reviewed and analyzed a variety of prediction models of PNs. The Mayo model is the most widely used and validated model, whereas the PKU model yields the best AUC among the most often used models. Because of the discrepant development cohorts among the models, it is vital that the most suitable model is used on the appropriate cohorts, and mixing models might lead to decreased accuracy. Nomograms or web calculators are intuitive and preferred by clinicians, but their clinical application needs to be further investigated.

CRediT Authorship Contribution Statement

Kai Zhang: Methodology, Formal analysis, Data curation, Writing - original draft, Writing - review & editing. Zihan Wei: Methodology, Formal analysis, Data curation, Writing - original draft. Yuntao Nie: Formal analysis, Data curation, Validation. Haifeng Shen: Formal analysis, Data curation. Xin Wang: Data curation. Jun Wang: Conceptualization, Supervision. Fan Yang: Methodology, Supervision. Kezhong Chen: Conceptualization, Methodology, Supervision, Writing - review & editing.
  55 in total

Review 1.  Clinical practice. The solitary pulmonary nodule.

Authors:  David Ost; Alan M Fein; Steven H Feinsilver
Journal:  N Engl J Med       Date:  2003-06-19       Impact factor: 91.245

Review 2.  Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews.

Authors:  Johannes B Reitsma; Afina S Glas; Anne W S Rutjes; Rob J P M Scholten; Patrick M Bossuyt; Aeilko H Zwinderman
Journal:  J Clin Epidemiol       Date:  2005-10       Impact factor: 6.437

3.  ANOVA model for network meta-analysis of diagnostic test accuracy data.

Authors:  Victoria N Nyaga; Marc Aerts; Marc Arbyn
Journal:  Stat Methods Med Res       Date:  2016-09-20       Impact factor: 3.021

4.  Accuracy of Models to Identify Lung Nodule Cancer Risk in the National Lung Screening Trial.

Authors:  Viswam S Nair; Vandana Sundaram; Manisha Desai; Michael K Gould
Journal:  Am J Respir Crit Care Med       Date:  2018-05-01       Impact factor: 21.405

5.  Reduced lung-cancer mortality with low-dose computed tomographic screening.

Authors:  Denise R Aberle; Amanda M Adams; Christine D Berg; William C Black; Jonathan D Clapp; Richard M Fagerstrom; Ilana F Gareen; Constantine Gatsonis; Pamela M Marcus; JoRean D Sicks
Journal:  N Engl J Med       Date:  2011-06-29       Impact factor: 91.245

6.  Evaluation of Prediction Models for Identifying Malignancy in Pulmonary Nodules Detected via Low-Dose Computed Tomography.

Authors:  Sandra González Maldonado; Stefan Delorme; Anika Hüsing; Erna Motsch; Hans-Ulrich Kauczor; Claus-Peter Heussel; Rudolf Kaaks
Journal:  JAMA Netw Open       Date:  2020-02-05

7.  Predicting lung cancer prior to surgical resection in patients with lung nodules.

Authors:  Stephen A Deppen; Jeffrey D Blume; Melinda C Aldrich; Sarah A Fletcher; Pierre P Massion; Ronald C Walker; Heidi C Chen; Theodore Speroff; Catherine A Degesys; Rhonda Pinkerman; Eric S Lambright; Jonathan C Nesbitt; Joe B Putnam; Eric L Grogan
Journal:  J Thorac Oncol       Date:  2014-10       Impact factor: 15.609

8.  Novel and convenient method to evaluate the character of solitary pulmonary nodule-comparison of three mathematical prediction models and further stratification of risk factors.

Authors:  Fei Xiao; Deruo Liu; Yongqing Guo; Bin Shi; Zhiyi Song; Yanchu Tian; Chaoyang Liang
Journal:  PLoS One       Date:  2013-10-29       Impact factor: 3.240

9.  Assessment of the cancer risk factors of solitary pulmonary nodules.

Authors:  Li Yang; Qiao Zhang; Li Bai; Ting-Yuan Li; Chuang He; Qian-Li Ma; Liang-Shan Li; Xue-Quan Huang; Gui-Sheng Qian
Journal:  Oncotarget       Date:  2017-04-25

10.  External validation of a convolutional neural network artificial intelligence tool to predict malignancy in pulmonary nodules.

Authors:  David R Baldwin; Jennifer Gustafson; Lyndsey Pickup; Carlos Arteta; Petr Novotny; Jerome Declerck; Timor Kadir; Catarina Figueiras; Albert Sterba; Alan Exell; Vaclav Potesil; Paul Holland; Hazel Spence; Alison Clubley; Emma O'Dowd; Matthew Clark; Victoria Ashford-Turner; Matthew Ej Callister; Fergus V Gleeson
Journal:  Thorax       Date:  2020-03-05       Impact factor: 9.139

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.