Literature DB >> 35137543

Lung cancer risk prediction models based on pulmonary nodules: A systematic review.

Zheng Wu1, Fei Wang1, Wei Cao1, Chao Qin1, Xuesi Dong1, Zhuoyu Yang1, Yadi Zheng1, Zilin Luo1, Liang Zhao1, Yiwen Yu1, Yongjie Xu1, Jiang Li1,2, Wei Tang3, Sipeng Shen4,5, Ning Wu3,6, Fengwei Tan7, Ni Li1,2, Jie He1,7.   

Abstract

BACKGROUND: Screening with low-dose computed tomography (LDCT) is an efficient way to detect lung cancer at an earlier stage, but has a high false-positive rate. Several pulmonary nodules risk prediction models were developed to solve the problem. This systematic review aimed to compare the quality and accuracy of these models.
METHODS: The keywords "lung cancer," "lung neoplasms," "lung tumor," "risk," "lung carcinoma" "risk," "predict," "assessment," and "nodule" were used to identify relevant articles published before February 2021. All studies with multivariate risk models developed and validated on human LDCT data were included. Informal publications or studies with incomplete procedures were excluded. Information was extracted from each publication and assessed.
RESULTS: A total of 41 articles and 43 models were included. External validation was performed for 23.2% (10/43) models. Deep learning algorithms were applied in 62.8% (27/43) models; 60.0% (15/25) deep learning based researches compared their algorithms with traditional methods, and received better discrimination. Models based on Asian and Chinese populations were usually built on single-center or small sample retrospective studies, and the majority of the Asian models (12/15, 80.0%) were not validated using external datasets.
CONCLUSION: The existing models showed good discrimination for identifying high-risk pulmonary nodules, but lacked external validation. Deep learning algorithms are increasingly being used with good performance. More researches are required to improve the quality of deep learning models, particularly for the Asian population.
© 2022 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.

Entities:  

Keywords:  early detection and early diagnosis; lung cancer; prediction; pulmonary nodule; screening

Mesh:

Year:  2022        PMID: 35137543      PMCID: PMC8888150          DOI: 10.1111/1759-7714.14333

Source DB:  PubMed          Journal:  Thorac Cancer        ISSN: 1759-7706            Impact factor:   3.500


INTRODUCTION

Lung cancer causes a significant burden on health care systems. In 2020, lung cancer resulted in the death of 1.8 million people worldwide. In China, lung cancer remains the most commonly diagnosed cancer and the leading cause of cancer death. The overall 5‐year survival rate of lung cancer ranges from 10% to 20% in most countries. However, the prognosis of lung cancer largely depends on the stage of the disease at diagnosis. Although the 5‐year survival rate of lung cancer at stage I is above 80%, it is close to 0% for stage IV disease. Therefore, early diagnosis and treatment are important to reduce mortality from lung cancer, improve the quality of life and reduce the economic burden from this disease. Screening with low‐dose computed tomography (LDCT) has been shown to be an efficient way to detect lung cancer at an earlier stage and reduce lung cancer mortality. Several lung cancer screening trials have been conducted worldwide. , , , , , The national lung cancer screening trial (NLST) of the United States has shown that early LDCT screening can detect potentially cancerous lung nodules at an early stage leading to a reduction in lung cancer mortality by 20%. Nevertheless, the false‐positive nodule detection rate by LDCT was extremely high at 96.4%, eventually leading to unnecessary radiation exposure from further follow‐up imaging tests, invasive biopsies, medical expenses, and anxiety among patients. Therefore, it is of paramount importance to identify the individuals at higher risk of developing lung cancer based on the pulmonary nodules identified on LDCT scans to recommend appropriate examination and management. Further examinations in current lung cancer screening programs are recommended solely based on the nodule sizes on the LDCT scans. However, although this method of categorizing pulmonary nodules is easy to implement clinically, it may lead to a high rate of false‐positive results. On the contrary, risk prediction models based on pulmonary nodule size, calcification, density, and other relevant imaging information may facilitate the identification of high‐risk groups, significantly reduce the false positive rate, and improve the screening program's efficiency. Therefore, this method is now recommended by several clinical guidelines to reduce the high false‐positive rate of LDCT screening. , As a result, several statistical models have been developed in recent years to predict the risk of developing lung cancer based on the identification of pulmonary nodules on LDCT. However, without a systematic evaluation of the relevant models, it remains unclear which, if any of these models should be used clinically. Therefore, in this study, we reviewed the contemporary published literature to identify current multivariable statistical models used to predict the risk of developing lung cancer from the pulmonary nodules identified on LDCT. In addition, the effectiveness, reliability, bias, and extrapolation of the different models used in these studies were also compared.

METHODS

Search strategy

A literature search was conducted using the PubMed, Cochrane, Embase, and Web of Science electronic databases. The keywords “lung cancer” or “lung neoplasms” or “lung tumor” or “lung carcinoma” and “predict” or “assessment” or “risk” and “nodule” were used to identify all relevant articles published in English from January 1960 to February 2021. We also hand‐searched the reference lists of eligible studies to identify additional relevant publications. Further detail about the search strategy used in this study is available in Table S1.

Review methods and selection criteria

Two reviewers independently screened all titles and abstracts and made decisions regarding the potential eligibility of the research articles for full text review. Discrepancies in judgment were resolved by a third reviewer. Studies were eligible if they reported on the development of multivariable risk prediction models for the development of lung cancer based on the pulmonary nodules identified on LDCT and included a detailed description of the procedures used to evaluate and validate the model. Studies with an incomplete description of the procedures used to develop, validate, and evaluate the model were excluded. Informal publications such as conference abstracts were also excluded.

Data extraction

The models used in the studies were divided into two categories; traditional and deep learning models. In the traditional models, raw data (i.e., original image features) were translated into a finite number of feature descriptors (i.e., size, type, or density of nodules) that could be used as predictors for lung cancer. The association between lung cancer risk and each descriptor was tested, quantified, and subsequently developed into an appropriate statistical risk model. In the deep learning algorithm‐based models, the use of raw data was allowed and representations needed for detection or classification were automatically discovered, and the association between lung cancer risk and descriptors is partly unexplainable. , For each of the included studies, basic information about the research methodology, variables used to develop the models, and the methods used to evaluate the models were extracted. The basic information included the first author, publication year, study design, study method, target population, inclusion criteria of participants and nodules, and the number of normal and lung cancer cases used for modeling. The model variables extracted from the studies included: basic information about the clinical and epidemiological characteristics, such as age, sex, smoking, family history, occupational exposure, or history of chronic respiratory diseases; and imaging nodule characteristics, like size, density or shape; other tumor biomarkers like neuron‐specific enolase (NSE), or carcinoembryonic antigen (CEA). For the studies based on the deep learning algorithm, it was not possible to extract these variables because of the method used to develop the risk model. The model evaluation criteria included the type of validation (external or internal), the sample size used for verification, the area under the curve (AUC), model calibration slope results, sensitivity, specificity, and the risk threshold. The findings of either the Hosmer‐Lemeshow test or the expected to observe ratio (excellent, poor, or uncalibrated) were also recorded. Furthermore, we used the same dataset to compare the performance (AUC, sensitivity, or specificity) of all deep learning models with existing prediction methods or clinically based guidelines published by professional bodies such as the American College of Radiology Lung Imaging Reporting and Data System (ACR Lung‐RADS) based on the conclusion in the original text.

Quality assessment

The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) method was used to evaluate the quality of evidence in traditional models. This method assesses the quality of the publication based on the risk of bias, consistency, accuracy, directness, and publication bias.

Data synthesis

The sample size used in each study was recorded when available and estimated for evaluation purposes when not available. If several models were used to train the algorithm on the same data set, the model with the highest AUC was selected. Limited statistical power may lead to insufficient power to detect a significant association, resulting in unstable models. To overcome this problem, we calculated the events per variable (EPV) for traditional models. EPV was defined as the number of events divided by the number of predictor variables included in the multivariable model. An EPV value <10 suggests limited statistical power. Because it was not possible to record and name the variables used in the deep learning models, the EPV could not be calculated.

RESULTS

Study characteristics and quality assessment

The literature search revealed a total of 3230 publications, of which 630 were found to be duplicated and were, therefore, removed from the evaluation. A total of 2293 articles that did not meet our criteria were excluded from the screening. After evaluating the full texts of the remaining 307 articles, 41 articles met the eligibility criteria and were included for further analysis (Figure 1).
FIGURE 1

Flow chart of literature search

Flow chart of literature search After evaluating the articles, 43 models were identified. Overall the models were based on more than 20 000 Asian, North American, and European participants (Figure 2(a)). After 2018, the number of relevant studies grew rapidly. As a result, over half (67.4%, 29/43) of all models were released after 2018 (Figure 3).
FIGURE 2

Characters of existing models; (a) size and distribution of training sets used for modeling; (b) number and distribution of existing models; (c) number and distribution of models seeking validation in different ways; (d) number and distribution of models from different regions and data sources; and (e) frequency of risk factors used in traditional final models

FIGURE 3

AUCs and confidence intervals of existing models by regions and time periods

Characters of existing models; (a) size and distribution of training sets used for modeling; (b) number and distribution of existing models; (c) number and distribution of models seeking validation in different ways; (d) number and distribution of models from different regions and data sources; and (e) frequency of risk factors used in traditional final models AUCs and confidence intervals of existing models by regions and time periods Most models (58.1%, 25/43) were developed based on deep learning algorithms, and the remaining (41.9%, 18/43) were developed using traditional models (Figure 2(b)) such as logistic regression. However, in recent years, the use of deep learning algorithms increased significantly (Table 2).
TABLE 2

Basic information and development of models based on the deep learning algorithm

First authorYearStudy designTargeted populationInclusion criteria of participantsInclusion criteria of nodulesSample sizeCases of lung cancerData source
Yoganand Balagurunathan 14 2019Screening trialAmerican55–74 years old and smoker≥4 mm24478Multicenter
Gerard A. Silvestri 15 2018Cohort studyAmerican and Canadian>40 years old8–30 mm17829Multicenter
Chao Zhang 16 2019Cohort studyAmerican and ChineseUnspecifiedUnspecifiedMulticenter
Johanna Uthoff 17 2019Cohort studyAmerican36374Multicenter
Ilaria Bonavita 18 2020Cohort studyAmericanUnspecifiedUnspecifiedMulticenter
Parnian Afshar 19 2020Cohort studyAmerican1010UnspecifiedMulticenter
Huafeng Wang 20 2018Cohort studyAmerican1018UnspecifiedMulticenter
Jason L. Causey 21 2018Cohort studyAmerican1018UnspecifiedMulticenter
Samuel Hawkins 1 22 2016Screening trialAmerican55–74 years old and smoker≥4 mm600200Multicenter
Samuel Hawkins 2 22 2016Screening trialAmerican55–74 years old and smoker≥4 mm600200Multicenter
Andrew V. Kossenkov 23 2019Cohort studyAmericansmoker6–20 mm583293Multicenter
G. A. Soardi 24 2015Cohort studyAmerican≤30 mm311199Single‐center
Zuohong Wu 25 2021Cohort studyChinese≤30 mm995772Single‐center
Stéphane Chauvie 26 2020Screening trialChinese45–75 years old and smoker23432Multicenter
Shulong Li 27 2019Cohort studyAmerican1010UnspecifiedMulticenter
Rekka Mastouri 28 2021Cohort studyAmericanUnspecifiedUnspecifiedMulticenter
Yin‐Chen Hsu 29 2020Cohort studyChinese83627Single‐center
Jiabao Liu 30 2020Cohort studyChinese6–30 mm879601Multicenter
Rahul Paul 31 2020Cohort studyAmerican55–74 years old and smoker≥4 mm26185Multicenter
Muahammad Bilal Zia 32 2020Cohort studyAmerican1010UnspecifiedMulticenter
Yi‐Ming Xu 33 2020Cohort studyAmerican55–74 years old and smoker≥4 mm1109926Multicenter
Subba R. Digumarthy 34 2019Cohort studyAmerican36UnspecifiedSingle‐center
Yangwei Xiang 35 2019Cohort studyChinese588462Single‐center
Liting Mao 36 2019Cohort studyChinese29461Single‐center
Shaun Daly 37 2013Cohort studyAmerican13669Single‐center
Only 23% (10/43) of the models were externally validated (Figure 2(c)). Data from multiple sources were used to develop the models in half of the studies (Figure 2(d)). Thirty‐three studies used data from cohort studies to develop the models, whereas in eight studies, the models were constructed using the data from screening trials (Tables 3 and 4). Almost all studies (97.6%, 40/41) had medium to very low credibility, largely because of publication bias, indirectly, and imprecision (Table S2).
TABLE 3

Validation of traditional models

First authorYearType of validationCalibrationSample sizeAUC a ThresholdsSensitivitySpecificity
Annette McWilliams 38 2013ExternalExcellent10900.9700.050.710.96
Barbara Nemesure 39 2019InternalNot calibrated14550.8600.730.81
Michael W. Marcus 40 2019InternalExcellent10130.882
Martin T. ammemagi 41 2018ExternalExcellent36800.947
Vineet K. Raghu 42 2019ExternalNot calibrated1260.8820.610.281.00
Joan E Walter 43 2018InternalExcellent8090.850
Xianfeng Li 44 2017InternalNot calibrated390.921
Michal Reid 45 2019ExternalExcellent450.810
Michael K. Gould 46 2007InternalExcellent3750.790
Sungmin Zo 47 2020InternalExcellent1570.952
Xiao‐Bo Chen 48 2019ExternalExcellent2160.848
Stephen J. Swensen 49 1997InternalExcellent2100.8330.100.930.47
0.400.510.90
Man Zhang 50 2015InternalNot calibrated1200.9100.550.870.85
Bin Zheng 1 51 2015InternalNot calibrated1980.808
Bin Zheng 2 51 2015InternalNot calibrated840.845
Jingsi Dong 52 2014InternalNot calibrated16790.935
Yun Li 53 2012ExternalNot calibrated1450.8740.460.950.70
Li Yang 54 2017InternalNot calibrated3440.7840.700.79

AUC, area under curve.

TABLE 4

Validation of models based on the deep learning algorithm

First authorYearSample sizeType of validationAUC a ThresholdSensitivitySpecificity
Yogan and Balagurunathan 14 2019235Internal0.8500.540.91
Gerard A. Silvestri 15 2018178Internal0.7600.050.970.44
Chao Zhang 16 2019UnspecifiedExternal0.8550.840.83
Johanna Uthoff 17 2019100External0.9650.381.000.96
Ilaria Bonavita 18 2020UnspecifiedInternalUnspecified
Parnian Afshar 19 20201010Internal0.9640.950.90
Huafeng Wang 20 20181018Internal0.970
Jason L. Causey 21 20181018Internal0.993
Samuel Hawkins 1 39 2016600Internal0.83
Samuel Hawkins 2 39 2016600Internal0.79
Andrew V. Kossenkov 23 2019158External0.8250.690.84
G. A. Soardi 24 2015311Internal0.893
Zuohong Wu 25 2021995Internal0.8510.880.64
Stéphane Chauvie 26 2020234InternalUnspecified0.901.00
Shulong Li 27 20191010Internal0.9310.830.92
Rekka Mastouri 28 2021UnspecifiedInternal0.920.920.92
Yin‐Chen Hsu 29 2020836Internal0.8730.750.85
Jiabao Liu 30 2020879Internal0.9380.580.840.91
Rahul Paul 31 2020261Internal0.960
Muahammad Bilal Zia 32 20201010InternalUnspecified0.910.91
Yi‐Ming Xu 33 20201109InternalUnspecified0.930.89
Subba R. Digumarthy 34 201936Internal0.708
Yangwei Xiang 35 2019588Internal0.8900.900.80
Liting Mao 36 2019294Internal0.9700.810.92
Shaun Daly 37 201381External0.6760.950.25

AUC, area under curve.

Development and performance of traditional models

The model from the Mayo clinic in the United States published in 1997 was the first model used to predict the risk of developing cancer from pulmonary nodules. Since then, 18 traditional models have been developed to predict the pathological characteristics of pulmonary nodules. Seven of these models were based on the North American population; two models were based on the European population, and nine models were based on the Asian population. Of the nine Asian models evaluated in this review, eight models were based on the Chinese population (Table 1).
TABLE 1

Basic information and development of traditional models

First authorYearStudy designStudy methodTarget populationInclusion criteria of participantsInclusion criteria of nodulesSample sizeCases of lung cancerEPVb Data source
Annette McWilliams 38 2013Screen trialLogistic regressionCanadian50–74 years old≥1 mm187110211.33Multicenter
Barbara Nemesure 39 2019Cohort studyCox regressionAmerican146985 a 6.54Single‐center
Michael W. Marcus 40 2019Screen trialLogistic regressionEnglish50–75 years old≥3 mm1013522.60Multicenter
Martin Tammemagi 41 2018Screen trialLogistic regressionCanadian50–74 years old≥1 mm187111110.10Multicenter
Vineet K. Raghu 42 2019Cohort studyLogistic regressionAmericanSmoker925010.00Multicenter
Joan E. Walter 43 2018Screen trialLogistic regressionDutch/Belgian50–75 years old and smoker80950 a 7.14Multicenter
Xianfeng Li 44 2017Cohort studyFisher discriminant analysisChinese20–80 years old5–30 mm39201.00Single‐center
Michal Reid 45 2019Cohort studyLogistic regressionAmerican≥18 years old≤30 mm30120010.00Single‐center
Michael K. Gould 46 2007Cohort studyLogistic regressionAmerican7–30 mm37520413.60Multicenter
Sungmin Zo 47 2020Cohort studyLogistic regressionKorean157905.29Single‐center
Xiao‐Bo Chen 48 2019Cohort studyLogistic regressionChinese8–20 mm49321411.26Single‐center
Stephen J. Swensen 49 1997Cohort studyLogistic regressionAmerican4‐30 mm419145 a 8.06Single‐center
Man Zhang 50 2015Cohort studyLogistic regressionChinese≤30 mm31424814.59Multicenter
Bin Zheng 1 51 2015Cohort studyLogistic regressionChinese≤30  mm and GCO b <50%40536711.84Single‐center
Bin Zheng 2 51 2015Cohort studyLogistic regressionChinese≤30 mm and GCO ≥50%1591665.35Single‐center
Jingsi Dong 52 2014Cohort studyLogistic regressionChinese1679129658.91Single‐center
Yun Li 53 2012Cohort studyLogistic regressionChinese37122915.27Unspecified
Li Yang 54 2017Cohort studyLogistic regressionChinese107872165.55Single‐center

Approximate number.

EPV, events per variable; GCO, ground glass opacity.

Basic information and development of traditional models Approximate number. EPV, events per variable; GCO, ground glass opacity. Traditional models included numerous imaging features such as nodule size, type, location, shape, and margin to determine the pathological characteristics of the pulmonary nodules. In addition, basic information such as age, gender, family history of cancer, and smoking status was also commonly used. However, biomarkers were used in only seven models (Figure 2(e)). Logistic regression analysis was used to develop most (16/18) traditional models. The models in the other two studies were developed using either Cox regression analysis or Fisher linear discriminant analysis. Most models (14/18) were cohort studies, and the remaining four were constructed using screening test results (Table 1). Based on the regression analysis, the size, margin of the nodules, smoking status, and age of patients were statistically significant in more than half of all models. The addition of biomarkers to tumor markers improved the AUC and statistical significance in three of the seven evaluated models, as shown in Table 5. These findings suggest that although biomarkers were not widely used to develop traditional models, they may have an important role in improving the accuracy of these models.
TABLE 5

Variables of traditional models

Variables a First authors of models
Annette McWilliams 38 Barbara Nemesure 39 Michael W. Marcus 40 Martin Tammemagi 41 Vineet K. Raghu 42 Joan E. Walter 43 Xianfeng Li 44 Michal Reid 45 Michael K. Gould 46 Sungmin Zo 47 Xiao‐Bo Chen 48 Stephen J. Swensen 49 Man Zhang 50 Bin Zheng 1 51 Bin Zheng 2 51 Jingsi Dong 52 Yun Li 53 Li Yang 54
Basic characterAge01101111011100111
Sex101100000011001
Personal history of other cancer1100100001
Family history of lung cancer0011000000110
Family history of other cancer00010000110
BMI b 0000
Exposure of asbestos010
FVC b 1
History of respiratory diseases1100000
Smoke1100100001110101
Clinical symptoms000
Time since previous lung cancer was diagnosed0
FEV1 b 011
BiomarkersSquamous cell carcinoma antigen0
NSE b 00
CEA b 1000001
CYFRA21‐1 b 1011
MiRNA‐21‐5p b 10
MiR‐574‐5p10
Laboratory indicators00
Ferritin0
Imaging informationSize110111001111111
Volume111
Density111010000
Location0011101010100100
Count00010
Margin (spiculate)110101111110011
Satellite lesions11001
Calcification0011011
Cavitation000
Shape00000000101
Enhancement100
Pleural indentation1000
Bronchus sign10
Vascular signs00
Enphysema001100
Vessels sign0
Vessel number1
Tracheal signs
Previous CT scan0
Previous X‐ray0
Vacuole signs
Associated pleural effusion00
Enlarged hilar or mediastinal lymph nodes00
Visibility in retrospect0
Carbohydrate antigen0
Neuron‐specific enolase0

0 depicts the inclusion of a variable into the model as a candidate variable; 1 depicts retention in the final model.

bBMI, body mass index; FVC, forced vital capacity; FEV1, forced expiratory volume in one second; NSE, neuron‐specific enolase; CEA, carcinoembryonic antigen; CEFRA21‐1, cytokeratin fragment antigen 21‐1; MiR(NA), MicroRNA.

The AUCs of the models ranged from 0.676 to 0.970. Most models (77.8%, 14/18) performed well on discrimination, with an AUC higher or equal to 0.8. Calibration was assessed in nine models, and the results indicated a good fit. Most studies (61.1%, 11/18) had an EPV higher than 10, suggesting sufficient statistical power. Only six of the 18 models were validated using external datasets. However, five of these models were validated using external data from a similar population from the same countries, and only one model was verified using data of participants from different origins. The latter model achieved good discrimination with an AUC of 0.970 (Tables 1 and 3). Compared with the European and American models, the Chinese models lack external validation. Most of the data used to develop the Chinese models were obtained from a single‐center or small sample retrospective cohort studies and only two of these studies were validated using an external dataset. However, the discrimination ability of the Chinese models was good, with seven of eight models achieving an AUC higher than 0.8, whereas two models reported excellent calibration. In addition, all Chinese models had an EPV higher than 10. More details can be found in Tables 1, 3, and Figures 2 and 3.

Development and performance of the deep learning algorithms

The first study reporting on the development and performance of a deep learning algorithm for the discrimination of pulmonary nodules was published in 2013. Only biomarkers were included in the development of this model, and the prediction ability was limited, with an AUC of 0.676. The majority of the deep learning models (84%, 21/25) were developed after 2018 and were based on the imaging features of the nodules. This improved the models' prediction ability, especially when the model was supplemented by epidemiological parameters and biomarkers (Figure 3). The AUC of the deep learning models was reported in 21 of 25. However, only half of these models (12 of 21) reported the confidence intervals (Table 4). The reported AUCs ranged from 0.676 to 0.970. Most of the deep learning models (68.0%, 17/25) had a good discrimination ability with an AUC higher than 0.8, whereas the other four models (16.0%) had an AUC below 0.8. The majority of the models (84.0%, 21/25 were not validated externally [Table 2]). Basic information and development of models based on the deep learning algorithm Only seven of 18 deep learning models were developed in Asia. Furthermore, all Asian models achieved high discrimination with an AUC above 0.8. However, the sample size of the Asian models was generally small, and only one of these models was validated using an external dataset (Tables 2 and 4).

Comparison of deep learning models with traditional models

The discrimination ability of 60.0% (15/25) of the deep learning models was compared with traditional methods. All deep learning models achieved higher or similar discrimination abilities when compared with traditional methods (Table 6).
TABLE 6

Comparison between existing methods and models based on the deep learning algorithm

First authorObjects for comparisonIndicators for comparisonSuperior methods
Yogan and Balagurunathan 14 None
Gerard A. Silvestri 15 Traditional modelsAUCa Deep learning
Gerard A. Silvestri 15 ClinicianAUCDeep learning
Chao Zhang 16 ClinicianAccuracy, sensitivity, and specificityDeep learning
Johanna Uthoff 17 None
Ilaria Bonavita 18 ClinicianF1 scoreDeep learning
Parnian Afshar 19 None
Huafeng Wang 20 None
Jason L. Causey 21 ClinicianAUCSimilar
Samuel Hawkins 1,2 39 Lung‐RADSAUCDeep learning
Samuel Hawkins 1,2 39 Traditional modelsAUCSimilar
Andrew V. Kossenkov 23 Traditional modelsAUCDeep learning
G. A. Soardi 24 None
Zuohong Wu 25 Traditional modelsAUCDeep learning
Stéphane Chauvie 26 Lung‐RADSPPVa, sensitivity, and specificityDeep learning
Stéphane Chauvie 26 Traditional modelsPPV, sensitivity, and specificityDeep learning
Shulong Li 27 None
Rekka Mastouri 28 None
Yin‐Chen Hsu 29 Lung‐RADSAUCDeep learning
Jiabao Liu 30 ClinicianAUCDeep learning
Rahul Paul 31 None
Muahammad Bilal Zia 32 None
Yi‐Ming Xu 33 ClinicianSensitivityDeep learning
Subba R. Digumarthy 34 None
Yangwei Xiang 35 Traditional modelsAUCDeep learning
Liting Mao 36 ACR‐lung RADSa Accuracy, sensitivity, and specificityDeep learning
Shaun Daly 37 Traditional modelsAUCDeep learning

AUC, area under curve; ACR‐Lung‐RADS, American College of Radiology Lung Imaging Reporting and Data System; PPV, positive predictive value.

DISCUSSION

LDCT can be used to diagnose lung cancer at an early stage via the identification and classification of pulmonary nodules into different risk categories. However, current pulmonary nodules classification guidelines are based solely on nodule size and density. Other important biomarkers and patient characteristics are mostly ignored, resulting in a very high false‐positive rate, over diagnosis, and unnecessary treatment. , , Various traditional and deep learning models based on clinical, biological, and epidemiological factors have been developed to overcome this problem. To our knowledge, in this manuscript, we present the first systemic review comparing the development, validation, and performance of these models in the characterization of pulmonary nodules identified on LDCT. In this systemic review, we evaluated the performance of 43 models derived from 41 research articles based on over 20 000 subjects. Our findings indicate that the majority of the traditional and deep learning models achieved an AUC higher than 0.8, suggesting that these models can be used to identify the high‐risk population effectively and hence, reduce the false‐positive rate and the harms of over diagnosis and treatment. Since 1997, the development of pulmonary nodule risk prediction models has increased rapidly. Most early models were developed using statistical methods such as regression analysis. Although imaging features such as nodule size, type, location, shape, and margin provide valuable information on the pathological characteristics of the nodules, our findings indicate that the incorporation of clinical characteristics such as age and smoking status can significantly improve the performance of these models. The first study confirming this finding was performed at the Mayo Clinic. Since then, various traditional statistic‐based models incorporating both imaging and patient characteristics have been developed. Subsequent models also incorporated clinical indicators such as forced vital capacity (FVC) and forced expiratory volume (FEV)1, and serum biomarkers such as CEA and NSE, to further improve the prediction efficacy on the models. , , , , Variables including age, size of the nodules, and margin of the nodules should be considered as a priory in machine‐learning analyses, as they were consistently considered as predictors of lung cancer in traditional studies. A limited number of studies incorporated other risk factors such as exposure of asbestos, satellite lesions, bronchus sign, and volume of nodules (Table 5). However, the main limitation of these risk factors is the limited sample size that limits the generalizability of the model. A large number of models were based on single‐center and retrospective studies with small sample sizes or data obtained from old studies. Biomarkers were not commonly used in the development of the predictive risk factor model (Table 5, Figure 2(e)). Nodule volume might have been an effective predictor, , but was generally not taken into consideration by current models. Because most studies were retrospective, it was not possible to incorporate time‐dependent variables such as variations in biomarkers and nodule size over time into the model. Therefore, time‐dependent factors, such as the nodule volume growth rate, were also ignored by most studies. Deep learning models can learn from various heterogeneous variables to generate homogeneous groups with similar features. These features can be mapped with similar survival models to obtain accurate predictions. Various studies , , , also suggest that compared with the traditional pulmonary nodule prediction models or expert judgment by clinicians, the use of deep learning algorithms has obvious advantages on discrimination (Table 6). However, although pulmonary nodule risk models based on deep learning algorithms have been used as early as 1993, they have not been widely used to predict pulmonary nodules until recent years as they still have several limitations. One of the main limitations of deep learning algorithms is that they require large amounts of data, advanced imaging equipment, top‐ranked statisticians, and research funds to develop. Despite the high discrimination ability of the deep learning algorithm models evaluated in our systemic review, the GRADE scores of these models were generally low because of their limited sample size, high level of bias, inaccuracy, and indirectness (Table S2). Furthermore, it is difficult to identify the specific variables used to develop the deep learning prediction model, potentially limiting the quality and authenticity of these models. Few studies were based on the Asian population. The majority of the Asian studies were based on a single center, had a limited sample size, and lacked external validation, which limited the quality of evidence (Tables 3 and 4, Figure 2). It is important to note that the accepted European and United States models may not be suitable for the Asian and Chinese populations because of large population differences, as suggested by Uthoff et al. and Nair et al. Validation of traditional models AUC, area under curve. Validation of models based on the deep learning algorithm AUC, area under curve. Variables of traditional models 0 depicts the inclusion of a variable into the model as a candidate variable; 1 depicts retention in the final model. bBMI, body mass index; FVC, forced vital capacity; FEV1, forced expiratory volume in one second; NSE, neuron‐specific enolase; CEA, carcinoembryonic antigen; CEFRA21‐1, cytokeratin fragment antigen 21‐1; MiR(NA), MicroRNA. Comparison between existing methods and models based on the deep learning algorithm AUC, area under curve; ACR‐Lung‐RADS, American College of Radiology Lung Imaging Reporting and Data System; PPV, positive predictive value. Our systemic review has several limitations that have to be acknowledged. First of all, variations between studies, including sample size, research design, data source, and imaging acquisition criteria, made it difficult to quantify, integrate, and extrapolate the results of the different studies. Some of the studies included in our analysis had high publication bias, particularly those that lacked external validity. Additionally, cultural and social risk factors were ignored by most models. Studies evaluating a single risk factor were also excluded from this analysis although these variables were highly predictive of lung cancer and represent the latest trend in the field. Furthermore, most of the existing models were based on the entire population. Therefore, subgroup analysis based on important risk factors such as smoking status and tumor histology is recommended to improve the prediction performance of current models and adapt these tools according to the specific characteristics of the population being studied. However, this type of research requires large datasets, highlighting the need for further large‐scale multicenter prospective studies. Future studies should also focus on developing deep learning based models based on decentralized and deparametric data. These methods process the raw data directly and therefore, reduce the heterogeneity while improving the models' performance compared with traditional models.

CONCLUSION

The incidence of lung cancer is increasing, particularly in developing countries. The models evaluated in our study were all developed in Europe, Asia, and the United States. These models showed good discrimination for identifying high‐risk pulmonary nodules, particularly when these models combined imaging features with clinical, behavioral characteristics, and other biomarkers. This highlights the need to develop models based on the unique characteristics of different populations, particularly those in developing countries, to reduce the global lung cancer burden. The use of deep learning algorithms increased significantly during the last few years and generally performed better than traditional models. However, more research is required to improve the quality of the deep learning models, particularly for the Asian population, because these models were often based on single‐center studies and lacked external validation. Further research should also focus on improving the quality of current screening guidelines by incorporating clinical and epidemiological factors into the evaluation of pulmonary nodules.

CONFLICT OF INTEREST

The author declares that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported. Appendix S1 Supporting Information Click here for additional data file.
  61 in total

Review 1.  Deep learning.

Authors:  Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

2.  Use of GRADE for assessment of evidence about prognosis: rating confidence in estimates of event rates in broad categories of patients.

Authors:  Alfonso Iorio; Frederick A Spencer; Maicon Falavigna; Carolina Alba; Eddie Lang; Bernard Burnand; Tom McGinn; Jill Hayden; Katrina Williams; Beverly Shea; Robert Wolff; Ton Kujpers; Pablo Perel; Per Olav Vandvik; Paul Glasziou; Holger Schunemann; Gordon Guyatt
Journal:  BMJ       Date:  2015-03-16

3.  Development of a Risk Prediction Model to Estimate the Probability of Malignancy in Pulmonary Nodules Being Considered for Biopsy.

Authors:  Michal Reid; Humberto K Choi; Xiaozhen Han; Xiaofeng Wang; Sanjay Mukhopadhyay; Lei Kou; Usman Ahmad; Xiaoqiong Wang; Peter J Mazzone
Journal:  Chest       Date:  2019-03-30       Impact factor: 9.410

4.  Determining the likelihood of malignancy in solitary pulmonary nodules with Bayesian analysis. Part I. Theory.

Authors:  J W Gurney
Journal:  Radiology       Date:  1993-02       Impact factor: 11.105

5.  Results of initial low-dose computed tomographic screening for lung cancer.

Authors:  Timothy R Church; William C Black; Denise R Aberle; Christine D Berg; Kathy L Clingan; Fenghai Duan; Richard M Fagerstrom; Ilana F Gareen; David S Gierada; Gordon C Jones; Irene Mahon; Pamela M Marcus; JoRean D Sicks; Amanda Jain; Sarah Baum
Journal:  N Engl J Med       Date:  2013-05-23       Impact factor: 91.245

6.  Development and validation of clinical diagnostic models for the probability of malignancy in solitary pulmonary nodules.

Authors:  Jingsi Dong; Nan Sun; Jiagen Li; Ziyuan Liu; Baihua Zhang; Zhaoli Chen; Yibo Gao; Fang Zhou; Jie He
Journal:  Thorac Cancer       Date:  2014-03-03       Impact factor: 3.500

7.  Predicting Malignancy Risk of Screen-Detected Lung Nodules-Mean Diameter or Volume.

Authors:  Martin Tammemagi; Alex J Ritchie; Sukhinder Atkar-Khattra; Brendan Dougherty; Calvin Sanghera; John R Mayo; Ren Yuan; Daria Manos; Annette M McWilliams; Heidi Schmidt; Michel Gingras; Sergio Pasian; Lori Stewart; Scott Tsai; Jean M Seely; Paul Burrowes; Rick Bhatia; Ehsan A Haider; Colm Boylan; Colin Jacobs; Bram van Ginneken; Ming-Sound Tsao; Stephen Lam
Journal:  J Thorac Oncol       Date:  2018-10-25       Impact factor: 15.609

8.  3D-MCN: A 3D Multi-scale Capsule Network for Lung Nodule Malignancy Prediction.

Authors:  Parnian Afshar; Anastasia Oikonomou; Farnoosh Naderkhani; Pascal N Tyrrell; Konstantinos N Plataniotis; Keyvan Farahani; Arash Mohammadi
Journal:  Sci Rep       Date:  2020-05-14       Impact factor: 4.379

9.  Assessment of Plasma Proteomics Biomarker's Ability to Distinguish Benign From Malignant Lung Nodules: Results of the PANOPTIC (Pulmonary Nodule Plasma Proteomic Classifier) Trial.

Authors:  Gerard A Silvestri; Nichole T Tanner; Paul Kearney; Anil Vachani; Pierre P Massion; Alexander Porter; Steven C Springmeyer; Kenneth C Fang; David Midthun; Peter J Mazzone
Journal:  Chest       Date:  2018-03-01       Impact factor: 9.410

10.  Nomogram For The Prediction Of Malignancy In Small (8-20 mm) Indeterminate Solid Solitary Pulmonary Nodules In Chinese Populations.

Authors:  Xiao-Bo Chen; Rui-Ying Yan; Ke Zhao; Da-Fu Zhang; Ya-Jun Li; Lin Wu; Xing-Xiang Dong; Ying Chen; De-Pei Gao; Ying-Ying Ding; Xi-Cai Wang; Zhen-Hui Li
Journal:  Cancer Manag Res       Date:  2019-11-06       Impact factor: 3.989

View more
  1 in total

Review 1.  Lung cancer risk prediction models based on pulmonary nodules: A systematic review.

Authors:  Zheng Wu; Fei Wang; Wei Cao; Chao Qin; Xuesi Dong; Zhuoyu Yang; Yadi Zheng; Zilin Luo; Liang Zhao; Yiwen Yu; Yongjie Xu; Jiang Li; Wei Tang; Sipeng Shen; Ning Wu; Fengwei Tan; Ni Li; Jie He
Journal:  Thorac Cancer       Date:  2022-02-08       Impact factor: 3.500

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.