Literature DB >> 28117344

Highly Sensitive Marker Panel for Guidance in Lung Cancer Rapid Diagnostic Units.

Sonia Blanco-Prieto1, Loretta De Chiara1, Mar Rodríguez-Girondo2,3, Lorena Vázquez-Iglesias1, Francisco Javier Rodríguez-Berrocal1, Alberto Fernández-Villar4, María Isabel Botana-Rial4, María Páez de la Cadena1.   

Abstract

While evidence for lung cancer screening implementation in Europe is awaited, Rapid Diagnostic Units have been established in many hospitals to accelerate the early diagnosis of lung cancer. We seek to develop an algorithm to detect lung cancer in a symptomatic population attending such unit, based on a sensitive serum marker panel. Serum concentrations of Epidermal Growth Factor, sCD26, Calprotectin, Matrix Metalloproteinases -1, -7, -9, CEA and CYFRA 21.1 were determined in 140 patients with respiratory symptoms (lung cancer and controls with/without benign pathology). Logistic Lasso regression was performed to derive a lung cancer prediction model, and the resulting algorithm was tested in a validation set. A classification rule based on EGF, sCD26, Calprotectin and CEA was established, able to reasonably discriminate lung cancer with 97% sensitivity and 43% specificity in the training set, and 91.7% sensitivity and 45.4% specificity in the validation set. Overall, the panel identified with high sensitivity stage I non-small cell lung cancer (94.7%) and 100% small-cell lung cancers. Our study provides a sensitive 4-marker classification algorithm for lung cancer detection to aid in the management of suspicious lung cancer patients in the context of Rapid Diagnostic Units.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28117344      PMCID: PMC5259733          DOI: 10.1038/srep41151

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Lung cancer (LC) is the most common cause of cancer-related death worldwide, accounting for 13% of new cancer diagnoses and 19% of total deaths1. Despite recent advances in treatment, this neoplasia carries an extremely poor prognosis, with an overall 5-year survival rate of 13% 2, consequence of the difficulty of detection at an early stage3. Therefore, early detection when surgery may be curative is the best way to reduce LC mortality24. Opposed to the scenario in U.S., where screening using low-dose computed tomography (CT) among high-risk individuals has been recommended5, in Europe there are no LC screening recommendations so far6. Among the main reasons are the high rate of false positive results789, and the awaited upcoming results of the on-going randomized control trials (reviewed in Ruchalski et al.9). In Spain, the reduction of the time before diagnosis and staging is a priority8101112, since both radiological imaging (CT, PET) and invasive procedures for histological confirmation (bronchoscopy, thoracic needle aspiration or thoracentesis) are required in the diagnostic work-up of a LC patient1314. Rapid or Quick Diagnostic Units (RDU/QDU) have been established within the public health system, with the main objective of accelerating the early diagnosis of potentially severe diseases such as cancer, avoiding hospitalisations for purely diagnostic purposes, minimizing hospital-related morbidity, reducing costs, and improving patient satisfaction1516. In Spain approximately 40–50% of the patients attending LC-RDUs display non-cancerous lung pathologies1112. Consequently, clinical decision-making in the setting of LC-RDU would benefit from non-invasive markers that could help predict LC risk in symptomatic individuals, discerning cancerous patients who should be submitted to confirmatory diagnostic procedures from those without cancer in whom a conservative approach could be applied avoiding initially such procedures. Recent reviews have indicated that blood-based markers would be an ideal tool to detect early-stage LC and complement CT imaging1718. We previously described a three-marker panel that included EGF (Epidermal Growth Factor), soluble CD26 (sCD26) and Calprotectin (CAL), showing a considerable discriminatory capacity to detect patients at high risk for LC (83% sensitivity and 87% specificity)19. In this new study, together with EGF, sCD26, and CAL, other 5 serum markers were evaluated: MMP-1, −7, −9 (Matrix Metalloprotease −1, −7, −9), CEA (Carcinoembryonic antigen) and CYFRA 21.1, with the aim of improving our previous diagnostic algorithm. These molecules cover a spectrum of biological functions implicated in cancer development and progression, as summarized on Table 1. Since this novel panel is intended to be used in a LC-RDU managed by consultants receiving referrals from primary care doctors, an elevated sensitivity to detect LC among symptomatic patients is imperative.
Table 1

Markers Selected for the Development of a Diagnostic Panel for Lung Cancer.

MarkerFunction in CancerUsefulness in LC diagnosis
Epidermal Growth Factor (EGF)Binding of EGF to receptor promotes tumour growth and progression42Suitable discrimination of LC/NSCLC from healthy and benign lung pathologies1920
sCD26Immune regulation and co-stimulatory activities43 Tumour suppressor protein in NSCLC44Suitable diagnostic potential for LC vs healthy21, healthy/ benign19 or NSCLC-MPE/NSCLC-PMPE22
Calprotectin (CAL)Antimicrobial and pro-inflammatory functions, tumour development45 Mediate lung metastases46Promising marker in LC derived by pleural effusion23 and LC19
Matrix Metalloproteinases (MMP-1, −7, −9)Extracellular matrix degradation leading to cancer invasion and metastasis, processing of growth factors, regulation of apoptosis and angiogenesis, tumour-associated inflammation and immune escape47MMP-1: Poor diagnostic capacity for LC vs healthy controls in plasma48
MMP-7: Moderate diagnostic potential for NSCLC vs healthy and benign lung disease49
MMP-9: Good diagnostic potential for LC vs control and benign lung affections5051
Carcinoembryonic Antigen (CEA)Belongs to immunoglobulin superfamily, acting in cell adhesion and innate immunity52Moderate diagnostic potential for LC vs non-malignant pathologies42425
CYFRA 21.1Fragment of Cytokeratin 19, constituent of cytoskeleton and expressed in epithelial differentiation53Moderate diagnostic potential for LC vs non-malignant pathologies42425

Abbreviations: LC = Lung Cancer, NSCLC = Non Small Cell Lung Cancer, MPE = Malignant Pleural Effusion, PMPE = paramalignant pleural effusion.

Results

Marker Levels in Lung Cancer and Controls

Serum levels of the 8 markers analyzed are shown in Table 2, including the median and range for the control group (healthy and benign) and LC. After correction for multiple testing, serum concentrations of EGF, CAL, MMP-1, MMP-7, MMP-9, CEA and CYFRA 21.1 were significantly elevated in LC compared to controls (Mann-Whitney U test, P = 0.001 for EGF, CAL, MMP-9, CEA and CYFRA 21.1; P = 0.047 for MMP-1 and P = 0.013 for MMP-7), while sCD26 levels were notably lower in malignancy relative to controls (Mann-Whitney U test, P = 0.001).
Table 2

Serum Markers in Lung Cancer and Controls in the Training Set.

MarkerControl/CaseaMedianRangePbAdjusted effect (95% CI)cAUC (95% CI)
EGFControl336.8640.13–1187.06   
(pg/mL) Healthy247.6040.13–972.90   
  Benign419.6743.25–1187.06   
 LC571.7040.75–1716.300.0010.238 (0.140–0,336)*0.698 (0.615–0.773)
sCD26Control471.50122.00–1092.00   
(ng/mL) Healthy509.00237.00–886.00   
  Benign431.50122.00–1092.00   
 LC356.00136.00–1192.000.001−0.097 (−0.148–0.047)*0.711 (0.629–0.785)
CALControl127.877.56–421.23   
(ng/mL) Healthy105.477.56–362.29   
  Benign158.1333.13–421.23   
 LC221.2148.33–438.320.0010.263 (0.190–0.336)*0.759 (0.679-0.827)
MMP-1Control5459.611186.61–23960.37   
(pg/mL) Healthy4664.941186.61–23635.33   
  Benign6420.511207.70–23960.37   
 LC7133.871450.34–41668.330.0470.089 (−0.008–0.187)0.597 (0.511-0.679)
MMP-7Control21761.355026.14–79977.27   
(pg/mL) Healthy21237.2010383.69–60968.91   
  Benign22286.745026.14–79977.27   
 LC26710.655383.18–79809.130.0130.020 (−0.040–0.081)0.624 (0.539-0.705)
MMP-9Control177.6052.79–3611.59   
(ng/mL) Healthy177.1952.79–743.25   
  Benign181.7457.41–3611.59   
 LC311.8621.06–1914.000.0010.270 (0.167–0.373)*0.729 (0.648–0.801)
CEAControl837.26170.84–4070.71   
(pg/mL) Healthy929.50171.03–4070.71   
  Benign559.51170.84–2828.15   
 LC2051.64141.16–136039.190.0010.494 (0.333–0.656)*0.744 (0.663–0.814)
CYFRA 21.1Control227.050.00–19314.33   
(pg/mL) Healthy0.000.00–16592.63   
  Benign1052.470.00–19314.33   
 LC3007.090.00–173410.170.0011.080 (0.587–1.573)*0.734 (0.653–0.805)

Abbreviations: LC = Lung Cancer.

aSample size in training set: Control n = 72 (Healthy n = 36, Benign n = 36), LC n = 68.

bMann-Whitney U test for comparison between the cancer and control group corrected by Benjamini-Hochberg method to control familywise error under multiple comparisons.

cAdjusted effects and 95% confidence intervals of the case/control status on each of the log-transformed markers considered as outcome in lineal regression model adjusted for gender, age and smoking. *P-value < 0.001.

All marker levels were found significantly different between healthy controls and cancer subjects (Mann-Whitney U test, P = 0.002 for EGF, sCD26, CAL, MMP-9, CEA and CYFRA 21.1; P = 0.018 for MMP-1 and MMP-7). However, when comparing patients with benign pathologies and cancer, differences in MMP-1 and MMP-7 resulted not significant (Mann-Whitney U test, P = 0.448 for MMP-1 and P = 0.090 for MMP-7). In multivariate linear regression models adjusted for gender, age and smoking status, significant association was again observed for the occurrence of LC and the markers, except for MMP-1 and MMP-7, when considering both the healthy group and all controls. However, only CAL, MMP-9 and CEA maintained significant association with LC regarding benign pathologies. Furthermore, correlation between the eight markers analysed was also explored using an annotated heatmap (Supplementary Figure S1). Correlations rank from a minimum of 0.038 between EGF and CYFRA 21.1, and a maximum of 0.489 for CAL and MMP-9. Moderate correlations were also observed for several markers, as with EGF with CAL and MMP-9, and the negative correlation of sCD26 with CAL. The performance of the candidate markers was evaluated by means of ROC curves (Table 2). CAL showed the best potential to discriminate LC from controls (AUC 0.759), followed by CEA (AUC 0.744), CYFRA 21.1 (AUC 0.734) and MMP-9 (AUC 0.729). EGF and sCD26 exhibited AUCs in the range of 0.7, while MMP-1 and MMP-7 demonstrated poor discriminatory capacity (AUC 0.597 and 0.627, respectively).

Marker Levels by Cancer Histology and Stage

As displayed in Table 3, LC cases were evaluated based on histology. Statistically significant differences after correction for multiple testing were found between both non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC) in relation to controls for sCD26, CEA and CYFRA 21.1 (Mann Whitney U test, NSCLC vs controls: P = 0.002 for sCD26, CEA and CYFRA 21.1; SCLC vs controls: P = 0.040 for sCD26, P = 0.002 for CEA and P = 0.012 for CYFRA 21.1). Levels of EGF, CAL, MMP-7 and MMP-9 resulted different when comparing NSCLC and controls (Mann-Whitney U test, P = 0.002 for EGF, CAL and MMP-9, P = 0.048 for MMP-7), but not for SCLC (Mann-Whitney U test, P = 0.798 for EGF and MMP-9, P = 0.112 for CAL and P = 0.056 for MMP-7).
Table 3

Distribution of Serum Markers in Lung Cancer by Histology and Stage and Comparison with Controls in the Training Set.

MarkerControl/CaseaMedianRangePbAdjusted effect (95% CI)c
EGFControl336.8640.13–1187.06  
(pg/mL)NSCLC601.97144.23–1176.150.0020.277 (0.168–0.386)*
  Early (I + II)785.28388.23–1159.730.0020.409 (0.236–0.581)*
  Late (III + IV)528.51144.23–1176.150.0020.231 (0.109–0.354)*
 SCLC295.4840.75–1716.300.798−0.039 (−0.293–0.216)
  Limited55.9740.75–201.63 
  Extended440.64264.30–1716.30 
sCD26Control471.50122.00–1092.00  
(ng/mL)NSCLC356.00136.00–945.000.002−0.095 (−0.153– 0.038)*
  Early (I + II)432.00206.00–640.000.235-0.032 (−0.114–0.049)
  Late (III + IV)341.00136.00–945.000.002−0.116 (−0.179– 0.054)*
 SCLC294.00208.00–1092.000.040−0.073 (−0.184–0.037)
  Limited370.00339.00–1192.00 
  Extended288.00208.00–541.00 
CALControl127.877.56–421.23  
(ng/mL)NSCLC221.0687.19–438.320.0020.250 (0.160–0.341)*
  Early (I + II)193.72120.73–340.670.0170.188 (0.033–0.343)*
  Late (III + IV)238.5987.19–438.320.0020.275 (0.172–0.378)*
 SCLC245.6948.33–422.200.1120.203 (−0.014–0.419)
  Limited97.3291.06–422.20 
  Extended279.3448.33–355.19 
MMP-1Control5459.611186.61–23960.37  
(pg/mL)NSCLC7132.521450.34–41668.330.0930.071 (−0.049–0.192)
  Early (I + II)7800.421784.41–30180.330.2350.087 (-0.106-0.279)
  Late (III + IV)7132.521450.34–41668.330.1310.066 (−0.065–0.198)
 SCLC7135.232220.44–18900.420.2790.083 (−0.158–0.323)
  Limited7135.234890.35–18900.42 
  Extended8114.522220.44–14246.29 
MMP-7Control21761.355026.14–79977.27  
(pg/mL)NSCLC26485.455383.18–79809.130.0480.021 (−0.052–0.095)
  Early (I + II)24615.568551.00–51000.430.726−0.031 (−0.136–0.074)
  Late (III + IV)26680.265383.18–79809.130.0230.039 (−0.041–0.118)
 SCLC28177.708231.19–43429.130.0560.052 (−0.082–0.186)
  Limited26874.4426710.65–40252.39 
  Extended30110.148231.19–43429.13 
MMP-9Control177.6052.79–3611.59  
(ng/mL)NSCLC340.1921.06–1914.000.0020.257 (0.135–0.379)*
  Early (I + II)379.94174.70–1688.000.0020.285 (0.101–0.470)*
  Late (III + IV)299.6521.06–1914.000.0020.244 (0.106–0.382)*
 SCLC225.6165.58–786.340.7980.013 (−0.235–0.262)
  Limited115.9365.58–253.19 
  Extended298.2176.78–786.34 
CEAControl837.26170.84–4070.71  
(pg/mL)NSCLC1783.93141.16–136039.190.0020.467 (0.266–0.668)*
  Early (I + II)1093.08353.26–21684.850.1400.154 (−0.050–0.358)
  Late (III + IV)2750.49141.16–136039.190.0020.567 (0.348–0.787)*
 SCLC3704.961147.58–82300.260.0020.818 (0.541–1.094)*
  Limited2092.471147.58–82300.26 
  Extended3884.151667.40–29844.72 
CYFRA 21.1Control227.050.00–19314.33  
(pg/mL)NSCLC2910.660.00–173410.170.0020.939 (0.348–1.531)*
  Early (I + II)1181.220.00–7309.160.1040.503 (−0.382–1.387)
  Late (III + IV)4791.340.00–173410.170.0021.105 (0.459–1.752)*
 SCLC5886.750.00–12228.100.0121.285 (0.154–2.415)*
  Limited1754.30219.64–6610.22 
  Extended5965.400.00-12228.10 

Abbreviations: NSCLC = Non-Small Cell Lung Cancer, SCLC = Small Cell Lung Cancer.

aSample size in training set: Control n = 72, NSCLC n = 59 (Early stage n = 16, Late stage n = 43), SCLC n = 9 (Limited stage n = 3, Extended stage n = 6).

bMann-Whitney U test for comparison between the control groups and lung cancer stratified by histology and stage corrected by Benjamini-Hochberg method to control familywise error under multiple comparisons.

cAdjusted effects and 95% confidence intervals of histology and stage on each of the log-transformed markers considered as outcome in lineal regression model adjusted for gender, age and smoking. *P-value statistically significant.

The potential of the markers to detect early stage LC was analysed according to tumour stage (Table 3). EGF, CAL and MMP-9 were the only molecules significantly altered after multiple testing correction in NSCLC stage I + II (Mann-Whitney U test, P = 0.002 for EGF and MMP-9, P = 0.017 for CAL), suggesting their usefulness for diagnosis at earliest stages. For late stage NSCLC, all markers except for MMP-1 displayed significant differences (Mann-Whitney U test, P = 0.002 for EGF, sCD26, CAL, MMP-9, CEA and CYFRA 21.1, P = 0.023 for MMP-7). Regarding SCLC a dramatic reduction of EGF, CAL and MMP-9 levels in limited SCLC prevents the distinction from non-cancer patients. After adjusting for the common risk factors gender, age and smoking, the same associations were maintained, with the exception of SCLC association with sCD26, and the lack of significance with either LC histology or NSCLC stages and MMP-7.

Association between Clinical Parameters and Marker Levels

The association of marker concentrations with clinical variables gender, age and smoking status is presented in Supplementary Table S1. sCD26 levels were significantly higher in women relative to men (Mann-Whitney U test, P = 0.018), with no other marker influenced by gender. Older age was associated significantly with higher levels of MMP-7 and CYFRA 21.1 (Mann-Whitney U test, P < 0.001 and P = 0.004, respectively), whereas sCD26 levels diminished with age. Patients with smoking habits had significantly increased serum concentrations of EGF and MMP-7 in relation to never smokers (Mann-Whitney U test, P = 0.028 for EGF and P = 0.026 for MMP-7).

Multimarker Panel and Classification Algorithm for Lung Cancer

Lasso regression was employed to simultaneously derive a multivariate panel of markers and an optimal cut-off for LC, with the criterion of maximizing specificity for a predefined sensitivity of 95%. The resulting classification rule as well as the optimal Lasso penalization parameter are available in Supplementary Material S1 and Supplementary Figure S2, respectively. Additionally, Supplementary Table S2 includes several diagnostic measurements for the proposed model. Variability in the proportion of males and smoking, and differences in age between LC and controls, as well as influence of these variables on marker levels motivated their inclusion in the model. Application of Lasso procedure on the training set led to the establishment of a 4-marker panel composed of EGF, sCD26, CAL and CEA. A clinical model composed of gender, age and smoking was also established by logistic regression for comparison. Performance and ROC curves of this marker panel and single markers for LC diagnosis, besides the clinical model, are presented on Table 4 and Fig. 1, respectively. The 4-marker panel demonstrated a good discriminatory capacity to differentiate LC patients from controls with an AUC of 0.873, showing 97% sensitivity and 43% specificity for LC detection corresponding to a 0.266 cut-off. This combination of markers outperforms the individual markers in terms of specificity. In relation to the clinical model, a lower discriminatory ability was displayed as compared to our proposed multivariate panel (AUC = 0.717 (0.637–0.799), DeLong test P-value < 0.0001). At the desired sensitivity of 95%, the decision rule based on such clinical model renders a poor specificity of 26%. Based on our model and assuming a prevalence of LC of 44.4%, corresponding to the RDU of the Pneumology Service of EOXI Vigo, an optimal Negative Predictive Value (NPV) of 94.7% was reached, and a moderate Positive Predictive Value (PPV) of 57.6%.
Table 4

Performance of the Four-Marker Panel and EGF, sCD26, CAL and CEA in the Diagnosis of Lung Cancer.

Training SetCut-offSn (%)Sp (%)PPVa (%)NPVa (%)AUC (95% CI)b
Multivariate Algorithm: EGF, sCD26, CAL, CEA>0.266974357.694.70.873 (0.811–0.925)
EGF>178.48 pg/mL9522.249.484.80.698 (0.615–0.773)
sCD26≤637.2 ng/mL9513.946.877.70.711(0.629–0.785)
CAL>96.37 ng/mL9530.652.288.40.759 (0.679–0.827)
CEA>258.2 pg/mL9511.14673.60.744 (0.663–0.814)
Clinical Modelc>0.2379526.450.886.90.717 (0.637-0.799)
Validation Set
Multivariate Algorithm: EGF, sCD26, CAL, CEA>0.26691.745.457.387.30.837 (0.718-0.936)
Clinical Modelc>0.23791.727.350.280.50.659 (0.488-0.816)

Abbreviations: Sn = Sensitivity, Sp = Specificity, PPV = Predictive Positive Value, NPV = Negative Predictive Value.

aPositive and negative predictive values were estimated assuming a prevalence of lung cancer of 44.4% (QDU of the Pneumology Service of Hospital Álvaro Cunqueiro EOXI Vigo).

bAUC and 95% CI evaluated in the training test is not protected against overfitting.

cClinical model includes gender, age and smoking.

Figure 1

ROC Curve Analysis for Lung Cancer Prediction in the Training Set.

ROC curves are shown for each individual marker included in the classification algorithm, together with the clinical model and the 4-marker panel derived from logistic Lasso regression. Training set included 68 lung cancer cases and 72 controls (36 healthy and 36 benign respiratory pathologies).

To further verify the performance of the 4-marker panel for prediction of LC, the resulting classification algorithm developed in the training set was tested in an independent validation set. Descriptive statistics for each marker of the panel are given in Supplementary Table S3 according to histology and stage. In the validation set the marker panel showed an AUC of 0.837, with a sensitivity of 91.7% and a moderately higher specificity of 45.4%, based on the 0.266 cut-off established (Table 4). Regarding the clinical model, the inferior discriminatory capacity was again evidenced by the AUC of 0.659 (0.488–0.816) (DeLong test P-value = 0.0003).

Classification Accuracy of the 4-Marker Panel for Lung Cancer and Control Subgroups

To deeply assess the performance of our classification rule we examined its ability to correctly classify specific subgroups of LC patients and controls (Table 5). Training and validation populations were combined, and sensitivity for the histological subgroups and stage was calculated at fixed 43% specificity (0.266 cut-off). The classification rule identified with high sensitivity stage I NSCLC (94.7%) and stage II (100%), similarly to advanced stages III and IV (95.2 and 94.6%, respectively). The most prevalent NSCLC type, adenocarcinoma (ADC), also demonstrated a high sensitivity (93.2%), as in Squamous Cell Carcinoma (SqCC) and Large Cell Carcinoma (LCC) (100% both). All patients with SCLC were likewise detected with 100% sensitivity.
Table 5

Classification Accuracy of the Multivariate Algorithm for Subgroups of Patients in the Combined Set (Training and Validation Set).

  Cases correctly classifieda/Total cases% Sn at 43% Sp
Lung Cancer 88/9295.6
NSCLC 76/8095
 I18/1994.7
 II3/3100
 III20/2195.2
 IV35/3794.6
 ADC41/4493.2
 SqCC20/20100
 LCC13/13100
 BAC1/250
 ND1/1100
SCLC 12/12100
  Cases correctly classifieda/Total cases% Sp at 95% Sn
Control 41/9443.6
Healthy 18/4440.9
Benign 23/5046
 RI19/4146.3
 ILD4/944.4
Accuracy in the Classification of Cancer and Controls regarding CT imaging
Absence of Nodules in CT scan Cases correctly classifieda/Total cases% Sn at 43% Sp
Lung Cancer 7/7100
NSCLC 6/6100
 I1/1100
 II2/2100
 III1/1100
 IV2/2100
SCLCExtended1/1100
Presence of Nodules in CT scan Cases correctly classifieda/Total cases% Sp at 95% Sn
Control 1/333.3
Healthy 0/10
Benign 1/250

Abbreviations: Sn = Sensitivity, Sp = Specificity, NSCLC = Non Small Cell Lung Cancer, ADC = Adenocarcinoma, SqCC = Squamous Cell Carcinoma, LCC = Large Cell Carcinoma, BAC = Bronchioloalveolar Carcinoma, ND = Not Differentiated Carcinoma, SCLC = Small Cell Lung Cancer, RI = Respiratory Infection, ILD = Interstitial Lung Disease.

aCut-off p = 0.266 for a sensitivity of 97% and specificity of 43% in the training set.

Among non-cancerous patients the panel correctly classified 41 out of 94 controls (43.6%), yielding a specificity of 40.9% for healthy and 46% for benign conditions of the lung. Table 5 also includes the classification accuracy based on the results of the CT scan, specifically when no mass was detected. When additionally no nodules were found, the panel correctly classified all LC cases (7/7; 100% sensitivity). On the contrary, in the presence of nodules, our panel was able to classify 1 out of 3 controls (33.3% specificity).

Discussion

Classification algorithms capable of guiding clinical decision-making constitute a valuable tool that can help predict LC, besides complement CT imaging1718. In a previous work we described a three-marker panel for high-risk patients including the molecules EGF, sCD26 and CAL, and gender and age as confounders, and their implication in lung carcinogenesis was enclosed1920212223. Here we provide an improved classification algorithm achieving a superior sensitivity for LC in the context of RDU. In this refined algorithm, besides the smoking status, the routinely used CEA was incorporated, corroborating its diagnostic capacity especially for late-stage tumours4242526. Briefly, our approach involves the measurement of EGF, sCD26, CAL and CEA to generate a classification score for each individual to predict LC. As for colorectal and breast cancer, LC could also benefit from screening programs. However, at this time in Europe there are no LC screening recommendations though The European Society of Radiology and the European Respiratory Society recommend screening within a clinical trial or in routine clinical practice at certified medical centres6. Instead, the strategy implemented in many European hospitals to achieve an early detection is the acceleration in the time to diagnosis in the so called Rapid Diagnostic Units for LC1011121516. Consequently, we intended to design a marker-based classification algorithm to be used in these Units, where the priority is to detect all LC cases (high sensitivity), in order to select those patients that should be immediately submitted to more invasive tests. Individual analyses evidenced the usefulness of EGF, sCD26, CAL and CEA among the 8 molecules assayed, with AUCs between 0.698–0.759 for the training set and 0.716-0.871 for the validation set, headed in both cohorts by CAL. Among the four markers, differences were more frequent comparing NSCLC and controls, even at early stages as in the case of EGF and CAL. In relation to SCLC, sCD26 and CEA were the markers that better differentiated this histological group. The individual diagnostic potential of the four markers resulted in a modestly specific signature for the detection of LC when combined through a multivariate logistic Lasso regression approach that provided, by design, desirable sensitivity. This strategy demonstrated 97% sensitivity in the training set and for a >0.266 cut-off the classification algorithm showed a specificity of 43%. In the validation set sensitivity resulted in a fine 91.7% and 45.4% specificity. This modest specificity is of value in the clinical context of RDU with patients with respiratory symptoms and/or LC suspicion. Performance of our marker panel also outperforms that of a clinical model constituted by gender, age and smoking. The classification accuracy including training and validation cohorts showed an overall sensitivity of 95.6% for LC. Among the 95% of NSCLC patients correctly classified, 94.7% of stage I tumours were detected. Regarding SCLC, the classification algorithm was effective for all the cases. Among controls, overall specificity resulted 43.6% and was not greatly affected by the nature of the controls themselves. Given the clinical dilemma of indeterminate nodules detected on CT-based screening due to elevated false positive rates8, we also evaluated our algorithm according to the absence/presence of nodules. All LC cases (100%) that had a negative CT-scan were correctly classified (6 out of 6 NSCLC and the SCLC case). On the contrary, among controls bearing nodules, our panel classified correctly 1 out of 3 patients. It should be noted that CT-scan data was available for all LC cases but only for 13.8% of controls (3 healthy and 10 benign cases), limiting the analysis. In the last years several diagnostic multianalyte panels have been proposed for LC, with variable criteria for patient selection such as inclusion limited to NSCLC or absence of controls bearing benign pulmonary pathologies. Studies comparable to ours, at least with similar study population, are scarce. Molina et al.25 proposed a six marker panel (CEA, CA 15.3, Squamous Cell Carcinoma Antigen –SCC–, CYFRA 21.1, Neuron Specific EnolaseNSE– and Progastrin-releasing Peptide –ProGrp–) for patients with suspected LC based on the criterion of any of the markers elevated, proving a sensitivity of 88.5% and specificity of 82%, not validated. Other studies document protein models combined with CT imaging techniques. Yang et al.24 reported for high-risk patients with no lesions on CT scan a panel, which resulted positive when at least one of the markers CEA, SCC, CYFRA 21.1 and Progastrin-releasing Peptide was altered, yielding a sensitivity of 76.6% and specificity of 94.4%, though they do not report data on another independent sample set. The algorithm established by Patz et al.27 based on the combination of nodule size and CEA, alpha-antitrypsin and SCC, rendered acceptable performance for classifying patients with indeterminate nodules (92% sensitivity and 74% specificity). To date, only two blood tests based on marker panels have been translated into clinical or commercial setting. The EarlyCTD-Lung, which measures autoantibodies, was developed for the early detection of LC in high-risk population or as adjunct to CT28. Its performance was demonstrated in clinical practice, yielding 41% sensitivity and 87% specificity. The PAULA’s test (Protein Assays Using Lung cancer Analytes) is a 4-marker panel comprising three tumour antigens (CEA, CA125 and CYFRA 21.1) and one autoantibody (NY-ESO1), intended for early NSCLC tumours in high-risk patients. In a validation set the panel discriminated NSCLC (with 67% early-stage) from healthy controls with a sensitivity of 77% and specificity of 80%. However its clinical applicability is limited since benign conditions were not included4. None of the cited studies pursued such a high sensitivity as we do, which would probably derive in a diminished specificity. In these circumstances, we would affirm the promising value of our 4-marker panel. Our model building procedure is based on regularized regression models which are intended to be more flexible and resistant to overfitting compared to stepwise approaches293031. Furthermore, by design, our method identifies models which guarantee the optimization of the derived classification rule by choosing the penalty parameter and cut-off which maximizes specificity, assuring a predefined sensitivity. The adaptive nature of our method constitutes one of the strengths of our study. Alternative model building techniques established by first choosing a logistic model based on a reduced set of variables by minimizing the AIC or BIC, and then determining a cut-off depending on the given classification setting, are less flexible, and in general our method outperforms these approaches since it is specifically designed for optimizing classification performance. Moreover, approaches based on exhaustive evaluation of all possible sub-models become rapidly unfeasible when increasing the number of candidate markers, whereas our approach, since it relies on shrinkage, is expected to perform well in such situations. In Supplementary Table S4 we have included two logistic regression models (two-stage) and derived classification rules for selected 90% and 95% sensitivities. As observed, the obtained models and classification rules are not uniformly optimal and their performance varies according to the classification situation. For example, the classification rule based on BIC performs well for the cut-off that provides 95% sensitivity, while its performance is considerably worse for the cut-off corresponding to 90% sensitivity. Alternatively, if we focus on the AIC as optimal criterion, this method outperforms alternative Lasso-based and BIC for 90% sensitivity, but it presents an inferior performance when we focus on higher sensitivities. Given the complex challenge of developing an optimal diagnostic panel for LC, a proper study design is also of crucial importance. Besides the consciousness in the statistical approach aforementioned, the inclusion of both benign and healthy individuals in the control group, as well as the two main histological tumours (NSCLC and SCLC) is a strong point of our study. Another important feature is that samples from all the individuals were prospectively collected at their first visit to the Pneumology Service in the presence of respiratory symptoms, reflecting the clinical setting of a RDU. For the refinement of the diagnostic algorithm we have also included information related to tobacco, which constitutes a well-established risk factor32 and is usually not contemplated in studies developing diagnostic panels. One of the advantages of our classification algorithm is that only 4 molecules comprise the panel, and 2 of them are already established in hospitals: CEA is routinely measured for various types of tumours, while CAL is also quantified for its utility in inflammatory colon processes33. This makes our 4-marker panel simple and affordable to guide clinical decision-making and complement CT scan. Additionally, we are currently working on an interactive web application to facilitate the implementation of the classification algorithm in the biomedical community, based on the Shiny web application for R34. Regarding the limitations of the study, the number of patients was modest for both training and validation sets, particularly for SCLC cases. A possible shortcoming could be the lack of information related to tobacco consumption, which perhaps could have contributed to the improvement of the diagnostic algorithm. In summary, we defined a modestly specific 4-marker classification algorithm that provided, by design, desirable sensitivity for the detection of LC, conceived to be useful among symptomatic high-risk individuals derived to LC-RDU. The next step along the complicated road to reach the clinical implementation is the validation of our panel in a large, multi-centric cohort.

Methods

Study Population

Between May 2007 and January 2011, 186 patients with respiratory symptoms were prospectively recruited at the Pneumology Service of Hospital Álvaro Cunqueiro EOXI Vigo (Spain). The study population included patients finally diagnosed of LC, and a control cohort with subjects diagnosed of benign lung disease and healthy subjects with no respiratory pathology. Exclusion criteria included relapse or progression of a cancer previously diagnosed, and chemo-or radiotherapy treatment. Clinical guidelines from the American College of Chest Physicians were followed for LC diagnosis1314. Histological assessment of tumours followed the WHO criteria35 and staging was performed according to the 7th edition of TNM36. Recruited individuals were divided into a training set for panel development, and in another set for validation of the algorithm. The training set consisted of 140 individuals and included 68 LC cases (80.9% men, median age 69.5 years). The control cohort included 72 subjects with a median age of 61 years and 63.6% males. The validation set consisted of 46 individuals (24 LC and 22 controls). Patient demographics are outlined in Supplementary Table S5. The study followed the clinical-ethical practices of the Spanish Government and the Helsinki Declaration, and was approved by the Galician Ethical Committee for Clinical Research. All patients provided written informed consent.

Determination of Markers Concentration

Blood samples were collected from all patients at their first visit to the Service, when bronchoscopy was performed. Serum was obtained and stored at −20 °C until analysis. Measurement of EGF (R&D Systems, Minneapolis, USA), sCD26 (eBioscience, Wien, Austria) and Calprotectin (Hycult Biotechnology, Uden, the Netherlands) concentrations were conducted using enzyme-linked immunosorbent assays (ELISA). Absorbance readings were collected on an EnVision Multilabel Plate Reader (Perkin Elmer). To measure the amount of serum MMPs, CEA and CYFRA 21.1 multiplexed bead-based immunoassays were used. Levels of MMP-1, MMP-7 and MMP-9 were determined with the Human MMP Panel 2 Magnetic Bead Kit, while CEA and CYFRA 21.1 were part of the Circulating Cancer Biomarker Magnetic Bead Panel 1 (EMD Millipore, Missouri, USA). Fluorescence was read on a Luminex 200™ with BioPlex Manager™ software (Bio-Rad, Hercules, CA), using a 5-parameter logistic fitting for deriving protein concentration in samples.

Statistical Methods

Individual Marker Analysis

Non-parametric Mann-Whitney U test was used for two-sample group comparisons of continuous variables, while Fisher test was applied for comparison of qualitative variables. Benjamini-Hochberg method for controlling the false discovery rate37 was used to correct P-values for multiple group comparisons. Linear regression models were used to study markers’ association with LC presence, histology and stage adjusted for the risk factors gender, age and smoking. The discriminatory ability of markers for LC was evaluated by Receiver Operating Characteristics Curve (ROC) providing the Area Under the Curve (AUC). All tests were two-sided and P-values ≤ 0.05 were considered statistically significant. Statistical software SPSS 22.0 (SPSS Inc., Chicago, IL) and R program package (Wirtschafts Universität, Wien, Austria) were used to perform these analyses.

Marker Panel Selection and Classification Algorithm Development

Marker concentrations were log10-transformed before multivariate analysis to reduce the skewness. We derived a classification rule based on a multivariate combination of the studied markers based on logistic Lasso regression38 fitted in the training set and including age, gender and smoking as fixed effects. This procedure was also used to obtain a clinical model in which only the variables age, gender and smoking were included. Lasso regularization imposes a penalization over the maximum likelihood estimates of the usual regression coefficients so that they are shrunk towards zero. Actually, some of the resulting coefficients can be exactly zero, and hence Lasso shrinkage performs automatic variable selection. The optimal amount of shrinkage is controlled by the selection of the penalization parameter which maximizes the out-of-sample performance (in terms of some pre-defined criterion) of the model. In our algorithm, we simultaneously chose the penalty parameter and cut-off point which provides the classification rule with maximum specificity, given a fixed value of sensitivity (95%). Namely, we use 10-fold cross validation in the training set and for each possible value of the penalty parameter we apply the resulting estimated coefficients to the out-of-sample data, obtaining case probability scores for each observation of the training set. Each of these scores was subsequently dichotomized to guarantee the desired level of sensitivity. Finally, we choose the penalty parameter which maximized the specificity. Further details concerning the Lasso procedure and the proposed algorithm are displayed as Supplementary Material S1. For prediction of a new individual’s diagnosis, the selected classification rule was applied. Based on the coefficients of the regression model, the classification algorithm calculates for a new patient a single score based on the estimated predicted probabilities (p) of presenting lung cancer as a function of markers concentrations and demographic variables. A new individual will be classified as cancer if p is higher than the cut-off estimated in the training set, while classified as non-cancer when the resulting score is below the cut-off. Applying the Lasso regression model to the train and test samples, their probability scores were obtained and the diagnostic performance of the classification rule was evaluated by providing sensitivity, specificity and predictive values. ROC curves were elaborated for both the Lasso-based marker model and clinical model, providing the AUC. DeLong test was applied for comparison of AUC values of these models39. For sake of comparison, we evaluated the performance of two alternative two-stage methods based on first selecting an optimal logistic model, based, respectively, on exhaustive sub-model evaluation and selection based on minimization of Akaike information criterion (AIC)40 and Bayesian information criterion (BIC)41 and secondly, determining the optimal cut-off for guarantying the desired level of sensitivity. All multivariate calculations were performed using the R program package (Wirtschafts Universität, Wien, Austria).

Additional Information

How to cite this article: Blanco-Prieto, S. et al. Highly Sensitive Marker Panel for Guidance in Lung Cancer Rapid Diagnostic Units. Sci. Rep. 7, 41151; doi: 10.1038/srep41151 (2017). Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  43 in total

Review 1.  CD26, let it cut or cut it down.

Authors:  I De Meester; S Korom; J Van Damme; S Scharpé
Journal:  Immunol Today       Date:  1999-08

Review 2.  The carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissues.

Authors:  S Hammarström
Journal:  Semin Cancer Biol       Date:  1999-04       Impact factor: 15.707

3.  Tumour-mediated upregulation of chemoattractants and recruitment of myeloid cells predetermines lung metastasis.

Authors:  Sachie Hiratsuka; Akira Watanabe; Hiroyuki Aburatani; Yoshiro Maru
Journal:  Nat Cell Biol       Date:  2006-11-26       Impact factor: 28.824

4.  Invasive mediastinal staging of lung cancer: ACCP evidence-based clinical practice guidelines (2nd edition).

Authors:  Frank C Detterbeck; Michael A Jantz; Michael Wallace; Johan Vansteenkiste; Gerard A Silvestri
Journal:  Chest       Date:  2007-09       Impact factor: 9.410

5.  Noninvasive staging of non-small cell lung cancer: ACCP evidenced-based clinical practice guidelines (2nd edition).

Authors:  Gerard A Silvestri; Michael K Gould; Mitchell L Margolis; Lynn T Tanoue; Douglas McCrory; Eric Toloza; Frank Detterbeck
Journal:  Chest       Date:  2007-09       Impact factor: 9.410

Review 6.  S100A8/A9: a Janus-faced molecule in cancer therapy and tumorgenesis.

Authors:  Saeid Ghavami; Seth Chitayat; Mohammad Hashemi; Mehdi Eshraghi; Walter J Chazin; Andrew J Halayko; Claus Kerkhoff
Journal:  Eur J Pharmacol       Date:  2009-10-14       Impact factor: 4.432

7.  Determination of the serum matrix metalloproteinase-9 (MMP-9) and tissue inhibitor of matrix metalloproteinase-1 (TIMP-1) in patients with either advanced small-cell lung cancer or non-small-cell lung cancer prior to treatment.

Authors:  Cynthia Jumper; Everardo Cobos; Charles Lox
Journal:  Respir Med       Date:  2004-02       Impact factor: 3.415

8.  Quick diagnosis units: a potentially useful alternative to conventional hospitalisation.

Authors:  Xavier Bosch; Jesús Aibar; Santiago Capell; Antonio Coca; Alfons López-Soto
Journal:  Med J Aust       Date:  2009-11-02       Impact factor: 7.738

9.  Prognostic significance of matrix metalloproteinase-1 levels in peripheral plasma and tumour tissues of lung cancer patients.

Authors:  Min Li; Ting Xiao; Ying Zhang; Lin Feng; Dongmei Lin; Yu Liu; Yousheng Mao; Suping Guo; Naijun Han; Xuebing Di; Kaitai Zhang; Shujun Cheng; Yanning Gao
Journal:  Lung Cancer       Date:  2010-01-08       Impact factor: 5.705

10.  Role for dipeptidyl peptidase IV in tumor suppression of human non small cell lung carcinoma cells.

Authors:  Umadevi V Wesley; Shakuntala Tiwari; Alan N Houghton
Journal:  Int J Cancer       Date:  2004-05-10       Impact factor: 7.396

View more
  7 in total

1.  Age and Smoking Status Affect Serum Cytokeratin 19 Fragment Levels in Individuals Without Cancer.

Authors:  Asami Minamibata; Yoshihito Kono; Taichiro Arimoto; Yoshinori Marunaka; Koichi Takayama
Journal:  In Vivo       Date:  2022 Sep-Oct       Impact factor: 2.406

2.  Tumour Markers in the Differential Diagnosis of Patients With Isolated Involuntary Weight Loss.

Authors:  Jaume Trape; Jordi Aligue; Mireia Vicente; Anna Arnau; Antonio San-Jose; Josep Ordeig; Roser Ordeig; Mariona Bonet; Andres Abril; Omar El-Boutrouki; Carolina Gonzalez-Fernandez; Maria Sala; Cristina Figols; Elisabeth Gonzalez-Garcia; Lourdes Montsant; Domingo Ruiz
Journal:  In Vivo       Date:  2021 Nov-Dec       Impact factor: 2.155

3.  CSF-1 and Ang-2 serum levels - prognostic and diagnostic partners in non-small cell lung cancer.

Authors:  Ana Luísa Coelho; Mónica Patrícia Gomes; Raquel Jorge Catarino; Christian Rolfo; Rui Manuel Medeiros; António Manuel Araújo
Journal:  ESMO Open       Date:  2018-07-25

4.  TSPAN1, TMPRSS4, SDR16C5, and CTSE as Novel Panel for Pancreatic Cancer: A Bioinformatics Analysis and Experiments Validation.

Authors:  Hua Ye; Tiandong Li; Hua Wang; Jinyu Wu; Chuncheng Yi; Jianxiang Shi; Peng Wang; Chunhua Song; Liping Dai; Guozhong Jiang; Yuxin Huang; Yongwei Yu; Jitian Li
Journal:  Front Immunol       Date:  2021-03-18       Impact factor: 7.561

Review 5.  Calprotectin in Lung Diseases.

Authors:  Ourania S Kotsiou; Dimitrios Papagiannis; Rodanthi Papadopoulou; Konstantinos I Gourgoulianis
Journal:  Int J Mol Sci       Date:  2021-02-08       Impact factor: 5.923

Review 6.  Metabolomic Fingerprinting for the Detection of Early-Stage Lung Cancer: From the Genome to the Metabolome.

Authors:  Jean-François Haince; Philippe Joubert; Horacio Bach; Rashid Ahmed Bux; Paramjit S Tappia; Bram Ramjiawan
Journal:  Int J Mol Sci       Date:  2022-01-21       Impact factor: 5.923

7.  Biomarkers for Comorbidities Modulate the Activity of T-Cells in COPD.

Authors:  Kaschin Jamal Jameel; Willem-Jakob Gallert; Sarah D Yanik; Susanne Panek; Juliane Kronsbein; David Jungck; Andrea Koch; Jürgen Knobloch
Journal:  Int J Mol Sci       Date:  2021-07-02       Impact factor: 5.923

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.