| Literature DB >> 31712735 |
Adrian Levitsky1,2, Maria Pernemalm2, Britt-Marie Bernhardson1, Jenny Forshed2, Karl Kölbeck3, Maria Olin3, Roger Henriksson4, Janne Lehtiö2, Carol Tishelman1,5,6, Lars E Eriksson7,8,9.
Abstract
The aim of this study was to identify a combination of early predictive symptoms/sensations attributable to primary lung cancer (LC). An interactive e-questionnaire comprised of pre-diagnostic descriptors of first symptoms/sensations was administered to patients referred for suspected LC. Respondents were included in the present analysis only if they later received a primary LC diagnosis or had no cancer; and inclusion of each descriptor required ≥4 observations. Fully-completed data from 506/670 individuals later diagnosed with primary LC (n = 311) or no cancer (n = 195) were modelled with orthogonal projections to latent structures (OPLS). After analysing 145/285 descriptors, meeting inclusion criteria, through randomised seven-fold cross-validation (six-fold training set: n = 433; test set: n = 73), 63 provided best LC prediction. The most-significant LC-positive descriptors included a cough that varied over the day, back pain/aches/discomfort, early satiety, appetite loss, and having less strength. Upon combining the descriptors with the background variables current smoking, a cold/flu or pneumonia within the past two years, female sex, older age, a history of COPD (positive LC-association); antibiotics within the past two years, and a history of pneumonia (negative LC-association); the resulting 70-variable model had accurate cross-validated test set performance: area under the ROC curve = 0.767 (descriptors only: 0.736/background predictors only: 0.652), sensitivity = 84.8% (73.9/76.1%, respectively), specificity = 55.6% (66.7/51.9%, respectively). In conclusion, accurate prediction of LC was found through 63 early symptoms/sensations and seven background factors. Further research and precision in this model may lead to a tool for referral and LC diagnostic decision-making.Entities:
Mesh:
Year: 2019 PMID: 31712735 PMCID: PMC6848139 DOI: 10.1038/s41598-019-52915-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1CONSORT flow diagram: The PEX-LC lung cancer investigation cohort. This figure is based on the CONSORT 2010 flow diagram. As this was not a randomised intervention trial, it has been modified to suit this cohort study accordingly. Primary LC: primary lung cancer (no other cancer); NC, no cancer; NSCLC: non-small cell lung cancer (adenocarcinoma, n = 200; squamous cell carcinoma, n = 45; not otherwise specified (NOS), n = 5; other NSCLC (adenosquamous lung carcinoma (n = 4), large cell neuroendocrine carcinoma (n = 3); large cell carcinoma, adenoid cystic carcinoma of the lung, adenoid carcinoma with neuroendocrine differentiation, and mucoepidermoid carcinoma of the lung (n = 1, respectively)); SCLC: Small cell lung cancer (includes one individual with combined SCLC) (n = 24); Other LC: carcinoid, n = 9; no histology, n = 17. *Not meeting inclusion criteria: translator required (n = 50), consent withdrawn/missing (n = 15); missing data (n = 5); other reason such as or pain, illness, or other medical condition (n = 25). ¹ Other reasons: Limited time of the visit or lack of resources (staff) at the clinic (n = 47); hospitalisations (n = 34); deaths (n = 20). ² Other: Medical records non-consent (n = 4); unconfirmed, possible lung cancer (n = 3); undiagnosed cancer (n = 2); death before clinical investigation (n = 1); participant withdrew clinical investigation (n = 2); previous lung cancer (n = 1); incomplete modules (n = 12). Primary LC: Current/previous comorbidities include Crohn’s disease, diabetes, gout, lymphedema, pulmonary fibrosis, fibromyalgia, sarcoidosis (n = 1, respectively); rheumatoid arthritis (n = 2); asbestos-related disease (n = 3); heart disease or anaemia (n = 4, respectively); chronic bronchitis (n = 5); angina pectoris (n = 15); emphysema (n = 17); pulmonary oedema (n = 33); asthma (n = 35); chronic obstructive lung disease (COPD, n = 70); pneumonia (n = 73); no comorbidities/unknown (n = 113). NC (no malignant cancer): Diagnoses included Castleman’s disease, empyema, systemic lupus erythematosus, gout, polymyositis, previous granulomatosis with polyangiitis, haemoptysis, tuberculosis (n = 1, respectively); benign hamartoma, resected benign hamartoma, tularaemia (n = 2, respectively); diabetes, sarcoidosis (n = 3, respectively). Current/previous conditions, NC: asbestos-related disease, bronchitis, kidney failure or lung embolism (n = 1, respectively); anaemia (n = 3); chronic bronchitis (n = 5); emphysema (n = 6); angina pectoris (n = 7); pulmonary oedema (n = 25); heart disease or COPD (n = 26, respectively); asthma (n = 34); pneumonia (n = 58); no diagnosis/unknown (n = 73).
Patient characteristics in the total PEX-LC cohort.
| Variable | Analysed (n = 506)a | Excluded (n = 164)a | P value |
|---|---|---|---|
| Age, years (Median (IQR)) | 70 (63–75) | 72 (64.3–78) |
|
| Sex, females | 249 (49.2) | 80 (48.8) | 0.924 |
| Current smokers* | 148 (29.2) | 28 (17.1) |
|
| Confirmed history of asthma | 68 (13.4) | 13 (7.9) | 0.060 |
| Confirmed history of COPD | 93 (18.4) | 20 (12.2) | 0.066 |
| Confirmed history of pneumonia | 126 (24.9) | 38 (23.2) | 0.654 |
| Antibiotics, past 2 years | 193 (38.1) | 52 (31.7) | 0.137 |
| Cold/flu/pneumonia, past 2 years | 351 (69.4) | 104 (63.4) | 0.156 |
To compare patient characteristics between the individuals fulfilling study criteria (lung cancer or no cancer = analysed) and the remainder of the cohort (excluded), chi-squared tests (Fisher’s exact tests if expected counts < 5) were utilised to compare proportional data (e.g. proportion of females or current smokers), and Independent Samples Mann-Whitney U tests were utilised to compare continuous data (age).
aAll variables are recorded in numbers (% proportions in parentheses), unless specified. *Current smokers includes individuals who recently quit smoking (within the past 1 year). IQR: interquartile range; COPD: chronic obstructive pulmonary disease. History of asthma, COPD or pneumonia, respectively, are physician-confirmed. Bolded two-sided p-values < 0.050 were considered statistically significant.
Figure 2Variable selection flow diagram for the PEX-LC analysis. *The first exclusion step removed variables with limited observations (<4 observations of “yes” per variable for each outcome: lung cancer (LC) vs. no cancer). These variables are shown in Supplementary Table S1. **For step 1 of background variable removal for potentially-analysable results, the majority were not included due to lack of significant univariate associations to LC and/or were not previously-reported LC risk signs (n = 35/39). Ordinal smoking status (never-smokers, past smokers, current smokers), living alone, and university-level education were not included due to potentially overfitting the model, and weight loss was not included due to a large proportion of missing data. These variables are shown in S1 Table. ***For step 2 of background variable removal, the majority had principal component analysis loadings and orthogonal projections to latent structures variable importance for the projection (VIP) scores < 1 (n = 8). The past smokers (vs. non-smokers) variable was not included due to the potential risk of overfitting the model, as current smokers included those who quit smoking within the past 1 year. These variables are shown in Supplementary Table S2. 1Descriptors with minimal model contribution (Supplementary Table S2) were sequentially removed (n = 82) until maximal model performance could be achieved with 70 variables. The final model selection process including performance of additional models by variable count is shown in Supplementary Fig. S2A,B.
Identified descriptors and background factors for maximal lung cancer prediction performance.
| BACKGROUND | |
|---|---|
|
| Confirmed history of COPD |
|
| Confirmed history of pneumonia* |
|
|
|
|
| |
|
| |
| 5: Wheezing/panting* | 30: Breathing worse when I lay down* |
| 7: Gasped for breath | 31: Breathing worse due to high humidity |
| 12: Felt thickness in throat | 33: Breathing worse due to coldness* |
| 21: Breathing sound: Whistled | 35: Breathing worse during certain times of the day* |
|
| |
|
| |
| 3: Sudden, loud cough* | 11: Needed to clear my throat* |
| 4: Hacking cough* |
|
| 5: Wheezing cough* | 35: Cough varied over the year |
| 6: Irritating, dry cough | 63: Cough occurred/worsened when I exerted myself* |
| 7: Coughed until I lost my breath, choked and/or vomited* | 64: Cough occurred/worsened when I breathed deeply* |
| 8: Cough attacks* | 68: Cough worsened by high humidity |
| 10: Small coughs* | |
|
| |
| 3: Decreased amount* | 24: Thin, fluid-like consistency* |
| 6: White mucus or sputum* | 25: Taffy-like/viscous consistency* |
|
| |
|
| |
| 3: Hurting: Comes and goes | 67: Heartburn |
|
| 201: Pain/aches/discomfort: Throat* |
| 9: Aches: Comes and goes | 204: Pain/aches/discomfort: Shoulder blade |
| 10_11_12: Aches: Positional/breathing-based | 207: Pain/aches/discomfort: Shoulder(s) |
| 14: Pain: Consistent | 210: Pain/aches/discomfort: Neck |
| 16_17_18: Pain: Positional/breathing-based* | 213: Pain/aches/discomfort: Chest |
| 27: Cramping aches/pains: Comes and goes* |
|
| 39: Dull aches/pain: Comes and goes | 227: Pain/aches/discomfort: Moves around* |
| 49: Tenderness | |
|
|
|
|
| 1: Voice got more hoarse |
| 4: Legs cannot cope |
|
| 11: Felt constant tiredness, weakness, or lack of energy* | 6: Cleared my throat more when I talked* |
|
|
|
|
| 1: More difficult to distinguish smells |
| 2: Enjoyed food less than before | 2: Lost sense of smell* |
|
|
|
|
|
|
| 1: Chills* | 1: Cramps in calves |
| 4: Felt cold | 10: Drier skin* |
| 13: Night sweats | 13: Drier mouth |
| 19: Feeling unfit | |
Variables included in the final model (n = 70) are shown, including 7 background variables and 63 descriptors. Numbers indicate the identifiers of each of the included descriptors for each respective module and serve as a key to the regression coefficients shown in Supplementary Fig. S3. Of originally 285 descriptors, 145 met inclusion criteria (at least 4 observations in each group, lung cancer or no cancer). Additionally-excluded descriptors (n = 82) and background variables (n = 9) for model finalisation are indicated in Supplementary Table S2. History of chronic obstructive pulmonary disease (COPD) and history of pneumonia, respectively, are physician-confirmed. Bolded descriptors reached significance in terms of regression coefficients and 95% jack-knifed confidence intervals (ordered by strength of association to lung cancer, see Supplementary Fig. S3).
*Indicates variables that had an average regression coefficient with an inverse association to lung cancer (n = 28).
Lung cancer prediction performance from orthogonal projections to latent structures (OPLS).
| Model | AUC | AUC2 | C | R2X | R2 | Q2 | Sens | Spec |
|---|---|---|---|---|---|---|---|---|
| Full model, 70 variables | 0.767 | 0.695 | 2 | 42.3 | 62.4 | 58.1 | 84.8* | 55.6* |
| Descriptors only, 63 variables | 0.736 | 0.670 | 2 | 32.7 | 56.0 | 50.1 | 73.9 | 66.7 |
| Background only, 7 variables | 0.652 | 0.568 | 2 | 79.9 | 51.7 | 50.9 | 76.1 | 51.9 |
Table headings: AUC: Area under the receiver operating characteristic (ROC) curve, cross-validation (CV) test set; AUC2: AUC, training set; C: Number of orthogonal components; R2X: Percent explained X variance (for all independent variables); R2: Percent explained Y variance (lung cancer); Q2: Cross-validated R2 (CV test set); Sens/Spec: Percent sensitivity and specificity, respectively, of the model in the CV test set, based off the optimal cutoff from the Youden’s index.
Model abbreviations: Full model: Final model with 70 variables (63 descriptors and seven background variables), built on maximal explained variance (R2 and Q2). After initially projecting all 145 descriptors (symptoms/sensations), candidates were then chosen in OPLS by visual inspection of regression coefficients and variable importance for the projection (VIP) values, with sequential removal of descriptors with no model contribution (S1 Table). The seven background variables were selected after demonstrating principal component analysis loadings > 0.1 and OPLS VIP values > 1. A full list of the final 70 variables is shown in Table 2. All sensitivity/specificity values are selected from the cutoff with the largest Youden’s index. Sensitivity was preferred in this study.
*Maximum performance of this model was with Youden’s index = 0.426 favoring specificity: sensitivity = 50%, specificity = 92.6%. Of relevancy for this study, the largest Youden’s index tailored for sensitivity (0.402) was selected: sensitivity = 84.8%; specificity = 55.6%.
Figure 3Receiver operating characteristic (ROC) curves for lung cancer prediction performance from orthogonal projections to latent structures (OPLS) modelling. ROC curves of lung cancer prediction performance were calculated from CV test set lung cancer prediction scores compared to diagnostic outcome (lung cancer or no cancer). Area under the ROC curves are shown in Table 3. For a detailed description of the full model and included variables, see Table 2. Background only_7 var (blue broken line): Seven background variables only. Full model_70 var (purple line): Final model, including 63 descriptors + seven background variables. Descriptors only_63 var (green broken line): 63 descriptors only.
Figure 4Orthogonal projections to latent structures (OPLS) 3D scores plot. Individual scores for the training set (A n = 433) and predicted scores (PS) for the cross-validated test set (B n = 73) are shown for the final model. All three of the OPLS model components are plotted, including the predictive component (t[1]) and the two orthogonal components (to[1] & [2]) (total R2X variance = 42.3%: t[1] = 23.6%, to[1] = 12.7%, to[2] = 6%). Predictive explained R2Y variance (lung cancer: training set): 62.4%; cross-validated explained Q2 variance (lung cancer: cross-validated test set): 58.1%. A total of 63 descriptors of symptoms and sensations were included together with seven background variables (Table 2). Coloured circles indicate lung cancer (red) or no cancer (blue). Outliers are indicated beyond the 95% confidence interval ellipse.