| Literature DB >> 32913898 |
Neil N Trivedi1, Mehrdad Arjomandi1, James K Brown1, Tess Rubenstein1, Abigail D Rostykus1, Stephanie Esposito2, Eden Axler3, Mike Beggs4, Heng Yu4, Luis Carbonell4, Alice Juang4, Sandy Kamer4, Bhavin Patel4, Shan Wang4, Amanda L Fish4, Zaid Haddad5, Alan Hb Wu6.
Abstract
BACKGROUND: The increase in lung cancer screening is intensifying the need for a noninvasive test to characterize the many indeterminate pulmonary nodules (IPN) discovered. Correctly identifying non-cancerous nodules is needed to reduce overdiagnosis and overtreatment. Alternatively, early identification of malignant nodules may represent a potentially curable form of lung cancer.Entities:
Keywords: biomarkers; diagnosis; lung cancer; pulmonary nodules; risk models
Year: 2018 PMID: 32913898 PMCID: PMC7480946 DOI: 10.15761/brcp.1000173
Source DB: PubMed Journal: Biomed Res Clin Pract ISSN: 2397-9631
Figure 1.Diagram of model development process and subject cohort designations
Subject set demographics. Diagnosis p-value refers to the significance of the difference in the characteristic between the Benign and Malignant diagnosis
| Characteristic | Training Set | Testing Set | Independent Validation Set |
|---|---|---|---|
| Number of subjects, n | |||
| Age, mean (SD)[range], years | 62.7 (9.1) [25–80] | 65.9 (9.2) [33–85] | 60.1 (8.4) [42–83] |
| Benign | 60.8 (10.8) [25–76] | 63.6 (10.6) [33–85] | 57.3 (8.1) [42–74] |
| Malignant | 63.8 (7.9) [46–80] | 67.5 (7.8) [51–82] | 62.8 (7.9) [50–83 |
| Diagnosis P-value | 0.117 | 0.133 | < 0.001 |
| Male | 73 (60) | 42 (71) | 58 (60) |
| Benign | 33 (45) | 19 (45) | 28 (48) |
| Malignant | 40 (55) | 23 (55) | 30 (52) |
| Diagnosis P-value | 0.321 | 0.513 | 0.853 |
| Female | 48 (40) | 17 (29) | 39 (40) |
| Benign | 10 (21) | 5 (29) | 20 (51) |
| Malignant | 38 (79) | 12 (71) | 19 (49) |
| Diagnosis P-value | < 0.001 | 0.04 | |
| Pack-year, mean (SD) | 49 (30) | 45 (48) | 60 (36) |
| Benign | 44 (37) | 48 (42) | 57 (36) |
| Malignant | 51 (26) | 53 (34) | 63 (37) |
| Diagnosis P-value | 0.349 | 0.65 | 0.455 |
| Size, mean (SD)[range], mm | 16.2 (7.1) [4–30] | 16.4 (6.4) [5–28] | 16.0 (6.0) [4–29] |
| Benign | 11.7 (5.6) [4–25] | 12.9 (5.6) [5–26] | 14.9 (6.1) [4–28] |
| Malignant | 18.7 (6.6) [6–30] | 18.7 (5.8) [8–28] | 17.2 (5.6) [8–29] |
| Diagnosis | < 0.001 | < 0.001 | 0.05 |
| Location upper lobe, n (%) | 64 (53) | 38 (64) | 55 (57) |
| Benign | 17 (27) | 14 (37) | 27 (49) |
| Malignant | 47 (73) | 24 (63) | 28 (51) |
| Diagnosis P-value | < 0.001 | 0.04 | 1 |
| Benign nodule diagnosis | 43 (36) | 24 (41) | 48 (49) |
| Granuloma | 3 (7) | 0 (0) | 5 (10) |
| CT scan stable | 12 (28) | 8 (33) | 41 (85) |
| Not reported | 28 (65) | 16 (67) | 2 (4) |
| Malignant nodule type | 78 (64) | 35 (59) | 49 (51) |
| Adenocarcinoma | 18 (23) | 13 (37) | 28 (57) |
| Squamous cell | 8 (10) | 4 (11) | 14 (29) |
| NSCLC | 3 (4) | 1 (3) | 3 (6) |
| Other | 0 (0) | 0 (0) | 4 (8) |
| Not reported | 49 (63) | 17 (49) | 0 (0) |
| Stage I & II | 68 (87) | 29 (83) | 48 (98) |
| Stage III & IV | 7 (9) | 5 (14) | 1 (2) |
| Not reported | 3 (4) | 1 (3) | 0 (0) |
| White | 96 (79) | 51 (86) | 91 (94) |
| Black | 18 (15) | 5 (8) | 4 (4) |
| Asian | 4 (3) | 1 (2) | 0 (0) |
| Hispanic | 1 (1) | 1 (2) | 0 (0) |
| Other | 2 (2) | 0 (0) | 2 (2) |
| Not reported | 0 (0) | 1 (2) | 0 (0) |
SVM Model performance in the independent cohort used for validation.
| Validation cohort; prevalence: 50.5% | ||||||
|---|---|---|---|---|---|---|
| SVM Model (single cutoff: 0.50) | VA Model (low cutoff: 0.05 and high cutoff: 0.65) | |||||
| Model Result, n | All | Malignant | Benign | All | Malignant | Benign |
| Total | 97 | 49 | 48 | 97 | 49 | 48 |
| High Risk | 78 | 46 | 32 | 29 | 18 | 11 |
| Intermediate Risk | 0 | 0 | 0 | 68 | 31 | 37 |
| Low Risk | 19 | 3 | 16 | 0 | 0 | 0 |
| Correct | 62 | 46 | 16 | 18 | 18 | 0 |
Dx Yield is the percent of all samples that are less than the lower cutoff i.e., [True Negative + False Negative]/Total number of samples. Risk-predicted Yield is the percent of all samples given a low-risk or high-risk result excluding those samples given an intermediate risk result. PPV=Positive Predictive Value defined as [sensitivity X prevalence]/[(sensitivity X prevalence) + (1-specificity) X (1-prevalence)]. NPV = Negative Predictive Value defined as [specificity X (1-prevalence)]/[(prevalence X (1-sensitivity)] + (specificity X (1-prevalence))].
Figure 2.Distributions of protein biomarkers and clinical factors between DX (diagnosis) of benign (blue boxes) and malignant (red boxes) disease subjects by cohort set. Natural log of the (A) EGFR (Epidermal Growth Factor Receptor), (B) ProSB (Pro-Surfactant Protein B), and (C) TIMP1 (Tissue Inhibitor of Metalloproteinases 1) plasma levels, (D) logarithm base 10 of the lung nodule diameter +1 cm, and (E) subject age. The Training cohort set consists of 2/3 of the 8 medical center combined subjects. The Testing cohort set consists of the remaining 1/3 of the 8 medical center combined subjects. The Validation cohort set are the subjects from Vanderbilt University Medical Center. Statistically significant differences between the diagnoses within a cohort set are indicated by the p-value (above the pair of boxes). Not-significant (ns) were p-values ≥ 0.05.
Figure 3.Variable importance in modeling the likelihood that a lung nodule is malignant based on the mean decrease in the Gini Impurity value from a random forest analysis. Nodule size is the logarithm base 10 of the lung nodule diameter +1 cm. TIMP1 (Tissue Inhibitor of Metalloproteinases 1), ProSB (Pro-Surfactant Protein B), and EGFR (Epidermal Growth Factor Receptor) are the natural log transformed plasma levels. Sex is the subject’s sex coded as 0=female and 1=male. Age is the subject age divided by 10. Cancer Hx is the subject’s history of cancer other than lung cancer. Location of the nodule in the upper lobe of either lung is coded as 0 = no and 1 = yes.
Figure 4.Receiver operator characteristic (ROC) plot of sensitivity vs. 1-specifcity for all cut-offs from 0 to 1 for the SVM model and the VA model in the training cohort set. The SVM algorithm ROC curve (blue line) has an area under the curve (AUC) of 0.86. The VA model ROC curve (red line) has an AUC of 0.77. The curve of no discrimination for reference is indicated by the gray diagonal line for which the AUC is 0.50.