| Literature DB >> 31601974 |
Stéphane Busca1, Julia Salleron2, Romain Boidot3,4, Jean-Louis Merlin1,5, Alexandre Harlé6,7.
Abstract
Diagnosis of lung cancer can sometimes be challenging and is of major interest since effective molecular-guided therapies are available. Compounds of tobacco smoke may generate a specific substitutional signature in lung, which is the most exposed organ. To predict whether a tumor is of lung origin or not, we developed and validated the EASILUNG (Exome And SIgnature LUNG) test based on the relative frequencies of somatic substitutions on coding non-transcribed DNA strands from whole-exome sequenced tumors. Data from 7,796 frozen tumor samples (prior to any treatment) from 32 TCGA solid cancer groups were used for its development. External validation was carried out on a local dataset of 196 consecutive routine exome results. Eight out of the 12 classes of substitutions were required to compute the EASILUNG signature that demonstrated good calibration and good discriminative power with a sensitivity of 83% and a specificity of 72% after recalibration on the external validation dataset. This innovative test may be helpful in medical decision-making in patients with unknown primary tumors potentially of lung origin and in the diagnosis of lung cancer in smokers.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31601974 PMCID: PMC6786985 DOI: 10.1038/s41598-019-51155-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Distribution of the relative frequencies of the 12 substitution classes in the development dataset for the 4,908 tumors with a somatic substitution prevalence of ≥1 sub/Mb.
| All | Other location | Lung cancer | p-value | |
|---|---|---|---|---|
| (n = 4908) | (n = 4206) | (n = 702) | ||
| %CA >12 | 13.47% (661) | 7.01% (295) | 52.14% (366) | <0.0001 |
| %GT >13 | 15.14% (743) | 9.49% (399) | 49% (344) | <0.0001 |
| %CG ≥1 | 82.46% (4047) | 84.38% (3549) | 70.94% (498) | <0.0001 |
| %GC ≥1 | 81.32% (3991) | 83.05% (3493) | 70.94% (498) | <0.0001 |
| %CT, median (IQR) | 22 (15; 30) | 24 (18; 31) | 11 (0; 16) | <0.0001 |
| %GA, median (IQR) | 22 (15; 30) | 25 (18; 32) | 11 (0; 16) | <0.0001 |
| %AC, median (IQR) | 1 (0; 2) | 1 (0; 3) | 1 (0; 1) | <0.0001 |
| %TG, median (IQR) | 1 (0; 3) | 1 (0; 3) | 0 (0; 1) | <0.0001 |
| %AG >5 | 43.44% (2132) | 47.81% (2011) | 17.24% (121) | <0.0001 |
| %TC >5 | 42.97% (2109) | 47.0% (1977) | 18.8% (132) | <0.0001 |
| %AT >3 | 26.67% (1309) | 24.82% (1044) | 37.75% (265) | <0.0001 |
| %TA >3 | 26.55% (1303) | 24.37% (1025) | 39.6% (278) | <0.0001 |
Results presented as percentage and frequency %(n) unless otherwise indicated. IQR: interquartile range; %CA = relative frequency of C > A substitutions (%); %GT = relative frequency of G > T substitutions (%); %CG = relative frequency of C > G substitutions (%); %GC = relative frequency of G > C substitutions (%); %CT = relative frequency of C > T substitutions (%); %GA = relative frequency of G > A substitutions (%); %AC = relative frequency of A > C substitutions (%); %TG = relative frequency of T > G substitutions (%); %AG = relative frequency of A > G substitutions (%); %TC = relative frequency of T > C substitutions (%); %AT = relative frequency of A > T substitutions (%); %TA = relative frequency of T > A substitutions (%).
Bivariate analysis and full multivariate model with the 12 substitution classes using logistic regression for the 4,908 tumors with a somatic substitution prevalence of ≥1 sub/Mb.
| Relative frequencies of substitutions in tumors | Bivariate Analyses | Full Mutivariate Model | ||
|---|---|---|---|---|
| Odds ratio and 95% CI | p-value | Odds ratio and 95% CI | p-value | |
| %CA >12 vs ≤12 | 14.44 [11.95; 17.46] | 0.001 | 4.86 [3.59; 6.58] | 0.001 |
| %GT >13 vs ≤13 | 9.17 [7.66; 10.98] | <0.001 | 1.62 [1.2; 2.19] | 0.001 |
| %CG ≥1 vs 0 | 0.45 [0.38; 0.54] | <0.001 | 1.52 [0.75; 3.06] | 0.24 |
| %GC ≥1 vs 0 | 0.50 [0.42; 0.60] | <0.001 | 1.58 [0.78; 3.22] | 0.20 |
| %CT | 0.84 [0.83; 0.85] | <0.001 | 0.91 [0.88; 0.93] | <0.001 |
| %GA | 0.84 [0.83; 0.85] | <0.001 | 0.93 [0.91; 0.95] | <0.001 |
| %AC | 0.67 [0.63; 0.72] | <0.001 | 0.82 [0.76; 0.90] | <0.001 |
| %TG | 0.65 [0.61; 0.70] | <0.001 | 0.83 [0.76; 0.90] | <0.001 |
| %AG >5 vs ≤5 | 0.23 [0.19; 0.28] | <0.001 | 0.43 [0.33; 0.58] | <0.001 |
| %TC >5 vs ≤5 | 0.26 [0.21; 0.32] | <0.001 | 0.44 [0.33; 0.59] | <0.001 |
| %AT >3 vs ≤3 | 1.84 [1.55; 2.17] | <0.001 | 0.99 [0.74; 1.33] | 0.96 |
| %TA >3 vs ≤3 | 2.04 [1.72; 2.40] | <0.001 | 1.15 [0.86; 1.53] | 0.36 |
CI = confidence interval; vs = versus; %CA = relative frequency of C > A substitutions (%); %GT = relative frequency of G > T substitutions (%); %CG = relative frequency of C > G substitutions (%); %GC = relative frequency of G > C substitutions (%); %CT = relative frequency of C > T substitutions (%); %GA = relative frequency of G > A substitutions (%); %AC = relative frequency of A > C substitutions (%); %TG = relative frequency of T > G substitutions (%); %AG = relative frequency of A > G substitutions (%); %TC = relative frequency of T > C substitutions (%); %AT = relative frequency of A > T substitutions (%); %TA = relative frequency of T > A substitutions (%).
Final logistic regression defining the EASILUNG score.
| Odds Ratio‡ and 95% CI | β Coefficient‡ | EASILUNG score points◊ | |
|---|---|---|---|
| %CA >12 vs ≤12 | 5.78 [4.36; 7.66] | 1.7547 | 175 |
| %GT >13 vs ≤13 | 1.86 [1.4; 2.47] | 0.6209 | 62 |
| %CT | 0.92 [0.9; 0.94] | −0.0863 | −9 |
| %GA | 0.94 [0.92; 0.96] | −0.0631 | −6 |
| %AC | 0.84 [0.77; 0.92] | −0.1719 | −17 |
| %TG | 0.84 [0.78; 0.92] | −0.1704 | −17 |
| %AG >5 vs ≤5 | 0.48 [0.36; 0.63] | −0.7433 | −74 |
| %TC >5 vs ≤5 | 0.48 [0.36; 0.64] | −0.7317 | −73 |
‡Odds ratio correspond to the exponential function of the regression coefficient (eβ).
◊β Coefficient*100 rounded to the nearest integer
CI = confidence interval; %CA = relative frequency of C > A substitutions (%); %GT = relative frequency of G > T substitutions (%); %CG = relative frequency of C > G substitutions (%); %GC = relative frequency of G > C substitutions (%); %CT = relative frequency of C > T substitutions (%); %GA = relative frequency of G > A substitutions (%); %AC = relative frequency of A > C substitutions (%); %TG = relative frequency of T > G substitutions (%); %AG = relative frequency of A > G substitutions (%); %TC = relative frequency of T > C substitutions (%); %AT = relative frequency of A > T substitutions (%); %TA = relative frequency of T > A substitutions (%).
Calculating the EASILUNG score with two examples from the TCGA.
| Substitution classes | %CA∆ | %GT◊ | %CT | %GA | %AC | %TG | %AG* | %TC‡ | |
|---|---|---|---|---|---|---|---|---|---|
| EASILUNG signature = | +175 | +62 | −9*(%CT) | −6*(%GA) | −17*(%AC) | −17*(%TG) | −74 | −73 | |
| Example 1(a) | |||||||||
|
|
|
|
|
|
|
|
|
| |
| EASILUNG score = | +175 | +62 | −72 | −72 | +0 | +0 | +0 | +0 | =93 |
| Example 2(b) | |||||||||
|
|
|
|
|
|
|
|
|
| |
| EASILUNG score = | +0 | +0 | −243 | −222 | −68 | −17 | −74 | +0 | =−624 |
%CA = relative frequency of C > A substitutions (%); %GT = relative frequency of G > T substitutions (%); %CT = relative frequency of C > T substitutions (%); %GA = relative frequency of G > A substitutions (%); %AC = relative frequency of A > C substitutions (%); %TG = relative frequency of T > G substitutions (%); %AG = relative frequency of A > G substitutions (%); %TC = relative frequency of T > C substitutions (%).
∆For samples with %CA > 12.
◊For samples with %GT > 13.
*For samples with %AG > 5.
‡For samples with %TC > 5.
(a)Example 1/Tumor sample barcode = TCGA-75–7027–01A-11D-1945–08 (lung cancer) with 27% C > A substitutions (+175 since > 12), 19% G > T substitutions(+62 since > 13), 8% C > T substitutions (−9*8), 12% G > A substitutions (−6*12), 0% A > C substitutions (−17*0), 0% T > G substitutions (−17*0), 3% A > G substitutions (0 since ≤5), 5% T > C substitutions (0 since ≤5).
(b)Example 2/Tumor sample barcode = TCGA-75–7027–01A-11D-1945–08 (non-lung cancer) with 2% C > A substitutions (+0 since ≤12), 2% G > T substitutions(+0 since ≤ 13), 27% C > T substitutions (−9*27), 37% G > A substitutions (−6*37), 4% A > C substitutions (−17*4), 1% T > G substitutions (−17*1), 6% A > G substitutions (−74 since>5), 5% T > C substitutions (0 since ≤5).
Figure 1Description of the EASILUNG score in the development dataset. This includes the 4,908 tumors with a somatic substitution prevalence of ≥1 sub/Mb and shows the estimated probability of lung cancer calculated from the EASILUNG score according to the values of the score. Logit (lung cancer) = 1.216 + 0.010 × EASILUNG score. Probability of lung cancer = 1/(1 + exp [−logit(lung cancer)]).
Figure 2Description of the EASILUNG score in the validation dataset. This includes the 195 tumors with a somatic substitution prevalence of ≥1 sub/Mb and shows the estimated probability of lung cancer calculated from the EASILUNG signature according to the values of the score. Logit (lung cancer) = 2.094 + 0.008 × EASILUNG score. Probability of lung cancer = 1/(1 + exp [−logit(lung cancer)]).