| Literature DB >> 31711466 |
Daniel J Craig1, Thomas Morrison2, Sadik A Khuder1, Erin L Crawford1, Leihong Wu3, Joshua Xu3, Thomas M Blomquist4, James C Willey5.
Abstract
BACKGROUND: Standardized Nucleic Acid Quantification for SEQuencing (SNAQ-SEQ) is a novel method that utilizes synthetic DNA internal standards spiked into each sample prior to next generation sequencing (NGS) library preparation. This method was applied to analysis of normal appearing airway epithelial cells (AEC) obtained by bronchoscopy in an effort to define a somatic mutation field effect associated with lung cancer risk. There is a need for biomarkers that reliably detect those at highest lung cancer risk, thereby enabling more effective screening by annual low dose CT. The purpose of this study was to test the hypothesis that lung cancer risk is characterized by increased prevalence of low variant allele frequency (VAF) somatic mutations in lung cancer driver genes in AEC.Entities:
Keywords: Biomarker; Low-frequency variant detection; Lung Cancer; Next generation sequencing
Mesh:
Substances:
Year: 2019 PMID: 31711466 PMCID: PMC6844032 DOI: 10.1186/s12885-019-6313-x
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Patient Demographics
| Sample # | Cancer Status | Pack Years | Sex | Age Range | Smoking Status | Diagnosis |
|---|---|---|---|---|---|---|
| 946 | CA | 45 | F | 50–59 | Former | NSCLC-SQ |
| 167 | CA | 50 | F | 60–69 | Unknown | NSCLC |
| 947 | CA | 45 | M | 60–69 | Former | SCLC |
| 146 | CA | 46.5 | F | 60–69 | Former | NSCLC |
| 887 | CA | 28 | F | 70–79 | Current | NSCLC-AD |
| 885 | CA | 90 | M | 70–79 | Current | SCLC |
| 940 | CA | 60 | M | 70–79 | Former | NSCLC-AD |
| 191 | CA | NA* | M | 70–79 | Current | NSCLC-SQ |
| 147 | CA | 75 | M | 70–79 | Former | SCLC |
| 128 | CA | 40 | F | 50–59 | Current | NSCLC |
| 923 | CA | 15 | M | 70–79 | Former | NSCLC |
| 210 | NC | 34 | M | 40–49 | Current | Noncancer |
| 886 | NC | 0 | F | 40–49 | Never | Noncancer |
| 952 | NC | 30 | M | 50–59 | Former | Noncancer |
| 157 | NC | 100 | M | 60–69 | Unknown | Noncancer |
| 943 | NC | 0 | F | 60–69 | Never | Noncancer |
| 956 | NC | 20 | M | 60–69 | Current | Noncancer |
| 884 | NC | 54 | M | 70–79 | Former | Noncancer |
| 883 | NC | 0 | M | 80–89 | Never | Noncancer |
*Not available: The exact pack year smoking history for this patient was not recorded. However, it was recorded that the patient was an active 2 PPD smoker at time of lung cancer diagnosis at age 75 and had advanced stage COPD, thus there is compelling circumstantial evidence for large smoking history
Fig. 1Mutations identified in patient specimens. Sample mutation signal versus IS sequencing error. Variant allele frequency (VAF) of sample mutations (red triangle) relative to VAF of corresponding nucleotide-specific error variants in 19 IS replicates (black circle). VAF = site specific variant allele reads/total allele reads
Target- and cohort-specific mutation prevalence
| Target | CA-SMK | NC-SMK | NC-NON | NC-TOT | Average (All Subjects) |
|---|---|---|---|---|---|
| BRAF_15 | 6.7 × 10− 3 | 0 | 0 | 0 | 3.9 × 10− 3 |
| EGFR_18 | 0 | 0 | 0 | 0 | 0 |
| EGFR_19 | 0 | 0 | 0 | 0 | 0 |
| EGFR_20 | 3.9 × 10−2 | 3.4 × 10− 2 | 4.5 × 10− 2 | 3.8 × 10− 2 | 3.8 × 10− 2 |
| EGFR_21 | 1.7 × 10− 3 | 0 | 0 | 0 | 9.9 × 10− 4 |
| ERBB2 | 1.1 × 10− 2 | 1.4 × 10− 2 | 1.4 × 10− 2 | 1.4 × 10− 2 | 1.2 × 10− 2 |
| KRAS_2 | 0 | 0 | 0 | 0 | 0 |
| PIK3CA_10 | 4.2 × 10− 3 | 0 | 0 | 0 | 2.4 × 10− 3 |
| TP53_5 | 2.2 × 10− 2 | 4.7 × 10− 3 | 0 | 2.9 × 10− 3 | 1.4 × 10− 2 |
| TP53_6 | 2.2 × 10− 2 | 0 | 3.1 × 10− 3 | 1.2 × 10− 3 | 1.3 × 10− 2 |
| TP53_7 | 1.3 × 10− 2 | 2.9 × 10− 3 | 0 | 1.8 × 10− 3 | 8.5 × 10− 3 |
| Average (All Targets) | 1.2 × 10− 2 | 4.7 × 10− 3 | 5.3 × 10− 3 | 4.9 × 10− 3 | 8.9 × 10− 3 |
Mutations were defined as substitutions with VAF (variant allele reads/total allele reads) > 5 × 10− 4 and significantly above IS VAF (i.e., background noise) based on contingency table analysis. Mutation prevalence was defined as mutations/target bp/subject (see Methods)
Fig. 2Inter-cohort comparison of TP53 mutation mean prevalence. a Mean mutation prevalence among subjects within each cohort in each separate TP53 exon 5, 6, or 7 (mutations/target base/subject). b Cohort- and substitution-specific mean mutation prevalence for the combined three TP53 exon targets. c Number of mutations at TP53 hotspot sites. Inset: number of mutations according to mutation type. Mutations were defined as those with VAF (variant allele reads/total allele reads) > 0.05% and significantly above IS background VAF based on contingency table analysis (see Methods). TP53 mutations in CA-SMK subjects were enriched significantly at “hotspot” lung cancer driver mutation sites (p = 0.002)
Statistical analysis of target specific inter-cohort differences in mutation prevalence
| Target | CA-SMK vs. NC-TOT | CA-SMK vs. NC-SMK | CA-SMK vs. NC-NON | NC-SMK vs. NC-NON |
|---|---|---|---|---|
| BRAF_15 | 0.12 | 0.4 | 0.54 | 1 |
| EGFR_18 | N/A | N/A | N/A | N/A |
| EGFR_19 | N/A | N/A | N/A | N/A |
| EGFR_20 | 0.72 | 0.78 | 0.96 | 0.74 |
| EGFR_21 | 0.39 | 0.76 | 0.83 | 1 |
| ERBB2 | 0.35 | 0.73 | 0.8 | 1 |
| KRAS_2 | N/A | N/A | N/A | N/A |
| PIK3CA_10 | 0.062 | 0.27 | 0.41 | 1 |
| TP53_5 | 0.022 | 0.27 | 0.1 | 0.77 |
| TP53_6 | 0.0083 | 0.037 | 0.333 | 0.849 |
| TP53_7 | 0.028 | 0.25 | 0.16 | 0.9 |
| TP53_Total | 0.0019 | 0.047 | 0.043 | 0.92 |
p-value for differences in mutation prevalence in each target across cohorts was measured with Kruskal-Wallis test and p-values presented were adjusted for multiple comparisons using Nemenyi test. Mutations were defined as those with VAF (variant allele reads/total allele reads) > 5 × 10− 4 and significantly above IS background VAF based on contingency table analysis
Distribution of mutations across targets and samples
| Sample | Diagnosis | Cohort | BRAF_15 | EGFR_18 | EGFR_19 | EGFR_20 | EGFR_21 | ERBB2 | KRAS_2 | PIK3CA_10 | TP53_5 | TP53_6 | TP53_7 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 946 | CA | SMK | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 |
| 167 | CA | SMK | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 1 | 3 | 3 | 11 |
| 947 | CA | SMK | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 5 |
| 146 | CA | SMK | 1 | 0 | 0 | 3 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 7 |
| 887 | CA | SMK | 0 | 0 | 0 | 4 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 7 |
| 885 | CA | SMK | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 4 | 2 | 1 | 10 |
| 940 | CA | SMK | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 6 | 6 | 2 | 15 |
| 191 | CA | SMK | 0 | 0 | 0 | 3 | 0 | 2 | 0 | 1 | 2 | 5 | 1 | 14 |
| 147 | CA | SMK | 1 | 0 | 0 | 3 | 0 | 1 | 0 | 1 | 4 | 4 | 0 | 14 |
| 128 | CA | SMK | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 5 |
| 923 | CA | SMK | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 2 | 4 | 1 | 9 |
| 210 | NC | SMK | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 4 |
| 886 | NC | NON | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 3 |
| 952 | NC | SMK | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 3 |
| 157 | NC | SMK | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 4 |
| 943 | NC | NON | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 5 |
| 956 | NC | SMK | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 5 |
| 884 | NC | SMK | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 |
| 883 | NC | NON | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 4 |
| Total | 3 | 0 | 0 | 43 | 1 | 17 | 0 | 4 | 23 | 27 | 11 | 129 | ||
Inter-target and inter-sample distribution of mutations. Mutations were defined as those with VAF (variant allele reads/total allele reads) > 5 × 10−4 (0.05%) and significantly above IS background VAF based on contingency table analysis
Fig. 3Effect of VAF cut-off on TP53 mutation prevalence detected in AEC of subjects with or without cancer. Hotspot regions in TP53 exons 5, 6, and 7 were targeted. Variants were binned according to VAF lower limit and cumulative variants in cancer (solid symbol) or non-cancer subjects (open symbol) above the indicated VAF threshold were plotted
Fig. 4Inter-cohort comparison of subject-specific mutation prevalence. Inter-cohort comparison of subject-specific mutation prevalence (mutations/target base/subject) in (a) TP53 exons only or (b) TP53 exons, PIK3CA, and BRAF
Inter-cohort comparison of type-specific substitution mutations across all TP53 exons
| Mutation | CA-SMK1 | NC-SMK2 | NC-NON3 | NC-TOT4 |
|---|---|---|---|---|
| C > A | 17 (2.0 × 10−3)* | 1 (1.2 × 10−4) | 0 | 1 (1.2 × 10−4) |
| C > G | 1 (1.2 × 10−4) | 1 (1.2 × 10− 4) | 0 | 1 (1.2 × 10− 4) |
| C > T | 27 (3.2 × 10−3)*** | 1 (1.2 × 10− 4) | 1 (1.2 × 10− 4) | 2 (2.4 × 10− 4) |
| T > A | 3 (3.6 × 10− 4) | 0 | 0 | 0 |
| T > C | 9 (1.1 × 10− 4)* | 0 | 0 | 0 |
| T > G | 0 | 0 | 0 | 0 |
1CA-SMK; Cancer subject, present or past smoker. 2NC-SMK; Non-Cancer subject, present or past smoker. 3NC-NON; Non-Cancer subject, never smoker. 4NC-TOT; All Non-Cancer subjects, smokers and non-smokers
*p < 0.05; **p < 0.01; ***p < 0.005
Mutation number with prevalence in parentheses (mutations/target bp/subject) for each substitution type. Mutations were called as described in Methods section, after testing for significance of mutation VAF above background and using a VAF of 0.05% as a minimum threshold
Fig. 5Inter-cohort comparison of EGFR mutation mean prevalence. a Mean mutation prevalence among subjects within each cohort in each EGFR exon [18–21], or (mutations/target base/subject). b Cohort- and substitution-specific mean mutation prevalence for the combined four EGFR exon targets. c Number of mutations at EGFR hotspot sites. Inset: number of mutations according to mutation type. Mutations were defined as those with VAF (variant allele reads/total allele reads) > 5 × 10− 4 (0.05%) and significantly above IS background VAF based on contingency table analysis (see Methods)