Literature DB >> 35529798

Identification of heritable rare variants associated with early-stage lung adenocarcinoma risk.

Rui Fu1,2, Jia-Tao Zhang3, Rong-Rong Chen4, Hong Li5, Zai-Xian Tai4, Hao-Xiang Lin4, Jian Su2, Xiang-Peng Chu2, Chao Zhang1,2, Zhen-Bin Qiu2, Zi-Hao Chen1,2, Wen-Fang Tang2,6, Song Dong2, Xue-Ning Yang2, Guo-Qing Zhang5, Guo-Ping Zhao5, Yi-Long Wu2, Wen-Zhao Zhong1,2.   

Abstract

Background: In East Asia, the number of patients with adenocarcinoma, especially those presenting with ground-glass nodules (GGNs), is gradually increasing. Family aggregation of pulmonary GGNs is not uncommon; however, genetic predisposition in these patients remains poorly understood and identification of genes involved in the cause of these early-stage lung cancers might contribute to understanding of the underlying mechanisms and potential prevention strategies.
Methods: Fifty patients with early-stage lung adenocarcinoma (LUAD) presenting as GGNs and a first-degree family history of lung cancer (FHLC) from 34 independent families were enrolled into this study. Germline mutations of these patients were analyzed with whole exome sequencing (WES) and compared with age- and sex-matched 39 patients with sporadic lung cancer and 689 local healthy people. We used a stepwise variant filtering strategy, gene-based burden testing, and enrichment analysis to investigate rare but potentially pathogenic heritable mutations. Somatic tumor mutations were analyzed to consolidate germline findings.
Results: In total, 1,571 single nucleotide variants (SNVs) and 238 frameshifts with a minor allele frequency (MAF) <0.01, which were rare, recurrent, and potentially damaging candidates, were finally identified through the filtering in the GGN cohort. Pathway analysis showed the extracellular matrix to be the top dysregulated pathway. Gene-based burden testing of these highly disruptive risk-conferring heritable variants showed that MSH5 [odds ratio (OR), 9.28, 95% confidence interval (CI): 2.49-35.87], MMP9 (OR, 8.11, 95% CI: 2.22-28.43), and CYP2D6 (OR, 8.09, 95% CI: 2.68-24.92) were significantly enriched in our cohort (P<0.05). The number of rare damaging germline variants in non-smoking patients was significantly higher than that of smoking-affected patients (Spearman's ρ=-0.39, P=0.02). Conclusions: Heritable, potentially deleterious, and rare candidate variants of MSH5, MMP9 and CYP2D6 were significantly associated with early-stage LUAD presenting with GGNs. Nonsmoking patients likely have a higher genetic predisposition to this type of cancer than smoking-affected patients. These results have extended our understanding of the underlying mechanisms of early-stage LUAD. 2022 Translational Lung Cancer Research. All rights reserved.

Entities:  

Keywords:  Lung cancer; adenocarcinoma; genetic predisposition; germline mutation; ground-glass nodule (GGN)

Year:  2022        PMID: 35529798      PMCID: PMC9073742          DOI: 10.21037/tlcr-21-789

Source DB:  PubMed          Journal:  Transl Lung Cancer Res        ISSN: 2218-6751


Introduction

Lung cancer is a malignancy with the highest morbidity and mortality worldwide (1). During the past two decades, the proportions of patients with adenocarcinoma, women, nonsmokers, and patients with a family history of malignant tumors has significantly increased in China (2). In the United States, the incidence of lung cancer was also higher in young females than in young males and the changes in epidemiological trends had not been explained fully by sex differences in smoking behaviors or in outdoor air pollution exposure (3). Although tobacco smoking is the major etiological component in lung cancer, an inherited predisposition might act independently or in concert with smoking (4). Therefore, susceptibility to lung adenocarcinoma (LUAD) needs further study. In the past twenty years, the promotion of low-dose computed tomography (CT) has increased the detection rate of pulmonary ground-glass nodules (GGNs) (5). Compared with lung cancers presenting as solid nodules, those presenting as GGNs are characterized by inert growth and better prognosis. The pathological diagnosis is possibly pre-invasive or early-stage LUAD, including atypical adenomatous hyperplasia, adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), or invasive adenocarcinoma (IAC) (6,7). Few trials have studied the familial genetic susceptibility of early-stage LUAD, and potentially high-risking heritable variants in pre-invasive and invasive LUAD remain largely unknown. Previous studies have shown that some damaging germline mutations could lead to LUAD familial aggregation (8-10). Familial LUAD is more likely to carry germline epidermal growth factor receptor (EGFR) mutations along with other cancer predisposition mutations, and potential genetic modifiers might contribute to somatic mutation (11-13). Nevertheless, reported damaging mutations, for example germline EGFR mutation, explain only a small proportion of patients with LUAD and familial aggregation and did not specially cover the population with GGN (13,14). Dozens of susceptibility loci implicated in lung cancer have been identified in genome-wide association studies (GWAS) (15). However, they could only explain a limited proportion of the genetic component of lung cancer pathogenesis with modest odds ratios (ORs) (1.1–1.4) (16,17). This has also been referred to as missing heritability and is due in part to the fact that GWAS focuses on common alleles [minor allele frequency (MAF) >0.05]. In brief, many genetic studies had limited genetic explanations for LUAD or did not focus on early-stage LUAD manifesting as GGNs. In contrast, previous studies have reported that rare and deleterious variants with MAF <0.01 and modest-to-high effect sizes may have an important role in the etiology of complex traits and can explain missing heritability, which cannot be explained by common variants (18-21). Some low-frequency coding variants at lung cancer risk loci evaluated by exome sequencing were proven to be associated with lung carcinogenesis (22). Selecting whole exome sequences of specific individuals with extreme phenotypes, such as those with a family history, is an economical approach in identifying rare causal variants in targeted loci (18). Therefore, in our study, we sequenced select cases of pre-invasive and invasive LUAD manifesting as GGNs in patients with a first-degree family history of lung cancer (FHLC) to reveal rare and potential inheritable carcinogenetic variants among Eastern Asian patients with early-stage LUAD (). The following article was presented in accordance with the STROBE reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-21-789/rc).
Figure 1

General view of the study. FHLC, first-degree family history of lung cancer; LUAD, lung adenocarcinoma; GGO, ground-glass opacity; WES, whole exome sequencing; ECM, extracellular matrix; M-CAP, Mendelian Clinically Applicable Pathogenicity.

General view of the study. FHLC, first-degree family history of lung cancer; LUAD, lung adenocarcinoma; GGO, ground-glass opacity; WES, whole exome sequencing; ECM, extracellular matrix; M-CAP, Mendelian Clinically Applicable Pathogenicity.

Methods

Study design and population

This study enrolled 50 patients with a first-degree relative family history of histologically confirmed lung cancer from 2019 to 2020. All patients were pathologically diagnosed as pre-invasive or IAC manifesting as GGNs on CT. Peripheral blood and tumor samples were collected for whole exome sequencing (WES) and further analysis. In addition, blood samples from age- and sex-matched 39 patients with sporadic LUAD (without FHLC) and 678 local healthy people were collected retrospectively. Considering the impact of secondhand smoking in the family, we divided all the families into non-smoking families (no smoker in the family) and smoking families (at least one person in the family who lived with the patient and had a smoking history). The patients who lived in smoking families or had smoking habits themselves were defined as smoking-affected patients. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Institutional Review Board of Guangdong Provincial People’s Hospital (No. GDREC2019523H), and written informed consent was obtained from all individual participants.

Library preparation, capture enrichment, exome sequencing, and variant identification

Serial peripheral blood (2–4 mL) was sampled and collected in ethylenediaminetetraacetic acid vacutainer tubes (BD Biosciences, Franklin Lakes, NJ, USA). Buffy coat DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA, USA). Forty-four patients with GGNs had tumor tissues available for somatic mutation analysis, and frozen tissue or serial sections from formalin-fixed paraffin-embedded tumor tissues were used for tumor genomic DNA extraction using the QIAamp DNA mini kit (Qiagen). DNA concentration was measured using a Qubit fluorometer and the Qubit dsDNA High Sensitivity Assay Kit (Invitrogen, Carlsbad, CA, USA). Before library construction, 1 µg of buffy coat DNA was sheared to 300 bp fragments using a Covaris S2 ultrasonicator (Covaris, Woburn, MA, USA). Following sonication, 200 ng of DNA from each sample was used for library construction. Samples underwent 2 enzymatic steps followed by the NEBNext Ultra II End Repair/dA-Tailing Module (New England Biolabs, Ipswich, MA, USA). Successful adapter ligation was confirmed with an 8-cycle polymerase chain reaction (PCR) using KAPA HiFi HotStart ReadyMix (KAPA Biosystems, Wilmington, MA, USA) with PCR primers containing a custom-synthesized barcode sequence (10 bp), which was used as a unique sample identifier. The adapter-ligated and indexed DNA fragments from 1–2 libraries were mixed in equal amounts to obtain a single pool containing 4.5 µg of DNA. DNA libraries of the peripheral blood were hybridized to the xGen Exome Research Panel (v1, Integrated DNA Technologies, Skokie, IL, USA) according to the manufacturer’s instructions. Each peripheral blood DNA sample was sequenced on the Geneplus-2000 sequencing platform (Geneplus, Beijing, China) using paired-end reads according to the manufacturer’s instructions. A mean depth of 211 for the germline WES and 401 for somatic WES were used. Low-quality reads and reads containing adaptor sequences were removed. Clean data were mapped to the human reference genome HG19 using Burrows-Wheeler Aligner software (23) (BWA, version 0.7.10). The best practices to call SNPs and Indels were followed using the Genome Analysis Toolkit (24).

Variant annotation and filtering

Germline variants were annotated for mutation types, transcripts, and allele frequencies of the healthy population in the public database using the variant effector predictor tool (VEP) (25). To identify the most possible rare damaging candidate variants, we filtered variants before analysis by filtering out non-functional variants; keeping variants with allele frequencies <0.01 from all populations and East Asian populations of ExAC (26), 1,000 G (27) and gnomAD (26) databases, and assessing the allele frequency in 678 healthy Chinese individuals; keeping variants predicted as damaging and deleterious by PolyPhen-2 (28) and SIFT (29); and keeping variants meeting the family segregation rule (i.e., all the patients tested in the same family must carry the variant). We also predicted the clinical pathogenicity of variants using the M-CAP Score, which dismisses variants with an uncertain significance (30). Somatic variants were filtered to exclude synonymous variants, known germline variants in the patient, and variants that occur at a population frequency of >1% in the Exome Sequencing Project (31).

Pathway enrichment analysis

Germline variants associated with biochemical metabolic pathways and signal transduction pathways were analyzed using pathway enrichment analyses based on the Kyoto Encyclopedia of Genes and Genome (KEGG), Gene Ontology (GO) and Reactome Pathway database using the clusterProfiler (32), DOSE (33), and ReactomePA (34) packages, respectively. All these enrichment analyses used a hypergeometric model and the Benjamini and Hochberg model to adjust q-values to the estimated false discovery rate. Statistical significance was established at an adjusted P value of <0.05.

Gene-based burden testing

Gene-based burden testing was performed for the targeted genes (genotype present or genotype absent) in the case subjects (34 unrelated index patients from 34 families) and the sporadic LUAD cohort compared to the healthy cohort. We calculated ORs with 95% confidence intervals (CIs) using Fisher’s exact tests and corrected the P values for multiple testing by applying the Benjamini and Hochberg approach against the total number of genes in the test. Statistical significance was established at and adjusted P value of <0.05.

Statistical analysis

Spearman’s rank correlation coefficient was used to compare the clinicopathological characteristics with germline variants. Statistical significance was established at P<0.05 (two-sided). Pathway enrichment analyses and gene-based burden testing used the Benjamini and Hochberg model to adjust q-values to the estimated false discovery rate. Statistical significance was established at an adjusted P value of <0.05 (two-sided).

Results

Patient clinical information

In total, 50 patients with GGNs from 34 independent families were recruited for this study (Figure S1). Fourteen families with two familial members and one family with three individuals enrolled with available samples. Most patients were female (n=29) and non-smokers (n=40), and the mean age at GGN diagnosis was 51 (range, 30–75) years. All cases presented as GGNs on CT and were pathologically confirmed as pre-invasive or IAC (Table S1). Among the 50 patients, 44 had tumor tissues available for somatic mutation analysis, except patient G0002, G0104, G0004, G0008, G0120, and G0121, as their lesions were too small and had to be used entirely for pathological diagnosis with no surplus available for WES. For the 39 patients with sporadic LUAD, most of them were female (n=27) and non-smokers (n=33), and the mean age at lung cancer diagnosis was 53 (range, 22–79) years (Table S2). For the healthy cohort, most people were female (n=436) and the mean age was 48 (range, 17–67) years.

Inheritable carcinogenetic variants of patients with GGNs

A total of 435,980 germline single nucleotide variants (SNVs) and 119,189 indels were identified by WES, with a mean of 82,880 SNVs [standard deviation (SD), 48,259; range, 44,366–156,617] and 14,460 indels (SD, 11,387; range, 6,229–34,072) for each patient from the GGN cohort. The variants were further filtered using a stepwise filtering strategy covering read quality and mutation classifications, including frameshift, missense, splicing, and stop gain. SNVs and indels with MAFs >0.01 in any of the ExAC, 1,000 G, or gnomAD databases, and an internal exome data cohort of local healthy individuals were filtered out. Furthermore, 3,786 SNVs and 440 frameshifts were predicted as potentially damaging or deleterious through PolyPhen and SIFT, and we identified 2,325 SNVs and 238 frameshifts meeting the family segregation criteria. Finally, we manually checked the allele frequency in the Allele Frequency Aggregator database to exclude variants with MAF >0.01 in the Asian population. As most of SNVs were missense mutations, we used M-CAP (30), a pathogenicity classifier for rare missense variants in the human genome with a high sensitivity to dismiss variants of uncertain significance, using >0.025 as the pathogenicity threshold. Finally, we retained 1,571 SNVs and 238 frameshifts, which were defined as rare, recurrent, and potentially pathogenic candidates (). With the same filtering steps except family segregation criteria, the sporadic lung cancer cohort had 2,391 SNVs and 342 frameshifts left, while the 678 healthy controls had 32,329 SNVs and 4,643 frameshifts left (Table S3).
Figure 2

Workflow and annotation pipeline for the identification of candidate variants. WES, whole exome sequencing; SNV, single nucleotide variant; MAF, minor allele frequency; M-CAP, Mendelian Clinically Applicable Pathogenicity; SIFT, Sorting Intolerant From Tolerant.

Workflow and annotation pipeline for the identification of candidate variants. WES, whole exome sequencing; SNV, single nucleotide variant; MAF, minor allele frequency; M-CAP, Mendelian Clinically Applicable Pathogenicity; SIFT, Sorting Intolerant From Tolerant. The KEGG pathway analysis of the 1,571 filtered SNVs and 238 filtered frameshifts indicated that “focal adhesion” and “extracellular matrix (ECM)-receptor interaction” were significantly enriched in the mutated genes (). The GO enrichment analysis showed that the top 3 dysregulated biological processes were associated with “ECM,” “extracellular structure organization,” and “collagen-containing ECM” (). The Reactome pathway analysis demonstrated that “degradation of the ECM” and “ECM proteoglycans” were among the top dysregulated pathways (). Collectively, these suggested that rare, potentially damaging, and inheritable variants associated with the ECM are possibly related to the risk of early-stage LUAD risk (adjusted P value <0.05).
Figure 3

Pathway analysis of the filtered 1,571 SNVs and 238 frameshifts indicated enrichment of mutations in ECM pathway related genes. (A) KEGG pathway analysis; (B) GO enrichment analysis; (C) Reactome pathways analysis. ECM, extracellular matrix; SNV, single nucleotide variant; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genome; ABC, ATP-binding cassette.

Pathway analysis of the filtered 1,571 SNVs and 238 frameshifts indicated enrichment of mutations in ECM pathway related genes. (A) KEGG pathway analysis; (B) GO enrichment analysis; (C) Reactome pathways analysis. ECM, extracellular matrix; SNV, single nucleotide variant; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genome; ABC, ATP-binding cassette. We further examined the distribution of the 1,571 filtered SNVs and 238 filtered frameshifts from 34 families. The number of variants varied remarkably among different families (median 40; range, 15–90). We analyzed the correlations between the number of filtered variants and clinicopathological characteristics using Spearman’s rank coefficient of correlation (). As expected, the pure or mixed GGN subtype demonstrated on CT was significantly associated with the pathological diagnosis (Spearman’s ρ=0.52, P<0.001). This was consistent with the proposition that the solid component of GGNs can predict the invasiveness of early-stage LUAD (35). Interestingly, we found that the number of variants was significantly associated with smoking history (Spearman’s ρ=−0.39, P=0.02), with fewer variants in smoking families and more in non-smoking families. This suggests that many more innate genetic predisposition factors are needed for non-smoking patients to lead to pre-invasive and invasive LUAD manifesting as GGNs.
Figure 4

Signature of germline and somatic mutations. (A) Correlation of mutation numbers with clinicopathological characteristics showed the association of smoking and the number of germline mutations. ***, P<0.001; *, P<0.05. (B) Somatic mutation signature of patients with or without smoking; (C) Somatic mutation signature of different pathologic types; (D) Number of somatic mutations in different pathologic types early-stage pulmonary adenocarcinoma; (E) Landscape of somatic driver mutations; (F) Distribution of the filtered variants. Chr, chromosome; inner circle, presented in ≥3 families; 2nd inner circle, presented in ≥2 families; 3rd inner circle, presented in ≥1 family. pGGO, pure ground-glass opacity; mGGO, mixed ground-glass opacity; AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma.

Signature of germline and somatic mutations. (A) Correlation of mutation numbers with clinicopathological characteristics showed the association of smoking and the number of germline mutations. ***, P<0.001; *, P<0.05. (B) Somatic mutation signature of patients with or without smoking; (C) Somatic mutation signature of different pathologic types; (D) Number of somatic mutations in different pathologic types early-stage pulmonary adenocarcinoma; (E) Landscape of somatic driver mutations; (F) Distribution of the filtered variants. Chr, chromosome; inner circle, presented in ≥3 families; 2nd inner circle, presented in ≥2 families; 3rd inner circle, presented in ≥1 family. pGGO, pure ground-glass opacity; mGGO, mixed ground-glass opacity; AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma. While examining the somatic mutation signatures, we noticed there were more C>A mutations and less C>T mutations in the smoking patients (P=0.0044 and P=0.0042, respectively; ). In contrast, the mutation signatures were similar in AIS, MIA, and IAC (). The median number of somatic mutations was 28 (range: 4 to 88). As expected, AIS had fewer somatic mutations than IAC (medium 31 vs. 47, P=0.00278, ). EGFR mutations were the most prominent and significant variations, followed by those in MED12, FOXA2, OR1S1 (n=6), ERBB2, POFUT2, TGFB1, TP53, and SP4 (n=5) (). Actionable EGFR mutations were found in 7 of the 18 patients with pure GGNs and 11 of the 26 patients with mixed GGNs or solid tumors (38.89% vs. 57.69%, P=0.36) (available online: https://cdn.amegroups.cn/static/public/tlcr-21-789-1.pdf).

Recurrent predisposition germline variants in adenocarcinoma families

Of the 1,571 filtered SNVs and 238 filtered frameshifts, 35 SNVs and 10 frameshifts were present in ≥2 families (Table S4). When aggregating the variant data at the gene level, there were 338 SNVs and 49 frameshifts in 192 genes presenting in ≥2 families, and 79 SNVs and 10 frameshifts in 31 genes in ≥3 families (). Gene-based burden testing showed that MMP9 (OR, 8.11; 95% CI: 2.22–28.43), MSH5 (OR, 9.28; 95% CI: 2.49–35.87), and CYP2D6 (OR, 8.09; 95% CI: 2.68–24.92) were significantly enriched in our cohort ().
Table 1

Gene-based burden testing of the genes with recurrent mutations

GeneExisting_variationFamily IDCase IDGene-based burden testing (P)OR95% CIGGN-mutGGN-wt§Control-mutControl-wt††LC without FHLC-mut‡‡LC without FHLC-wt§§Gene-based burden testing (P, not adjusted)¶¶
ABCA4 rs61749446, rs1413097229, rs2014716075, 6, 45G0005, G0106, G0006, G00450.3431331336451381
ALDH6A1 rs369485559, COSM5927708, rs3708973648, 10, 37G0008, G0010, G0137, G00370.01868.10502.221–28.4333186700391
ARHGEF10 rs146766107, rs187607027, –21, 38, 41G0121, G0021, G0038, G00410.1327331146641380.36217479
CATSPERG rs2001322275, 10, 36G0005, G0010, G00360.2028331206581380.46187873
COL9A1 rs1422617430, rs375684014, rs7675446952, 11, 18G0102, G0002, G0111, G0011, G00180.2653331266520391
CRIPAK rs52845795915, 21, 46G0015, G0121, G0021, G00460.2653331276512370.66036729
CYP2D6 rs202102799, rs532668079, rs18577208534, 36, 19, 27G0134, G0034, G0036, G0019, G00270.02918.08502.678–24.92430116672370.0984435
DIAPH3 rs145827856, rs760815388, rs7709944358, 38, 39G0008, G0038, G00390.2653331286500391
DPYD rs570122671, COSM50544, –4, 8, 41G0104, G0004, G0008, G00410.1401331156630391
DTHD1 rs529758698, rs5775344787, 41, 46G0007, G0041, G00460.029113.03003.294–52.2033156730391
DYSF rs185596534, rs200195517, rs141536854, rs7595057683, 5, 13, 41G0103, G0003, G0005, G0113, G0013, G00410.9141430676112371
ECE2 rs779580606, rs368866385, rs7727409845, 36, 37G0005, G0036, G0137, G00370.2653331266521380.57150265
EP400 rs183260874, rs575639601, rs7605081583, 6, 41G0103, G0003, G0106, G0006, G00410.5841331406382370.33890905
EPPK1 rs782582986, –, –6, 15, 31G0106, G0006, G0015, G00311.0000331795991380.71897502
KIAA1217 rs761928869, rs41279868, rs7803716897, 20, 30G0007, G0120, G0020, G00300.2653331286500391
KRT73 rs1162822103, 8, 36G0103, G0003, G0008, G00360.1750331186600391
MMP9 rs752547204, rs573936612, rs7713590212, 40, 46G0102, G0002, G0040, G00460.01868.10502.221–28.4333186700391
MSH5 rs561487480, rs74690356618, 32, 38G0132, G0032, G0018, G00380.04919.27602.492–35.8733176710391
MYO1H rs759230534, rs544074593, rs75819889715, 27, 36G0015, G0027, G00360.2653331286500391
MYO7A rs117966637, rs3751828584, 5, 6G0104, G0004, G0005, G0106, G00061.0000331646141381
MYOM2 rs140558918, rs755061905, –8, 10, 40G0008, G0010, G00400.9141331536253360.21962473
NIPAL1 rs572130928, rs190045000, rs7770299797, 42, 43G0007, G0042, G00430.1327331146640391
OBSCN rs371324697, rs776567153, rs370234174, rs781156170, rs772564832, rs1378528618, –1, 3, 5, 8, 31, 36G0101, G0201, G0001, G0103, G0003, G0005, G0008, G0031, G0031, G00361.00006282134654350.63787766
PKD1 rs146096401, rs578031762, rs11124453017, 38, 40G0117, G0017, G0038, G00401.00003311235552370.56663464
PLEC rs543632870, rs782618187, rs549098011, –, –15, 21, 34, 38, 40G0015, G0121, G0021, G0134, G0034, G0038, G00401.00005291665121380.11075302
SAMM50 rs780383284, 15, 18G0004, G0104, G0015, G00180.1475331166621380.43046147
SHANK3 rs1461926484, rs376862893, –, –, –5, 27, 43G0005, G0027, G0027, G0027, G00430.018621.77004.857–94.8433136751380.20084246
TBL3 rs745933962, rs749858979, rs123267688019, 26, 43, 46G0019, G0026, G0043, G00460.0540430166620391
TDRD12 rs135063791430, 31, 45G0030, G0031, G00450.1182331126661380.32510298
TRIO rs146453151, rs20026256810, 36, 38G0010, G0036, G00380.2304331226562370.15411722
WDR81 rs1485459176, rs774204130, rs1460812728, 12, 36G0008, G0112, G0012, G00360.5841331386400390.61673108

†, P values of gene-based burden testing in GGNs cohort compared to the healthy cohort; ‡, number of mutant type variants in GGNs cohort; §, number of wild type variants in GGN cohort; ¶, number of mutant type variants in healthy cohort; ††, number of wild type variants in healthy cohort; ‡‡, number of mutant type variants in sporadic LUAD cohort; §§, number of wild type variants in sporadic LUAD cohort; ¶¶, P values of gene-based burden testing in sporadic LUAD cohort compared to the healthy cohort. OR, odds ratio; GGN, ground-glass nodule; LC, lung cancer; FHLC, first-degree family history of lung cancer.

†, P values of gene-based burden testing in GGNs cohort compared to the healthy cohort; ‡, number of mutant type variants in GGNs cohort; §, number of wild type variants in GGN cohort; ¶, number of mutant type variants in healthy cohort; ††, number of wild type variants in healthy cohort; ‡‡, number of mutant type variants in sporadic LUAD cohort; §§, number of wild type variants in sporadic LUAD cohort; ¶¶, P values of gene-based burden testing in sporadic LUAD cohort compared to the healthy cohort. OR, odds ratio; GGN, ground-glass nodule; LC, lung cancer; FHLC, first-degree family history of lung cancer. MMP9 encodes matrix metallopeptidase 9 (707 amino acids), spans 7.6 kb, and contains 13 exons. MMP9 plays an essential role in local proteolysis of the ECM and in leukocyte migration (protein ID: P14780). We identified 3 rare missense variants in MMP9 (G615W, T246I, C373W) (, available online: https://cdn.amegroups.cn/static/public/tlcr-21-789-2.pdf). Since the over-expression of MMPs can destroy the basement membrane, tumor cells or their accompanying stromal cells bearing MMPs are better able to penetrate endothelial basement membranes and become invasive (36). The expression level of MMP9 increases from AIS to IAC, especially in the non-invasive phase, suggesting that increased expression of MMP9 occurs before non-invasive lesions become invasive tumors (37). CYP2D6 encodes cytochrome P450 2D6 (446 amino acids, spans 4 kb, and contains 9 exons. The R441H substitution is located on exon 9 and affects a highly evolutionarily conserved site in the crystal structure of CYP2D6 (protein ID: Q9Y512) (Figure S2). This variant had an M-CAP score of 0.690 (available online: https://cdn.amegroups.cn/static/public/tlcr-21-789-2.pdf). The M-CAP score of the other 2 variants (R474W, Y355C) was 0.095 and 0.082 respectively, which indicates that they are possibly pathogenic as well. MSH5 encodes mutS homolog 5 (834 amino acids), which contains 25 exons and functions in the DNA mismatch repair pathway. Notably, GWAS have identified susceptibility loci for lung carcinogenesis by GWAS in this gene (22,38,39). The A685T substitution in the MSH5 is highly evolutionarily conserved (protein ID: O43196) (Figure S3). It was identified in 2 non-smoking female patients with multiple GGNs diagnosed with pre-invasive LUAD from family 32, and 1 non-smoking female patient with a GGN diagnosed as invasive LUAD from family 18 (). The M-CAP scores of A685T and the other variant (R287H) were 0.089 and 0.128, respectively (available online: https://cdn.amegroups.cn/static/public/tlcr-21-789-2.pdf). Considering the function of MSH5 in the DNA repair pathway (40,41), we further explored other DNA repair genes besides MSH5 in our cohort. We found rare recurrent germline mutations in APEX1, FANCM, MNAT1, MSH4, PNKP, and RAD54L (Table S5). As most DNA repair genes serve as tumor suppressors, we further queried whether patients with rare germline mutations in DNA repair genes had somatic mutations in these genes as well. Indeed, patients from families 9, 17, 31, 38, 40, and 42 had both somatic and germline mutations in DNA repair genes (Table S5).

Discussion

The number of patients with LUAD presenting as GGNs is gradually increasing in East Asian populations, and their genetic predisposition remains unclear. This study analyzed the germline variants of patients with pre-invasive or invasive LUAD presenting as GGNs as well as patients with FHLC and sporadic LUAD and East-Asian healthy people without cancer, using a stepwise variant filtering strategy. Using WES data and gene-based burden testing, we identified rare, heritable, and potentially pathogenic candidates in early-stage LUAD. Pathway enrichment analyses showed that germline variants in genes associated with the ECM may contribute to the carcinogenesis of LUAD presenting as GGNs, especially the MMP9. Numerous studies have demonstrated the crucial role of different stromal components during cancer development and metastasis (42). Genetic and epigenetic changes, such as aberrant promoter methylation or aberrant miRNA expression, lead to misexpression of collagens, laminins, proteoglycans, proteases, and integrins in the tumor microenvironment (43). Changes in biomechanical properties of the ECM are involved in the development of cancer (44). Focal adhesion complexes, as an adaptor linking the ECM to the actomyosin cytoskeleton, can help cells perceive environmental external forces and lead to many functional consequences (36). From hyperplasia and carcinoma in situ, to invasive lesions, oncogenic transformation involves a series of genetic and epigenetic changes, including genetic mutations and expression changes of different ECM adhesion receptors and growth factor receptors. Moreover, it modifies the ability of tumor cells to sense and respond to external forces and mechanical properties of the ECM (44). MMP9 is an important ECM enzyme. External carcinogens could induce production of MMP9 and epithelial-mesenchymal transition progression in lung cancer by activating the Shp2/ERK1/2/JNK/Smad2/3 signaling pathways (45). With age or certain diseases, MMPs may be deregulated at genetic or post-genetic levels and destabilize the ECM dynamics, which is a characteristic of cancer (36). Interestingly, our results also showed that smoking-affected patients carried fewer filtered potentially damaging germline variants than those without a smoking history. To some extent, this is consistent with a previous finding that familial mutation carriers reported fewer pack-years than other patients with lung cancer (21). Therefore, without a smoking history, many more innate genetic predisposition factors are needed for the development of pre-invasive and invasive LUAD manifesting as GGNs. We speculated that this observation may also explain the different mutations (46) and growth patterns (47) in smokers and non-smokers with malignant GGN. Moreover, germline variants in DNA repair genes have been reported in a wide range of cancers. In a real-world study, the pathogenic germline variants of patients with lung cancer were most commonly found in DNA repair genes (48), which are associated with lung cancer through several repair pathways, including chromatin structure, homologous recombination, DNA polymerases, ubiquitination, and changing sensitivity to DNA-damaging agents (49). The Cancer Genome Atlas has reported that 2.5% to 4.5% patients with LUAD carry potential damaging germline variants of 8 genes, which fall most frequently in DNA repair pathways (14). In this study, heritable rare variants in MSH5 were significantly enriched in this East-Asian population, and a total of 19 DNA repair genes were identified in 30 patients, including MSH5, MSH4, and BRCA2 (Table S5). Several candidate variants, including MSH5 (A685T), ANKRD (P429L), KRT73 (R212C), and NUPL2 (Y174C), which are found in high-risk loci regions and detected by GWAS, were also identified through the stepwise filtering; this suggesting the rationality of our filter strategy and confirms the existence of heritable potentially pathogenic germline variants in East Asian patients with early-stage LUAD and FHLC. CYP2D6 is a member of the CYP450 superfamily of enzymes that participates in the metabolism of many common carcinogenetic agents of lung cancer, such as tobacco, nitrosamine, nicotine-derived nitrosamine ketone, nicotine, and cotinine (50,51). Moreover, the A allele and AA genotype of CYP2D6 rs1065852 are associated with an increased risk of lung cancer development (52). The CYP2D6 locus has is also associated with a higher risk of lung cancer or carcinogenesis in the Chinese population (52). One of the explanations for this is that the genotypes of this gene are associated with higher carcinogen-DNA adducts and 7-methyl-dGMP levels, which bind to DNA and may induce more gene mutations and increase the lifetime risk of lung cancer, mostly in non-smokers (53). Rare variants of CYP2D6 were significantly enriched in this early-stage LUAD cohort. SNPs affecting the metabolism of carcinogenetic agents in populations influence the response to carcinogenetic agents of lung cancer. This can partially explain why some patients who were non-smokers still developed LUAD while some heavy smokers were free of lung cancer. One of the main advantages of this study is that the recruited patients with GGNs were pathologically diagnosed with pre-invasive and invasive LUAD, and they all had first-degree relatives with lung cancer. Chen et al. reported that YAP1-mutant carriers had a higher predisposition for GGNs (10); however, as the nodules were not pathologically diagnosed, their conclusions should be interpreted with caution. By contrast, we analyzed the genetic susceptibility of GGNs in patients with pathologically confirmed FHLC. This study has some limitations. First, the number of GGN patients who had first-degree relatives with lung cancer was not large enough, and we could not exclude potential selection bias and statistical power was limited. Second, lack of validation of the identified mutations in a separate large-scale cohort limits the relevance of our findings, but the results of this study can be used as the preliminary basis for further research. Third, it was difficult to provide direct evidence that specific SNP could increase the risk of lung cancer due to due to generally mild effects of a single SNP/gene in the complex pathogenesis of lung cancer. Last, lack of relatives limited the analysis of transmission in the family. However, due to the age-dependent penetrance of cancer, it was difficult to use the “non-cancer” relatives as a true negative control to filter out variants. Therefore, we used the family segregation rule as an alternative of transmission analysis. In summary, using WES, a stepwise filter strategy, and gene-based burden testing, we presented a global view of germline variants in patients with pre-invasive or invasive LUAD presenting as GGNs. Our results indicated that rare, recurrent, heritable, and potentially highly disruptive risk-conferring variants of MSH5, MMP9, and CYP2D6 may have contributed to the formation of LUAD. Non-smoking patients probably have a higher genetic predisposition than smoking-affected patients. In the future, it will be necessary to perform validation studies in a larger cohort and conduct functional verification of potentially high-risk candidate mutations to explore the high-risk genes in this unique lung cancer subtype, find the populations at risk, and guide screening for early-stage LUAD. The article’s supplementary files as
  51 in total

1.  The association between baseline clinical-radiological characteristics and growth of pulmonary nodules with ground-glass opacity.

Authors:  Yoshihisa Kobayashi; Yukinori Sakao; Gautam A Deshpande; Takayuki Fukui; Tetsuya Mizuno; Hiroaki Kuroda; Noriaki Sakakura; Noriyasu Usami; Yasushi Yatabe; Tetsuya Mitsudomi
Journal:  Lung Cancer       Date:  2013-11-01       Impact factor: 5.705

2.  Germline Mutations in DNA Repair Genes in Lung Adenocarcinoma.

Authors:  Erin M Parry; Dustin L Gable; Susan E Stanley; Sara E Khalil; Valentin Antonescu; Liliana Florea; Mary Armanios
Journal:  J Thorac Oncol       Date:  2017-08-24       Impact factor: 15.609

3.  CT Screening for Lung Cancer: Nonsolid Nodules in Baseline and Annual Repeat Rounds.

Authors:  David F Yankelevitz; Rowena Yip; James P Smith; Mingzhu Liang; Ying Liu; Dong Ming Xu; Mary M Salvatore; Andrea S Wolf; Raja M Flores; Claudia I Henschke
Journal:  Radiology       Date:  2015-06-23       Impact factor: 11.105

Review 4.  Human DNA repair genes, 2005.

Authors:  Richard D Wood; Michael Mitchell; Tomas Lindahl
Journal:  Mutat Res       Date:  2005-09-04       Impact factor: 2.433

5.  EGFR and ERBB2 Germline Mutations in Chinese Lung Cancer Patients and Their Roles in Genetic Susceptibility to Cancer.

Authors:  Shun Lu; Yongfeng Yu; Ziming Li; Ruoying Yu; Xue Wu; Hairong Bao; Yan Ding; Yang W Shao; Hong Jian
Journal:  J Thorac Oncol       Date:  2019-01-02       Impact factor: 15.609

6.  R331W Missense Mutation of Oncogene YAP1 Is a Germline Risk Allele for Lung Adenocarcinoma With Medical Actionability.

Authors:  Hsuan-Yu Chen; Sung-Liang Yu; Bing-Ching Ho; Kang-Yi Su; Yi-Chiung Hsu; Chi-Sheng Chang; Yu-Cheng Li; Shi-Yi Yang; Pin-Yen Hsu; Hao Ho; Ya-Hsuan Chang; Chih-Yi Chen; Hwai-I Yang; Chung-Ping Hsu; Tsung-Ying Yang; Kun-Chieh Chen; Kuo-Hsuan Hsu; Jeng-Sen Tseng; Jiun-Yi Hsia; Cheng-Yen Chuang; Shinsheng Yuan; Mei-Hsuan Lee; Chia-Hsin Liu; Guan-I Wu; Chao A Hsiung; Yuh-Min Chen; Chih-Liang Wang; Ming-Shyan Huang; Chong-Jen Yu; Kuan-Yu Chen; Ying-Huang Tsai; Wu-Chou Su; Huei-Wen Chen; Jeremy J W Chen; Chien-Jen Chen; Gee-Chen Chang; Pan-Chyr Yang; Ker-Chau Li
Journal:  J Clin Oncol       Date:  2015-06-08       Impact factor: 44.544

7.  A method and server for predicting damaging missense mutations.

Authors:  Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev
Journal:  Nat Methods       Date:  2010-04       Impact factor: 28.547

Review 8.  The impact of rare and low-frequency genetic variants in common disease.

Authors:  Lorenzo Bomba; Klaudia Walter; Nicole Soranzo
Journal:  Genome Biol       Date:  2017-04-27       Impact factor: 13.583

9.  Evidence that DNA repair genes, a family of tumor suppressor genes, are associated with evolution rate and size of genomes.

Authors:  Konstantinos Voskarides; Harsh Dweep; Charalambos Chrysostomou
Journal:  Hum Genomics       Date:  2019-06-07       Impact factor: 4.639

10.  The mutational constraint spectrum quantified from variation in 141,456 humans.

Authors:  Konrad J Karczewski; Laurent C Francioli; Grace Tiao; Beryl B Cummings; Jessica Alföldi; Qingbo Wang; Ryan L Collins; Kristen M Laricchia; Andrea Ganna; Daniel P Birnbaum; Laura D Gauthier; Harrison Brand; Matthew Solomonson; Nicholas A Watts; Daniel Rhodes; Moriel Singer-Berk; Eleina M England; Eleanor G Seaby; Jack A Kosmicki; Raymond K Walters; Katherine Tashman; Yossi Farjoun; Eric Banks; Timothy Poterba; Arcturus Wang; Cotton Seed; Nicola Whiffin; Jessica X Chong; Kaitlin E Samocha; Emma Pierce-Hoffman; Zachary Zappala; Anne H O'Donnell-Luria; Eric Vallabh Minikel; Ben Weisburd; Monkol Lek; James S Ware; Christopher Vittal; Irina M Armean; Louis Bergelson; Kristian Cibulskis; Kristen M Connolly; Miguel Covarrubias; Stacey Donnelly; Steven Ferriera; Stacey Gabriel; Jeff Gentry; Namrata Gupta; Thibault Jeandet; Diane Kaplan; Christopher Llanwarne; Ruchi Munshi; Sam Novod; Nikelle Petrillo; David Roazen; Valentin Ruano-Rubio; Andrea Saltzman; Molly Schleicher; Jose Soto; Kathleen Tibbetts; Charlotte Tolonen; Gordon Wade; Michael E Talkowski; Benjamin M Neale; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2020-05-27       Impact factor: 69.504

View more
  3 in total

1.  Lung cancer susceptibility beyond smoking history: opportunities and challenges.

Authors:  Samir Hanash
Journal:  Transl Lung Cancer Res       Date:  2022-07

2.  Cuproptosis-related gene signature correlates with the tumor immune features and predicts the prognosis of early-stage lung adenocarcinoma patients.

Authors:  Yu Tang; Qifan Li; Daoqi Zhang; Zijian Ma; Jian Yang; Yuan Cui; Aiping Zhang
Journal:  Front Genet       Date:  2022-09-14       Impact factor: 4.772

3.  The influence of baseline characteristics on the efficacy of immune checkpoint inhibitors for advanced lung cancer: A systematic review and meta-analysis.

Authors:  Qionghua Xiao; Xiaolin Yu; Zhihao Shuai; Ting Yao; Xiaohua Yang; Yanxia Zhang
Journal:  Front Pharmacol       Date:  2022-09-09       Impact factor: 5.988

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.