Literature DB >> 34734005

Analysis of whole-exome data of cfDNA and the tumor tissue of non-small cell lung cancer.

Yuanzhou Wu1, Qunqing Chen1, Qiangzu Zhang2, Man Li3, Hui Li1, Longfei Jia1, Yang Huang1, Jian Zhang4.   

Abstract

BACKGROUND: Non-small cell lung cancer (NSCLC) has the highest cancer mortality rate in the world, but currently there is no effective method of dynamic monitoring. Gene mutation is an important factor in tumorigenesis and can be detected using high-throughput sequencing technology. This study aimed to analyze the driving genes in the tumor of NSCLC patients by whole exon sequencing, and to compare and analyze the subclones of the tumor at different time points.
METHODS: We collected 87 cases of NSCLC tumor tissues, para-cancer tissues, and peripheral blood samples for detecting cell-free DNAs (cfDNAs) from January 2016 to December 2018, and whole-exome sequencing was performed. The gene mutation map of NSCLC was drawn in detail by second-generation sequencing data analysis and new driver genes were found. In addition, we performed a subclonal analysis of tumors from different stages of the same patient to further describe the tumor heterogeneity.
RESULTS: We found that the clonal analysis obtained by cfDNA detection was similar to the clonal analysis of the tissue samples, so real-time monitoring of tumor changes can be carried out through monitoring cfDNA.
CONCLUSIONS: This study provides evidence for studying the gene mutation information of NSCLC and shows the importance of cfDNA in the analysis of tumor subcloning information. 2021 Annals of Translational Medicine. All rights reserved.

Entities:  

Keywords:  Cell-free DNAs (cfDNA); non-small cell lung cancer (NSCLC); subcloning; whole-exome sequencing

Year:  2021        PMID: 34734005      PMCID: PMC8506706          DOI: 10.21037/atm-21-4117

Source DB:  PubMed          Journal:  Ann Transl Med        ISSN: 2305-5839


Introduction

Lung cancer (LC) is a malignant tumor with high mortality, subdivided into non-small cell LC (NSCLC, accounting for ~85% of the total number of lung cancers) and small cell LC (SCLC, accounting for ~15% of the total number of lung cancers). NSCLC is further divided into adenocarcinoma and squamous cell carcinoma, which can be molecularly stratified according to specific gene mutations and their expression on the tumor (1). If NSCLC can be detected early and surgically removed, the prognosis can be good, with a 5-year survival rate of 70–90% (2). However, most patients (~75%) are already at an advanced stage when diagnosed (3), and although in recent years there have been significant advances in treating patients with advanced lung cancer, the survival rate remains low. Currently, some NSCLC patients are given targeted therapy, but the biggest obstacle is the inevitable drug resistance, arising through tumor cells using different mechanisms to resist the drugs, including target gene mutations and activation of complementary bypass pathways, phenotypic transformation, etc. (4,5). Tumor cells will produce different subclones in their development and evolution, which leads to tumor heterogeneity (6-8). Both the diagnosis and postoperative monitoring of LC are important for the patient’s prognosis, but it is still difficult to diagnose and monitor the development of LC in the early stage, because often there are no obvious symptoms. Although low-dose computed tomography (LDCT) is the LC screening and detection method widely recommended, it has radiation risks. Therefore, a noninvasive screening tool that can be used to detect LC earlier is desired. Circulating cell-free DNA (cfDNA) refers to small double-stranded DNA fragments released from normal or tumor cells into the peripheral blood or other body fluids (9). In patients with a tumor the cfDNA level will increase notably, and tumor-specific mutations derived from cancer cells can be identified (10). Compared with traditional tissue sampling, blood cfDNA sampling is faster, more convenient, easier to operate, minimally invasive, and inexpensive (11). As a feasible tissue biopsy method, cfDNA liquid biopsy analysis has been used for molecular target identification, response and prognosis prediction, and drug resistance monitoring in targeted therapy for LC. Therefore, it can also be used to dynamically monitor the treatment and prognosis of patients. Intratumor heterogeneity refers to the subclonal diversity of tumor cells observed in a single tumor, whereas intertumor heterogeneity refers to the diversity between the primary and secondary tumors (12-14). Tumor heterogeneity not only manifests within and between tumors, between primary tumors and secondary tumors, but also between different tumor cells in the same tumor tissue. Different cell populations and cells at different stages of one tumor have many different characteristics, such as gene mutation information, gene expression information, and epigenetic information (15). Tumor heterogeneity can lead to a high degree of complexity and genetic diversity within tumor tissues, causing different treatment sensitivities for the same tumor type (16). The heterogeneity of tumors between patients is related to individual genetic and phenotypic variation, which can explain the different treatment responses of each patient. The development of high-throughput sequencing technology provides a good method for studying tumor heterogeneity and its development. Due to its high throughput and high accuracy, the second-generation sequencing technology can accurately analyze changes in both tumor gene mutations and expression. The third-generation sequencing technology can more conveniently and accurately detect structural variations at the genomic and transcriptome level due to its long-read sequencing (17). The development of single-cell sequencing technology and spatial transcriptome technology has deepened the ability to investigate the spatial heterogeneity of tumors (18-20). At present, there is no effective method for dynamic monitoring of NSCLC. In this study, we aimed to provide ideas and methods for real-time monitoring of tumors. Tumor and para-cancer tissue samples, and peripheral blood samples of NSCLC patients were collected and performed whole-exome sequencing. The driver genes in the tumors of NSCLC patients were analyzed by sequencing analysis, and the tumor subclones at different time points were compared and analyzed. After systematic analysis, the gene mutation maps in the tumor tissues and the levels of cfDNA of the NSCLC patients were plotted. We present the following article in accordance with the MDAR reporting checklist (available at https://dx.doi.org/10.21037/atm-21-4117).

Methods

Clinical data collection

From January 2016 to December 2018, 87 patients with NSCLC were enrolled in the Zhujiang Hospital of Southern Medical University as research subjects. All patients were diagnosed by pathological examination, and the para-cancer tissues were confirmed to be free of cancer cells. shows the backgrounds and clinicopathological characteristics of the patients. All procedures performed in this study involving human participants were in accordance with the Declaration of Helsinki (as revised in 2013). All patients signed an informed consent form, and the study protocol was approved by the Ethics Committee of the Zhujiang Hospital of Southern Medical University.
Table 1

Clinical information of 87 patients with non-small cell lung cancer

CharacteristicConcordance patients (n=87) (%)
Age (years)
   Median (min., max.)60 (32, 79)
   <6564 (73.6)
   ≥6523 (26.4)
Gender
   Male50 (57.5)
   Female37 (42.5)
Smoking status
   Current smoker15 (17.2)
   Former smoker3 (3.4)
   Never smoker69 (79.3)
Histologic subtype
   Adenocarcinoma75 (86.2)
   Squamous carcinoma4 (4.6)
   Other8 (9.2)
Clinical UICC stage before treatment
   I–II12 (13.8)
   III–IV75 (86.2)
Residence area
   Rural33 (37.9)
   Urban54 (62.1)
CEA level (μg/L)
   ≤5.026 (29.9)
   >5.061 (70.1)
Respond to treatment
   CR5 (5.7)
   PR4 (4.6)
   SD41 (47.1)
   PD37 (42.5)

UICC, Union for International Cancer Control; CEA, carcinoembryonic antigen; CR, complete response; PR, partial response; SD, stable disease; PD, disease progression.

UICC, Union for International Cancer Control; CEA, carcinoembryonic antigen; CR, complete response; PR, partial response; SD, stable disease; PD, disease progression.

Whole-exome sequencing

We performed whole-exome sequencing (WES) on 47 tumor tissue samples, and on the cfDNA extracted from 36 peripheral blood samples, and 4 patients were performed on both tissue samples and cfDNA samples. We randomly assigned 5 patients to dynamic WES to monitor tumor changes. Novaseq6000 was the sequencing platform, and the exon region was captured by the Agilent SureSelect Human All Exon V6 kit.

Statistical analysis

Quality control and analysis of sequencing data

First, fastp (21) software was used to control the quality of the original sequencing data. Adapters and low-quality bases were removed, then bwa (22) software was used to compare the filtered data with the human reference genome (hg38). Next, GATK (23) software was applied to find the mutation sites of each sample to annotate and filter, and finally, a reliable map related to tumor tissue and cfDNA gene mutations was obtained.

Analysis of purity and ploidy of tumor samples

ABSOLUTE software (https://software.broadinstitute.org/cancer/cga/absolute) was used to analyze the purity and ploidy of each tumor sample following the default parameters, and then the samples with lower tumor purity were removed.

Driver genes prediction

MutSigCV (24) software was used to analyze all the tumor mutation information to predict tumor driver genes.

Results

Quality of sequencing data

In the samples we obtained, samples with a high sequencing repetition rate or with a tumor sample depth of less than 100X were removed, so a total of 174 WES data sets were examined. Of them, the 91 tumor samples (we got both tissue and cfDNA samples in 4 patients) had an average sequencing depth of 406X (103X–812X), 51 tissue samples had an average sequencing depth of 417X (140X–703X), and 40 cfDNA samples had an average depth of 392X (103X–821X).

Consistence of the sequencing data with samples

After analysis and judgment, all data of tumor samples and control samples were consistent with the corresponding patients, and the sexes of the sequencing data were consistent with the corresponding patients as well. No sample confusion events had occurred.

General consistency of the tumor purity and the number of ploidy of tumor tissue with cfDNA

The tumor purity and ploidy analyses were performed on the WES results for the cfDNA and tumor tissue samples, and found to be about the same ().
Figure 1

Tumor purity and ploidy analysis of tumor tissue and cfDNA samples. cfDNA, Circulating free DNA.

Tumor purity and ploidy analysis of tumor tissue and cfDNA samples. cfDNA, Circulating free DNA.

Higher frequency of somatic mutations of TTN, EGFR, and TP53 genes

After analyzing the results of filtered gene mutations, the tumor tissue or cfDNA samples had the largest number of missense mutations compared with the para-cancer tissues, and most of them were single nucleotide polymorphisms (SNP), in which the base C mutated to base A or T accounted for the highest percentage (). The median number of mutations in all samples was 79, and among all genes with mutations, we found that TTN, EGFR, and TP53 had the highest mutation frequency, with an occurrence rate >30% (). Subsequently, we calculated the distribution of genes with higher mutation frequency in each sample ().
Figure 2

Mutation map of all samples. (A) Frequency distribution of different variant classifications; (B) frequency distribution of different variant types; (C) frequency distribution of SNV variant classifications; (D) accumulation of different variants in each sample; (E) distribution of different variant classifications in each sample; (F) top 20 genes with high mutation frequency. SNV, single nucleotide variants.

Figure 3

Waterfall diagram of mutations in all samples.

Mutation map of all samples. (A) Frequency distribution of different variant classifications; (B) frequency distribution of different variant types; (C) frequency distribution of SNV variant classifications; (D) accumulation of different variants in each sample; (E) distribution of different variant classifications in each sample; (F) top 20 genes with high mutation frequency. SNV, single nucleotide variants. Waterfall diagram of mutations in all samples.

Number of converted versus inverted bases in the mutated samples

We analyzed the SNP changes of all samples and found that among the SNPs of all samples, the number of cases of cytosine mutated to thymine (C>T) was the largest, accounting for about 48% of the total SNPs, followed by cytosine mutated to guanine (C>A, 23%), and thymine mutated to cytosine (T>C, 13%) (). In addition, we counted the number of bases that underwent conversion (Ti) and inversion (Tv), and found that conversion (~55% of the total) was notably more than inversion (~40% of the total) (). The proportion of specific mutation types in each sample is shown in .
Figure 4

Statistics of mutation types. (A) Proportion of 6 different mutation types in the samples; (B) proportion of converted and inverted mutations; (C) proportion of different mutation types in each sample.

Statistics of mutation types. (A) Proportion of 6 different mutation types in the samples; (B) proportion of converted and inverted mutations; (C) proportion of different mutation types in each sample.

Main relationship between significant mutation signal of the sample and smoking

Maftools (25) software was used to extract mutation features from our measured data, and then the mutation features we extracted, based on the 30 verified mutation features in COMISC, were annotated. We found 4 distinctly enriched mutation signals, which were mutation feature 2, mutation feature 4, mutation feature 5, and mutation feature 6 (), respectively. Among them, mutation feature 4 was related to smoking, and its feature was similar to the mutation pattern observed in the experimental system exposed to tobacco carcinogens, so it may be caused by tobacco mutagens, which indicated that there might be tumors caused by smoking among the patient samples we collected.
Figure 5

Mutation characteristic signals.

Mutation characteristic signals.

Sample prediction of TP53, EGFR, FOLR3, LCN10, SPPL2B, STK11, and KRAS genes

MutSigCV (24) software was applied to predict the tumor genes based on the analysis results of all nonsense mutations. In the end, we found that TP53, EGFR, FOLR3, LCN10, SPPL2B, STK11, and KRAS were the tumor driver genes of our tested samples (). Among them, TP53, EGFR, and KRAS genes have already been predicted as driver genes in a variety of tumors. Mutations of these genes will seriously affect the progression of tumors and the effects of treatment and prognosis.
Table 2

Prediction results of tumor driver genes of all samples

GeneexprreptimehicN_nonsilentN_silentnneixXpq
TP53 206956721334281,09977,62329132,618,79800
EGFR 423489336−13863,681228,04648521,88500
FOLR3 71529326762165,34740,49550104,625,07500
LCN10 189890119039131,76838,85738224,469,28300
SPPL2B 150360626731387,751120,30250165,227,04000.0086
STK11 221458123429286,28678,9882692,936,93400.0264
KRAS 25919351216155,97437,58350205,245,24000.0734

nnei: number of adjacent genes; x: number of silent or non-coding mutant bases in adjacent genes; X: total number of bases related to adjacent genes; p: significant P values; q: the corrected P value.

nnei: number of adjacent genes; x: number of silent or non-coding mutant bases in adjacent genes; X: total number of bases related to adjacent genes; p: significant P values; q: the corrected P value.

High similarity between cfDNA and tissue samples for secondary clones

Studies have shown that genomic instability promotes the emergence of more competitive subclones, which is the main factor in tumor progression and metastasis and drug resistance during treatment. Before treating LC, there will be drug-resistant clones. After tyrosine kinase inhibitor (TKI) treatment, sensitive clones (primary clones) will be reduced, but drug-resistant clones (secondary clones) gradually become dominant and tumor heterogeneity appears (26). Among the 91 samples from 87 patients we obtained, 5 patients were sampled at different time points to evaluate the significance of WES in monitoring tumor progression (). After sequencing these samples, we analyzed the subclonal structures of the tumor DNA and cfDNA in the blood at different times, as well as the tumor heterogeneity. The results showed that for the secondary clones, using cfDNA to detect changes in the patient’s tumor subclones was highly consistent with the clonal analysis of the tissue samples ().
Table 3

Sampling information of 5 patients

Patient No.Sample IDTypeCellularityPloidy estimate
1ZY1711073842943000Tissue0.613.7
ZY1803020787902000cfDNA0.653.1
2ZY1711073821432000cfDNA0.81.9
ZY1711076467752000cfDNA0.432.2
ZY1711070438002000cfDNA0.572.3
ZY1803029704273000Tissue0.754
3ZY0755443364264940Tissue0.62.7
ZY0755443364264940-ctcfDNA0.632.1
4ZY1803026037353000-TTissue0.472.3
ZY1711079369192000cfDNA0.542
5ZY0755342713044877cfDNA0.62
ZY1711079244802000cfDNA0.532
ZY1803027967203000Tissue0.381.7
Figure 6

Analysis of tumor subclonal changes in 5 patients at different time periods.

Analysis of tumor subclonal changes in 5 patients at different time periods. There was a total of 7 subclones in the tumor of patient 1, consisting of 4 primary clones (n>5) (cluster0, cluster1, cluster2, cluster3), and 3 secondary clones (n≤5) (cluster4, cluster5, cluster6). The second subclone (cluster1) was unique to the tissue samples, and the third subclone (cluster2) was unique to the blood samples. The other 2 primary clones and the 3 secondary clones were detected in both the blood and tissue samples, and the effective clone detection rate (the percentage and sensitivity of clones shared in blood and tissue to the detected clones in tissue) was 83.3%. The effective detection rate of secondary clones was 100%, and the false-positive rate (the percentage of specific detection in blood to all clones detected in tissues and blood samples) was 14.3%. There was a total of 10 subclones in the tumor of patient 2, consisting of 3 primary clones (n>5) (cluster0, cluster1, cluster3), and 7 secondary clones (n≤5) (cluster2, cluster4–cluster9). The second and sixth (cluster1, cluster5) subcloned tissues were not detected. The effective detection rate of all clones was 100%, the effective detection rate of secondary clones was 100%, and false-positive rate was 20%. There were 6 subclones in patient 3, with 4 primary clones (n>5) (cluster0, cluster1, cluster2, cluster3), and 2 secondary clones (n≤5) (cluster4, cluster5). The unique subclonal type in the tissues was the second subclonal type (cluster1), and the unique subclonal type in the blood samples was the third subclonal type (cluster2). The effective detection rate of all clones was 80%, the effective detection rate of secondary clones was 100%, and false-positive rate was 17%. There were 6 subclones in patient 4, with 3 primary clones (n>5) (cluster0, cluster1, cluster2), and 3 secondary clones (n≤5) (cluster3, cluster4, cluster5). The first subclone (cluster1) was a unique type of subclone in the tissue, and the second subclone (cluster2) was a unique feature of blood. The effective detection rate of all clones was 80%, the effective detection rate of secondary clones was 100%, and false-positive rate was 17%. There were 7 subclones in patient 5, with 5 primary clones (n>5) (cluster0, cluster2, cluster3, cluster4, cluster5), and 2 secondary clones (n≤5) (cluster1, cluster6). The second subclonal type (cluster1) was a unique tumor feature in the tissue sample. The effective detection rate of all clones (was 85.7%, the effective detection rate of secondary clones was 100%, and false-positive rate was 0%.

Discussion

In general, the occurrence of tumors is the result of mutations in one or more genes (27-29), and NSCLC is one of the tumors with a high incidence. A mutation map of the patient’s tumor and the discovery of driver genes can provide effective guidance for treatment (30). WES is an advanced technique and method used for studying the gene mutation of tumors. Through analyzing the WES data of tumor samples, it is possible to accurately obtain the mutation gene map of the tumor cells and discover the tumor driver genes. Precisely detecting tumor somatic mutations and driver genes is critical for treatment and prognosis (31-33). Several studies have identified the main driver genes of a variety of tumors (34), and these results provide essential information for their clinical treatment. Tumorigenesis is a dynamic process (6). Tumors at the same site in a patient will have higher heterogeneity at different periods (35). The subclones generated during the evolution of tumors lead to extremely high complexity and genetic diversity, which leads to inconsistent tumor sensitivity to treatment, resulting in poor treatment efficacy. Therefore, it is of great significance for the treatment to monitor tumors in real-time. Because the cfDNA of tumor patients contains a large amount of DNA released from the tumor, and the sampling for cfDNA is simple and noninvasive, the dynamic monitoring of tumor changes through cfDNA detection has potential for real-time tumor monitoring (36,37). Studies have shown that (38), in patients with NSCLC, cfDNA is more suitable as a monitoring index than carcinoembryonic antigen (CEA) and neuron specific enolase (NSE), making it a biomarker with significant advantages in lung cancer diagnosis, tumor efficacy and prognosis. In this study, we simultaneously performed WES on the tumor, para-cancer tissues, and cfDNA of NSCLC patients, as well as systematically analyzing the gene mutation map, and tumor subclones at different time points. We created a gene mutation map of NSCLC and predicted TP53, EGFR, FOLR3, LCN10, SPPL2B, STK11, and KRAS as the driver genes. The false-positive rate of all tumor subclones analyzed by cfDNA was <20%, the sensitivity was >80%, and the effective detection rate of secondary clones was 100%, which were highly similar to the results for the tissue samples; thus, WES can represent the tumor status, providing a feasible method for real-time monitoring of tumors. The limitation of this study is that the small sample size. There were only 5 patients used to compare the WES results for the cfDNA and tumor tissue samples, and the effective number of samples was only 91, so there may be sampling bias, as well as random deviations in the statistical results. However, even with the limited number of samples, we still found that it is feasible for cfDNA analysis to replace tumor tissue samples in patients with advanced LC (≥ stage III) for real-time monitoring of tumor differentiation. The article’s supplementary files as
  38 in total

1.  Quantitative analysis of pleural fluid cell-free DNA as a tool for the classification of pleural effusions.

Authors:  Michael H M Chan; Kai Ming Chow; Anthony T C Chan; Chi Bon Leung; Lisa Y S Chan; Katherine C K Chow; Ching Wan Lam; Y M Dennis Lo
Journal:  Clin Chem       Date:  2003-05       Impact factor: 8.327

2.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

Review 3.  Genetic insights into the morass of metastatic heterogeneity.

Authors:  Kent W Hunter; Ruhul Amin; Sarah Deasy; Ngoc-Han Ha; Lalage Wakefield
Journal:  Nat Rev Cancer       Date:  2018-02-09       Impact factor: 60.716

Review 4.  Cancer epigenetics: tumor heterogeneity, plasticity of stem-like states, and drug resistance.

Authors:  Hariharan Easwaran; Hsing-Chen Tsai; Stephen B Baylin
Journal:  Mol Cell       Date:  2014-06-05       Impact factor: 17.970

Review 5.  Genetic polymorphisms and lung cancer susceptibility: a review.

Authors:  Chikako Kiyohara; Akiko Otsu; Taro Shirakawa; Sanae Fukuda; Julian M Hopkin
Journal:  Lung Cancer       Date:  2002-09       Impact factor: 5.705

6.  Clinical correlation of extensive-stage small-cell lung cancer genomics.

Authors:  A Dowlati; M B Lipka; K McColl; S Dabir; M Behtaj; A Kresak; A Miron; M Yang; N Sharma; P Fu; G Wildey
Journal:  Ann Oncol       Date:  2016-01-22       Impact factor: 32.976

7.  fastp: an ultra-fast all-in-one FASTQ preprocessor.

Authors:  Shifu Chen; Yanqing Zhou; Yaru Chen; Jia Gu
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

Review 8.  Cancer stem cells: understanding tumor hierarchy and heterogeneity.

Authors:  Jeremy N Rich
Journal:  Medicine (Baltimore)       Date:  2016-09       Impact factor: 1.889

9.  Tumor heterogeneity and acquired drug resistance in FGFR2-fusion-positive cholangiocarcinoma through rapid research autopsy.

Authors:  Melanie A Krook; Russell Bonneville; Hui-Zi Chen; Julie W Reeser; Michele R Wing; Dorrelyn M Martin; Amy M Smith; Thuy Dao; Eric Samorodnitsky; Anoosha Paruchuri; Jharna Miya; Kaitlin R Baker; Lianbo Yu; Cynthia Timmers; Kristin Dittmar; Aharon G Freud; Patricia Allenby; Sameek Roychowdhury
Journal:  Cold Spring Harb Mol Case Stud       Date:  2019-08-01

10.  Long-read sequence assembly of the gorilla genome.

Authors:  David Gordon; John Huddleston; Mark J P Chaisson; Christopher M Hill; Zev N Kronenberg; Katherine M Munson; Maika Malig; Archana Raja; Ian Fiddes; LaDeana W Hillier; Christopher Dunn; Carl Baker; Joel Armstrong; Mark Diekhans; Benedict Paten; Jay Shendure; Richard K Wilson; David Haussler; Chen-Shan Chin; Evan E Eichler
Journal:  Science       Date:  2016-04-01       Impact factor: 47.728

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.