Literature DB >> 34225099

Copy number variation in triple negative breast cancer samples associated with lymph node metastasis.

Mamta Pariyar1, Andrea Johns1, Rick F Thorne2, Rodney J Scott3, Kelly A Avery-Kiejda4.   

Abstract

Triple negative breast cancer (TNBC) is a highly metastatic and aggressive subtype of breast cancer and cases presenting with lymph node involvement have worse outcomes. This study aimed to determine the regions of copy number variation (CNV) associated with lymph node metastasis in TNBC patients. CNV analyses were performed in a study cohort of 23 invasive ductal carcinomas (IDCs), 12 lymph node metastases (LNmets), and 7 normal adjacent tissues (NATs); as well as in an independent cohort containing 70 TNBC IDCs and the same 7 NATs. CNV-associated genes were analyzed using GO-enrichment and Pathway analysis. The prognostic role for genes showing CNV-based changes in messenger RNA expression was determined using the Kaplan-Meier plotter database. For the IDCs, there were a number of variations that were common in both the study and independent cohorts in the amplified regions of 1q, 8q, 19 (p and q), 2p, 5p and the deleted regions in 8p followed by 5q, and 19p. The most frequently amplified regions in the LNmets of the study cohort were 4q28.3, 2p, 3q24, 1q21.2, 10p, 12p11.1, 8q, 20p11.22-20p11.21, 21q22.13, 6p22.1 and the most frequently deleted regions were in 1p36.23, 4q21.1 and 5q. A total of 686 (441 amplified and 245 deleted) genes were associated with LNmets. The LNmet-associated genes were highly enriched for "regulation of complement activation," "regulation of protein activation cascade," "regulation of humoral immune response," "oxytocin signalling pathway," and "TRAIL binding" pathways. Moreover, 6/686 LNmet-associated genes showed CNV-based changes in their mRNA expression of which, high expression of ASPM and KIF14 was significantly associated with worse relapse-free survival. This study has identified several CNV regions in TNBC that could play a major role in metastasis to the lymph node.
Copyright © 2021. Published by Elsevier Inc.

Entities:  

Keywords:  Copy number variation; Invasive ductal carcinoma; Lymph node metastasis; Triple negative breast cancer

Mesh:

Substances:

Year:  2021        PMID: 34225099      PMCID: PMC8259224          DOI: 10.1016/j.neo.2021.05.016

Source DB:  PubMed          Journal:  Neoplasia        ISSN: 1476-5586            Impact factor:   5.715


Introduction

Breast cancer is the second most common cancer among women worldwide as well as a major cause of cancer-related deaths [1]. Triple negative breast cancer (TNBC) is characterized by a lack of 2 hormonal receptors: progesterone receptor and estrogen receptor (ER), as well as a lack of overexpression of human epidermal growth factor receptor-2 [2]. It comprises about 15% to 20% of all the breast cancers cases [3]; and is one of the most aggressive breast cancer subtypes, as it metastasizes rapidly and often recurs within 3 to 5 years after diagnosis when compared to ER-positive breast cancer, which typically recurs at later stages (beyond 5 years) [4], [5], [6]. There is no biomarker that can discriminate between women who will do well and those who develop recurrent disease, nor are there targeted therapies for this breast cancer subtype. Therefore, chemotherapy and surgery are the only options for the treatment of TNBC. Identification of target molecules is essential to enable better prognostic indicators for this disease or new targets for therapy in order to improve patient survival. Copy number variation (CNV) is a type of structural variation where the DNA sequence, ranging from 1 kb to several megabases in length, is either amplified or deleted compared to the normal copy. It is one of the major sources of human genetic variation [7], and it is associated with the initiation of cancer and other diseases such as cardiovascular and complex neurological illnesses [8], [9], [10]. CNVs contribute 4.8% to 9.5% of the variability in the human genome [11]. One research finding suggests that 62% of highly amplified genes in breast cancer exhibit at least a 2-fold increase in expression [12]. Several studies have shown that TNBCs with lymph node involvement have a higher probability of recurrence and worse survival [13], [14], [15]. Additionally, in TNBC, some CNVs can predict poor outcomes and act as prognostic factors [16], [17], [18], and it is likely that some recurrent CNVs in TNBC may be related to lymph node metastasis. The frequent amplification of myeloid cell leukemia sequence 1 was found in ER+ primary tumors with lymph node metastasis but absent in primary tumors without metastasis as determined by single cell sequencing [19]. Gains in CTAGE5 were associated with LN metastases in breast cancer using TCGA and METABRIC data from breast ductal carcinoma [20], whereas in TNBC, there have been no previous studies of CNVs that are associated with lymph node metastasis. Previous studies from our laboratory were performed to explore gene, miRNA, and methylation changes, aimed at identifying potential biomarkers of progression in TNBC [21], [22], [23]. Differentially expressed genes in tumor and lymph node metastasis compared to normal adjacent tissues (NATs) were identified and 39% of the genes were associated with altered methylation levels, whereas a large proportion of the differentially expressed genes (61%) were not associated with altered methylation levels in invasive ductal carcinomas (IDC) versus NAT [21]. Since CNVs are a frequent event in cancer, the differential expression of these genes in TNBC may be due to a variation in copy number. Thus, with the availability of methylation array data from the previous study, this study focuses on performing CNV analysis using methylation data to define regions of copy number gain or loss in TNBC. Moreover, genes overlapping the regions of CNV were compared with their messenger RNA (mRNA) expression levels. By comparing CNV-based genes in lymph node positive TNBCs and lymph node metastases (LNmets), common genes were identified which were associated with the progression of this disease to LNmet.

Methods

Study and independent cohorts

In total 23 IDCs, 12 LNmets, 3 pool NAT and 1 singular NAT from TNBC cases previously used for methylation analysis were used for CNV analysis [21,22]. Of the 3 pooled NAT samples, there are 2 pooled NAT samples that contain 3 NAT samples from individual patient tumors in each pool and 1 pooled NAT sample that contained 4 NAT samples from individual patient tumors, totaling to 11 NAT samples. One pooled NAT sample (mix of 4 NAT samples) was removed in this CNV analysis after it failed quality control analysis, leaving 7 NAT samples in total for the analysis. Of the 23 IDCs in the study cohort, 13 are LN- and 10 are LN+, these were used to determine CNVs associated with LNmet. The LN+ IDC are the primary disease with positive lymph node status. The clinicopathological characteristics including grade and tumor size are similar between LN+ IDC and LN- IDC (Supplementary Table S1). The sample size used in the study cohort is low for CNV analysis and this is a limitation of this study. However, TNBCs represent a small proportion of all breast cancer cases and cases with matched, LNmet and NATs are a unique and powerful resource that have been very well characterized in our previous analyses [21], [22], [23]. An independent cohort was used for copy number analysis and contained 70 IDC TNBC samples from the Australian Breast Cancer Tissue Bank, which have been previously described [21] and these were compared to the same 7 NAT samples used in the study cohort. These 70 TNBC samples are homogenously distributed in terms of grade and tumor staging [21].

Ethics declarations

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Ethical approval was obtained from the Hunter New England Human Research Ethics Committee (Approval number: 09/05/20/5.02). In accordance with the National Statement on Ethical Conduct in Research Involving Humans, a waiver of consent was granted for the study cohort by the same ethics committee that approved the study. Informed consent was obtained from all individual participants included in the independent cohort [21].

DNA extraction

As previously described, the Gentra Puregene Tissue Kit (Qiagen,Venlo,Limburg, the Netherlands) was used to extract DNA from FFPE tissues [21].

Illumina Infinium HD FFPE methylation arrays

Methylation analysis was performed using Human Methylation 450K BeadChip arrays (Illumina) as previously described [21]. The results were deposited in Gene Expression Omnibus (Accession No. GSE78751).

Gene expression microarray analysis

Human Gene 2.0 arrays were used for gene expression analysis as previously described [23]. The gene expression results were deposited in Gene Expression Omnibus (Accession No. GSE61723).

CNV analysis

The Illumina Human Methylation 450K BeadChip arrays previously used to screen for methylation differences in the same cohort was used in this study [21]. These arrays can also be used to detect CNVs as they have a dense SNP array backbone [24]. In the current study, we performed CNV analysis using signal intensity data generated from our previous methylation data in Partek Genomic Suite 7.0. The idat files were imported into Partek Genomic Suite 7.0 Software. The probe intensity data were parsed from the idat files. Copy number was created using 7 NAT samples as a reference baseline (described in "Study and independent cohorts") and values were converted to log2 ratios. Following that, the values were adjusted for local GC content at a window size of 1 Mb to reduce genomic waviness [25]. A genomic segmentation algorithm was used for copy number detection to identify copy number changes between 2 neighboring regions. Following the optimization of the segmentation procedure as suggested by Partek, this algorithm was used with the optimal cut-off value of minimum genomic markers ≥40, signal to noise ratio = 0.3 for the magnitude of significant region differences relative to the noise level in each sample and a P value threshold = 0.001 for significance between different regions. A segment was considered as amplified if the mean copy number was log2 ratio ≥ 0.2, and a deletion if the mean copy number was log2 ratio ≤ -0.3. A false discovery rate of 0.05 was applied to the P value to account for multiple testing. The CNV regions with amplification and deletion across the genome shared by at least five samples were selected for further analysis in IDCs of the study cohort and the same proportion of cases for selection were used for IDCs in the independent cohort [26,27]. Ensembl Transcript release 75 database was used to determine the genes located within those CNV regions. GO-enrichment and pathway analysis within Partek Genomic Suite was used to identify enriched GO terms and pathways from the list of copy number altered genes. Enrichment scores > 3 were considered significant and a q-value of <0.05 was applied to the P value for multiple test correction. The list of copy number altered genes in IDCs and LNmets of the study cohort were compared with gene expression data previously performed on the study cohort to determine if there were CNV-based changes in expression of the genes [23]. A detailed copy number workflow is shown in Supplementary Figure 1.

Kaplan-Meier survival analysis

The Kaplan-Meier plotter database was used for relapse-free survival (RFS) analysis which contains 255 TNBC samples [28]. High and low expression groups of genes of interest were divided according to “Auto select best cut-off” which is the cut-off value of most statistically significant from all possible cut-off values computed between lower and upper quartiles of the expression. Hazard ratio with 95% confidence interval, and log-rank P values were calculated within the database. In addition, RFS was performed in the TCGA (n = 117 TNBCs) and METABRIC datasets (n = 258 TNBCs) downloaded from cBioPortal. Samples were classified based on the presence of an alteration or no alteration of genes. The alteration of genes includes amplification and deletion in this analysis. Log-rank P values were calculated within the GraphPad Prism 9.0 (http://cbioportal.org).

Results

CNVs in IDCs and LNmets compared to NAT samples

First, to identify total CNV segments in each sample group, a genomic segmentation algorithm was used which revealed 4709 CNV segments in IDCs when compared to NAT, with 65% associated with gains and 35% with losses (Supplementary Table S2). Similarly, in LNmets, a total of 1725 CNV were identified with 58.9% gains and 41.1% losses (Supplementary Table S3), which were not present in NATs. Similarly, in the independent cohort, 15,597 CNV segments were identified with 61.59% being gains and 38.4% losses, compared against NATs (Supplementary Table S4).

Gene annotation of CNVs in IDCs of the study and independent cohorts

Next, the CNVs that were frequent in IDCs and the genes associated with those regions were calculated. In IDCs of the study cohort, the most recurrent amplified regions were located on 8q (Supplementary Table S5.a), followed by 1q, 10p, 12p13.1-12p12.3, and 2q33.1. Gene annotation data were mapped to the CNVs revealing 594 genes in 8q, which were amplified in over 50% of the 23 samples (Supplementary Table S5.b). In Figure 1a, the majority of samples were amplified across the 8q region, whereas only 78 genes in 1q, 7 in 2q33.1, 29 in 10p15.1-10p15.3 and 10 in 12p13.1-12p12.3 were frequently amplified in over 50% of the 23 samples. The most recurrent deleted regions were 14q24.1, followed by 14q21.1, 19p13.3, 4q34.1, 5q13.2, and 5q32 (Supplementary Table S6.a). Figure 1a shows the higher distribution of deletions (blue) across the q regions of chromosome 4, 5, 14, and 19p compared to all other chromosomal regions. However, deleted regions were shared less among the samples compared to that of the amplified regions. Four genes in 14q24.1, 2 in 14q21.1, 2 in 19p13.3, 2 in 4q34.1, 2 in 5q13.2, and 5 in 5q32 were recurrent, observed in more than ~40% of the samples (Supplementary Table S6.b).
Fig. 1

Karyogram view of significantly amplified (red) and deleted (blue) regions across 22 chromosomes of the study cohort containing 23 IDCs (Top) and 12 LNmets (below). Histogram heights on either side of the chromosome correspond to the number of samples that share either amplification or deletion at the particular region. The higher the histogram height, the higher the number of samples amplified or deleted at that particular region. This karyogram profile was generated using a genomic segmentation algorithm with the number of minimum genomic markers >=40, P value = 0.001, and signal to noise ratio = 0.3. A false discovery of 0.05 was applied to the resulting P values to correct for multiple testing. (A) CNV profile in IDC samples where amplification in 8q, 1q, 2q, 10p, and 12p is shared by >50% of samples. Deletion in 4q, 5q, 14q, and 19p is shared by >39% of samples. (B) CNV profile in LNmet samples where amplification in 1q, 4q28.3, 2p, 3q, 6p, 8q, 12p, 20p, and 21p is shared by >50% of samples, whereas the deletion in 1p, 4q, and 5q is shared by >39% of samples.

Karyogram view of significantly amplified (red) and deleted (blue) regions across 22 chromosomes of the study cohort containing 23 IDCs (Top) and 12 LNmets (below). Histogram heights on either side of the chromosome correspond to the number of samples that share either amplification or deletion at the particular region. The higher the histogram height, the higher the number of samples amplified or deleted at that particular region. This karyogram profile was generated using a genomic segmentation algorithm with the number of minimum genomic markers >=40, P value = 0.001, and signal to noise ratio = 0.3. A false discovery of 0.05 was applied to the resulting P values to correct for multiple testing. (A) CNV profile in IDC samples where amplification in 8q, 1q, 2q, 10p, and 12p is shared by >50% of samples. Deletion in 4q, 5q, 14q, and 19p is shared by >39% of samples. (B) CNV profile in LNmet samples where amplification in 1q, 4q28.3, 2p, 3q, 6p, 8q, 12p, 20p, and 21p is shared by >50% of samples, whereas the deletion in 1p, 4q, and 5q is shared by >39% of samples. To investigate if genes within the CNVs in IDCs of the study cohort were associated with specific functional groups and pathways, GO-enrichment and Pathway analysis was used. The enrichment scores, “production of molecular mediator of immune response,” “antigen binding,” “immunoglobulin production,” and “complement activation” were the most enriched GO-terms (Supplementary Table S7.a) and “ribosome biogenesis in eukaryotes” was the most enriched pathway from the list of genes contained within the amplified regions (Supplementary Table S7.b). The list of genes overlapping the deleted regions were highly enriched in GO-terms such as “neuron differentiation,” “Cardiac ventricle morphogenesis,” and “regulation of hormone levels” (Supplementary Table S7.c). In the independent cohort, the most frequent amplified regions were 3q24, 2p15, 6p22.1, 8q, 1q, 10p15.3, and 4q28.3 (Figure 2; Supplementary Table S8.a). While at the gene level, 47 genes in 8q, 43 in 1q, 15 in 10p15.3, 5 in 4q28.3, 3 in 3q24, 1 in 2p15, and 1 in 6p22.1 were amplified in ≥50% of samples. The most frequent deleted regions were 17p13.1 and 3p21.31. Only 8 genes in 17p13.1 and 7 in 3p21.31 were observed to be frequently deleted in more than 39% of the samples (Supplementary Table S8.b).
Fig. 2

Karyogram view of significantly amplified (red) and deleted (blue) regions across 21 chromosomes of 70 IDCs in the independent cohort. Histogram heights on either side of the chromosome correspond to the number of samples that share either amplification or deletion at the particular region. The higher the histogram height, the higher the number of samples that are amplified or deleted at that particular region. This karyogram profile was generated using a genomic segmentation algorithm with the number of minimum genomic markers >= 40, P value = 0.001, and a signal to noise ratio = 0.3. A false discovery of 0.05 was applied to the resulting P values for multiple testing. CNV profile in IDCs sample where amplification in 3q, 2p, 6p, 8q, 1q, 10p, 4q is shared by >50% of samples. Deletion in 17p and 3p is shared by >39% of samples.

Karyogram view of significantly amplified (red) and deleted (blue) regions across 21 chromosomes of 70 IDCs in the independent cohort. Histogram heights on either side of the chromosome correspond to the number of samples that share either amplification or deletion at the particular region. The higher the histogram height, the higher the number of samples that are amplified or deleted at that particular region. This karyogram profile was generated using a genomic segmentation algorithm with the number of minimum genomic markers >= 40, P value = 0.001, and a signal to noise ratio = 0.3. A false discovery of 0.05 was applied to the resulting P values for multiple testing. CNV profile in IDCs sample where amplification in 3q, 2p, 6p, 8q, 1q, 10p, 4q is shared by >50% of samples. Deletion in 17p and 3p is shared by >39% of samples. Similar to the study cohort, in IDCs of the independent cohort, the significantly overrepresented GO-terms in the list of amplified genes were “antigen binding,” “immunoglobulin production,” “production of molecular mediator of immune response,” and “complement activation,” whereas “ribosome biogenesis in eukaryotes” was a significantly enriched pathway (Supplementary Table S9.a and S9.b). In the list of deleted genes, “neutrophil mediated cytotoxicity,” “neutrophil mediated killing of symbiont cell,” and “neutrophil mediated immunity” were the most enriched GO-terms (Supplementary Table S9.c). Interestingly, 29% of the CNV-associated genes in the study cohort were detected in the independent cohort. Of the 2601 genes, 2535 were amplified and 66 were deleted in both cohorts. About 1599 of 2535 (63%) amplified genes were associated with chromosome 1q, followed by 303 in 8q, 282 in 19(p and q), 114 in 2p, 68 in 5p while in the remaining regions fewer genes were distributed across 3q, 4q, 6p, 7(p and q), 10p, 12p, 17q, 18p, and 20q (Supplementary Table S10.a). Of the 66 deleted genes in both cohorts, 37 were associated with 8p followed by 22 in 5q, 3 in 19p13.3, while the rest were associated with 3p21.31, 4q32.3, 14q13.2, and 17p13.1 (Supplementary Table S10.b). The most enriched GO-terms were “immunoglobulin production” and “antigen binding” and “ribosome biogenesis in eukaryotes” in the list of amplified genes (Supplementary Table S11.a and S11.b). In the list of deleted genes, neutrophil mediated cytotoxicity, “cellular extravasation” and “regulation of chemokine biosynthetic process” were the most enriched GO-terms (Supplementary Table S11.c).

Gene annotations of CNVs in LNmets

The most frequently amplified regions in the LNmets were 4q28.3, 2p, 3q24, 1q21.2, 10p, 12p11.1, 8q, 20p11.22-20p11.21, 21q22.13, 6p22.1 (Supplementary Table S12.a, Figure 1b). At the gene level, 2p contained the highest number of amplified genes (105) (Supplementary Table S12.b) and accounted for more than 50% of cases; followed by 19 in 8q and 15 in 20p11.22-20p11.21; whereas 3 in 1q21.2, 6 in 10p15.3, 5 in 12p11.1, 2 in 21q22.13, 4 in 3q24, 6 in 4q28.3, 3 in 6p22.1 were frequently observed in over 50% of samples. The most frequently deleted regions were 1p36.23, 4q21.1, and 5q in more than 39% of the samples. Similar to IDCs, the number of deleted regions shared among the multiple samples was less than that of the amplified regions. There were 28 genes in 5q, 7 in 4q21.1, and 5 in 1p36.23 deleted in more than 39% of samples (Supplementary Table S12.c). The most enriched GO-terms were “Production of molecular mediator of immune response,” “Immunoglobulin production,” “antigen binding,” and “regulation of complement activation” and “ribosome biogenesis in eukaryotes” in the list of amplified genes (Supplementary Table S13.a and S13.b). The list of deleted genes showed highest enrichment in GO-terms for “flavonoid glucuronidation,” “flavonoid metabolic process,” and “cellular glucuronidation.” Other highly enriched pathways in the list of deleted genes were mainly involved in metabolism such as “Pentose and glucuronate interconversions,” “steroid hormone biosynthesis,” “drug metabolism,” and the “estrogen signaling pathway” (Supplementary Table S13.c and S13.d). Interestingly, while comparing the CNVs in IDC and LNmets, 41% of CNV associated genes in IDCs were also observed in LNmets. The overlapping genes were mainly associated with amplification in chromosomal regions including 1q, 2(p and q), 6p, 7q, 8q, 10p, 12p, 19p, 21q and deletion in regions including 1p, 3p, 4(p and q), 5q, 8p, 14q, 15q, 17q, and 19p.

Genes within CNV regions associated with the progression from IDC to LN metastasis

We next determined the genes associated with CNVs in LNmets to identify changes related to the progression of primary TNBC to metastasis in the study cohort. For this, first, we identified total CNV regions of lymph node positive IDC (LN+ IDC) and lymph node negative IDC (LN- IDC) and the genes associated with that region. Then we compared the amplified and deleted genes among the 3 groups using Venn diagrams. Group 1: LN+ IDC (n = 10), Group 2: (LN- IDC) (n = 13) and Group 3: LNmets (n = 12). With this comparison, we aimed to identify genes in common with CNVs in LN+ IDC and LNmets, that were not present in LN- IDC, that were potentially associated with metastasis (Figure 3a and b).
Fig. 3

Venn diagram showing genes associated with LNmet. The gene lists were identified by overlapping the significant CNV regions that are shared by at least 22% of multiple samples. The number of genes for: lymph node positive IDC (LN+ IDC), lymph node negative IDC (LN- IDC), and LNmets are shown in brackets both for amplification and for deletion and n refers to number of samples in each group. (A) 441 amplified genes highlighted in bold were common in both LN+ IDC and LNmets. (B) 245 deleted genes highlighted in bold were common in both LN+ IDC and LNmets.

Venn diagram showing genes associated with LNmet. The gene lists were identified by overlapping the significant CNV regions that are shared by at least 22% of multiple samples. The number of genes for: lymph node positive IDC (LN+ IDC), lymph node negative IDC (LN- IDC), and LNmets are shown in brackets both for amplification and for deletion and n refers to number of samples in each group. (A) 441 amplified genes highlighted in bold were common in both LN+ IDC and LNmets. (B) 245 deleted genes highlighted in bold were common in both LN+ IDC and LNmets. We identified 441 amplified genes located in chromosome 1q, 5p, 6(p and q), 7q, 8(p and q), 17q and 20q that were in common with LN+ IDC and LNmets. Interestingly, 365 of 441 (83%) genes were associated with the q region of chromosome 1, while 30 of 441 (7%) genes were associated with chromosome 6 and 26 of 441(6%) genes in the 17q region (Supplementary Table S14.a). Two hundred forty-five deleted genes were located on chromosome 5q, 6p, 8p, 12q, 14q, 17q, and 19p and were common to both LN+ IDC and LNmets. Here, 146 of 245 (60%) deleted genes were located in 8p, followed by 50 in 5q (20.4%), 32 in 14q (13.06%), with the other regions encompassing less than 10 genes (Supplementary Table S14.b). GO-enrichment and Pathway analysis, revealed a series of amplified genes showing the highest enrichment in GO-terms in pathways associated with “regulation of complement activation,” “protein activation cascade,” “regulation of acute inflammatory response,” and “regulation of protein processing and maturation” and the highest enrichment in pathways including “complement and coagulation cascades” and “oxytocin-signalling pathway” (Supplementary Table S15.a and S15.b). Whereas for CNV loss, “TRAIL binding” was the most enriched GO-term with “estrogen signalling pathway” and “cytokine-cytokine receptor interaction” significantly associated with these deleted regions (Supplementary Table S15.c and S15.d).

Integration of CNVs with gene expression analyses

CNV data were integrated with previously published gene expression data (GEO Accession: GSE61723) to determine whether the change in mRNA expression was a result of the CNVs [23]. However, very few differentially expressed genes were linked to the CNVs in the study cohort. In the study cohort, 33 of 185 (18%) differentially expressed genes in IDC vs NAT were copy number altered, where 29 upregulated and 4 downregulated genes were amplified and deleted respectively (Figure 4a and b). In the LNmets, 18 of 165 (10.9%) differentially expressed genes in LNmet vs NAT showed copy number alterations, where 5 upregulated and 13 downregulated genes were amplified and deleted respectively (Figure 4c and d).
Fig. 4

Venn diagram showing genes with CNV and change in gene expression. Total 33 of 185 (18%) differentially expressed genes in IDC vs NAT were also CNV altered in the IDCs of study cohort where (A) 29 upregulated genes were amplified and (B) 4 downregulated genes were deleted. Total 18 of 165 (10.9%) differentially expressed genes in LN vs NAT were also CNV altered in LNmets of study cohort where (C) 5 upregulated genes were amplified and (D) 13 downregulated genes were deleted.

Venn diagram showing genes with CNV and change in gene expression. Total 33 of 185 (18%) differentially expressed genes in IDC vs NAT were also CNV altered in the IDCs of study cohort where (A) 29 upregulated genes were amplified and (B) 4 downregulated genes were deleted. Total 18 of 165 (10.9%) differentially expressed genes in LN vs NAT were also CNV altered in LNmets of study cohort where (C) 5 upregulated genes were amplified and (D) 13 downregulated genes were deleted. Our previous study identified 28 TNBC specific genes that were differentially expressed in IDCs vs NAT of the study cohort, but not in non-TNBC IDCs [23]. In the current study, 3 of 28 TNBC-specific genes whose expression was upregulated were amplified in IDCs of the study cohort (ANKRD36BP1, ANP32E, MYBL1), while TBC1D9 and TMEM144 were deleted in the IDCs. Additionally, we investigated if the 441 amplified and 245 deleted genes in both LN+ IDC and LNmets in our current study also showed differential expression in LN+ IDC vs NAT. For this, we compared these genes with the total 104 genes that were differentially expressed in LN+ IDC vs NAT. The total 104 genes are the result from the previous study [23] and are shown in the Supplementary Table S16. Only 3 amplified (ASPM, KIF14, and LEMD1) genes were upregulated and 3 (SNORD113-2, SNORD113-3, and SNORD113-4) deleted genes were downregulated in LN+ IDC vs NAT (Table 1).
Table 1

List of 6 LNmet-associated genes which were associated with CNVs in both LN+ IDC and LNmet and showed CNV-based change in expression at mRNA level in LN+ IDC vs NAT

GenesAmplification deletionAverage copy number value in LN+ IDCAverage copy number value in LNmetsFold change (Description)Fold change (LN+ IDC vs NAT)
KIF14Amplification0.3255230.280932Upregulation1.69073
ASPMAmplification0.3257060.320008Upregulation1.6279
LEMD1Amplification0.3134970.335521Upregulation1.54211
SNORD113-2Deletion-0.406205-0.633017Downregulation-2.82983
SNORD113-3Deletion-0.406205-0.633017Downregulation-2.15659
SNORD113-4Deletion-0.406205-0.633017Downregulation-1.65026
List of 6 LNmet-associated genes which were associated with CNVs in both LN+ IDC and LNmet and showed CNV-based change in expression at mRNA level in LN+ IDC vs NAT

The genes associated with CNVs in LNmets that also showed a change in mRNA expression are associated with outcome in TNBC

We assessed the prognostic value of the 6 CNV-based genes (ASPM, KIF14, LEMD1, SNORD113-2, SNORD113-3, and SNORD113-4) (Table 1) that also showed corresponding changes in their gene expression in 125 TNBC patients using the Kaplan-Meier plotter online database. The outcome data of the study cohort were not used because of the small sample size. Interestingly, the survival analysis revealed that high expression of ASPM and KIF14 were significantly associated with worse RFS, while high expression of LEMD1 showed a nonsignificant trend in increased RFS (Figure 5). No survival information was available for the 3 (SNORD113-2, SNORD113-3, and SNORD113-4) deleted genes. Furthermore, we evaluated the prognostic value of these genes in TNBC patients of TCGA and METABRIC datasets using cBioPortal database. However, there was no significant difference in RFS with altered and nonaltered status of genes.
Fig. 5

Kaplan-Meier plots for relapse-free survival curves of the patients with high expression (red) and low expression (black) for (A) ASPM, (B) KIF14, and (C) LEMD1. The 255 TNBC samples were divided into high or low expression groups based on best cut-off expression value of each gene and compared by Kaplan-Meier survival analysis. A log-rank P value ≤ 0.05 was considered significant. HR, hazard ratio. The red curve represents high expression, and the black curve represents low expression.

Kaplan-Meier plots for relapse-free survival curves of the patients with high expression (red) and low expression (black) for (A) ASPM, (B) KIF14, and (C) LEMD1. The 255 TNBC samples were divided into high or low expression groups based on best cut-off expression value of each gene and compared by Kaplan-Meier survival analysis. A log-rank P value ≤ 0.05 was considered significant. HR, hazard ratio. The red curve represents high expression, and the black curve represents low expression.

Discussion

TNBC is very heterogeneous and associated with a higher burden of CNVs compared to other subtypes [29]. There are no studies that have performed analysis of CNVs that are associated with LNmet in TNBC, we aimed to identify CNVs which may play a role in metastases of this subtype. We have identified the most frequent CNVs in IDCs and LNmets and the CNV-based genes which are LNmet-associated by comparing the groups of LN+ IDCs, LN- IDCs, and LNmets. In our study, from both study and independent cohorts, common CNVs associated with amplified regions in 1q, 8q, 19(p and q), 2p, 5p and deleted regions in 8p followed by 5q, and 19p were found. Moreover, common CNVs in IDC and LNmets of the study cohort were also observed which were associated to amplification in chromosomal regions including 1q, 2(p and q), 6p, 7q, 8q, 10p, 12p, 19p, 21q and deletion in regions including 1p, 3p, 4(p and q), 5q, 8p, 14q, 15q, 17q and 19p. Similar to our results, other studies have also found frequent amplification in 1q, 3q, 8q, 10p, and 12p and deletion in 5q and 17p in TNBC [30], [31], [32], [33] suggesting these CNV regions may have a significant role in increasing genomic aberrations in TNBC. Furthermore, we have identified a total 441 amplified and 245 deleted CNV-based genes which were observed in both LN+ IDCs and LNmets and that were not present in LN- IDCs; and thus, are associated with metastasis to the LN. This implies that certain CNVs are shared between the primary tumor and the metastasis, which could be involved in driving metastatic progression. We observed that the majority of the LNmet-associated genes were located in 1q amplified and 8p deleted regions in our study. The gain in 1q has also been associated with metastasis in breast cancer [19,34]. Amplified genes in the chromosome 1q region identified included CD55, CR1, CR2, CD46, and C4BPB, also known as complementary regulatory proteins, that are known to be overexpressed in cancer cells and promote LN metastasis in various cancer types such as nasopharyngeal, gastric, and pancreatic [35], [36], [37], [38]. We have identified the highest number of LNmet-associated deleted genes located in 8p region. The deletion of 8p was also associated with metastasis in hepatocellular carcinoma [39]. Moreover, loss of 8p was significantly linked to the presence of LN metastasis in breast cancer [40]. This suggests that the gain in 1q and deletion of 8p may have a significant role in metastasis. Furthermore, regulation of “complement activation cascade” and “humoral immune response” were highly enriched in the LNmet-associated genes that were amplified indicating the genomic imbalance in complement activation and humoral immune response may assist the tumor to escape immune attack leading to its progression and metastasis. CD55 and CD46 which are amplified in our study were found to regulate the immune response [41,42] but the alteration of these proteins may dysregulate the immune mechanism which could enhance metastasis [43]. We observed “TRAIL binding” was highly enriched within the list of deleted genes. Upon binding to TRAIL Receptor 2 (TRAIL-R2), TNF-related apoptosis-inducing ligand (TRAIL) and agnostic mAbs have been shown to act as a metastasis suppressor in an orthotropic model of TNBC [44]. Deletion of TNFRSF10C (TRAIL-R3), also identified in our study, was associated with distant metastasis and positive nodal disease in colorectal cancer [45]. Overall, our results suggest the dysregulation of the immune response and the apoptotic pathway may play a significant role in regulating metastasis of the primary tumor. Moreover, we found 6 of the 686 LNmet-associated genes present in both LN+ IDC and LNmets also showed CNV-based changes in their expression. Of the 6 LNmet-associated genes, SNORD113-2, SNORD113-3, and SNORD113-4 were both deleted and downregulated in LN+ IDC vs NAT. These 3 genes belong to a group of noncoding RNA (ncRNA) molecules which play a role in the ribosomal RNA biogenesis [46,47]. However, the role of them has not been studied in TNBC. One study has shown that the downregulation of SNORD113-1, which belongs to the same family of snoRNAs, is associated with worse RFS in hepatocellular carcinoma [48]. The remaining 3 LNmet-associated genes (ASPM, KIF14, and LEMD1) were both amplified and upregulated in LN+ IDC vs NAT, where the high expression of ASPM and KIF14 was associated with worse RFS in TNBC using Kaplan-Meier plotter analysis [28]. ASPM is known for its role in spindle microtubule organization in cell division [49]. The high expression of ASPM was also observed in several other cancers such as bladder cancer [50], ovarian cancer [51], and prostate cancer [52], where the increase in expression was mostly associated with tumor grade, early recurrence, tumor metastasis, and worse survival. KIF14 is responsible for mitotic spindle formation and cytokinesis and has been associated with decreased disease-free survival in breast and lung cancer [53,54]. High expression of KIF14 has also been found in other cancers such as prostate cancer [55] and ovarian cancer [56] where it is associated with tumor growth and worse survival. LEMD1 overexpression was associated with nodal metastasis and worse prognosis in oral squamous cell carcinoma numbered and in gastric cancer [57]. Overall, these genes with CNV-associated changes in expression have a significant role in multiple cancers and may also play a greater role in increasing disease aggressiveness in TNBC. Although for many of the 686 genes there was no association between CNV and gene expression, this may be due to the various factors including the degree of overlap of CNVs in genes, distance to transcription start sites and types of genes. One study showed that the genes associated with the regions amplified discontinuously are downregulated suggesting partial gene amplification may acts as a silencer to downregulate gene expression [58]. Overall, a series of CNVs have been identified that appear to be associated with LNmets and which contains genes that have a central function in maintaining a number of key pathways, which, if perturbed, result in an increased departure from the normal mechanisms that are associated with mammary homeostasis. However, this study is limited by the small sample size of LNmets and lack of validation cohort. Because of the small sample size of the study cohort, prognostic significance of key genes was performed using publicly available databases. Moreover, Infinium arrays have a high resolution to detect alterations in coding loci; however, their design is more gene-centric and thus may omit the alterations present outside the genes which could limit the results of CNV analysis using Infinium arrays. Additionally, for future studies, the key genes that are LNmet-associated and which showed a change in mRNA expression need to be validated by using alternate techniques such as ddPCR or quantitative PCR. After validation in larger cohort, the sensitivity and specificity testing of the genomic markers should be performed using a receiver operating characteristic curve to investigate their predictive value in TNBC and add to their clinical value.

Conclusions

This study has identified several regions of CNV in TNBC that could play major role in metastasis to the LN. Further validation of these CNVs in a larger cohort and functional studies are necessary to understand their important role in the progression of TNBC.
  4 in total

1.  Identification of pyroptosis related subtypes and tumor microenvironment infiltration characteristics in breast cancer.

Authors:  Guo Huang; Jun Zhou; Juan Chen; Guowen Liu
Journal:  Sci Rep       Date:  2022-06-23       Impact factor: 4.996

2.  Prognostic Value of Long Noncoding RNA DLEU2 and Its Relationship with Immune Infiltration in Kidney Renal Clear Cell Carcinoma and Liver Hepatocellular Carcinoma.

Authors:  Shengqiang Fu; Binbin Gong; Siyuan Wang; Qiang Chen; Yifu Liu; Changshui Zhuang; Zhilong Li; Zhicheng Zhang; Ming Ma; Ting Sun
Journal:  Int J Gen Med       Date:  2021-11-11

3.  Elevated transcription and glycosylation of B3GNT5 promotes breast cancer aggressiveness.

Authors:  Zhaorui Miao; Qianhua Cao; Ruocen Liao; Xingyu Chen; Xiaoli Li; Longchang Bai; Chenglong Ma; Xinyue Deng; Zhijun Dai; Jun Li; Chenfang Dong
Journal:  J Exp Clin Cancer Res       Date:  2022-05-07

4.  Prognostic Value of Copy Number Alteration Burden in Early-Stage Breast Cancer and the Construction of an 11-Gene Copy Number Alteration Model.

Authors:  Dingyuan Wang; Songlin Gao; Haili Qian; Peng Yuan; Bailin Zhang
Journal:  Cancers (Basel)       Date:  2022-08-27       Impact factor: 6.575

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.