| Literature DB >> 30231942 |
Mengying Tong1, Ziqian Deng1,2, Mengying Yang1, Chang Xu1, Xiaolong Zhang1, Qingzheng Zhang1, Yuwei Liao1, Xiaodi Deng1, Dekang Lv1, Xuehong Zhang1, Yu Zhang1, Peiying Li1, Luyao Song1, Bicheng Wang2,3, Aisha Al-Dherasi1, Zhiguang Li4, Quentin Liu5,6.
Abstract
BACKGROUND: Breast cancer stem cells (BCSCs) are considered responsible for cancer relapse and drug resistance. Understanding the identity of BCSCs may open new avenues in breast cancer therapy. Although several discoveries have been made on BCSC characterization, the factors critical to the origination of BCSCs are largely unclear. This study aimed to determine whether genomic mutations contribute to the acquisition of cancer stem-like phenotype and to investigate the genetic and transcriptional features of BCSCs.Entities:
Keywords: Breast cancer; Cancer stem cell; Genomics; Sequencing; Transcriptomics
Mesh:
Year: 2018 PMID: 30231942 PMCID: PMC6146522 DOI: 10.1186/s40880-018-0326-8
Source DB: PubMed Journal: Cancer Commun (Lond) ISSN: 2523-3548
Fig. 1Identification and investigation of potential breast cancer stem cell (BCSC)-associated mutation hotspots. a Ascending trend of the percentage of the aldehyde dehydrogenase (ALDH)-positive cell population across the samples from the breast cancer cell line MDA-MB-231. b The invasion ability of enriched spheres was analyzed by transwell invasion assay. ***P < 0.001, two-tailed Student’s t tests. Error bars represent mean ± standard deviation (SD). c Expression levels of markers related to cancer stem cells [nanog homeobox (NANOG) and SRY (sex determining region Y)-box 2 (SOX2)] were assessed by real-time quantitative PCR in both enriched spheres (SP) and monolayer parental cells (2D). ***P < 0.001, two-tailed Student’s t-tests. Error bars represent mean ± SD. d Histogram 2D plots, conducted by the R package “plotly”, show the comparison of variant allele frequency (VAF) between every two samples. The VAF of most single nucleotide variant (SNV) sites in the whole genome is observed as being similar. e One hotspot region in chromosome 7 highlighted with a yellow bar is displayed as an example. First, potential SNV sites along the genome were ordered from the first to the last variant on chromosome 7 and colored according to P values. The distance between each mutation and the one prior to it (the inter-SNV distance) is plotted on the vertical axis (rainfall plot). P values were determined by an exponential distribution formula. Additionally, the number of potential SNV sites of each bin was visualized by University of California Santa Cruz Genome Browser (GB), with the whole chromosome divided into 10,000 equal bins. Next, hotspots of parental cells (2D), and derived spheres of the fourth generation (SP4) hotspot was displayed by GB using the sliding window approach, which was performed by shifting one base each time along the chromosome from start to end and calculating the SNV density and VAF level in each 1000 bp window. f Target deep DNA sequencing of comparison of VAF between every two samples revealed no difference from 2D to SP4 (left and middle). R2 was determined by regression analysis. Cor denotes the Pearson correlation coefficient. The dotted line represents the diagonal line. Sanger sequencing validated part of the results of target deep DNA sequencing (right)
Fig. 2Single-cell target deep DNA sequencing of BCSCs and non-BCSCs. a Schematic depiction of single-cell target deep DNA sequencing analysis. Pearson correlations between every two samples were determined by the base weight, i.e., the fraction of a base in all four possible bases, at each position in hotspot regions. Binomial test was used to assess the probability of background count (PBC) in the 3 BCSCs from a binomial distribution with the position error rate (PER) determined by 2 non-BCSCs. A PBC lower than the threshold (0.01 here) denotes that the alternative reads cannot all be generated by sequencing errors, i.e., a true SNV is called. b Extremely high Pearson correlations of the genomic program between every two samples (left and middle). The box plot shows no significant differences between the correlation of inter-group samples and that of intra-group samples (right). The P value was determined by a two-tailed t test. c The distribution of genetic distances of each site between every two samples is in a narrow range (left), showing no difference between the inter-group and intra-group at all hotspot sites (ordered by the genetic distance, right). d Constant trend of case-permutation ratio (CPR) of each group following adjustment of the P value threshold. CPR was defined as the ratio of the number of sites with P values less than a threshold in the case group to permutation group
Fig. 3Single-cell RNA sequencing (scRNA-seq) and gene differential expression analysis. a Schematic depiction of the origination of sequenced samples. b Gene set enrichment analysis (GSEA) of gene sets enriched in BCSCs compared with those in non-BCSCs. FDR, false discovery rate; NES, normalized enrichment score. c The dot plot (left) shows differentially expressed genes between BCSCs and non-BCSCs. The red dots represent 74 BCSC highly expressed genes with a false discovery rate (FDR) < 0.05 and a fold change > 2. Heatmap (right) illustrates the hierarchical clustering of BCSCs and non-BCSCs showing the 74 genes, with previously reported BCSC-associated genes highlighted with red color. d The validation result of BCSC highly expressed genes using bulk-cell RNA-seq. Heatmap (left) shows the relative expression of BCSC highly expressed genes, and scatter plots (right) illustrate the high correlation of the results between scRNA-seq and bulk-cell RNA-seq
Fig. 4Biological and clinical significance of the BCSC highly expressed genes. a Gene Ontology (GO) analysis of the BCSC highly expressed genes in biological process. P values (one-tail Fisher exact P values used for gene enrichment analysis) were calculated in the DAVID database (https://david.ncifcrf.gov/tools.jsp). b Interaction network of BCSC highly expressed genes integrated from the STRING database. Network nodes represent genes, and edges represent gene–gene associations. A detailed legend is available at https://string-db.org. c Investigation of the clinical relevance of BCSC highly expressed genes in 22 cancer types. The expression of each gene in cancer and corresponding normal tissues was analyzed by a two-tailed t test. Heatmap was horizontally sorted by the number of genes with P < 0.01 in a particular cancer type, shown as red columns on the top. d Kaplan–Meier relapse-free survival curve (left) of patients with low (green) and high (red) risk grouped by BCSC highly expressed genes in SurvExpress (dataset: Breast cancer relapse data). The total number of each group was shown in the top right corner, and the number of censoring samples are marked with a “+” symbol. The concordance index (CI) per curve was also included. The P value was determined by a log-rank test. The x axis represents the years of the study. In rows and corresponding colors, the numbers of samples not presenting the event at the matching time are shown. The box plot (right) shows the comparison of the gene expression between the low- and high-risk groups. Genes significantly (P < 0.05) highly expressed in the high-risk group are highlighted in red. P values were calculated using two-tailed t test