Literature DB >> 28924377

Current Progresses of Single Cell DNA Sequencing in Breast Cancer Research.

Jianlin Liu¹, Ragini Adhav¹, Xiaoling Xu¹.

Abstract

Breast cancers display striking genetic and phenotypic diversities. To date, several hypotheses are raised to explain and understand the heterogeneity, including theories for cancer stem cell (CSC) and clonal evolution. According to the CSC theory, the most tumorigenic cells, while maintaining themselves through symmetric division, divide asymmetrically to generate non-CSCs with less tumorigenic and metastatic potential, although they can also dedifferentiate back to CSCs. Clonal evolution theory recapitulates that a tumor initially arises from a single cell, which then undergoes clonal expansion to a population of cancer cells. During tumorigenesis and evolution process, cancer cells undergo different degrees of genetic instability and consequently obtain varied genetic aberrations. Yet the heterogeneity in breast cancers is very complex, poorly understood and subjected to further investigation. In recent years, single cell sequencing (SCS) technology developed rapidly, providing a powerful new way to better understand the heterogeneity, which may lay foundations to some new strategies for breast cancer therapies. In this review, we will summarize development of SCS technologies and recent advances of SCS in breast cancer.

Entities: Chemical Disease Gene Species

Keywords: Breast cancer; Cancer stem cells; Intertumor heterogeneity; Intratumor heterogeneity; Single cell sequencing.

Mesh：

Year: 2017 PMID： 28924377 PMCID： PMC5599901 DOI： 10.7150/ijbs.19627

Source DB: PubMed Journal: Int J Biol Sci ISSN： 1449-2288 Impact factor: 6.580

Introduction

Breast cancer is the most common malignancy in women worldwide. Drug resistance, cancer recurrence and metastasis are the main causes of mortality of breast cancer patients 1, 2. According to the cancer stem cell (CSC) hypothesis, CSCs are responsible for phenotypic and functional heterogeneity within the tumor in some cancers 3-6, and for disease recurrence and metastasis 7-11, which make cancer hard to treat. Within tumor microenvironment, CSCs are differentially originated in different area within a tumor, leading to cancer heterogeneity 12. On the other hand, clonal evolution theory suggests that heritable genetic and epigenetic changes contribute to tumor heterogeneity. Similar to hematologic malignancies or other cancers, breast cancers also have high heterogeneity in pathology and molecular profiles 13. It is essential to reveal the initiating events in molecular nature of breast cancer and understand the nature of heterogeneity, and then to identify putative predictive biomarkers for aberrant oncogene, thereby to develop molecularly targeted therapies for breast cancer patients. Emerging technologies, such as massive parallel genomic sequencing, multiplexed somatic mutation genotyping, are commonly used to classify cancers into molecular subsets 14. In the past 5-10 years, single cell sequencing undergoes a rapid progress, which greatly advanced our understanding of heterogeneity in cancers.

Heterogeneity of breast cancer

Breast cancer is a heterogeneous malignancy and there are highly diversities among different cancers (Intertumor heterogeneity), and within a single tumor (Intratumor heterogeneity).

Intertumor heterogeneity

Human breast cancers are categorized into 18 subtypes by the histological features of primary tumors, such as cellular arrangement, cellular features, lesion size, the presence of necrosis, etc. 15. However, this kind of classification based on histological criteria is confusing due to a number of factors, including regions of different morphologies within one tumor, and different scoring subjectivity, etc. 13. Recently, microarray analysis provides a new way to categorize human breast cancers into at least 6 subtypes by gene expression profiling, including luminal A, luminal B, ERBB2, claudin-low, normal-like, and basal-like 16-19. Definitely, this pattern of gene expression in luminal and basal cells is useful for defining the original molecular portraits of breast tumors, and facilitating patient care and treatment 16, 20. By using a sophisticated lineage tracing approach in vivo, a pool of Axin2+ cells are considered to contribute to the origin of basal and luminal cell populations 21. Luminal progenitor cells with an Flf5-driven Cre transgene only give rise to the luminal population of cells 22. ERα-/PR- luminal progenitors only generate ERα-/PR- differentiated luminal alveolar cells 23. Subtype-specific tumor cell-of-origin and transforming events have been proposed to be responsible for intertumor heterogeneity 24. It has been proposed that HER2+ tumors correlate with enrichment in the fetal mammary stem cell signature and luminal progenitors serve as precursors for basal-like cases 25-28. Other studies provided evidence illustrating that the luminal progenitors are original cells for breast cancer associated gene-1 (BRCA1) mutant carriers 25, 28. Basal mammary stem cells can develop into both basal and luminal cells in varied stages of development 22, 29. In these cases, genetic lineage tracing helps us to understand the mammary epithelial hierarchies and origin of cancers.

Intratumor heterogeneity

Besides the huge differences among different tumors, the cancer cells within one tumor are also highly diverse with cellular genetic alternations 30. A tumor might be a complexity of different subtypes of cell, understanding the origin and evolution of the tumor is helpful for understanding of either intratumor or intertumor heterogeneity of cancer, facilitating cancer therapy, especially the precision medicine for cancer. Two main theories have been proposed to explain intratumor heterogeneity, including cancer stem cell theory 31 and clonal evolution model 32, which might be complementary during cancer development.

Cancer stem cells (CSCs)

CSCs, which are a very small pool of tumorigenic cells that have self-renewal, proliferative and differentiation abilities, have been used to explain intratumor heterogeneity. Breast CSCs were first isolated and characterized from primary and metastatic human breast tumor in 2003 33. The identification of breast CSCs depends on the recognition of the cell surface marker expression, such as CD44+CD24-/low, CD55+, CD61+, and ALDH1 33-36. CSCs derive from several fashions, including oncogenic transformation of a normal stem cell, non-CSC-to-CSC plasticity, etc. 31. According to the CSC theory, CSCs are on the top of the hierarchy of cells and drive the growth and progression of the tumor mass and seed metastases 31. The CSCs hierarchy is bidirectional conversions between CSC and non-CSC states (Figure 1). CSCs undergo symmetric division to produce more CSCs. On the other hand, the most tumorigenic cells undergo asymmetric division to generate non-CSCs with less tumorigenic and metastatic potential 31. In addition, non-CSCs can also undergo a dedifferentiation process and revert to the CSC phenotype under certain conditions 10, 31, 37-40. CSCs may comprise a group of heterogeneous and functionally varied population of cells that may undergo different genetic and epigenetic changes and non-CSCs can convert to CSCs with various degrees of efficiency depending on the stimuli from the microenvironment, probably giving rise a vast degree of diversity within tumors 31. Increasing studies also demonstrated that CSC model is applicable to many other solid tumors including breast tumor 33, glioblastoma 7, 41, colorectal tumor 42, 43, pancreatic tumor 44 and ovarian tumor 45-47. However, the CSCs might be dynamically variable within tumors under distinct microenvironmental cues 31, 48 and the markers for CSCs might also be variable in different subtypes or stages of diseases 33, 46, 49-54, although some cases might not follow the CSC model 55-58. The mammary gland is a special organ, of which development occurs through different stages throughout embryonic and pubertal development and reproductive life. Most recently, evidences show that the cancer risk is attributed to random mutation arising in normal stem cells divisions 59, 60, which suggests that mammary gland might be an organ has relative high cancer risk, whereas another study suggests that the mutation accumulation is tissue specific in adult stem cells 61. Although these studies differ in the pattern of mutation occurring, they indeed support the CSC theory to some extent.

Figure 1

CSC model. Normal stem cells can undergo oncogenic transformation to give rise to cancer stem cells (CSCs). CSCs can generate CSCs and non-CSC through symmetric division and asymmetric division, respectively, driving tumor growth and seeding metastases. On the other hand, non-CSCs can also dedifferentiate back to CSCs under stimuli from microenvironment. CSCs to non-CSCs is bidirectional and dynamic conversion, leading to great diversity within tumors.

Clonal evolution

A tumor initially arises from a single cell, which then undergoes clonal expansion to a population of cancer cells. During tumorigenesis and evolution process, cancer cells undergo different degrees of genetic instability and consequently obtain varied genetic aberrations. Clonal expansions driven by the acquisition of different mutations are main portraits for clonal evolution 32, 62 (Figure 2), but not all expansions are induced by genetic events. Driver mutations (i.e. mutations that allow cells gain growth advantages) are the key mutational events that drive clonal expansions in a given microenvironment 63. Under a certain set of selective pressure, clones acquire driver mutations, and are also accompanied by passenger alterations, which may change into driver aberrations if the selective pressures change 63. During the tumor progression, the mutational rate also changes, subsequently the clones acquire new mutations, which lead to genetic heterogeneity within the tumor 63, 64. The clonal evolution has been revealed in breast cancer. Several somatic coding mutations that vary between primary and metastatic breast tumor have been discovered by using next generation sequencing in patients, suggesting that evolution occurs in breast cancer progression 65. In addition, some other events, such as epigenetic modification during tumorigenesis could also substantially contribute to heterogeneity of cancers 66, 67.

Figure 2

Clonal evolution model. Tumor can evolve through clonal expansion of cancer cell, giving rise to heterogeneity within tumors, which is created by genetic changes. Under selection pressure, earlier diver mutations (red color), and new driver mutations (yellow color) obtain advantage for outgrowth of clones and drive tumors grow.

Single cell DNA sequencing in breast cancer

Because the quantity of CSCs is very low compared with the total cells of a tumor, the genetic alterations of the CSCs probably are masked when tumors are sequenced as a whole. This problem can be overcome by using single cell sequencing (SCS), which is a powerful technology to study the evolution and heterogeneity in tumor and to understand the role of rare cells in cancer progression 68. Compare to mixed cell sequencing, SCS could unravel key points much clearer in cancer biology that is difficult to address with bulk tumor sequencing. Thus, this method will be greatly helpful for our understanding of initiation, progression, invasion, metastasis, resistance and recurrence in cancer, subsequently guiding a more efficient early detection and targeted therapy in cancer therapies in clinic.

Technical challenges in single cell DNA sequencing

For the single cell DNA research, following several main procedures are involved in a sequential order: 1) Single cell isolation; 2) Amplification (including whole genome amplification, library construction); 3) Sequencing; and 4) Bioinformatics analysis. To date, the single cell sequencing is still technologically challenging due to the bias and errors caused during the whole single cell DNA sequencing workflow.

Single cell isolation

The first step of single cell sequencing is to capture the single cell of interest. Several approaches have been developed for capturing the single cell from abundant pools, including serial dilution 69, micropipetting 70, microwell dilution 71, optical tweezers 72, microfluidic platforms 73-75, and fluorescence-activated cell sorting (FACS) 76 (Table 1). Basically, serial dilution is conducted by diluting a pool of cells for a number of times in a constant dilution factor to obtain one single cell per microliter 69. Micropipetting uses a special glass micropipette under the microscope to pick the cell of interest into individual PCR tubes for downstream amplification 70, which is laborious and low-throughput. Microwell dilution is a method that distributes single cells into wells by using a microwell array, followed by genetic material amplification simultaneously 71. Most recently, QIAscout (QIAGEN) has been developed for isolating the single cell effectively and fast by microrafts (https://www.qiagen.com/mo/). Optical tweezers employ highly focused laser beam in the combination of imaging-based cell selection to capture individual cells 72, 77, 78. Microfluidic platforms usually use reconfigurable flow-routing capabilities of integrated microvalve technology, which can deposit single cells into nanoliter-volume storage chambers of microfluidic chips, followed by amplification in droplets containing reagents and cell 73, 74. C1 Single-Cell Auto Pre System is popular one of microfluidic devices that uses pneumatic components to control the microfluidic integrated fluidic circuit (IFC) and uses thermal components for preparatory chemistry, which has 96-capture sites per IFC 75. Drop-seq 79 and the Chromium™ Controller from 10x genomics (https://www.10xgenomics.com/single-cell/) can also be used for single cell capture, but the present designs are only applicable for downstream RNA-seq not for DNA-seq. FACS can isolate thousands of single cells in microdroplets by electric charge at much high pressure. Antibodies against cell markers or dye can be used for labeling live cell or cell nuclei to isolate sub-groups of cell or cell nuclei from frozen or formalin-fixed paraffin-embedded (FFPE) samples 76, 80. However, isolating rare single cell is much more challenging. In this regards, laser capture microdissection (LCM) is used for the isolation of cells within certain context without contamination from surrounding cells from fresh or archival specimens by using infrared (IR) capture system 81, 82 or ultraviolet (UV) cutting system 83-86 (Table 2). Circulating tumor cells (CTCs) are rare cells present in peripheral blood from cancer patients, which occur at much low frequency 87. So far, several platforms have been developed for capturing CTCs (Table 2). CTC-chip is a unique microfluidic platform that can be used to separate viable CTCs efficiently via antibody EpCAM-coated microposts 88. So far, FDA (Food and Drug Administration) has approved the CellSearch system, which uses ferrofluid particles conjugated with anti-EpCAM and anti-CD45 antibodies 89, 90, for the capture of CTCs.

Table 1

Methods for capturing single cell from abundant cell population

Methods	Descriptions	Advantages	Disadvantages
Serial dilution	Serial dilution to single cell per microliter	Low cost	Time-consuming; high possibility of capturing multiple cells
Micropipetting	Capture single cell using special micropipette	Low cost; higher possibility for capturing rare population cells	Time-consuming; low throughput
Microwell dilution	Isolate single cell using microwell array	High throughput; low contamination (nanolitre volumes reaction); less reagent cost	Expensive consumables and equipment; time-consuming
Optical tweezers	Trap single cell using focused laser beam	Low contamination (nanolitre volumes reaction); fluorescent cells can be captured	Highly dependent both on the size and shape of the cell; low throughput
Microfluidic devices	Capture single cell into flow chambers using microfluidic chips	Low contamination (nanolitre volumes reaction); less reagent cost	Expensive consumables and equipment; hard to avoid cell doublets or empty well
FACS sorting	Sort single cell by electric charge at high pressure	High throughput; fluorescent cell surface markers can be used for capturing specific population of cells; dye can be used for sorting nuclei from broken cell of frozen or FFPE samples	Expensive equipment

Table 2

Methods for capturing single cell from rare cell population

Methods	Descriptions	Advantages	Disadvantages
Laser capture microdissection	Isolate single cell from tissue section using a laser	Context of cell can be identified	Expensive equipment; DNA materials may be damaged by UV
CTC-chip	Separate CTC via antibodies-coated micropost	High-throughput	Expensive consumables
CellSearch	Capture CTC using ferrofluid particles conjugated with EpCAM antibody	High-throughput	CTC with mesenchymal-like phenotype can be difficult to isolate

Amplification of genome

Since there is only one copy of DNA within one single cell, whole genome amplification (WGA) is required for next generation sequencing. However, artefacts can be introduced during WGA, such as amplification bias (decreased coverage uniformity, allelic imbalance), genome loss, mutations and chimaeras 91. There are several conventional approaches used for WGA, such as degenerate oligonucleotide primed PCR (DOP-PCR) 92, multiple displacement amplification (MDA) 93, 94, PicoPLEX 95, multiple annealing and looping-based amplification cycles (MALBAC) 70 (Table 3). DOP-PCR is a special polymerase chain reaction (PCR) that employs oligonucleotides of partially degenerated sequence with a low annealing temperature, which enables the priming at multiple evenly dispersed sites along a genome, resulting in DNA amplification at a rapid, efficient manner 92. Only 10% of physical coverage can be achieved along a single-cell genome but levels of copy number can be accurately retained using DOP-PCR 68, 76, 96, which makes it suitable for copy number variants (CNVs) detection not for single nucleotide variants (SNVs) measurement. MDA amplifies the randomly primed regions of genome by using random hexamers and Phi29 DNA polymerase in an isothermal condition, which has a low error rate due to the high proof-reading capability of Phi29 DNA polymerase 93, 94, 97. High physical coverage (>90%) can be achieved along a single-cell genome using MDA, however the coverage is not uniform, making it a good method to detect the SNVs but a poor method for CNVs detection 70, 98-100. PicoPLEX and MALBAC follow a very similar protocol, which use random degenerate primers to add a common sequence in the first step, followed by priming for subsequent PCR amplification 95. MALBAC modifies this protocol by using a new common sequences and temperature cycling, which results in more uniform amplification 70. This method has high false positive error rates in SNVs, which suggests it more suitable for CNVs detection 70. The comparison of these three methods has been done. One report find that coverage generated by MDA is better than MALBAC, leading to higher detection rates of SNVs 101. Another report reveals that coverage breath of MDA is better than MALBAC and DOP-PCR 102. But the uniformity of MALBAC and DOP-PCR is greater, which results in better detection of CNVs 101, 102. So far several groups improve the WGA method by performing MDA for single cell DNA in microfluidic emulsions. Microwell displacement amplification system (MIDAS) performs the MDA for single cell in nanoliter wells, which can both reduce contamination and improve amplification uniformity 71. Droplet MDA can minimize bias and amplification of contaminants by using microfluidic-generated picoliter droplets for WGA reactions 103. Emulsion WGA (eWGA) also divides DNA into a large number of picoliter aqueous droplets in oil for DNA amplification, which can improve the uniformity and accuracy of the amplification markedly 104. Usually the WGA induces a lot of bias and errors and library construction follows the WGA, amplifying the bias. However, a new robust, scalable, and high-fidelity method called direct library preparation (DLP) is developed for library preparation using nanoliter-volume transposition reactions without preamplification, which results in greater coverage uniformity and more reliable detection of copy-number alterations compared with existing methods 105. Most recently, a novel method, linear amplification via transposon insertion (LIANTI), is developed for single-cell WGA 106. Unlike the exponential amplification of genomic DNA in conventional WGA methods, this method combines Tn5 transposition and T7 promoter to linearly amplify the genomic DNA into thousands of copies of RNAs, followed by reverse transcription and second-strand synthesis into double-stranded amplicons for DNA library construction, which outperforms existing approaches by greatly reducing amplification bias and errors, enabling micro-CNV detection with kilobase resolution by digital-counting analysis of inferred fragment number 106.

Table 3

Methods for whole genome amplification

Methods	Description	Advantages	Disadvantages
Conventional WGA methods
Degenerate oligonucleotide primed PCR (DOP-PCR)	PCR-based amplification using degenerate oligonucleotides and thermostable polymerase	High uniformity (better for calling CNVs)	High error rate; low coverage
Multiple displacement amplification (MDA)	Isothermal amplification of randomly primed regions of genome using random hexamers and Phi29 polymerase	Low error rate; great coverage (better for calling SNVs)	Lack of uniformity
Multiple annealing and looping-based amplification cycles (MALBAC) or PicoPLEX	Limited isothermal amplification using degenerate primers followed by PCR amplification	High uniformity (better for calling CNVs)	Intermediate error rate
Modified MDA methods
Microwell displacement amplification system (MIDAS)	Perform MDA in microwell of nanoliter	Low contamination; improved uniformity than conventional MDA	Limited efficiency of amplification
Droplet MDA	Perform MDA in microfluidic-generated picoliter droplets	Low contamination; improved uniformity than conventional MDA	-
Emulsion WGA (eWGA)	Perform MDA in picoliter aqueous droplets in oil	Higher coverage; higher accuracy and finer resolution in simultaneous detection of SNVs and CNVs	-
Direct library preparation (DLP)	Directly construct single-cell whole-genome library using nanoliter-volume transposition reactions without preamplification	High uniformity; reliable for detection of CNVs	-
Novel WGA methods
Linear amplification via transposon insertion (LIANTI)	Combine Tn5 transposition and T7 promoter in vitro transcription to linearly amplify the genomic DNA	Lowest amplification bias and errors	-

Next generation sequencing

There are several commonly used high-throughput sequencing platforms, including Roche 454 sequencing 107, Ion torrent 108, 109, Illumina sequencing 110, the Complete Genomics platform 111, and the Real-time Sequencer (RS) of Pacific Biosciences 112, although with different errors rate (Table 4). These approaches may generate some common errors: nucleotide substitutions, insertions and deletions (indels), and coverage biases. It is reported that indels errors in Illumina platform are much less than substitution errors, in addition, overall error rate of Illumina is the lowest comparing with other platforms 113. The error profiles of individual platforms have been discussed in this review 113. For the Illumina platform, some errors come up as the lower quality scores in the specific positions such as some sequence motifs, especially the indels error in GC-rich sequence and around inverted repeats. These types of errors can be removed by normalizing with the quality score (Q>20) 114-117. However, some other errors introduced in library or sample preparation could not be reflected in the sequence quality score, such as the errors owing to PCR, which could be reduced by using a PCR-free library construction 114.

Table 4

Error rate of high-throughput sequencing platforms

Platforms	Substitution	Indels
Roche 454 sequencing	Low	Intermediate
Ion torrent	Low	Intermediate
Illumina sequencing	Low	Low
The Complete Genomics platform	High	Low
The Real-time Sequencer (RS) by Pacific Biosciences	High	High

Bioinformatics analysis

Analysis of high-throughput sequencing data is a very complicated and time-consuming process. In general, analyses of bulk sequencing and single cell sequencing follow a similar pipeline as briefly summarized below. 1) Pre-processing raw sequence data, including alignment to a reference genome and some data cleanup. 2) Variant discovery, including identification of genomic variation in each sample and filter the data using the appropriate methods based on experiment design. 3) Callset refinement, including using meta-data to increase genotyping accuracy and determine the overall quality of the callset. However, due to much more bias (amplification bias and chimeric DNA rearrangements) introduced during WGA, the computational methods for analysis of single cell data should be different. So far, many methods or algorithms are developed for bulk DNA sequence data, the tools originally designed for analyzing single cell data are relatively lacking. Several tools developed for bulk sequence alignment are used for single cell genome alignment currently, including BWA 70, 100, 118, Bowtie 80, 119, SOAPaligner 98, 99, 120 etc. Single nucleotide variants (SNVs) caller for bulk samples also have been used for single cell sequencing, such as SNVdetector 121, SOAPsnp 98, 99, 122, VarScan 123, 124, GATK 70, 100, 125-127, MuTech 128, etc.. Current variant callers basically do not include allelic dropout (ADO), allelic imbalance, coverage non-uniformity and false-positive errors, so they are not exactly applicable for single cell DNA sequencing. Monovar is a novel method designed for detecting and genotyping SNVs in single cell data, which outperforms standard methods commonly used to identify driver mutations and delineating clonal substructure 129. Filtering the data using the appropriate methods or criteria based on experiment design is important for single cell analysis. Due to bias and errors introduced during the previous steps, all the possibilities are required to be taken into account. It is crucial to compare the variant alleles found in the single cells to those identified in the bulk to avoid selection bias for single cell capture 74. In addition, the errors introduced in WGA and sequencing, which results in SNVs false positive calls, are required to be considered. To overcome this problem, firstly, it requires that the coverage of the mutant alleles should be 10X or more reads, which leads to very low false positive rate 99. Secondly, the criterion that a mutation occurs in bulk sample or in at least two cells simultaneously is required 70, 74. However, this strategy may lead to loss of some extremely rare mutations that only occur in one single cell. This can be partially overcome by validation of these mutations using PCR-Sanger sequencing in the original single cell DNA, but still depends on how early the errors are introduced. Loss of genome and non-uniform amplification can cause false positive or false negative in calling copy number aberrations (CNAs) detection. Improvement of uniformity of WGA is essential for calling CNAs. Usually CNAs calling also mainly relies on algorithms, including circular binary segmentation, rank segmentation, and hidden Markov models, which can normalize the bias during WGA to identify the regions that are really biologically over- or under- amplified 70, 96, 130. Ginkgo is also a web platform designed for single cell CNAs analysis, which can automatically construct copy-number profiles of cells 131. Determining genetic relationships between single cells, consequently understanding the mutational heterogeneity within tumors and inferring of a tumor evolutionary lineage trees are important task for single cell DNA sequencing. OncoNEM is a method for reconstruction of tumor clonal lineage tress from somatic SNVs of single cells, which accounts for genotyping errors and tests for unobserved subpopulations 132. Ginkon can construct phylogenetic trees of cells by using CNAs 131. Single Cell Genotyper (SCG) infers clonal genotype and population structure by inputting a cell-target matrix, which simultaneously addresses the technical noise in single cell data by taking doublets into account and predicting genotypes accordingly 133. SCITE is a tool for the identification of the evolutionary history of a tumor by analyzing single cells, which is able to calculate the maximum-likelihood mutation history by using a flexible Markov chain Monte Carlo sampling scheme 134. SiFit is another novel method for tumor phylogenies from noisy single cell data by using a finite-sites model, which could improve inference of tumor phylogenies (bioRxiv: http://dx.doi.org/10.1101/091595).

Single cell DNA sequencing application in breast cancer

Single cell DNA sequencing is mainly used for calling SNVs, CNAs, and structure variants (SVs) in breast cancer research. So far studies mainly focus on CNAs and SNVs, however, few study on SVs is published.

CNAs

The current single cell genome sequencing in breast cancer mainly focus on CNAs by whole genome sequencing (WGS), revealing the heterogeneity within the tumor and the clonal structure and evolution of tumors. Calling for CNAs does not require high coverage of sequencing. The first single cell DNA sequencing research in cancer was published by Navin etc. in 2011 76, which called CNAs using bins of different length with uniform expected unique counts for correcting the bias in WGA. Eventually, they sequenced 100 single cells of a polygenomic triple negative (ER-/PR-/Her2-) breast cancer and 100 single cells of a monogenomic triple negative breast primary cancer and its liver metastasis using single nucleus sequencing with low coverage (~6%), revealing three different subclones that might represent sequential clonal expansion. The CNAs analysis also revealed a clonal expansion leading to the formation of original tumor and the metastasis respectively. In addition, their findings suggested a new pattern of tumor growth called “punctuated clonal evolution” with few persistent intermediates, which is contradict to gradual models. This evolution model was supported by another report from the same group three years later 100. To increase coverage and reduce the ADO and false positive rate induced in WGA, they sorted nuclei from cells at G2/M phase for sequence. Fifty single nuclei of an oestrogen-receptor positive (ER+/PR+/Her2-) (ERBC) breast tumor were sequenced, which showed the tumor shared highly similar CNAs, suggesting a monoclonal population. Meanwhile, fifty single nuclei from triple-negative breast cancer were sequenced, and the CNAs profiles revealed highly similar aneuploid and hypodiploid within each subpopulation. These findings indicated that the tumor evolve by early rearrangements in chromosome, followed by stable clonal expansions. This model was further confirmed by sequencing 1,000 single cells of triple negative breast tumors from 12 patients 135. Eventually, this model established by single cell CNAs suggested important implications for tumor evolution, diagnostics and therapy in breast cancer, especially in TNBC. In addition, single cell CNAs also has been used for tracing the origin of disseminated tumor cell in breast cancer 136. In this study, genomes of 63 single cells from 6 non-metastatic breast cancer patients were sequenced. Fifty-three percent of cells were defined as disseminated tumor cells (DTCs) and remaining cells were non-aberrant 'normal' cells or 'aberrant cells of unknown origin' based on CNAs. Further, the prevalence of the aberrant cells of unknown origin was age-dependent and one subset of them was hematopoietic in origin, in addition, data also revealed that the DTCs were originated from the main tumor clone, primary tumor subpopulation, or subpopulation in lymph node metastasis. Unraveling the origin of DTCs facilitated our understanding for metastasis of breast cancer and further prevention in future. Most recently, single cell CNAs analysis from FFPE samples also have been reported in synchronously diagnosed ductal carcinoma in situ (DCIS) and invasive breast cancer 80, which showed similar CNAs profiles to those of frozen tissue and concordant with CNAs profiles of bulk tissue. They identified six different but highly related subclones, implying that either invasion was unrelated to the CNAs or invade occurred in early stage of disease followed by genome instability and that multiple diverse DCIS subclones developed in parallel then progressed to invasive disease in one case. Mover, they revealed two major subpopulations in another case, suggesting that intratumor genetic heterogeneity occurred in early stage of disease and progression from DCIS to invasive disease occurred via clonal selection.

SNVs

SNVs calling usually requires high coverage depth (>10X), which is highly cost for WGS due to a 3 Gb human genome. Thus, researchers so far primarily focused on SNVs calling mainly on protein coding region (the exome; 30-60 Mb) using single cell whole exome sequencing (WES). Two reports applied single cell WES research to myeloproliferative neoplasm and kidney tumor 98, 99. In these studies, they established a routine workflow and criteria for WES and SNVs calling, which are very important for single cell WES. The quantity of 25 of single cells were considered sufficient for calling most of mutations in this myeloproliferative cancer case, and another study also claimed that 20-40 single cells were necessary to detect the major subpopulations with 95% power 98, 135. Of the routine, they developed a reliable way to verify the called somatic mutations, which use PCR-Sanger sequencing by randomly choosing 30 somatic mutations and examining their status in 52 randomly selected cells. Finally, they identified some essential thrombocythemia related mutant genes, including SESN2 and NTRK1, revealed a monoclonal evolution in JAK2-negative myeloproliferative neoplasm and delineated the intra-tumor genetic heterogeneity, and identified some important gene such as AHNAK in kidney tumor. The first single cell WES research in breast cancer was reported by Yong Wang, et al. in 2014 100. In this study, a new approach was developed for verifying the called somatic mutations, which is single-molecule targeted deep sequencing (more than 110,000X) in the bulk tissue. They firstly sequenced 4 single tumor nuclei of ERBC from G2/M phase at high coverage breadth (80.79±3.31%) and depth (46.75X±5.06) using WGS, and found 12 clonal non-synonymous mutations (also present in bulk tissue sequencing) and 32 subclonal non-synonymous mutations. In addition, they sequenced 59 nuclei of ERBC from G2/M phase (47 tumor cells and 12 normal cells) with 92.77% coverage breadth and 46.78X coverage depth using WES, identifying 17 clonal mutations, 19 new subclonal mutations, and 26 de novo mutations that were present in only one tumor cell, such as MARCH11, CABP2. On the other hand, they sequenced 16 single tumor nuclei of TNBC from the G2/M phase and 16 single normal nuclei and identified 374 clonal non-synonymous mutations present in bulk tissue, 145 subclonal non-synonymous mutations, and 152 de novo mutations, including AURKA, SYNE2, TGFB2, etc. This data suggested that the point mutations evolved gradually, leading to extensively clonal diversity, and that the TNBC had more mutation rate (13.3), whereas the ERBC did not. This work identified some mutant genes, including some rare novel mutations that might be involved in breast cancer. Meanwhile it also raised questions, such as what roles these mutations play in breast cancer, which genes are real drivers, and which genes are passengers? It could be expected that more single cell WES on breast cancer will be reported in the coming years, which will accelerate our understanding of origin, progression and metastasis of breast cancer, facilitating prevention and therapy of this disease.

Conclusion and Future Aspects

Heterogeneity in genetics and pathologies of breast cancer casts difficulties in cancer treatment and patient care. Recently developed SCS technology makes it possible for providing a better understanding about heterogeneity of breast cancer. Although this technology develops rapidly with increasing efficiency and accuracy, some problems remain in the whole procedure of single cell preparation, whole genome amplification, library construction, sequencing or data analysis, such as low coverage, bias, errors, depending on different work platform or method. More efficient approaches for capturing single cell, a better method for WGA, a better platform for sequencing, some better tools or algorithms for data analysis still need to be developed in the future. Using single cell sequencing for deciphering breast cancer heterogeneity is still in the early stages for providing a comprehensive understanding of cancer initiation and progression, although enormous efforts have been delivered to sequence cancer cell lines, primary and metastatic cancers 76, 100, 135-137. Moreover, the present studies only showed how the tumor evolved by CNAs and how many genes got mutated in some ER and TNBC cases, yet many basic biological questions are still unknown. This includes 1) which genes are the real driver for cancer initiation, and progression; 2) could such drivers change their roles during the process of tumorigenesis; 3) what are the differences among different CSCs within a single cancer; and 4) which cell or group of cells is the origin for breast tumor. The further development of SCS technology should keep up with demand for the precise answers. The mutation spectrum will be more intact and detailed along the whole genome as well as SVs by sequencing more cells using WGS if the cost can be reduced in the future, especially the more mutations in intron region, which have important roles in breast cancer 138. Currently, the single-cell genomic approaches are limited to freshly isolated or recently frozen cancer samples due to the chemical alterations caused by formalin fixation. This technical barrier is overcome recently. Martelotte et al. (2017) developed a powerful methodology for profiling whole-genome copy-number of single nuclei isolated from FFPE breast cancer samples 80. We can foresee that future efforts may also be emphasized on the application of SCS technology to a broad spectrum in breast cancer research, such as cancer metastasis, recurrence, drug resistance, and phylogenetic, etc., which will lead better understandings of breast cancer origin, therapeutic treatment, and patient care.

136 in total

1. A hierarchy of self-renewing tumor-initiating cell types in glioblastoma.

Authors: Ruihuan Chen; Merry C Nishimura; Stephanie M Bumbaca; Samir Kharbanda; William F Forrest; Ian M Kasman; Joan M Greve; Robert H Soriano; Laurie L Gilmour; Celina Sanchez Rivers; Zora Modrusan; Serban Nacu; Steve Guerrero; Kyle A Edgar; Jeffrey J Wallin; Katrin Lamszus; Manfred Westphal; Susanne Heim; C David James; Scott R VandenBerg; Joseph F Costello; Scott Moorefield; Cynthia J Cowdrey; Michael Prados; Heidi S Phillips
Journal: Cancer Cell Date: 2010-04-13 Impact factor: 31.743

2. Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer.

Authors: H Telenius; N P Carter; C E Bebb; M Nordenskjöld; B A Ponder; A Tunnacliffe
Journal: Genomics Date: 1992-07 Impact factor: 5.736

Review 3. Cancer as an evolutionary and ecological process.

Authors: Lauren M F Merlo; John W Pepper; Brian J Reid; Carlo C Maley
Journal: Nat Rev Cancer Date: 2006-11-16 Impact factor: 60.716

4. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution.

Authors: Peter Eirew; Adi Steif; Jaswinder Khattra; Gavin Ha; Damian Yap; Hossein Farahani; Karen Gelmon; Stephen Chia; Colin Mar; Adrian Wan; Emma Laks; Justina Biele; Karey Shumansky; Jamie Rosner; Andrew McPherson; Cydney Nielsen; Andrew J L Roth; Calvin Lefebvre; Ali Bashashati; Camila de Souza; Celia Siu; Radhouane Aniba; Jazmine Brimhall; Arusha Oloumi; Tomo Osako; Alejandra Bruna; Jose L Sandoval; Teresa Algara; Wendy Greenwood; Kaston Leung; Hongwei Cheng; Hui Xue; Yuzhuo Wang; Dong Lin; Andrew J Mungall; Richard Moore; Yongjun Zhao; Julie Lorette; Long Nguyen; David Huntsman; Connie J Eaves; Carl Hansen; Marco A Marra; Carlos Caldas; Sohrab P Shah; Samuel Aparicio
Journal: Nature Date: 2014-11-26 Impact factor: 49.962

5. CD133 negative glioma cells form tumors in nude rats and give rise to CD133 positive cells.

Authors: Jian Wang; Per Ø Sakariassen; Oleg Tsinkalovsky; Heike Immervoll; Stig Ove Bøe; Agnete Svendsen; Lars Prestegarden; Gro Røsland; Frits Thorsen; Linda Stuhr; Anders Molven; Rolf Bjerkvig; Per Ø Enger
Journal: Int J Cancer Date: 2008-02-15 Impact factor: 7.396

Review 6. Tumor heterogeneity: causes and consequences.

Authors: Andriy Marusyk; Kornelia Polyak
Journal: Biochim Biophys Acta Date: 2009-11-18

7. Identification of pancreatic cancer stem cells.

Authors: Chenwei Li; David G Heidt; Piero Dalerba; Charles F Burant; Lanjing Zhang; Volkan Adsay; Max Wicha; Michael F Clarke; Diane M Simeone
Journal: Cancer Res Date: 2007-02-01 Impact factor: 12.701

8. Heterogeneity in cancer: cancer stem cells versus clonal evolution.

Authors: Mark Shackleton; Elsa Quintana; Eric R Fearon; Sean J Morrison
Journal: Cell Date: 2009-09-04 Impact factor: 41.582

9. Lineage tracing reveals Lgr5+ stem cell activity in mouse intestinal adenomas.

Authors: Arnout G Schepers; Hugo J Snippert; Daniel E Stange; Maaike van den Born; Johan H van Es; Marc van de Wetering; Hans Clevers
Journal: Science Date: 2012-08-01 Impact factor: 47.728

10. Phenotypic characterization of human colorectal cancer stem cells.

Authors: Piero Dalerba; Scott J Dylla; In-Kyung Park; Rui Liu; Xinhao Wang; Robert W Cho; Timothy Hoey; Austin Gurney; Emina H Huang; Diane M Simeone; Andrew A Shelton; Giorgio Parmiani; Chiara Castelli; Michael F Clarke
Journal: Proc Natl Acad Sci U S A Date: 2007-06-04 Impact factor: 11.205

6 in total

1. SCOPE: A Normalization and Copy-Number Estimation Method for Single-Cell DNA Sequencing.

Authors: Rujin Wang; Dan-Yu Lin; Yuchao Jiang
Journal: Cell Syst Date: 2020-05-20 Impact factor: 10.304

2. Copy Number Variation Detection by Single-Cell DNA Sequencing with SCOPE.

Authors: Rujin Wang; Yuchao Jiang
Journal: Methods Mol Biol Date: 2022

Review 3. Overview on Clinical Relevance of Intra-Tumor Heterogeneity.

Authors: Giorgio Stanta; Serena Bonin
Journal: Front Med (Lausanne) Date: 2018-04-06

Review 4. Dissecting human disease with single-cell omics: application in model systems and in the clinic.

Authors: Paulina M Strzelecka; Anna M Ranzoni; Ana Cvejic
Journal: Dis Model Mech Date: 2018-11-05 Impact factor: 5.758

Review 5. Single-cell Sequencing in the Field of Stem Cells.

Authors: Tian Chen; Jiawei Li; Yichen Jia; Jiyan Wang; Ruirui Sang; Yi Zhang; Ruiming Rong
Journal: Curr Genomics Date: 2020-12 Impact factor: 2.236

6. Genome-wide analysis of cell-Free DNA methylation profiling with MeDIP-seq identified potential biomarkers for colorectal cancer.

Authors: Xin Zhang; Tao Li; Qiang Niu; Chang-Jiang Qin; Ming Zhang; Guang-Ming Wu; Hua-Zhong Li; Yan Li; Chen Wang; Wen-Fei Du; Chen-Yang Wang; Qiang Zhao; Xiao-Dong Zhao; Xiao-Liang Wang; Jian-Bin Zhu
Journal: World J Surg Oncol Date: 2022-01-22 Impact factor: 2.754

6 in total