| Literature DB >> 32807205 |
Xian F Mallory1,2, Mohammadamin Edrisi1, Nicholas Navin3, Luay Nakhleh4.
Abstract
Copy number aberrations (CNAs), which are pathogenic copy number variations (CNVs), play an important role in the initiation and progression of cancer. Single-cell DNA-sequencing (scDNAseq) technologies produce data that is ideal for inferring CNAs. In this review, we review eight methods that have been developed for detecting CNAs in scDNAseq data, and categorize them according to the steps of a seven-step pipeline that they employ. Furthermore, we review models and methods for evolutionary analyses of CNAs from scDNAseq data and highlight advances and future research directions for computational methods for CNA detection from scDNAseq data.Entities:
Keywords: Copy number aberrations; Intra-tumor heterogeneity; Single-cell DNA sequencing; Tumor evolution
Mesh:
Substances:
Year: 2020 PMID: 32807205 PMCID: PMC7433197 DOI: 10.1186/s13059-020-02119-8
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Peculiarities of scDNAseq technologies
| Method | Uniformity | Coverage | Throughput | Suitability for CNA |
|---|---|---|---|---|
| MDA [ | Low | High | Low | N |
| DOP-PCR [ | High | Low | High | Y |
| MALBAC [ | High | High | Low | Y |
| C-PCR-L [ | High | Low | High | Y |
| SCI-seq (xSDS) [ | Medium | Low | High | Y1 |
| DLP [ | High | Low | High | Y |
| LIANTI [ | High | High | Low2 | Y |
| TnBC [ | High | Low | High | Y |
| Flow cytometry [ | High | Low | High | Y |
| 10x [ | High | Low | High | Y |
| DLP+ [ | High | Low | High | Y |
Eleven scDNAseq technologies are listed. Uniformity, coverage, throughput, and suitability for CNA refer to the uniformity of sequencing coverage, the sequencing coverage over the whole genome, the number of cells that can be sequenced at one time, and whether the technology is suitable for CNA calling
1With cell filtering
2High-throughput sequencing can be achieved by adding combinatorial cellular barcodes
Fig. 1The seven steps in CNA detection in single-cell sequencing. a Binning. The number of reads within each bin (bottom) is computed from the pileup of the reads according to where they align (top). b GC correction. Scatter plot of read count per bin with respect to the GC content of the bin. The red curve represents the corresponding regression. c Mappability correction. Scatter plot of read count per bin with respect to the mappability of the bin. The red curve represents the corresponding regression. d Removal of outlier bins. Scatter plot of read count per bin with respect to the genomic position is shown. Outlier bins are shown in red, in contrast with the rest of the genome which are in green. e Removal of outlier cells. A Lorenz curve for the read count at all bins is shown. Gini coefficient is twice the highlighted area between the Lorenz curve and the diagonal line. The higher the Gini coefficient, the more likely the cell is an outlier. f Segmentation. Scatter plot of read count per bin with respect to the genomic position is shown. Dotted vertical lines correspond to the segments’ boundaries. g Calling the absolute copy numbers. The copy number—a non-negative integer—for each segment is determined
Fig. 2The three approaches for segmentation. In all three panels, a scatter plot of the read count per bin with respect to the genomic position of the bin is shown. a The sliding-window approach. A window is passed across the genome, and a genomic region within a window that is significantly different in terms of read count from the rest of the genome (e.g., the window defined by the two dotted vertical lines) is declared as a segment. b The objective function-based approach. Three piecewise constant functions are shown (two in red and one in green) and represent segmentation candidates. Each piece in the function corresponds to a segment, and the value of the piece corresponds to the copy number at that segment. The function in green is the optimal one with respect to the fidelity to the data and the constraint on the number of breakpoints, whereas the two in red are either over-segmented (top) or under-segmented (bottom). c The HMM-based approach. States of the HMM correspond to the different copy numbers, and a transition between two different states indicates a change in the segment. In the read-count panel, colors of the dots represent the absolute copy number of the various genomic bins (red for 1, yellow for 2, and green for 4) as obtained by parsing the data with respect to the HMM (bottom). The actual path of the state transitions is shown in the middle and highlighted with blue arrows on the HMM as well. The arrows are numbered to indicate the order of the transitions
Eight methods for calling CNVs or CNAs
| S | D | N | P | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HMMcopy | Shah et al. [ | S | N | N | Y | Y | Y | Y | Y | N | 3 | Y |
| Ginkgo | Garvin et al. [ | S | N | N | N | Y | Y | N | Y | Y | 1 | Y |
| AneuFinder | Bakker et al. [ | S | N | N | Y | Y | Y | N | Y | Y | 3 | Y |
| SCNV | Wang et al. [ | S | N | Y | Y1 | N | Y | N | N | Y | 1 | Y |
| CopyNumber | Nilsen et al. [ | M | N | N | N | Y | N | N | Y | N | 2 | N |
| SCOPE | Wang et al. [ | M | N | Y | N | Y | Y | Y | Y | Y | 1 | Y |
| CHISEL | Zaccaria et al. [ | M | Y | N | Y | Y | Y | N | Y | N | 2 | Y |
| SCICoNE | Kuipers et al. [ | M | N | N | N | Y | Y | Y | N | N | 2 | Y |
Four features of the methods are highlighted. S, whether the method applies to a single sample (S) or multiple samples simultaneously (M); D, whether the method assumes diploidy of the sample (Y) or not (N); N, whether the method requires genome of a normal cell (Y) or not (N); P, whether the method is parametric (Y) or not (N). Each method is also classified according to which of the seven steps of Fig. 1 it employs. 1, the method applies binning (Y) or not (N); 2, the method applies GC correction (Y) or not (N); 3, the method applies mappability correction (Y) or not (N); 4, the method removes outlier bins (Y) or not (N); 5, the method removes outlier cells (Y) or not (N); 6, the method applies sliding window (1), objective function fitting (2), or HMM (3) for segmentation; and 7, the method calls the absolute copy number (Y) or not (N). “NA” denotes that the step is not applicable to the method
1The model is automatically calibrated given the identified normal cells
Fig. 3Three modes of evolution of multigene families. a Concerted evolution. b Divergent evolution. c Evolution by birth and death process. (Reproduced from [95])