| Literature DB >> 26344043 |
Tyler Garvin1, Robert Aboukhalil1, Jude Kendall1, Timour Baslan1,2, Gurinder S Atwal1, James Hicks1, Michael Wigler1, Michael C Schatz1.
Abstract
We present Ginkgo (http://qb.cshl.edu/ginkgo), a user-friendly, open-source web platform for the analysis of single-cell copy-number variations (CNVs). Ginkgo automatically constructs copy-number profiles of cells from mapped reads and constructs phylogenetic trees of related cells. We validated Ginkgo by reproducing the results of five major studies. After comparing three commonly used single-cell amplification techniques, we concluded that degenerate oligonucleotide-primed PCR is the most consistent for CNV analysis.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26344043 PMCID: PMC4775251 DOI: 10.1038/nmeth.3578
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1The Ginkgo flowchart for performing single-cell copy-number analysis. Usage and parameters are described in the online methods and on the website.
Figure 2(a) Lowess fit of GC content with respect to log normalized bin counts for all samples in each of the 9 datasets analyzed: 3 for MDA (top left – green), 3 for MALBAC (center left – orange), and 3 for DOP-PCR (bottom left – blue). Each colored line within a plot corresponds to the lowess fit of a single sample. The dashed lines show a two fold increase or decrease in average observed coverage. Note that the three MDA datasets (top left) have a different y-axis scale due to the large GC biases present. (b) The median absolute deviation (MAD) of neighboring bins: A single pair-wise MAD value is generated for each sample in a given dataset and represented by a box and whisker plot. The bold center line represents the mean, the box boundaries represent the quartiles, and the whiskers represent the remaining data points. The high biases present in the MDA datasets make comparing DOP-PCR and MABLAC samples difficult. Figure 3 of the Online Methods shows this comparison more clearly.
List of the 9 datasets analyzed across 8 different studies.
| Study | WGA Method | Tissue Type | # of cells | Accession |
|---|---|---|---|---|
| Kirkness | MDA | Sperm | 11 | SRP017516 |
| Wang | MDA | Sperm | 31 | SRA053375 |
| Evrony | MDA | Neuron | 8 | SRA056303 |
| Lu | MALBAC | Sperm | 99 | SRA060945 |
| Ni | MALBAC | Lung | 29 | SRP029757 |
| Hou | MALBAC | Oocyte | 181 | SRA091188 |
| Navin | DOP-PCR | Breast (T10) | 100 | SRX021401 |
| Navin | DOP-PCR | Breast (T16) | 100 | SRX037035/SRX037132 |
| McConnnell | DOP-PCR | Neuron | 109 | SRP030642 |
Note that there are two distinct datasets from the same Navin et al. study. We validate Ginkgo by reproducing major findings of several single cell sequencing studies that employ three different WGA techniques: MALBAC, DOP-PCR/WGA4, and MDA. Take together, we analyze the data characteristics of nine datasets across five tissue types (Table 2). The Ginkgo parameters for these datasets are described in the main text, and additional parameters are noted below.
Figure 3A comparison of MAD between the Navin et al. (T10) shown in blue and Ni et al. (CTC) shown in orange. As the bin offset increases the separation between the mean T10 MAD and mean CTC MAD grows.
Figure 4Histograms of normalized bin counts across the CTC and T10 datasets, for a high-, typical-, and poor-quality cell. The rightmost column contains histograms of high quality cell population averages. Distinct peaks are representative of clean data from which accurate copy number calls can be made.
Figure 5(a) Model representation of the 9 distinct subclones generated by simulation of 100 copy number events with respect to the reference. (b) Hierarchical clustering of the 90 samples by Ginkgo. Ginkgo perfectly recovers the underlying subclonal population structure.