| Literature DB >> 26251698 |
Yong Hou1, Kui Wu1, Xulian Shi2, Fuqiang Li1, Luting Song1, Hanjie Wu1, Michael Dean3, Guibo Li1, Shirley Tsang4, Runze Jiang1, Xiaolong Zhang5, Bo Li1, Geng Liu1, Niharika Bedekar6, Na Lu2, Guoyun Xie1, Han Liang1, Liao Chang1, Ting Wang7, Jianghao Chen7, Yingrui Li1, Xiuqing Zhang8, Huanming Yang9, Xun Xu1, Ling Wang7, Jun Wang10.
Abstract
BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleotide-primed PCR (DOP-PCR) and multiple annealing and looping-based amplification cycles (MALBAC). However, a comprehensive comparison of variations detection performance between these WGA methods has not yet been performed.Entities:
Keywords: DOP-PCR; MALBAC; MDA; Next-generation sequencing; Single-cell resequencing; Variations detection; Whole genome amplification
Mesh:
Year: 2015 PMID: 26251698 PMCID: PMC4527218 DOI: 10.1186/s13742-015-0068-3
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Fig. 1A narrowing-down strategy used to compare WGA methods cost-effectively. We describe the narrowing-down strategy using 3 panels (a, b, c). We perform LWGS comparison including genome coverage and uniform using YH single cells which are amplified by seven WGA kits based on DOP, MDA and MALBAC methods in panel A. We additionally compare the CNVs detection using simulated data of YH single cells in panel A. In panel B, we perform the deep WGS comparison of biases and SNVs detection using deep-sequenced YH or SW480 single cells amplified by DOP, MDA or MALBAC respectively. Corresponding bulk data is used as unamplified control. In panel C, we further compare the CNVs detection between MDA-2 and MALBAC amplified data using real data of BGC823 single cells. *Ion Proton sequencing data; #Illumina and Ion Proton sequencing data. LWGS, low-coverage whole-genome sequencing; WGS: whole genome sequencing. DOP-1,GenomePlex® Single Cell WGA Kit; DOP-2, Silicon Biosystem Ampli™ WGA Kit; DOP-3, NEB Single Cell WGA Kit; MDA-1, Qiagen REPLI-g Mini Kit; MDA-2, Qiagen REPLI-g Single Cell Kit; MDA-3, GE Healthcare illustra GenomiPhi V2 DNA Amplification Kit; MALBAC,Yikon Genomics Single Cell Whole Genome Amplification Kit. Data marked in purple is downloaded
Fig. 2LWGS Comparison of recovery sensitivity and amplification uniformity between WGA methods. a The recovery sensitivity comparison of three different WGA methods using 0.1X randomly extracted LWGS data. The histogram and line graph show the mean mapping ratio and mean duplication ratio of different methods, respectively. b The mean normalized depth distribution of the seven WGA kits using the 0.1X sequencing data. The normalized read depth is defined as the ratio of the mean depth of all reads in each window to the mean depth of the whole genome. The binning window is 100 kb. The dashed curve is plotted using the simulated data (1000 dots) that followed the Poisson distribution (λ = 30) and normalized by dividing by 30. c A comparison of mean normalized depth distribution in chr15:q11.1-q26.3 between different WGA kits. The binning window is 100 kb. YH-mix is used as the unamplified control
Deep-sequencing statistics of single cells amplified by different kits
| Sample index | Number of mapped bases (bp) | Read mapping ratio (%) | Read duplication ratio (%) | GC content (%) | Mean depth (X) | Genome coverage (%) |
|---|---|---|---|---|---|---|
| MDA-2_46 | 109,113,164,019 | 98.42 | 2.44 | 43.63 | 38.20 | 94.30 |
| MDA-2_47 | 82,746,143,862 | 98.49 | 1.73 | 42.74 | 28.95 | 99.63 |
| MDA-2_66 | 102,165,179,471 | 98.54 | 6.52 | 40.66 | 35.84 | 99.24 |
| MDA-3_45 | 52,911,771,602 | 99.09 | 6.17 | 39.40 | 18.52 | 94.35 |
| DOP-1_97 | 8,294,107,956 | 86.18 | 39.24 | 40.65 | 3.00 | 23.23 |
| SW480-1 | 55,385,452,648 | 94.34 | 7.50 | 42.95 | 19.45 | 91.33 |
| SW480-2 | 57,344,758,117 | 94.69 | 7.51 | 42.86 | 20.15 | 91.63 |
| SW480-3 | 66,569,935,382 | 93.54 | 19.64 | 40.40 | 23.42 | 83.33 |
| SW480-4 | 78,746,822,579 | 92.56 | 21.83 | 39.91 | 27.76 | 70.88 |
| SW480-5 | 40,966,360,470 | 89.53 | 7.05 | 40.36 | 14.50 | 74.87 |
| SW480-HEC | 104,576,495,349 | 96.49 | 3.82 | 42.84 | 36.59 | 99.01 |
| SW480-SCD | 88,079,534,311 | 91.39 | 3.43 | 39.42 | 30.99 | 99.13 |
| YH-mix | 109,269,489,080 | 95.97 | 10.77 | 41.39 | 38.30 | 99.68 |
Fig. 3Bias and chimeras comparison using WGS data. a The cumulative distribution of sequencing fold depth of deep WGS data amplified by DOP-1, MDA-2, MDA-3, and MALBAC, respectively. The standard Poisson Cumulative Distribution (λ = 30) is plotted (dashed), and YH-mix and SW480 bulk data are presented as a control. It was related to Additional file 11: Table S6. b Normalized read depth distribution in repeat regions (Alu and L1 regions) and the entire genome of deep-sequenced data amplified by different WGA kits. The normalized read depth is calculated for each Alu/L1 region and for each window binning 100 kb of the entire genome. c Normalized read depth distribution in regions with different GC content of deep-sequenced data amplified by different WGA kits. The 100 kb windows with GC content >50 % are defined as ‘HighGC’ windows, <35 % as ‘LowGC’ windows, and others as ‘MiddleGC’ windows. d Histogram of effective consensus genotype efficiency, and line graph of the concordant ratio of all deep-sequenced cells amplified by different WGA kits compared to the golden control. e The percentage of different types of chimeras detected in MDA-2-amplified YH single cells. CTX, inter-chromosomal translocation; ITX, intra-chromosomal translocation; DEL, deletion: INS, insertion: INV, inversion. f Boxplot of the length distribution of ITXs shared between MDA-2-amplified cells and YH-mix versus the chimera ITXs that are unique in single cells. p < 0.01, Mann–Whitney-Wilcoxon test
Comparison of consensus genotypes and SNVs detection accuracy of deep-sequenced data amplified by MDA and MALBAC
| Allele type | Golden control for SW480 cells | ||||||
|---|---|---|---|---|---|---|---|
| HOM ref. | HOM mut. | HET ref. | Total | Consistency (%) | |||
| 1,762,437.00 | 403,431.00 | 173,098.00 | 2,338,966.00 | ||||
| MALBAC mean | HOM ref. | 2 | 849,057.40 | - | - | ||
| 1 | - | - | 10,352.40 | 859,455.20 | 98.79 | ||
| 0 | - | 45.40 | - | ||||
| HOM mut. | 2 | - | 266,889.00 | - | |||
| 1 | - | - | 18,507.40 | 285,625.60 | 93.44 | ||
| 0 | 213.40 | 11.20 | 4.60 | ||||
| HET ref. | 2 | - | - | 58,948.80 | |||
| 1 | 2,287.20 | 6,860.40 | 8.80 | 68,105.20 | 86.56 | ||
| 0 | - | 0.00 | - | ||||
| Total | 851,558.00 | 273,806.00 | 87,822.00 | 1,213,186.00 | 96.84 | ||
| Coverage (%) | 48.32 | 67.87 | 50.74 | 51.87 | - | ||
| Allele type | Golden control for YH cells | ||||||
| HOM ref. | HOM mut. | HET ref. | Total | Consistency (%) | |||
| 1,584,649.00 | 270,225.00 | 351,490.00 | 2,206,364.00 | ||||
| MDA mean | HOM ref. | 2 | 1,373,228.00 | - | - | ||
| 1 | - | - | 21,871.00 | 1,395,113.33 | 98.43 | ||
| 0 | - | 14.33 | - | ||||
| HOM mut. | 2 | - | 256,682.67 | - | |||
| 1 | - | - | 27,674.00 | 284,365.67 | 90.26 | ||
| 0 | 7.33 | 1.67 | 0.00 | ||||
| HET ref. | 2 | - | - | 256,185.67 | |||
| 1 | 212.67 | 326.33 | 2.33 | 256,727.00 | 99.79 | ||
| 0 | - | 0.00 | - | ||||
| Total | 1,373,448.00 | 257,025.00 | 305,733.00 | 1936,206.00 | 97.41 | ||
| Coverage (%) | 86.67 | 95.12 | 86.98 | 87.76 | - | ||
Mean coverage and consistency are calculated using the data amplified by the same WGA method according to Additional file 13: Table S7. HOMref, homozygotes where both alleles are identical to the reference; HOMmut, homozygotes where both alleles are different from the reference; HETref, heterozygotes where only one allele is identical to the reference. We formulate the mean counts of genotyped alleles of single cell sequencing sites that are consistent with ‘golden control’ at both alleles, at one allele, or that are inconsistent at both alleles as 2, 1, and 0, respectively
Comparison of consensus genotypes and SNVs detection accuracy of deep-sequenced data amplified by MDA and MALBAC
| Control/sample | Heterozygous (FP/ADO/Efficiency) | Homozygous (FP/Efficiency) | Total (FP/Efficiency) | FP ratio | ADO Ratio |
|---|---|---|---|---|---|
| YH-mix (Unamplified control) | 2051,282 | 1,598,291 | 3,649,573 | - | - |
| MDA-2_46 | 777,908 (5563/390,038/37 %) | 1,747,004 (390,107/84 %) | 2,524,912 (395,670/58 %) | 1.32E-04 | 0.3340 |
| MDA-2_47 | 1,807,282 (6517/14,124/87 %) | 1,562,036 (14,177/96 %) | 3,369,318 (20,694/91 %) | 6.90E-06 | 0.0078 |
| MDA-2_66 | 1,651,733 (6347/55,158/80 %) | 1,587,456 (55,195/95 %) | 3,239,189 (61,542/87 %) | 2.05E-05 | 0.0323 |
Fig. 4Read-data CNVs detection comparison between MALBAC and MDA-2 amplified data. a Taking the simulated YH-mix data as control, sensitivity and specificity of CNVs (≥1 Mb) in simulated single YH cells amplified by different WGA methods are bar-plotted. b CNVs of BGC823 single cells amplified by MALBAC or MDA-2. BGC823 single cells are sequenced on the Ion Proton sequencer (~0.5X) as control. Bulk BGC823 sequencing data (bottom row) are sequenced by PE-100 on an Illumina Hiseq 2000 (~50X), and ~1X data was extracted randomly to detect CNVs. Green, red, and blue represent normal, amplification, and deletion, respectively