| Literature DB >> 32831134 |
Chongwei Bi1, Lin Wang1,2, Baolei Yuan1, Xuan Zhou1, Yu Li3, Sheng Wang3, Yuhong Pang4, Xin Gao3, Yanyi Huang5,6, Mo Li7.
Abstract
Quantifying the genetic heterogeneity of a cell population is essential to understanding of biological systems. We develop a universal method to label individual DNA molecules for single-base-resolution haplotype-resolved quantitative characterization of diverse types of rare variants, with frequency as low as 4 × 10-5, using both short- or long-read sequencing platforms. It provides the first quantitative evidence of persistent nonrandom large structural variants and an increase in single-nucleotide variants at the on-target locus following repair of double-strand breaks induced by CRISPR-Cas9 in human embryonic stem cells.Entities:
Keywords: CRISPR-Cas9; Genome editing; Human embryonic stem cell; Long-read sequencing; Nanopore sequencing; Next-generation sequencing; Somatic mutation; Structural variant
Mesh:
Substances:
Year: 2020 PMID: 32831134 PMCID: PMC7444080 DOI: 10.1186/s13059-020-02143-8
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Summary of individual sequencing runs
| Gene | Mutant allele frequency (%) / Type | Amplicon size | Sequencing platform | Read count | Reads with UMI | UMI groups for variant calling | Median read number per UMI group | UMI groups with introduced mutation | Somatic SNV count | Somatic SNV load per megabase | SV groups |
|---|---|---|---|---|---|---|---|---|---|---|---|
| EPOR | 1:100 (1%) | 168 bp | Nanopore | 17,634 | 6444 | 284 | 7 | 2 (0.7%) | 0 | N/A | N/A |
| EPOR | 1:1000 (0.1%) | 6789 bp | PacBio | 227,206 | 136,399 | 3184 | 6 | 4 (0.126%) | 192 | 9.0 | 3 |
| EPOR | 1:10,000 (0.01%) | 168 bp | Nanopore | 1,093,683 | 494,009 | 15,598 | 8 | 1 (0.006%) | 10 | 7.1 | N/A |
| EPOR | 1:10,000 (0.01%) | 168 bp | Illumina | 7,488,257 | 7,236,007 | 132,341 | 7 | 5 (0.004%) | 85 | 7.1 | N/A |
| PANX1 (Pan1) | WT | 7077 bp | Nanopore | 576,583 | 165,628 | 2810 | 6 | N/A | 73 | 3.8 | 0 |
| PANX1 (Pan3) | WT | 6595 bp | Nanopore | 389,726 | 133,215 | 3867 | 7 | N/A | 103 | 4.1 | 0 |
| PANX1 (Pan1) | Cas9 editing | 7077 bp | Nanopore | 2,761,805 | 613,147 | 3479 | 7 | N/A | 275 | 11.3 | 189 (5.4%) |
| PANX1 (Pan3) | Cas9 editing | 6595 bp | Nanopore | 3,078,165 | 1,042,582 | 7281 | 10 | N/A | 624 | 13.1 | 204 (2.8%) |
Fig. 1IDMseq for detection of subclonal variants. a Schematic representation of IDMseq. Individual DNA molecules are labeled with unique UMIs and amplified for sequencing on appropriate platforms (e.g., Illumina, PacBio, and Nanopore). During data analysis, reads are binned by UMIs to correct errors introduced during amplification and sequencing. Both SNV and SV calling are included in the analysis pipeline. b Examples of Integrative Genomics Viewer (IGV) tracks of UMI groups in which the spike-in SNV in the 1:10,000 population was identified by IDMseq and VAULT. The knock-in SNV is indicated by the red triangle in the diagram of the EPOR gene on top, and also shown as red “T” base in the alignment map. The gray bars show read coverage. The ten colored bars on the left side of the coverage plot represent the UMI sequence for the UMI group. Individual Nanopore (top) and Illumina (bottom) reads within the group are shown under the coverage plot. c Large SVs detected by IDMseq in the 1:1000 population on the PacBio platform. Three UMI groups are shown with the same 2375-bp deletion. Group 1 represents one haplotype, and groups 2 and 3 represent a different haplotype. Colored lines represent the SNPs detected in each group. Thick blue boxes: exons; thin blue boxes: UTRs. Thin vertical red lines in the gene diagram represent PCR primer location. d Distribution of SNVs detected by PacBio sequencing in conjunction with IDMseq and VAULT. One of the SNVs was also found in the Nanopore dataset. The spike-in SNV (1:1000) is indicated by the red triangle. The table on the right summarizes the frequency of SNV-associated records in different annotation categories. The numbers in the table represent annotation records from all transcript isoforms, so the same SNV may be recorded more than once. e Frequency distribution of the variant allele fraction of SNVs detected by IDMseq in PacBio sequencing of the EPOR locus. f The spectrum of base changes among somatic SNVs. The majority of base changes are G to A and C to T. g Comparison between observed VAF and expected VAF in different experiments and sequencing platforms
Fig. 2Quantitative analysis of DNA repair outcome of Cas9-induced DNA double-strand break in hESCs. a Schematic representation of the experimental design. Cas9 RNPs designed to cleave the first exon of PANX1 were electroporated to H1 hESCs. IDMseq was used to analyze the locus in edited hESCs 48 h later. b Large SVs detected by IDMseq and VAULT in edited hESCs. Five SV groups were shown with deletion length ranging from 419 to 5494 bp. The red dotted line represents the Cas9 cutting site. The coverage of Nanopore reads is shown on top of each track in gray. The colored lines on the left side of the coverage plot represent the UMI for the group. Individual Nanopore reads within the group are shown under the coverage plot. c The frequency of deletions or insertions of different size detected in Pan1-edited hESCs. Certain deletions and insertions occur at disproportionally high frequencies. For example, a 5494-bp deletion was found in 56 UMI groups, which indicates a possible hotspot of Cas9-induced large deletion. d Distribution of SNVs detected by IDMseq and VAULT in Pan1-edited hESCs. Somatic SNVs are shown in green, while the cell-line specific SNVs are shown in red (using 40 bp of bin size in the figure). Somatic SNVs cannot be detected if variant calling is done en masse without UMI analysis (see the coverage track). Cell-line specific SNVs are detected in ensemble analysis (see colored lines in the coverage track) and most of them have been reported as common SNPs in dbSNP-141 database (common SNP track). The Cas9 cut site is indicated by the red triangle. e The number of presumed somatic SNVs per Mb (y-axis) in PANX1 WT and Cas9-edited cells. f Analysis of somatic mutations detected in Pan1-edited hESCs based on functional annotation and base change. The majority of base changes are G to A and C to T