| Literature DB >> 30837529 |
Alexei Slesarev1, Lakshmi Viswanathan2, Yitao Tang2, Trissa Borgschulte3, Katherine Achtien3, David Razafsky3, David Onions2, Audrey Chang2, Colette Cote2.
Abstract
The robust detection of structural variants in mammalian genomes remains a challenge. It is particularly difficult in the case of genetically unstable Chinese hamster ovary (CHO) cell lines with only draft genome assemblies available. We explore the potential of the CRISPR/Cas9 system for the targeted capture of genomic loci containing integrated vectors in CHO-K1-based cell lines followed by next generation sequencing (NGS), and compare it to popular target-enrichment sequencing methods and to whole genome sequencing (WGS). Three different CRISPR/Cas9-based techniques were evaluated; all of them allow for amplification-free enrichment of target genomic regions in the range from 5 to 60 fold, and for recovery of ~15 kb-long sequences with no sequencing artifacts introduced. The utility of these protocols has been proven by the identification of transgene integration sites and flanking sequences in three CHO cell lines. The long enriched fragments helped to identify Escherichia coli genome sequences co-integrated with vectors, and were further characterized by Whole Genome Sequencing (WGS). Other advantages of CRISPR/Cas9-based methods are the ease of bioinformatics analysis, potential for multiplexing, and the production of long target templates for real-time sequencing.Entities:
Mesh:
Year: 2019 PMID: 30837529 PMCID: PMC6401131 DOI: 10.1038/s41598-019-39667-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic representation of main steps in the RGEN-R and RGEN-TdT protocols.
Figure 2Schematic representation of main steps in the RGEN-D protocol.
Relative quantification (RQ) of target locus after enrichment using comparative ΔΔCT method.
| Sample | AD49ZG | AD49ZH | ||||||
|---|---|---|---|---|---|---|---|---|
| Protocol | RGEN-TdT | RGEN-D (A) | RGEN-R | RGEN-TdT | ||||
| DNA sample type | Calibarator | Enriched | Calibrator | Enriched | Calibrator | Enriched | Calibrator | Enriched |
| Endogenous control Spag5 average CT | 37.2 ± 0.15 | NDd | 24.5 ± 0.15 | 29.8 ± 0.16 | 25.21 ± 0.15 | 35.85 ± 0.19 | 30.1 ± 0.12 | ND |
| Target IGHM average CT | 36.1 ± 0.17 | 37.5 ± 0.17 | 25.0 ± 0.10 | 25.2 ± 0.09 | 17.55 ± 0.08 | 25.05 ± 0.10 | 30.0 ± 0.10 | 37.2 ± 0.15 |
| ∆CT Spag5-IGHMa | 1.1 ± 0.22 | ND | −0.5 ± 0.18 | 4.6 ± 0.18 | 7.66 ± 0.17 | 10.80 ± 0.21 | 0.1 ± 0.16 | ND |
| −ΔΔCT − (ΔCT − ΔCT,calibrator) | 0.0 ± 0.22 | ND | 0.00 ± 0.18 | 4.1 ± 0.18 | 0.00 ± 0.17 | 3.14 ± 0.21 | 0.0 ± 0.16 | ND |
| RQ IGHM relative to controlc | 1 (0.6–1.2) | 56.8* | 1 (0.6–1.1) | 17.1 (19.4–15.1) | 1 (0.6–1.1) | 8.8 (7.6–10.2) | 1 (0.6–1.1) | 29.2* |
Cgr Spag5 and Hsa IGHM are TaqMan assays for the Cricetulus griseus Spag5 gene located outside the vector integration locus (endogenous control) and for the human immunoglobulin transgene in integrated vectors (target), respectively. aThe average ∆CT determined by subtracting the average IGHM CT value from the average Spag5 CT value. bThe calculation of −∆∆CT involves subtracting ∆CT calibrator value from the ∆CT target value. cThe RQ for enriched IGHM is calculated using the equation: . dND, not determined; *Determined by NGS.
Figure 5Map of the integration vector pCLD1 with breakpoints in the AD49ZG cell line. (A) Different vector features are shown color-coded. Sizes of guided RNAs used in the RGEN-D protocol are not to scale. All gRNAs used in the RGEN-D protocol are shown; only gRNA1 was used for the RGEN-TdT protocol. (B) Identified fusion breakpoints are shown. Blue boxes indicate direct repeat. Fusion breakpoints with E. coli sequences are noted (see text).
Summary of the NGS analysis of targeted capture experiments.
| Clone | Target-enrichment method | Library Insert Size (bp) | Total Reads (M) | % Mapped | Mean Coverage | Vector Coverage | Vector Copy No | Fold Enrichment |
|---|---|---|---|---|---|---|---|---|
| AE54SL | RGEN-D (B) | 630 | 22.6 | 98.4 | 2 | 20.1 | ND | 5 |
| AE54SL | xGen | 372 | 32.3 | 99.8 | 3.2 | 14,078* (354,049) | ND | 2,301 |
| AE54SL | WGS | 615/3,985 | 344.7 | 92 | 27 | 46 | 4 | 1 |
| AD49ZG | RGEN-TdT | 641 | 17.3 | 95.2 | 0.5 | 56.8 | ND | 56.8 |
| AD49ZG | RGEN-D (A) | 655 | 18.2 | 94.9 | 0.26 | 16.9 | ND | 16.3 |
| AD49ZG | xGen | 378 | 42.2 | 99.4 | 5 | 8,110* (540,685) | ND | 821 |
| AD49ZG | TLA | 193 | 1.24 | 99.8 | 0.05 | 905* (8,625) | ND | 9,106 |
| AD49ZG | WGS (Illumina) | 450 | 754.9 | 97.9 | 68.3 | 133.1 | 4 | 1 |
| AD49ZG | WGS (PacBio) | 13,200 (N50) | 6.9 | 84.9 | 14.5 | 26.4 | 4 | 1 |
| AD49ZH | RGEN-TdT | 400 | 10 | 94.2 | 0.28 | 82 | ND | 29.2 |
| AD49ZH | xGen | 403 | 44.2 | 99.3 | 5.3 | 98,006* (970,360) | ND | 1,849 |
| AD49ZH | WGS (Illumina) | 454 | 777.4 | 98 | 70.1 | 728.9 | 20 | 1 |
*Vector coverage after removal of PCR duplicates. PCR duplication rates were calculated according to Bansal[40]. The integrated vector copy numbers were calculated using the WGS Illumina® and PacBio data, as ratios of the vector coverage to the mean chromosome coverage. The fold enrichments were normalized by the corresponding vector copy numbers. Abbreviations used: ND, not determined; WGS, whole genome sequencing; TLA, targeted locus amplification[6]; xGen, hybridization capture of DNA libraries for NGS target enrichment[1]; RGEN-D, RGEN-TdT, methods developed in this paper (Figs 1 and 2).
Figure 3Map of the integration vector in the AE54SL cell line. Different vector features are shown color-coded. Sizes of guide RNAs used in the RGEN-D protocol are not to scale. Two identified fusion breakpoints are shown (chr10|vector and vector|vector junction (head-to-head)).
Figure 4Reconstruction of the integration site rearrangements in the chromosome 10 of the AE54SL cell line. (A) Copy number and structural variations for the chromosome 10 visualized by SplitThreader[51]. Copy number segmentation is based on coverage averages in 10 kbp bins. (B) A parsimonious model that partially explains the copy number and rearrangements found in the AE54SL chromosome 10. Assuming that at some point there was chromosome 10 aneuploidy (see Supplementary Fig. 4), the observed DOC indicates the presence of one intact chromosome, while two or even 4 more copies formed isochromosomes. Such event(s) would create a partial monosomy of the 6 Mb region at the 5′ end of chromosome 10 and a trisomy of the remaining ~8.7 Mb portion. At least one putative isochromosome has the 354 bp unique loop (colored blue) and ~300 kb deletion (mR) in one arm.
Figure 6(A) Structure of the integration site in AD49ZG genome. The integrated sequence was assembled using Illumina® and PacBio WGS datasets (see Supplemental Information for details). (B) Coverage of transcription reads aligned to the assembled AD49ZG insert with flanking chromosome 7 sequences (visualized by VING[56] and Gviz[57]). The insert sequence was annotated by Rapid Annotations using Subsystems Technology (RAST)[58] (Supplementary Table 3 and Supplementary Fig. 11).