| Literature DB >> 32042315 |
Joanna C Dawes1,2, Philip Webster1,2,3, Barbara Iadarola1,2, Claudia Garcia-Diaz4, Marian Dore1,2, Bruce J Bolt1,2, Hamlata Dewchand1,2, Gopuraja Dharmalingam1,2, Alex P McLatchie5, Jakub Kaczor1,2, Juan J Caceres6, Alberto Paccanaro6, Laurence Game1,2, Simona Parrinello4, Anthony G Uren1,2.
Abstract
BACKGROUND: Ligation-mediated PCR protocols have diverse uses including the identification of integration sites of insertional mutagens, integrating vectors and naturally occurring mobile genetic elements. For approaches that employ NGS sequencing, the relative abundance of integrations within a complex mixture is typically determined through the use of read counts or unique fragment lengths from a ligation of sheared DNA; however, these estimates may be skewed by PCR amplification biases and saturation of sequencing coverage.Entities:
Keywords: Genetics; Genomics; Integration site cloning; Mutagenesis; Retroviruses
Year: 2020 PMID: 32042315 PMCID: PMC7001329 DOI: 10.1186/s13100-020-0201-4
Source DB: PubMed Journal: Mob DNA
Fig. 1Comparison of LUMI-PCR with regular Illumina dual index library prep and with regular splinkerette PCR library prep. a) The steps of a traditional ligation-mediated PCR strategy using adapters with non-complementary segments and two rounds of nested PCR (e.g. splinkerette). The adapter strands are partially non-complementary and the lower strand (dark green) has no complementary primer. The adapter primer (blue) cannot bind to a template until the first strand has been synthesised from the virus primer (red). Subsequent steps will amplify virus flanked genomic regions but not other regions. b) Standard Illumina library preparation protocols for single index libraries. Using ligation of adapters, an index (black) is included in the adapter for each library, with one copy per fragment being present in the final product. Both strands are amplified yielding different termini at each end for flow cell binding (blue & purple). c) Illumina Nextera library prep using tagmentation. Adapters are added via Tn5 transposase. Both strands are amplified simultaneously using primer pairs that add an index at each end. d) LUMI-PCR is a hybrid protocol for ligation-mediated PCR that uses one index in the adapter and one in the secondary PCR step. A unique molecular identifier (UMI orange) is included adjacent to the adapter index (black) for quantitation of library fragments. The placement of the index is switched from the strand normally used in Illumina adapters such that it will be retained after the first strand synthesis from the virus primer. The flow cell binding sequence normally present in the Illumina adapter (purple) is included in the LTR primer of the secondary PCR amplification. e) A modified dual index Nextera sequencing protocol is used with custom primers and modified numbers of bases read from each index depending on the length of the custom index and the UMI (our protocol uses 10 bp indexes and an 8–10 bp UMI). The custom virus primer can be nested back from the virus genome junction to allow the junction to be sequenced
Fig. 2Quantitation of integration abundance and number is a function of sequencing coverage. a) The total number of sheared fragment length counts (blue) is substantially lower than the number of UMI counts (red) in each of four replicate libraries. b) A single library (#1179) was reanalysed using subsets of read pairs (1000, 3000, 10,000, 100,000 and 300,000 read pairs). Quantitation of the ten most clonal integrations for each of these subsets is shown using unique sheared fragment lengths identified per integration (blue) and UMI counts per integration (red). These values are similar when sampling lower numbers of reads but as the sample size increases, the sheared fragment length counts becomes saturated. c & d) The clonality and normalized clonality calculations for the ten most clonal integrations is calculated for all read subsets using fragment length counts (c) and UMI counts (d). For the lowest samplings (1000 & 3000 read pairs) the clonality and normalized clonality based on fragments (Fig. 2c) and UMIs (Fig. 2d) are very similar whereas a larger number of reads leads to underestimation of fragment length clonality for the most abundant inserts and conversely an overestimation of fragment normalized clonality for less abundant inserts
Fig. 3The most-clonal integrations are found reproducibly in all replicate libraries. a) A four-way Venn diagram illustrates the number of integrations that are found reproducibly in 1, 2, 3 and 4 replicate libraries. The majority of single fragment/subclonal integrations are only found in one library whereas the most-clonal integrations are found in all four libraries. The clonality values b) and normalized clonality values c) of all integrations were compared for integrations that were found in 1, 2, 3 and 4 replicate libraries. The set of mutations present in only one of the four libraries had substantially lower median clonality/normalized clonality values than those inserts found in more than one library. Although the vast majority of subclonal mutations were found in only one library, a fraction are also found in more than one library. All integrations with clonality > 0.01 and normalized clonality > 0.1 were found present in all four libraries
Fig. 4Quantitation of the 10 most-clonal integrations is highly reproducible between libraries. a) Spearman correlation coefficients were calculated for pairwise comparisons between all 4 replicates using normalized clonality (NC) values for the 10 most-clonal integrations. Rho values range between 0.9601 and 0.9934. b) Normalized clonality profiles of the top 50 most-clonal integrations from each sample are highly similar, with a narrow range of entropy values between 2.535 and 2.785
Fig. 5Quantitation of MuLV integrations over a range of concentrations. a) Triplicate libraries were analysed from two MuLV infected spleen DNA samples, identifying nine clonal integrations in sample #5036 and two clonal integrations in sample #5238. Integration 9 from sample #5036 and integration 1 from sample #5238 both map to the same base pair in the 3′ UTR of Mycn (chr12:12936986) which is a highly selected hotspot for integrations in MuLV infected lymphoma samples. Triplicate libraries of uninfected DNA did not contain any mappable reads. b) These two DNAs were mixed with each other at ratios of 1:49, 1:4, 1:1, 4:1 and 49:1 and triplicate libraries were constructed. The clonality of each of the integrations is plotted against the percentage of its source DNA present in each mixture. Plots 1–8 are inserts 1–8 from sample #5036. Plot 9 is insert 2 from sample #5238. Plot 10 simultaneously represents insert 9 from sample #5036 and insert 1 from sample #5238
Fig. 6Quantitation of piggyBac integrations over a range of concentrations. a) Triplicate libraries were analysed from three cell lines derived from mouse neuronal precursors transfected with piggyBac and cloned by single cell sorting. These DNAs have 1, 5 and 9 integrations each. Triplicate libraries of uninfected DNA did not contain any mappable reads. b) These three DNAs were mixed with each other at ratios of 1:2:4, 4:2:1, 1:5:25 and 25:5:1 and triplicate libraries were constructed. The clonality of each of the integrations is plotted against the percentage of its source DNA present in each mixture. Plots 1–9 are inserts from cell line AltH2B_1 C1 (G2). Plot 10 is the insert from cell line Orig C1 (G8). Plots 11–15 are the inserts of the sample AltH2B_2 C1 (G14)
| μl per sample | μl for 96 well master mix (× 110) | |
|---|---|---|
| DNA | 52.5 | * |
| 10x reaction buffer | 7.7 | 847 |
| End Repair Enzyme Mix | 4 | 440 |
| H2O | 12.8 | 1408 |
| Total | 77 | 2695 |
| μl per sample | μl for 96 well master mix (× 110) | |
|---|---|---|
| End Repairs, Blunt DNA | 42.0 | * |
| NEBNext dA-Tailing Reaction Buffer | 5.0 | 550 |
| Klenow Fragments (3′ > 5′ exo) | 3.0 | 330 |
| Total | 50.0 | 880 |
| μl per sample | μl for 96 well master mix (× 110) | |
|---|---|---|
| Upper Strand Adaptor, 10 pmoles/μl (40 pmoles) | 8 | – |
| Universal Lower Adaptor, 10 pmoles/μl (40 pmoles) | 8 | 880 |
| NEB buffer 2.1 | 4 | 440 |
| H2O | 20 | 2200 |
| Total | 40 | 3520 |
| μl per sample | μl for 96 well master mix (× 110) | |
|---|---|---|
| DNA (~ 3.8pmoles) | 36 | * |
| Buffer | 5 | 550 |
| T4 Ligase (400,000 units/ml) | 2 | 220 |
| Unique adaptor (~40pmoles) | 8 | * |
| Total | 51 | 770 |
| μl per sample | μl for 96 well master mix (× 110) | |
|---|---|---|
| DNA | 51 | * |
| CutSmart Buffer | 6 | 660 |
| EcoRV-HF | 1 | 110 |
| H2O | 2 | 220 |
| Total | 60 | 990 |
| μl per sample | μl for 96 well master mix (× 110) | |
|---|---|---|
| DNA | 28.5 | * |
| HF Buffer (5x) | 10 | 1100 |
| 10 mM dNTPs | 1 | 110 |
| LTR primary PCR primer (10 μM) | 2.5 | 275 |
| Adapter PCR primer (10 μM) | 2.5 | 275 |
| Phusion Hot Start II (F549S) | 0.5 | 55 |
| SYBR® Green I (0.1x) | 5 | 550 |
| Total | 50 | 2365 |
| Cycle# | Denaturation | Annealing | Extension |
|---|---|---|---|
| 1 | 98°C for 30 sec | - | - |
| 2-17 | 98°C for 10 sec | 66°C for 30 sec | 72°C for 30 sec |
| 18 | - | - | 72°C for 5 min |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | LTR 2° #1 | LTR 2° #2 | LTR 2° #3 | LTR 2° #4 | LTR 2° #5 | LTR 2° #6 | ||||||
| B | LTR 2 ° #1 | LTR 2° #2 | LTR 2° #3 | LTR 2° #4 | LTR 2° #5 | LTR 2° #6 |
| μl per sample | μl for 96 well master mix (× 110) | |
|---|---|---|
| DNA (50 ng) | variable | * |
| H2O | variable | * |
| HF Buffer (5x) | 10 | 1100 |
| 10 mM dNTPs | 1 | 110 |
| Adapter primer (10 μM) | 2.5 | 275 |
| LTR secondary indexed primer | 2.5 | * |
| Phusion Hot Start II | 0.5 | 55 |
| SYBR®Green I (0.1x) | 5 | 550 |
| Total | 50 | 2090 |
| Cycle# | Denaturation | Annealing | Extension |
|---|---|---|---|
| 1 | 98°C for 30 sec | - | - |
| 2-17 | 98°C for 10 sec | 66°C for 30 sec | 72°C for 30 sec |
| 18 | - | - | 72°C for 5 min |