| Literature DB >> 31036053 |
Daryl M Gohl1,2, Alessandro Magli3,4, John Garbe5, Aaron Becker5, Darrell M Johnson5, Shea Anderson5, Benjamin Auch5, Bradley Billstein5,6, Elyse Froehling5, Shana L McDevitt7, Kenneth B Beckman5.
Abstract
Quantification of DNA sequence tags from engineered constructs such as plasmids, transposons, or other transgenes underlies many functional genomics measurements. Typically, such measurements rely on PCR followed by next-generation sequencing. However, PCR amplification can introduce significant quantitative error. We describe REcount, a novel PCR-free direct counting method. Comparing measurements of defined plasmid pools to droplet digital PCR data demonstrates that REcount is highly accurate and reproducible. We use REcount to provide new insights into clustering biases due to molecule length across different Illumina sequencers and illustrate the impacts on interpretation of next-generation sequencing data and the economics of data generation.Entities:
Keywords: ATAC-Seq; DNA library preparation; Genotyping by sequencing; Illumina; Next-generation sequencing; PCR-free; RAD-Seq; RNA-Seq; Size bias
Mesh:
Substances:
Year: 2019 PMID: 31036053 PMCID: PMC6489363 DOI: 10.1186/s13059-019-1691-6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1REcount enables accurate and precise measurements of plasmid pools. a Design of REcount constructs. A barcode-containing, Illumina adapter-flanked construct is liberated with a restriction enzyme (MlyI) digest and directly sequenced. b Accuracy and reproducibility of REcount. c Analogous measurements of the same plasmid pool shown in panel b using varying PCR cycle numbers. d Root mean squared deviation from expected values (5% per construct) when the plasmid pool is measured using REcount, and varying cycles of PCR amplification of either the barcode construct (BC) or another variable sequence in these plasmids (V4). e Pearson correlation heatmap comparing REcount measurements with droplet digital PCR data and with conventional PCR amplification of either the BC or V4 amplicons
Fig. 2Multiplexing of REcount measurements using orthogonal restriction enzymes. a Plasmids containing REcount constructs flanked by orthogonal restriction enzyme cut sites. b–f Total mapped reads identified for each construct type when the plasmid pool is digested with the indicated enzyme. g–k Mapped reads identified for each construct when the plasmid pool is digested with the indicated enzyme
Fig. 3Illumina size standards allow measurement of sequencer-specific size biases. a Design of REcount-based Illumina size standard constructs. Each standard construct contains a normalization barcode, as well as a barcode associated with a variable size standard that can be liberated by MlyI digestion and directly sequenced. b Raw abundance data for all 30 size standards and normalization barcodes from a MiSeq run. c Run-to-run variability of multiple MiSeq runs (n = 6 flow cells). d Size bias profiles of the iSeq (n = 1 flow cell), MiSeq (n = 6 flow cells), NextSeq (n = 4 flow cells), and NovaSeq (n = 4 flow cells, 4 lanes) sequencers. Note: Size bias data for other Illumina instruments is shown in Additional file 1: Figure S5. e Size bias profiles of the same library either clustered on the MiSeq immediately after denaturation or clustered after freezing and thawing the denatured library. Error bars are ± s.e.m
Fig. 4Instrument-specific size biases have minimal effect on RNA-sequencing data. a Fragment size distributions for an RNA-Seq library sequenced on the NovaSeq and the NextSeq. b Correlation of expression values (FPKM) for this library across the two instruments
Fig. 5Instrument size biases affect genotyping marker observations in RAD-Seq data. a Average read counts for 11 RAD-Seq samples sequenced on the HiSeq or NextSeq. b Number of markers observed in filtered VCF file for the 11 RAD-Seq libraries. c Number of loci observed in filtered VCF file for the 11 RAD-Seq libraries. d Fraction of missing genotype calls for each sample in the unfiltered VCF file. e PCA plot generated using the unfiltered VCF file. f PCA plot using the filtered VCF file. HiSeq data points overlap with NextSeq data points in this plot
Fig. 6Effect of instrument size bias on ATAC-Seq data. a Average insert size for 6 ATAC-Seq libraries sequenced on the HiSeq or NextSeq. b Percentage of reads at a subsampled depth of 20 million reads per sample classified as non-, mono-, di-, and tri-nucleosomal. n = 6 libraries. ***denotes p < 0.01 using a t-test. n.s. denotes no significant difference. c Distribution of mapped reads at the Fgfr4 locus. IGV plots of mapped reads for each sample, subsampled to a depth of 20 million reads, and either directly mapped (“All reads”) or split into the non-nucleosomal (“Non-nucl.”) subset and mapped. MACS peak calls for PAX3-responsive sites for HiSeq (top) and NextSeq (bottom) are below each set of mapped reads
| Stock name | Source | Notes |
| Berlin-K | Bloomington Drosophila Stock Center | RRID:BDSC_8522 |
| Canton-S | Bloomington Drosophila Stock Center | RRID:BDSC_64349 |
| DGRP-21 | Bloomington Drosophila Stock Center | RRID:BDSC_28122 |
| DGRP-26 | Bloomington Drosophila Stock Center | RRID:BDSC_28123 |
| DGRP-48 | Bloomington Drosophila Stock Center | RRID:BDSC_55016 |
| DGRP-100 | Bloomington Drosophila Stock Center | RRID:BDSC_55017 |
| Genome Strain | Bloomington Drosophila Stock Center | RRID:BDSC_2057 |
| IsoD1 | Clandinin Lab, Stanford University | [ |
| Ore-R-C | Bloomington Drosophila Stock Center | RRID:BDSC_5 |
| Ore-R-modENCODE | Bloomington Drosophila Stock Center | RRID:BDSC_25211 |