| Literature DB >> 31881841 |
Christopher J Troll1, Joshua Kapp2, Varsha Rao3, Kelly M Harkins3, Charles Cole4, Colin Naughton3, Jessica M Morgan3, Beth Shapiro2,5, Richard E Green4.
Abstract
BACKGROUND: Cell-free DNA (cfDNA), present in circulating blood plasma, contains information about prenatal health, organ transplant reception, and cancer presence and progression. Originally developed for the genomic analysis of highly degraded ancient DNA, single-stranded DNA (ssDNA) library preparation methods are gaining popularity in the field of cfDNA analysis due to their efficiency and ability to convert short, fragmented DNA into sequencing libraries without altering DNA ends. However, current ssDNA methods are costly and time-consuming.Entities:
Keywords: Cell-free DNA; Next-generation sequencing; Nucleosome positioning; Oligos; SRSLY; Single-stranded library
Mesh:
Substances:
Year: 2019 PMID: 31881841 PMCID: PMC6935139 DOI: 10.1186/s12864-019-6355-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Schematic overview of SRSLY. A DNA input pool of diverse template molecules is denatured with heat and maintained as single-stranded molecules through a cold-snap and use of a thermostable single-stranded DNA binding protein (SSB). Template DNA is phosphorylated and SRSLY splint adapters are ligated in a combined phosphorylation/ligation reaction. Adapters contain a random single-stranded splint overhang and ligation blocking modifications on all termini except for the ones that facilitate correctly oriented library molecules. After clean up, molecules are ready for index PCR
Fig. 2Standard NGS metrics for merged reads from SRSLY and NEBNext Ultra II libraries from healthy human cfDNA extracts H-69 and H-81. Unless otherwise stated, all libraries for each method were combined by cfDNA extract prior to analysis and filtered for PCR duplicates and a quality score equal to or greater than q20. (a) Insert distribution plots for cfDNA extracts H-69 and H-81, respectively. (b) Fold coverage by base percent across the human genome (hg19) for SRSLY and NEBNext by cfDNA extract. Combined libraries were subsampled to similar read depth prior to fold coverage calculations. Subsampled depth was set at 295 M reads, the limit of sequenced reads for SRSLY-H-81. (c) Preseq complexity estimate for SRSLY and NEBNext by cfDNA extract. Three libraries of equivalent sequencing depth per method were combined to estimate complexity, since more libraries were made via SRSLY than NEBNext. Files containing the PCR duplicate reads were used to facilitate complexity estimates (d) Normalized coverage as a function of GC content over 100 bp sliding scale across the human genome for SRSLY and NEBNext by cfDNA extract. Green histogram represents the human genome GC across the 100 bp sliding window. (e) Normalized, log-transformed base composition at each position of read termini starting 2 bp upstream and extending to 34 bp downstream of read start site for combined cfDNA extracts for SRSLY and NEBNext. All reads regardless of insert length considered
Fig. 3Coverage of duplexed oligos containing single-stranded overhangs for SRSLY and NEBNext. (a) Cartoon schematic of duplexed synthetic oligos – one blunt end, an identifiable 50 nt complementary region, and an overhang of specific length and type. (b) Average coverage per base across the length of all duplexed oligos for three technical replicates in 0 base coordinates for both SRSLY and NEBNext methods. Technical replicates were not statistically different from each other (Students t-test: SRSLY p = 0.714, NEBNext p = 0.985), error bars not shown for aesthetics. Each oligo sequenced > 5000 reads
Fig. 4Single-stranded oligo analyses by the SRSLY method. Red and black lines and dots represent technical replicates (a) Insert distribution of equimolar pooled single-stranded oligo libraries. Oligos from 20 to 120 nt synthesized at 10 nt intervals were purified by standard desalting. Raw unfiltered sequencing data. (b) Mapped sequencing data for technical replicates separated by oligo. Represented as a function of oligo length. Black vertical bar and associated black and red numbers indicate percent of full-length product per oligo length present in the library pool. Each library was sequenced to a depth of ~ 100,000 read pairs (10,000 read pairs per oligo, excluding 20 and 30 nt lengths) (c) Effects for various purification methods on oligo purity as a function of oligo length for a 60 nt synthesized oligo. Associated black and red numbers indicate percent of full-length product per oligo. Data for the standard desalted 60 nt synthetic oligo pulled from (b)
Fig. 5cfDNA analysis. (a) Normalized genomic dinucleotide frequencies as a function of read length for SRSLY data for three discrete fragment lengths including 100 bp ± the read mapped coordinates. Read midpoint is centered at 0. Negative numbers denote genomic regions upstream (5-prime) of the midpoint and positive numbers denote genomic regions downstream (3-prime) of the midpoint. Input data is from the combined H-69 and H-81 SRSLY datasets. (b) Same as (a) except for NEBNext data. (c) Normalized genomic dinucleotide frequency as a function of read length for SRSLY data for the termini of three discrete fragment lengths including a 9 bp region into the read (positive numbers) and 10 bp outside the read (negative numbers). Read start and end coordinates are centered on 0. Input data is from the combined H-69 and H-81 SRSLY datasets. (d) Same as (c) except for NEBNext data. (e) Normalized WPS values (120 bp window; 120–180 bp fragments) for SRSLY data compared to sample CH01 [16] at the same pericentromeric locus on chromosome 12 used to initially showcase WPS. (f) Average normalized WPS score within ±1 kb of annotated CTCF binding sites for long fragment length binned data (120 bp window, 120–180 bp fragments) and short fragment length binned data (16 bp window, 35–80 bp fragments) for SRSLY data compared to sample CH01 [16]