| Literature DB >> 22127871 |
Filip Van Nieuwerburgh1, Ryan C Thompson, Jessica Ledesma, Dieter Deforce, Terry Gaasterland, Phillip Ordoukhanian, Steven R Head.
Abstract
Standard Illumina mate-paired libraries are constructed from 3- to 5-kb DNA fragments by a blunt-end circularization. Sequencing reads that pass through the junction of the two joined ends of a 3-5-kb DNA fragment are not easy to identify and pose problems during mapping and de novo assembly. Longer read lengths increase the possibility that a read will cross the junction. To solve this problem, we developed a mate-paired protocol for use with Illumina sequencing technology that uses Cre-Lox recombination instead of blunt end circularization. In this method, a LoxP sequence is incorporated at the junction site. This sequence allows screening reads for junctions without using a reference genome. Junction reads can be trimmed or split at the junction. Moreover, the location of the LoxP sequence in the reads distinguishes mate-paired reads from spurious paired-end reads. We tested this new method by preparing and sequencing a mate-paired library with an insert size of 3 kb from Saccharomyces cerevisiae. We present an analysis of the library quality statistics and a new bio-informatics tool called DeLoxer that can be used to analyze an IlluminaCre-Lox mate-paired data set. We also demonstrate how the resulting data significantly improves a de novo assembly of the S. cerevisiae genome.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22127871 PMCID: PMC3273786 DOI: 10.1093/nar/gkr1000
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.LoxP adapter oligos. Both double-stranded oligos have a 3-bp overhang to allow for directional ligation.
Figure 2.Schematic of the Cre recombination-related library preparation steps. NNNNN denotes 2- to 5-kb DNA fragments taken into the LoxP adaptor ligation. LoxP sequences are in red and the 8-bp spacer between the two palindromic elements are in green. Orientation of the spacer region determines direction of recombination. Marked in yellow are the biotinylatedthymidines.
Figure 3.Schematic of the classification scheme used by DeLoxer. The dark blue line represents the original circularized DNA fragment obtained by Cre recombination and polymerase fill-in, while the block represents a single linear piece of that fragment obtained during the mate-pair sequencing prep. The ends are sequenced (light blue), yielding a read pair that must be classified, while the center of the fragment may be unsequenced (light gray). The possible positions of the LoxP adapter relative to the sequenced fragment are shown as green boxes. Case I: the LoxP site aligns to the start of one read, and the overlap is trimmed off. This LoxP site is outside the two reads, so the read pair is not a mate pair and is labeled as ‘paired-end’. Case II: The LoxP site aligns near the center of one read. The read is discarded as it does not contain at least 36 contiguous bp of genomic DNA, and the other read in the pair is retained as an unpaired read. Case III: The LoxP site aligns to the end of one (or both) reads, and the overlap is trimmed off. This site lies between the two reads, making this pair a mate-pair. Case IV: The LoxP site lies entirely in the unsequenced center portion of the DNA fragment, and does not overlap either read, so the pair is LoxP-negative. Case V: The LoxP site does not occur anywhere within the sequenced fragment, so the pair is LoxP-negative. Note that although Case IV represents a mate-pair, it is indistinguishable from Case V.
Figure 4.Fragment size distribution of (a) mate-paired reads, (b) paired-end reads and (c) LoxP-negative reads.
DeLoxer output quality statistics
| Number of reads | Percentage of reads (%) | Fragment size (mean ± SD) | Size after trimming (mean ± SD) | |
|---|---|---|---|---|
| Total 2 × 100 reads | 78 607 373 | 100.00 | ||
| Mate-paired reads (LoxP positive) | 22 494 162 | 28.62 | 79 ± 22 | |
| Uniquely aligned pairs | 18 970 394 | 100.00 | ||
| True mate-paired reads | 18 951 404 | 99.90 | 2313 ± 812 | |
| Paired-end reads | 7216 | 0.04 | 400 ± 78 | |
| Short fragments | 1359 | 0.01 | 95 ± 66 | |
| Non-unique alignment | 2 488 148 | |||
| Unaligned | 1 035 620 | |||
| Duplicate reads | 5 518 899 | 29.09 | ||
| Paired-end reads (LoxP positive) | 22 424 011 | 28.53 | 78 ± 22 | |
| Uniquely aligned pairs | 17 963 827 | 100.00 | ||
| True paired-end reads | 12 677 878 | 70.57 | 256 ± 40 | |
| Mate-paired reads | 2663 | 0.01 | 3299 ± 2261 | |
| Short fragments | 5 283 278 | 29.41 | 168 ± 32 | |
| Non-unique alignment | 3 478 415 | |||
| Unaligned | 981 769 | |||
| Duplicate reads | 5 008 216 | 27.88 | ||
| LoxP negative, low quality | 5 517 705 | 7.02 | ||
| LoxP negative, quality filtered | 22 288 114 | 28.35 | 83 ± 18 | |
| Uniquely aligned pairs | 12 409 334 | 100.00 | ||
| Mate-paired reads | 11 567 200 | 93.21 | 2279 ± 813 | |
| Paired-end reads | 46 637 | 0.38 | 290 ± 64 | |
| Short fragments | 777 291 | 6.26 | 61 ± 17 | |
| Non-unique alignment | 1 613 963 | |||
| Unaligned | 8 264 817 | |||
| Duplicate reads | 3 243 252 | 26.14 | ||
| Single reads (LoxP positive) | 5 820 905 | 7.41 | ||
| Both reads too short (LoxP positive) | 62 476 | 0.08 |
Data generated by sequencing a 3-kb mate-paired S. cerevisiae DNA library (prepared using 14 cycles of PCR), sequenced in one Illumina HiSeq flowcell lane.
Library yields
| Input into Cre-Lox recombination reaction (ng) | Number of PCR cycles | Library yield (ng) |
|---|---|---|
| 400 | 18 | 1.08 |
| 600 | 16 | 2.49 |
| 1000 | 14 | 2.52 |