| Literature DB >> 23590730 |
Sebastiaan van Heesch1, Wigard P Kloosterman, Nico Lansu, Frans-Paul Ruzius, Elizabeth Levandowsky, Clarence C Lee, Shiguo Zhou, Steve Goldstein, David C Schwartz, Timothy T Harkins, Victor Guryev, Edwin Cuppen.
Abstract
BACKGROUND: Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses.Entities:
Mesh:
Year: 2013 PMID: 23590730 PMCID: PMC3648348 DOI: 10.1186/1471-2164-14-257
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sequencing and coverage statistics for all paired-read libraries
| PE | 160 M | 151 M (95%) | 131 M (87%) | 166 | 8.5 | 0.34 | 5 | >131 M | 9.5% | 2.0% |
| 3 kb | 17.7 M | 16.3 M (92%) | 15.2 M (93%) | 3,208 | 50 | 6 | 14 | 4.6 M | 24.0% | 3.2% |
| 5 kb_a | 11.9 M | 6.0 M (51%) | 5.5 M (92%) | 5,696 | 11 | 10.1 | 18 | 4.7 M | 46.5% | 3.7% |
| 5 kb_b | 16.7 M | 14.7 M (88%) | 14.0 M (95%) | 5,811 | 28 | 10.1 | 13 | >16.7 M | 47.5% | 4.6% |
| 8 kb_a | 20.8 M | 8.6 M (41%) | 7.8 M (91%) | 8,293 | 25 | 16.2 | 14 | 5.7 M | 58.4% | 6.9% |
| 8 kb_b | 11.8 M | 11.2 M (95%) | 10.6 M (95%) | 8,160 | 34 | 16.2 | 13 | >11.8 M | 50.6% | 5.5% |
| 15 kb_a | 31.7 M | 1.7 M (5%) | 1.0 M (60%) | 14,561 | 6 | 30.3 | 21 | 0.6 M | 60.8% | 7.6% |
| 15 kb_b | 11.6 M | 1.6 M (14%) | 1.2 M (73%) | 13,556 | 7 | 30.3 | 21 | 0.7 M | 23.2% | 3.2% |
| 20 kb | 13.3 M | 6.7 M (51%) | 5.9 M (87%) | 19,375 | 48 | 40.5 | 14 | 4.9 M | 41.8% | 4.7% |
| 25 kb | 56.9 M | 2.3 M (4%) | 1.1 M (49%) | 25,871 | 11 | 50.6 | 17 | 0.7 M | 51.9% | 5.4% |
| TOTAL | 352.4 M | 220.1 M (62%) | 193.3 M (88%) | 228.5 |
* The _b samples are retrieved from a replicate experiment using an independent DNA isolate from the same animal.
** Number of PCR cycles required to retrieve sufficient library molecules in the final adapter-mediated PCR.
*** Complexity is defined as minimal sequencing depth (in million clones) at which over half of the pairs are clonal.
Figure 1MP insert size distribution and library complexity. (a) Insert size distribution of all mate-paired libraries and biological duplicates. Data have been filtered for non-clonal pairs. (b) Complexity of each library is depicted by the number of unique read-pairs versus the number of properly mapped read-pairs. On the x-axis, increasing sequencing depth is represented based on actual sequencing data versus the amount of unique information obtained on the y-axis. A plateau indicates that a library has been sequenced to saturation.
Figure 2Bridging of repeat elements by paired read libraries. (a) The percentage of each repeat type per window of 1000 repeats (y-axis) is shown, relative to the size of each repeat on the x-axis. A higher density of dots indicates the presence of more repeats in the indicated size bin. (b) Pie chart of the largest classes of repetitive elements based on their total length (Mb) in the rat genome. Satellite repeats, RNA repeats, and low-complexity repeats are listed as “Other.” (c + d) Bridging by paired-tag libraries of all annotated LINEs (c) and LTRs (d) within contigs of RGSC 3.4. The size of LINE elements or LTRs (x-axis) is plotted against the percentage of elements of that specific size that were bridged by one or more read-pairs from each of the libraries. All single library datasets were normalized to 8.5× physical genome coverage.
Figure 3Combinations of libraries with different insert sizes improve contig scaffolding. (a) All library data sets were normalized to 8.5× non-clonal physical genome coverage resulting in the use of approximately 130 million pairs for the PE library to several million pairs for the MPs. The scaffold N50 (y-axis) as determined by SSPACE is plotted against the total number of scaffolds (x-axis) for each individual library and for all combinations of libraries. Scaffolding results for the current genome reference (RGSC 3.4) are displayed as well. (b) Representative examples of the genomic loci on rat chromosome 18 that show major discordance between optical map and the RGSC 3.4 reference genome. MP-assisted scaffolding restored concordance between sequence scaffolds and optical maps. The top panel (black) represents the reference genome assembly with the vertical lines indicating predicted SwaI sites; the middle panel (red) represents optical map data obtained using SwaI digests; the lower panel represents the rescaffolded genome using the MP data. The indicated positions on chromosome 18 are according to the current RGSC 3.4 assembly. A large region of approximately 75 kb (top panel) that shows low concordance with the predicted path of the optical map (0.065 Mb–0.14 Mb), increased significantly after MP-scaffolding. The bottom panel shows another example of increased resemblance to optical mapping data (3.85 Mb–3.90 Mb). Order and placement of contigs was shifted in the new scaffold resulting in SwaI sites identical to the optical map.
Scaffolding value of different paired-read library combinations
| 1 | 15 kb | 163,475 | PE | 37,694 |
| 2 | 5 kb + 25 kb | 522,027 | PE + 3 kb | 46,699 |
| 2 | 5 kb + 20 kb | 474,308 | PE + 5 kb | 141,403 |
| 2 | 8 kb + 25 kb | 470,890 | PE + 25 kb | 142,007 |
| 3 | 5 kb + 20 kb + 25 kb | 834,964 | PE + 3 kb + 5 kb | 158,525 |
| 3 | 5 kb + 15 kb + 25 kb | 789,954 | PE + 3 kb + 8 kb | 171,253 |
| 3 | 8 kb + 20 kb + 25 kb | 726,289 | PE + 3 kb + 25 kb | 198,696 |
| 7 | ALL | 1,287,609 | N/A | N/A |
*All libraries were normalized to 8.5x physical genome coverage, limited by the amount of available data for the paired-end (PE) library.