| Literature DB >> 22289472 |
Romain Philippe1, Frédéric Choulet, Etienne Paux, Jan van Oeveren, Jifeng Tang, Alexander H J Wittenberg, Antoine Janssen, Michiel J T van Eijk, Keith Stormo, Adriana Alberti, Patrick Wincker, Eduard Akhunov, Edwin van der Vossen, Catherine Feuillet.
Abstract
BACKGROUND: Sequencing projects using a clone-by-clone approach require the availability of a robust physical map. The SNaPshot technology, based on pair-wise comparisons of restriction fragments sizes, has been used recently to build the first physical map of a wheat chromosome and to complete the maize physical map. However, restriction fragments sizes shared randomly between two non-overlapping BACs often lead to chimerical contigs and mis-assembled BACs in such large and repetitive genomes. Whole Genome Profiling (WGP™) was developed recently as a new sequence-based physical mapping technology and has the potential to limit this problem.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22289472 PMCID: PMC3311077 DOI: 10.1186/1471-2164-13-47
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Characteristics of the sequence tags produced by WGP
| Expected data | Raw data | Tag filtering | BAC filtering | |
|---|---|---|---|---|
| Total number of tags | 5,681,912 | 327,282 | 228,263 | 194,716 |
| Total number of unique tags | 97,293a | 111,678 | 47,900 | 47,220 |
| Average tag length (bp) | 30.0 | 30.0 | 29.9 | 29.9 |
| Number of BAC with tag | 16,128 | 14,199 | 13,888 | 11,238 |
| Average number of tag per BAC | 58.4b | 23.0 | 16.4 | 17.3 |
| Average number of BAC sharing the same tag | 9.6 | 2.9 | 4.8 | 4.1 |
a based on a average EcoRI sites density of 4,728 bp and a coverage of the BACs subset of 230 Mb.
b based on a average EcoRI sites density of 4,728 bp and an average BAC size of 138 Kb.
Figure 1Tag number, . A window of 50 kb sliding every 10 kb was used to calculate the number of tags, EcoRI sites, and the percentage of transposable elements (TE). Each position on the graph corresponds to the middle of each window.
Features of the WGP tag distribution along 18 Mbp of sequence corresponding to 12 reference contigs of chromosome 3B.
| Total number of unique tags | 3,396 |
| Maximum distancea between two unique tags (bp) | 84,687 |
| Minimum distancea between two unique tags (bp) | 0 |
| Average distancea between two unique tags (bp) | 5,251 |
| Standard deviationa distance between two unique tags (bp) | 7,541 |
| Median distancea between two unique tags (bp) | 2,329 |
| Percentage of | 35.7% |
| Percentage of sites with two unique tags | 38.9% |
a distances are calculated between the end of a tag and the beginning of the next one.
Figure 2Analysis of SNaPshot and WGP assemblies built between cut-offs of 1e. A) Total number of contigs at each cut-off. B) Estimated number of chimerical contigs for 10 Mb and percentage of mis-assembled BACs. C) Coverage in length of the WGP and SNaPshot physical assemblies at different cut-offs. The coverage and contigs size were estimated on the basis of an average BAC size of 138 kb and an average band size of 1.1 kb for SNaPshot and 6.1 kb for WGP.
Comparison of physical map assemblies obtained with the SNaPshot and WGP technologies at their optimum final cut-offs (1e-25 for SNaPshot; 1e-11 for WGP).
| SNaPshot | WGP | |
|---|---|---|
| Total number of contigs | 631 | 434 |
| Average contigs size (Kb) | 374a | 469b |
| Median contigs size (Kb) | 295a | 374b |
| Number of singletons | 2112 (18.8%) | 4145 (36.9%) |
| Coverage in length | 236 Mb ± 65a | 199 Mb ± 42b |
| Number of chimerical contigs for 10 Mb | 0.6 | 0.6 |
| Percentage of mis-assembled BACs | 9.5% | 2.7% |
b Based on an average bands size of 6.1 kb ± 1.3.
a Based on an average bands size of 1,1 kb ± 0.3.
Features of the 4 pools sequenced with the 454 GS-FLX technology.
| Number of BACs | Length of the reference sequencea | Number of readsa | Average reads size | Sequencing coverage | |
|---|---|---|---|---|---|
| Pool1 | 9 | 1,135,279 | 241,916 | 323.0 | 69X |
| Pool2 | 6 | 665,389 | 157,653 | 334.7 | 79X |
| Pool3 | 4 | 622,598 | 151,278 | 328.2 | 80X |
| Pool4 | 5 | 676,686 | 173,666 | 328.6 | 84X |
a The length of the sequences and reads are in bp.
Figure 3Features of different sequence assemblies performed for four wheat BAC contigs representing 3,099,952 bp. A) N90 and L90 values for different assemblies obtained with 454 sequencing with no paired-end reads as well as with (black line) and without (dotted line) the integration of WGP data between 15X and 50X sequencing coverage. B) N90 and L90 values for different assemblies obtained with 454 sequencing of paired-end reads and with and without integration of WGP data between 15X and 50X sequencing coverage. C) Percentage of gap in the scaffolds of different assemblies obtained with 454 paired-end reads between 15X and 50X sequencing coverage.
Figure 4Sequence alignment between a superscaffold obtained after WGP tags integration and a reference sequence (ctg0079). The superscaffold was obtained after assembly of 454 unpaired reads of ctg0079 with WGP tags integration at 40X coverage. Each line of the superscaffold corresponds to a contig and each rectangle corresponds to ambiguous contigs orders. Contigs within a bin are not ordered. Red areas highlight errors in contigs and bin mergers in the superscaffolds compared to the reference sequence. In this example, there are 12 mergers of which 3 are erroneous.