| Literature DB >> 26793268 |
Paul Visendi1, Paul J Berkman2, Satomi Hayashi3, Agnieszka A Golicz3, Philipp E Bayer4, Pradeep Ruperao4, Bhavna Hurgobin4, Juan Montenegro3, Chon-Kit Kenneth Chan4, Helena Staňková5, Jacqueline Batley4, Hana Šimková5, Jaroslav Doležel5, David Edwards4.
Abstract
BACKGROUND: There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success.Entities:
Keywords: 7DS; Assembly; BAC; Next-generation sequencing; SASSY; Saccharum spp; Triticum aestivum
Year: 2016 PMID: 26793268 PMCID: PMC4719536 DOI: 10.1186/s13007-016-0107-9
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Fig. 1Optimal coverage for assembly. Assembly sizes vs coverage for each of the 11 sugarcane BACs. Assembly sizes peak at 450x and level off despite increase in coverage beyond 1500x
Assembly statistics of seven single bread wheat BACs and simulated BAC pool assemblies
| BAC samples | Pre-processing statistics | Assembly statistics | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Name | Coverage xa | Vector % |
| Clonal % | Coverage xb | Contigs | N50 Kb | Longest Kb | Length Kb |
| A | 811 | 4 | 13 | 0.6 | 643 | 4 | 99 | 99 | 113 |
| B | 1041 | 5 | 10 | 0.8 | 844 | 1 | 118 | 118 | 118 |
| C | 709 | 5 | 10 | 0.6 | 572 | 7 | 23 | 50 | 115 |
| E | 833 | 4 | 14 | 0.5 | 656 | 4 | 81 | 81 | 128 |
| F | 748 | 4 | 11 | 0.6 | 599 | 5 | 32 | 46 | 111 |
| G | 943 | 5 | 10 | 0.8 | 773 | 1 | 102 | 102 | 102 |
| H | 829 | 4 | 28 | 0.4 | 519 | 4 | 90 | 90 | 113 |
| ABCE | 849 | 4 | 12 | 0.6 | 679 | 23 | 43 | 97 | 452 |
| BCEF | 833 | 5 | 11 | 0.6 | 668 | 21 | 43 | 97 | 443 |
| CEFG | 808 | 4 | 11 | 0.6 | 650 | 22 | 32 | 81 | 433 |
| EFGH | 838 | 4 | 16 | 0.6 | 637 | 20 | 46 | 81 | 430 |
BACs A, B, C, E, F, G and H assembled individually and in simulated pools ABCE, BCEF, CEFG and EFGH
aRaw coverage estimated at 120 Kb prior to assembly
bFinal coverage estimated at 120 Kb
Fig. 2Mummer plot of assemblies of single BACs A, B, C, E against pooled BACs of ABCE
Fig. 3BES mappings on contigs of simulated pool (ABCE). Clones A, C and E have forward (M13_For) and reverse (SP6_Rev) BES (A01_M13_For, A01_SP6_Rev, C01_M13_For, C01_SP6_Rev, E01_M13_For, E01_SP6_Rev) respectively correctly mapped. Clone B had no BES available but 120 bp sequences from cloning vector ends (FOR and REV) were used to identify contig ends of clone B
Mate pair mapping orientations on E. coli, contigs and scaffolds
| Orientation | Reference | % of pairs | Median insert size (Kb) |
|---|---|---|---|
| RF |
| 99 | 6 |
| Contigs | 99 | 6 | |
| Scaffolds | 97 | 6 | |
| FR |
| 0.3 | 4 |
| Contigs | 0.6 | 8 | |
| Scaffolds | 2 | 98 | |
| FF/RR |
| 0.7 | 3 |
| Contigs | 0.8 | 3 | |
| Scaffolds | 0.9 | 4 |
Fig. 4Distribution of no of contigs and scaffolds per BAC for 96 BAC pools
Fig. 5Distribution and orientation of MP insert sizes on E coli (a), contigs (b) and scaffolds (c) of 96 wheat BAC pools. Y axis (MP read counts in log scale), X axis (insert sizes). Correctly orientated MP reads with orientation RF (< –, – >) are shown in green, shadow library MP reads mapping with orientation FR (– >, < –) are shown in orange and chimeric MP reads mapping with orientation FF (– >, – >) and RR (< –, < –) are shown in blue