| Literature DB >> 23557231 |
Tatiana Belova1, Bujie Zhan, Jonathan Wright, Mario Caccamo, Torben Asp, Hana Simková, Matthew Kent, Christian Bendixen, Frank Panitz, Sigbjørn Lien, Jaroslav Doležel, Odd-Arne Olsen, Simen R Sandve.
Abstract
BACKGROUND: The assembly of the bread wheat genome sequence is challenging due to allohexaploidy and extreme repeat content (>80%). Isolation of single chromosome arms by flow sorting can be used to overcome the polyploidy problem, but the repeat content cause extreme assembly fragmentation even at a single chromosome level. Long jump paired sequencing data (mate pairs) can help reduce assembly fragmentation by joining multiple contigs into single scaffolds. The aim of this work was to assess how mate pair data generated from multiple displacement amplified DNA of flow-sorted chromosomes affect assembly fragmentation of shotgun assemblies of the wheat chromosomes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23557231 PMCID: PMC3622640 DOI: 10.1186/1471-2164-14-222
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Contig assembly summary statistics
| 7BS | 1,349,563 | 842 | 239 | 50,938 | 323 |
| 178,789* | 2428 | 1152 | 50,938 | 206 | |
| 7BL | 4,527,901 | 145 | 144 | 30,964 | 652 |
| 328,725* | 1556 | 789 | 30,964 | 260 |
Contigs > 200 bp.
Summary table of mate pair sequence data
| | | ||||||
| 7BS | 2.60*107 | 2.23*108 | 1.97*108 | 4.46*108 | 71.8% | 22.4% | 5.9% |
| 7BL | 3.13*107 | 2.32*108 | 2.16*108 | 4.79*108 | 71.2% | 23.7% | 5.1% |
Total numbers of read pairs are given for each MP library. The read pair classification is based on mapping of MP data back to assembled contigs from PE data.
°Roche library, †Illumina library.
Figure 1MP read classes and the estimated insert size distributions for the 3 KB library of 7BS. A) distribution of insert sizes for the FF reads B) distribution of insert sizes for the FR reads C) distribution of insert sizes for the RF reads.
Scaffold assembly summary statistics
| | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 3 | 7BS | 0.96 | 20,654 | 4.51 | 38 | 52 | 11.2 (1.7-143.1) | 168 | 14.49 |
| 7BL | 1.56 | 31,582 | 4.33 | 43 | 42 | 9.6 (2.0-117.6) | 192 | 11.15 | |
| 5 | 7BS | 1.06 | 17,481 | 3.81 | 27 | 37 | 10.7 (1.7-129.4) | 148 | 11.03 |
| 7BL | 1.41 | 23,365 | 3.91 | 32 | 28 | 9.5 (2.2-122.1) | 166 | 8.31 | |
| 7 | 7BS | 1.14 | 15,230 | 3.4 | 20 | 29 | 10.5 (1.72-109.6) | 133 | 9 |
| 7BL | 1.48 | 19,610 | 3.56 | 25 | 21 | 9.3 (2.3-81.9) | 148 | 6.33 | |
| 10 | 7BS | 1.24 | 12,750 | 3.04 | 15 | 22 | 10.5 (1.8-108.9) | 115 | 7.04 |
| 7BL | 1.58 | 15,896 | 3.22 | 20 | 16 | 9.3 (2.3-77.7) | 128 | 4.49 | |
| 15 | 7BS | 1.35 | 9,733 | 2,73 | 12 | 15 | 10.7 (2.0-102.4) | 92 | 5.2 |
| 7BL | 1.7 | 12,052 | 2.89 | 17 | 11 | 9.4 (2.54-67.4) | 103 | 2.84 | |
| 20 | 7BS | 1.42 | 7,618 | 2,55 | 10 | 11 | 10.9 (2.1-73.3) | 76 | 4.2 |
| 7BL | 1.79 | 9,458 | 2.68 | 14 | 8 | 9.6 (2.8-69.2) | 84 | 1.97 | |
* Including all sequences (contigs + scaffolds).
Gene content in ABySS and SSPACE assemblies
| SSPACE k3 | 7BS | 1029 | 0.49 | 0.09 | 449 | 193 |
| 7BL | 1539 | 0.44 | 0.10 | 551 | 224 | |
| SSPACE k5 | 7BS | 1038 | 0.49 | 0.09 | 445 | 193 |
| 7BL | 1545 | 0.43 | 0.12 | 547 | 227 | |
| SSPACE k7 | 7BS | 1032 | 0.49 | 0.09 | 449 | 196 |
| 7BL | 1551 | 0.43 | 0.12 | 535 | 221 | |
| SSPACE k10 | 7BS | 1038 | 0.49 | 0.09 | 447 | 195 |
| 7BL | 1555 | 0.43 | 0.12 | 533 | 215 | |
| SSPACE k15 | 7BS | 1040 | 0.48 | 0.12 | 436 | 186 |
| 7BL | 1576 | 0.42 | 0.14 | 529 | 217 | |
| SSPACE k20 | 7BS | 1048 | 0.47 | 0.13 | 433 | 183 |
| 7BL | 1574 | 0.42 | 0.14 | 516 | 205 | |
| Contigs | 7BS | 1071 | 0.45 | 0.17 | 403 | 160 |
| 7BL | 1621 | 0.39 | 0.21 | 457 | 162 | |
1Mean coverage per sequence (contig/scaffold) of Brachypodium homologs based on blast analyses (see methods).
2 Gene Fragmentation Index (GFI) is defined in the methods section.
* TBLASTN hits covering > =70% of a homologous Brachypodium protein in a single contig/scaffold.
† TBLASTN hits covering an entire Brachypodium protein (+ − 10aa) in a single contig/scaffold.
Figure 2MP effect on sequences containing multiple full length (FL) genes.
Figure 3Evaluation of scaffold reliability. A) Scaffold reliability evaluated by the proportion of conserved syntenic relationship to the Brachypodium genome. B) Scaffold reliability evaluated by comparisons of sequence content in BACs and scaffolds. SRI = scaffold reliability index (defined in method section).
Figure 4Effect of MP sequencing depth on assembly N50. Only 7BS was used to estimate the MP coverage effect.