| Literature DB >> 28915793 |
X Argout1, G Martin2, G Droc2, O Fouet2, K Labadie3, E Rivals4,5, J M Aury3, C Lanaud2.
Abstract
BACKGROUND: Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes.Entities:
Keywords: Criollo B97–61/B2 genome; GBS; Genome Assembly; Mate Paired sequences; Theobroma cacao
Mesh:
Year: 2017 PMID: 28915793 PMCID: PMC5603072 DOI: 10.1186/s12864-017-4120-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1CIRCOS graphical representation of paired reads mapping on a misassembled contig. The blue circle represents the contig sequence. In the inner circle, grey lines represent concordant links (orientation and insert size) between read pairs. The black arrow points to the misassembled region
Changes in statistics during scaffold assembly
| Library used for assembly | Sequence number | Assembly length (Mb) | N50 (kb) | Unknown (Ns) (Mb) | |
|---|---|---|---|---|---|
| Contigs | – | 25,527 | 290,5 | 19,8 | – |
| Scaffolds | 3-5 kb | 4383 | 303,9 | 189,1 | 13,4 |
| Scaffolds | 5-8 kb | 1906 | 312,3 | 439,4 | 21,8 |
| Scaffolds | 8-11 kb | 1271 | 315,9 | 709,4 | 25,4 |
| Scaffolds | 11-15 kb | 980 | 318,2 | 906,5 | 27,7 |
| Scaffolds | BAC ends | 554 | 325,2 | 5324,1 | 34,6 |
| Scaffolds | GapClosure | 554 | 324,7 | 6465,7 | 18,5 |
Distribution of SNP markers and Scaffolds among T. cacao version 2 chromosome assembly
| Chromosome | N° of SNP markers | N° of scaffolds | Length (Mb) |
|---|---|---|---|
| Chr1 | 694 | 13 | 37.3 |
| Chr2 | 581 | 11 | 41.2 |
| Chr3 | 573 | 15 | 36.4 |
| Chr4 | 530 | 7 | 31.9 |
| Chr5 | 597 | 19 | 39.4 |
| Chr6 | 369 | 13 | 26.3 |
| Chr7 | 293 | 20 | 21.6 |
| Chr8 | 321 | 15 | 19.6 |
| Chr9 | 601 | 13 | 38.6 |
| Chr10 | 298 | 8 | 21.8 |
| Total | 4857 | 134 | 314.2 |
Fig. 2Chromosome reconstruction. Linkage dot plots between markers along non-ordered scaffolds (a) and ordered scaffolds (b) on chromosome 1. Each dot represents the recombination frequency between two markers. The intensity of the linkage is color coded. Warm colors indicate strong linkage and cold colors indicate weak linkage. Grey bars in the dot plots divide markers belonging to a same scaffold
Fig. 3Scaffolds anchored to the 10 Theobroma cacao chromosomes. Black boxes represent scaffolds with orientation. Gene and SNP marker densities are in blue and orange, respectively, and were computed with a window size of 400 kb
Fig. 4Comparison of Theobroma cacao Criollo assembly version 1 and version 2. a Graphical representation of insertions and reduction of the unknown chromosome version 1 (Tc00) in chromosomes version 2 (chr1–10). b Graphical representation of regions previously anchored to a different chromosome in the first version of the assemblies. “Tc” chromosomes refer to assembly version 1 and “chr” chromosomes to assembly version 2
Statistics for T. cacao chromosome assemblies
| B97–61/B2 Version 1 [ | B97–61/B2 Version 2 | Mat 1–6 [ | |
|---|---|---|---|
| Number of scaffolds | 4792 | 554 | 814 |
| Cumulated size(Mb) | 326.9 | 324.7 | 346 |
| N50(Mb) | 0.47 | 6.5 | 4.3 |
| Anchored on chromosomes (Mb) | 218.4 (66.8%) | 314.2 (96.7%) | 330 (95.5%) |
| Unknown sites (Mb) | 35.4 (10.8%) | 18.5 (5.7%) | 15.2 (4.4%) |
Fig. 5Dot plot comparing Criollo B97–61/B2 version 2 and Amelonado Matina 1–6 genomes computed with Last [29]. Red and blue dots indicate forward and reverse alignments, respectively