| Literature DB >> 33604743 |
Jessica B Lyons1,2, Jessen V Bredeson1, Ben N Mansfeld3, Guillaume Jean Bauchet4, Jeffrey Berry3, Adam Boyher3, Lukas A Mueller4, Daniel S Rokhsar1,2,5,6, Rebecca S Bart7.
Abstract
KEY MESSAGE: We demystify recent advances in genome assemblies for the heterozygous staple crop cassava (Manihot esculenta), and highlight key cassava genomic resources. Cassava, Manihot esculenta Crantz, is a crop of societal and agricultural importance in tropical regions around the world. Genomics provides a platform for accelerated improvement of cassava's nutritional and agronomic traits, as well as for illuminating aspects of cassava's history including its path towards domestication. The highly heterozygous nature of the cassava genome is widely recognized. However, the full extent and context of this heterozygosity has been difficult to reveal because of technological limitations within genome sequencing. Only recently, with several new long-read sequencing technologies coming online, has the genomics community been able to tackle some similarly difficult genomes. In light of these recent advances, we provide this review to document the current status of the cassava genome and genomic resources and provide a perspective on what to look forward to in the coming years.Entities:
Keywords: Cassava; Crop improvement; Genomics; Heterozygous genomes; Phased genomes
Mesh:
Year: 2021 PMID: 33604743 PMCID: PMC9162999 DOI: 10.1007/s11103-020-01104-w
Source DB: PubMed Journal: Plant Mol Biol ISSN: 0167-4412 Impact factor: 4.335
Fig. 1Global cassava yields show potential for improvement. a Cassava global median yields trends (1961–2017). Natural cubic spline smoothed trendline (blue) and standard errors (shaded ribbon). Cassava yields rose significantly since the 1980s, possibly due to improvements in germplasm and breeding. More recently however, there has been a plateau in yields. b Large disparity in global yields around cassava producing regions also suggests there is still potential for large scale gains. Yields in each cassava producing country plotted relative to the maximum produced in 2017 (32 Tons/ha, Lao People’s Democratic Republic) (Source: FAOSTAT, December 2019)
Cassava AM560-2 reference genome assemblies
| v4.1 | v5.1 | v6.1 | v7.1 | |
|---|---|---|---|---|
| Release | 2009 | 2014 | 2016 | 2019 |
| Primary sequence technology | 454 | 454 | Illumina | PacBio |
| Primary scaffolding data | 454 mate pair | Composite genetic map | Illumina mate pair, fosmid, genetic maps | Illumina mate pair, fosmid, genetic maps |
| Total scaffold length | 533 Mb | 534 Mb | 582 Mb | 669 Mb |
| Total contig length | 419 Mb | 419 Mb | 496 Mb | 667 Mb |
| Number of chromosomal scaffolds | 0 | 18 | 18 | 18 |
| Bases in chromosomes | 0 | 305 Mb | 444 Mb | 606 Mb |
| Contig N50 length (contiguity) | 11 kb | 11 kb | 27 kb | 693 kb |
| Annotated genes | 30,666 | 30,666 | 33,033 | 33,849 |
Fig. 2Reference genome assembly strategies. a Generating a haploid representation (reference assembly) of a diploid inbred genome is relatively straightforward. Due to homozygosity, sequence reads from the two haplotypes assemble together. b The heterozygosity present in a diploid outbred genome means that sequences from maternal and paternal haplotypes (blue and gold) will tend to assemble separately. In this case, to generate a haploid reference assembly, researchers can either combine maternal and paternal contigs into a haploid representation for each chromosome (haplotype-mosaic reference assembly), or they can try to fully assemble the maternal and paternal chromosomes, choosing one or the other to represent each chromosome in the reference (haplotype-phased reference assembly). Gray, assembly gaps
Fig. 3Repeats, genes, and recombination frequency in the AM560-2 v7 cassava genome. Repeat density (light blue lines), gene count (blue lines), and recombination rate (gold lines) are plotted. Genic regions are anticorrelated with repetitive regions (Y-axis). Regions with low recombination frequency tend to co-occur with areas of high repeat density, thus, these hard-to-assemble regions also tend not to benefit from scaffolding information provided by a genetic map. Repeat density is measured as the fraction of bases that are annotated as repetitive in 1 Mb sliding windows sampled every 100 kb along the AM560-2 v7 chromosomes. The gene count was also taken with 1 Mb sliding windows every 100 kb. Recombination rate is measured as the number of recombinations per 1 Mb sliding window (100 kb step) using the first derivative of a natural cubic spline-smoothed fit line to the ICGMC 2014 framework map anchored to the v7 genome sequence. The marker positions of the framework map are plotted with vertical black ticks below the X-axis
Accessing key cassava genomic resources
| Resource | Portal | url |
|---|---|---|
| AM560-2 v6.1 | Phytozome | |
| AM560-2 v7.1 | Phytozome | |
| W14 assembly (unresolved Manihot) | NCBI | |
| KU50 assembly | NCBI | |
| TME3 assembly | NCBI | |
| 60444 assembly | NCBI | |
| HapMapI SNV calls (download) | Phytozome | |
| Genomic and phenotypic data and breeding metadata | Cassavabase | |
| ICGMC genetic map viewer | Cassavabase | |
| HapMapII sequence and SNP data | Cassavabase | |
| HapMapII SNPs (browser) | Cassavabase | |
| Bart Lab Cassava Atlas | none |