| Literature DB >> 22523606 |
Simon Prochnik, Pradeep Reddy Marri, Brian Desany, Pablo D Rabinowicz, Chinnappa Kodira, Mohammed Mohiuddin, Fausto Rodriguez, Claude Fauquet, Joseph Tohme, Timothy Harkins, Daniel S Rokhsar, Steve Rounsley.
Abstract
The starchy swollen roots of cassava provide an essential food source for nearly a billion people, as well as possibilities for bioenergy, yet improvements to nutritional content and resistance to threatening diseases are currently impeded. A 454-based whole genome shotgun sequence has been assembled, which covers 69% of the predicted genome size and 96% of protein-coding gene space, with genome finishing underway. The predicted 30,666 genes and 3,485 alternate splice forms are supported by 1.4 M expressed sequence tags (ESTs). Maps based on simple sequence repeat (SSR)-, and EST-derived single nucleotide polymorphisms (SNPs) already exist. Thanks to the genome sequence, a high-density linkage map is currently being developed from a cross between two diverse cassava cultivars: one susceptible to cassava brown streak disease; the other resistant. An efficient genotyping-by-sequencing (GBS) approach is being developed to catalog SNPs both within the mapping population and among diverse African farmer-preferred varieties of cassava. These resources will accelerate marker-assisted breeding programs, allowing improvements in disease-resistance and nutrition, and will help us understand the genetic basis for disease resistance.Entities:
Year: 2012 PMID: 22523606 PMCID: PMC3322327 DOI: 10.1007/s12042-011-9088-z
Source DB: PubMed Journal: Trop Plant Biol ISSN: 1935-9756 Impact factor: 1.512
Fig. 1a Overview of whole genome shotgun sequencing and assembly. Starting with plant material, many genomes’ worth of DNA is extracted, purified, fragmented, pooled by length and sequenced to a high level of redundancy with the aim of sequencing every region of the genome so that the chromosomal sequence can be generated (assembled) by overlapping fragments that have (near-)identical sequences. Longer range, paired-end sequence information is used to bridge sections of the genome that are not unique (repeats) and impossible to resolve by this approach b The phytozome genome browser (http://www.phytozome.net/cassava) provides a portal for accessing, browsing, searching and downloading all available cassava sequence and annotation data and for comparative plant genomic analysis
Cassava genomic and mRNA sequence data
| Sequence type | Source | Technology | Sequences generated | Notes |
|---|---|---|---|---|
| Genome shotgun | Roche | 454 Titanium | 39,259,112 | |
| 454 FLX Plus (experimental) | 10,785,244 | |||
| JGI | 454 Titanium | 21,581,680 | ||
| Sanger | 723,958 | |||
| U. Maryland & UC Davis | Sanger | 75,748 | BAC-end seq. | |
| Expressed sequence tags (ESTs) | Roche | 454 Titanium | 1.51 M reads (leaf) | 0.30 M after removing chloroplast and rDNA sequences |
| 454 Titanium | 1.19 M reads (root) | |||
| various | Sanger | 80,459 |
M. esculenta v. 4 assembly and repeat statistics
| Total assembled scaffold length | 532.5 Mb |
| Total assembled contig sequence length | 419.5 Mb (21% gaps) |
| Total number of scaffolds | 12,977 |
| Half of the assembly is in scaffolds longer than | 258.1 kbp (487 scaffolds) |
| Annotated repetitive portion of assembly | 37.5% |
| Gypsy-related sequence | 140 Mb (10M reads) |
| rDNA sequence | 54 Mb (3.9 M reads, 6 k copies) |
| polygalacturonase gene sequence | 12 Mb (850 k reads, 3 k copies) |
Protein-coding annotation comparisons to cassava
| cassava v. 4.1 | castor bean (TIGR v. 0.1) |
| soybean (JGI v. 1) | rice (MSU v. 6.0) | |
|---|---|---|---|---|---|
| Protein-coding gene loci | 30,666 | 31,221 | 27,416 | 46,367 | 40,838 |
| Alternate transcripts | 3,485 | 0 | 7,970 | 9,420 | 10,420 |
| Genes supported by one or more ESTs over >75% of length | 11,526 | 3,262 | 16,931 | 9,431 | 18,364 |
| Avg. number of exons/transcript | 5.1 | 4.2 | 5.3 | 6.0 | 4.3 |
| Median exon length (bp) | 148 | 142 | 155 | 142 | 163 |
| Median intron length (bp) | 166 | 169 | 99 | 185 | 176 |
| Transcripts with Pfam domain annotation (KOG orthology) | 20,641 (12,307) | 16,720 (9,321) | 19,419 (12,184) | 34,065 (20,601) | 20,766 (9,933) |
The protein-coding gene annotations in cassava are compared to castor bean, Arabidopsis, soybean and rice
Fig. 2a Circos plot showing cassava-castor bean colinearity. A sample 2.7 Mb region of the cassava genome was aligned against castor bean scaffolds with the Promer tool from Mummer v3.2, and visualized with Circos. Colored segments of the circle represent cassava while greyscale represents corresponding segments of castor bean genome. Green blocks in outer ring are contigs within the 2.7 Mb scaffold. The second ring represents repeat content in grey blocks, and the inner ring represents genes in red blocks. Grey lines linking cassava segments to castor bean segments represent homology at the protein level, and the fact that these lines do not cross over each other indicates colinearity. The central portion of the cassava scaffold, highlighted in yellow, is zoomed in to show more detail. b The same 2.7 Mb cassava scaffold as in (a; top) but aligned against other regions of the cassava genome that are highly similar (bottom). This demonstrates the presence of a duplication in the cassava genome