| Literature DB >> 20729833 |
Agnes P Chan1, Jonathan Crabtree, Qi Zhao, Hernan Lorenzi, Joshua Orvis, Daniela Puiu, Admasu Melake-Berhan, Kristine M Jones, Julia Redman, Grace Chen, Edgar B Cahoon, Melaku Gedil, Mario Stanke, Brian J Haas, Jennifer R Wortman, Claire M Fraser-Liggett, Jacques Ravel, Pablo D Rabinowicz.
Abstract
Castor bean (Ricinus communis) is an oilseed crop that belongs to the spurge (Euphorbiaceae) family, which comprises approximately 6,300 species that include cassava (Manihot esculenta), rubber tree (Hevea brasiliensis) and physic nut (Jatropha curcas). It is primarily of economic interest as a source of castor oil, used for the production of high-quality lubricants because of its high proportion of the unusual fatty acid ricinoleic acid. However, castor bean genomics is also relevant to biosecurity as the seeds contain high levels of ricin, a highly toxic, ribosome-inactivating protein. Here we report the draft genome sequence of castor bean (4.6-fold coverage), the first for a member of the Euphorbiaceae. Whereas most of the key genes involved in oil synthesis and turnover are single copy, the number of members of the ricin gene family is larger than previously thought. Comparative genomics analysis suggests the presence of an ancient hexaploidization event that is conserved across the dicotyledonous lineage.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20729833 PMCID: PMC2945230 DOI: 10.1038/nbt.1674
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Genome assembly and annotation statistics
| All scaffolds | Scaffolds longer | |
|---|---|---|
| Fold genome coverage | 4.59 | 4.59 |
| Number of scaffolds | 25,828 | 3,500 |
| Total span | 350.6 Mb | 325.5 Mb |
| N50 (scaffolds) | 496.5 kb | 561.4 kb |
| Largest scaffold | 4.7 Mb | 4.7 Mb |
| Average scaffold length | 14 kb | 93 kb |
| Number of contigs | 54,000 | 24,500 |
| Largest contig | 190 kb | 190 kb |
| Average contig length | 6 kb | 13 kb |
| N50 (contigs) | 21,1 kb | |
| GC content | 32.5% | |
| Gene models | 31,237 | |
| Gene density | 11,220 bp/gene | |
| Mean gene length | 2,258.6 bp | |
| Mean coding sequence length | 1,004.2 bp | |
| Longest gene | 15,849 bp | |
| Mean number of exons per gene | 4.2 | |
| Mean exon length | 251 bp | |
| Longest exon | 6,590 bp | |
| GC content in exons | 44.5% | |
| Mean intron length | 381 bp | |
| Longest intron | 33,291 bp | |
| GC content in introns | 31.8% | |
| Mean intergenic region length | 6,846 bp | |
| Longest intergenic region | 691,597 bp | |
| GC content in intergenic regions | 30.7% |
Classification of repetitive sequences
| Length occupied (bp) | % of total repeats | % of genome | |
|---|---|---|---|
| Retrotransposons | 61,199,930.00 | 36.07 | 18.16 |
| Gypsy | 38,595,566 | 22.75 | 11.45 |
| Copia | 16,078,721 | 9.48 | 4.77 |
| Line | 465,220 | 0.27 | 0.14 |
| Sine | 1,867 | 0.00 | 0.00 |
| Other | 6,058,556 | 3.57 | 1.80 |
| Unclassified elements | 105,387,872 | 62.12 | 31.26 |
| DNA transposons | 3,065,391 | 1.81 | 0.91 |
| Total transposable elements | 169,653,193 | 25.33 | 50.33 |
| Low complexity sequences | 6,348,051 | 0.95 | 1.88 |
Figure 1Reciprocal best BLAST matches between castor bean genes
Strings of paralogous genes that correspond to triplicated regions are highlighted in the same color. The 30 pairs of scaffolds that contained the highest numbers of paralogous gene pairs are shown.
Figure 2Collinearity between three paralogous castor bean genomic regions and their putative orthologues in other dicot genomes
a) An example of a conserved paralogous triplication in the castor bean genome. b–e) Putative orthologous gene pairs are shown as colored lines connecting the castor bean scaffolds (noted as Rc:scaffold number) to chromosomes or scaffolds in the other dicot genome. In most cases, one copy of the paralogous castor bean genes corresponds to two genes in poplar (b), one gene in grapevine (c), and four genes in Arabidopsis (d). The castor bean-papaya relationship (e) is inconclusive. Numbers around the circles correspond to linkage group numbers (b), chromosome numbers (c and d), or scaffold numbers (e). Grapevine scaffolds that were mapped to chromosomes but their exact location is unknown are noted with an "r" (random). The size of the castor bean genomic regions is proportional in all circles. Additional castor bean paralogous regions and their corresponding orthologues from other dicots are shown in Supplementary Figure 3.
Figure 3Schematic representation of the members of the ricin/RCA lectin gene family in castor bean
Ricin protein domains are represented at the top by blue boxes, and gray boxes represent protein sequences from this gene family aligned to the ricin precursor protein sequence used as reference. The ruler indicates the amino acid coordinates. The ricin and RCA genes are indicated and the amino acid sequence length for each gene model is shown in parenthesis. Pairs of adjacent gene models that could belong to a single pseudogene are shown in gray.