| Literature DB >> 23369264 |
Haiyang Zhang, Hongmei Miao, Lei Wang, Lingbo Qu, Hongyan Liu, Qiang Wang, Meiwang Yue.
Abstract
The Sesame Genome Working Group (SGWG) has been formed to sequence and assemble the sesame (Sesamum indicum L.) genome. The status of this project and our planned analyses are described.Entities:
Mesh:
Year: 2013 PMID: 23369264 PMCID: PMC3663098 DOI: 10.1186/gb-2013-14-1-401
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Phylogenetic positions of sesame and the 36 land plants with available genome sequences. aRefers to sesame (S. indicum L.), a member of the Pedaliaceae family, only 34 genera of which have been entered in the NCBI taxonomy database.
Summary of Illumina data for the S. indicum genome
| Sequencing platform | Library type (n) | Insert size (bp) | Usable bases (Gb) |
|---|---|---|---|
| Illumina genome analyzer (Solexa) | Paired-end (12) | 300 | 28.12 |
| 500 | 44.51 | ||
| Mate-pair (5) | 2,000 | 7.23 | |
| 3,000 | 7.74 | ||
| 5,000 | 10.65 |
Overview of the current draft assembly of S. indicum
| Estimatedgenome size (Mb) | Genome assembly length (Mb) | Estimatedcoverage (%) | ContigsN50 (kb) | ContigsN90 (kb) | GC (%) | ScaffoldsN50 (kb) | ScaffoldsN90 (kb) |
|---|---|---|---|---|---|---|---|
| 354 | 293.7 | 82.9% | 19.0 | 3.9 | 34.6 | 22.6 | 4.3 |
Note: these statistics assume a genome size of 354 Mb. GC, guanine (G) + cytosine (C).
Figure 2K-mer (17mer) frequency analysis of the . Data produced from 500 bp insert libraries. The peak k-mer frequency is 39 and its minimum point is 10. Genome size was estimated with the formula: Estimated genome size (bp) = total number of k-mers with a frequency >10/peak k-mer frequency.
Repeats derived from de novo and homology-based predictions in S. indicum
| Repeat type | Repeat number | Length occupied (bp) | Percentage of sequences |
|---|---|---|---|
| Retroelements | 18,322 | 5,811,328 | 1.98 |
| SINEs | 8 | 328 | 0.00 |
| LINEs | 2,266 | 374,709 | 0.13 |
| LTR elements | 16,048 | 5,436,291 | 1.85 |
| DNA transposons | 3,349 | 571,933 | 0.19 |
| hobo-Activator | 305 | 43,075 | 0.01 |
| Tc1-IS630-Pogo | 1,232 | 155,117 | 0.05 |
| En-Spm | 96 | 55,227 | 0.02 |
| MuDR-IS905 | 2 | 347 | 0.00 |
| Total bases masked | 16,852,950 | 5.74 | |
| Unclassified repeatsa | 835,752 | 92,380,494 | 31.65 |
| Total interspersed repeats | 92,380,494 | 31.65 |
aUnclassified repeats refer to predicted repeats (sequences in the de novo repeats library) that cannot be classified by RepeatMasker.
Predicted genes in S. indicum
| Gene number | Average gene length (kb) | Average number of introns per gene | CDS GC (%) | Average length of introns (bp) | Average length of exons | Average length of CDS |
|---|---|---|---|---|---|---|
| 23,713 | 2.9 | 4.3 | 45 | 399.4 | 227.4 | 1.2 |
CDS, coding sequence; GC, guanine (G) + cytosine (C).
Figure 3Functional catalogues of sesame genes in the preliminary assembly. Results are summarized in three main categories: biological processes, cellular components and molecular functions. A total of 10,656 genes have been assigned with Gene Ontology terms.