| Literature DB >> 25233906 |
Hideki Hirakawa1, Kenta Shirasawa1, Koji Miyatake2, Tsukasa Nunome2, Satomi Negoro2, Akio Ohyama2, Hirotaka Yamaguchi2, Shusei Sato1, Sachiko Isobe1, Satoshi Tabata1, Hiroyuki Fukuoka3.
Abstract
Unlike other important Solanaceae crops such as tomato, potato, chili pepper, and tobacco, all of which originated in South America and are cultivated worldwide, eggplant (Solanum melongena L.) is indigenous to the Old World and in this respect it is phylogenetically unique. To broaden our knowledge of the genomic nature of solanaceous plants further, we dissected the eggplant genome and built a draft genome dataset with 33,873 scaffolds termed SME_r2.5.1 that covers 833.1 Mb, ca. 74% of the eggplant genome. Approximately 90% of the gene space was estimated to be covered by SME_r2.5.1 and 85,446 genes were predicted in the genome. Clustering analysis of the predicted genes of eggplant along with the genes of three other solanaceous plants as well as Arabidopsis thaliana revealed that, of the 35,000 clusters generated, 4,018 were exclusively composed of eggplant genes that would perhaps confer eggplant-specific traits. Between eggplant and tomato, 16,573 pairs of genes were deduced to be orthologous, and 9,489 eggplant scaffolds could be mapped onto the tomato genome. Furthermore, 56 conserved synteny blocks were identified between the two species. The detailed comparative analysis of the eggplant and tomato genomes will facilitate our understanding of the genomic architecture of solanaceous plants, which will contribute to cultivation and further utilization of these crops.Entities:
Keywords: Solanum melongena L.; comparative analysis; eggplant; gene prediction; genome sequencing
Mesh:
Year: 2014 PMID: 25233906 PMCID: PMC4263298 DOI: 10.1093/dnares/dsu027
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1.Strategy for genome sequencing and hybrid assembly.
Statistics of the eggplant genome assembly SME_r2.5.1
| SME_r2.5.1 | ||
|---|---|---|
| Total | Number of sequences | 33,873 |
| Total length (bp) | 833,108,131 | |
| Average length (bp) | 24,595 | |
| Maximum length (bp) | 629,958 | |
| Minimum length (bp) | 473 | |
| N50 length (bp) | 64,536 | |
| A | 255,484,950 | |
| T | 254,643,398 | |
| G | 141,325,886 | |
| C | 142,070,567 | |
| N | 39,583,330 | |
| G+C% | 35.7 | |
| ≥500 bp | Number of sequences | 33,872 |
| Total length (bp) | 833,107,658 | |
| Average length (bp) | 24,596 | |
| ≥1 kb | Number of sequences | 30,983 |
| Total length (bp) | 831,088,565 | |
| Average length (bp) | 26,824 | |
| ≥5 kb | Number of sequences | 21,443 |
| Total length (bp) | 804,313,164 | |
| Average length (bp) | 37,509 | |
Figure 2.Cluster analysis of the predicted gene sequences. The numerals inside and outside the branckets in each compartment represent the number of genes and number of clusters, respectively.