| Literature DB >> 30101322 |
Hansheng Zhao1, Songbo Wang2,3, Jiongliang Wang1, Chunhai Chen2, Shijie Hao4, Lianfu Chen1, Benhua Fei1, Kai Han4, Rongsheng Li5, Chengcheng Shi4, Huayu Sun1, Sining Wang1, Hao Xu1, Kebin Yang1, Xiurong Xu1, Xuemeng Shan1, Jingjing Shi1, Aiqin Feng2, Guangyi Fan4, Xin Liu4,6, Shancen Zhao2,3, Chi Zhang2,3, Qiang Gao2, Zhimin Gao1, Zehui Jiang1.
Abstract
Background: Calamus simplicifolius and Daemonorops jenkinsiana are two representative rattans, the most significant material sources for the rattan industry. However, the lack of reference genome sequences is a major obstacle for basic and applied biology on rattan. Findings: We produced two chromosome-level genome assemblies of C. simplicifolius and D. jenkinsiana using Illumina, Pacific Biosciences, and Hi-C sequencing data. A total of ∼730 Gb and ∼682 Gb of raw data covered the predicted genome lengths (∼1.98 Gb of C. simplicifolius and ∼1.61 Gb of D. jenkinsiana) to ∼372 × and ∼426 × read depths, respectively. The two de novo genome assemblies, ∼1.94 Gb and ∼1.58 Gb, were generated with scaffold N50s of ∼160 Mb and ∼119 Mb in C. simplicifolius and D. jenkinsiana, respectively. The C. simplicifolius and D. jenkinsiana genomes were predicted to harbor 51,235 and 53,342 intact protein-coding gene models, respectively. Benchmarking Universal Single-Copy Orthologs evaluation demonstrated that genome completeness reached 96.4% and 91.3% in the C. simplicifolius and D. jenkinsiana genomes, respectively. Genome evolution showed that four Arecaceae plants clustered together, and the divergence time between the two rattans was ∼19.3 million years ago. Additionally, we identified 193 and 172 genes involved in the lignin biosynthesis pathway in the C. simplicifolius and D. jenkinsiana genomes, respectively. Conclusions: We present the first de novo assemblies of two rattan genomes (C. simplicifolius and D. jenkinsiana). These data will not only provide a fundamental resource for functional genomics, particularly in promoting germplasm utilization for breeding, but also serve as reference genomes for comparative studies between and among different species.Entities:
Mesh:
Year: 2018 PMID: 30101322 PMCID: PMC6117794 DOI: 10.1093/gigascience/giy097
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Morphological characteristics of C. simplicifolius and D. jenkinsiana. The pictures in series A and B display the different morphological characteristics of C. simplicifolius and D. jenkinsiana, respectively. (a1) A young C. simplicifolius; (a2) a developing C. simplicifolius; (a3) a climbing C. simplicifolius; (a4) a mature C. simplicifolius; (a5) a nursery of C. simplicifolius; (b1) a young D. jenkinsiana; (b2) a young forest of D. jenkinsiana; (b3) a nursery of D. jenkinsiana; (b4) leaves of D. jenkinsiana; (b5) inflorescences of D. jenkinsiana; (b6) young fruits of D. jenkinsiana. All the photos were taken by Prof Rongsheng Li.
Statistics of the clean data of the C. simplicifolius and D. jenkinsiana genomes
|
|
| ||||||
|---|---|---|---|---|---|---|---|
| Sequencing platform | Insert size | Read length (bp) | Total data (Gb) | Sequence Depth (×)a | Read length (bp) | Total data (Gb) | Sequence depth (×)a |
| Illumina | 270 bp | 150 | 160.9 | 82.09 | 150 | 98.21 | 61.38 |
| 500 bp | 125 | 60.2 | 30.71 | 125 | 56.9 | 35.56 | |
| 800 bp | 125 | 101.2 | 51.63 | 125 | 89.47 | 55.91 | |
| 2 Kb | 49 | 22.8 | 11.63 | 49 | 33.08 | 20.67 | |
| 5 Kb | 49 | 16.4 | 8.37 | 49 | 22.1 | 13.81 | |
| 10 Kb | 49 | 26.8 | 13.67 | 49 | 32.63 | 20.39 | |
| 20 Kb | 49 | 27.4 | 13.98 | 49 | 15.4 | 9.6 | |
| PacBio | 20 Kb | 9,079b | 78.38 | 39.99 | 9,131b | 78.38 | 48.75 |
| Hi-C | N.A. | 100 | 6.7 | 3.42 | 100 | 13.1 | 8.19 |
| Total | 500.78 | 255.5 | 439.27 | 274.26 | |||
aRead length for PacBio means the average length.
bSequencing depth was calculated based on a 1.98 Gb C. simplicifolius genome and 1.61 Gb D. jenkinsiana genome.
Metrics of the final assemblies of the C. simplicifolius and D. jenkinsiana genomes
|
|
| ||||
|---|---|---|---|---|---|
| Items | Hybrid assemblya | Hi-C assembly | Hybrid assemblya | Hi-C assembly | |
| Contig | Number | 29,973 | 29,973 | 27,631 | 27,631 |
| Size (bp) | 1,923,260,127 | 1,923,260,127 | 1,570,849,893 | 1,570,849,893 | |
| N50 (bp) | 99,304 | 99,304 | 89,562 | 89,562 | |
| N90 (bp) | 28,872 | 28,872 | 25,720 | 25,720 | |
| Scaffold | Number | 29,775 | 5,283 | 27,146 | 5,126 |
| Size (bp) | 1,923,287,712 | 1,935,533,712 | 1,570,878,714 | 1,581,888,714 | |
| N50 (bp) | 99,590 | 160,072,219 | 89,705 | 119,093,744 | |
| N90 (bp) | 28,922 | 93,668,489 | 25,828 | 61,330,142 | |
| Total number | >3 kb | 29,767 | 5,275 | 27,137 | 5,117 |
| >5 kb | 29,727 | 5,235 | 27,081 | 5,061 | |
| Longest sequence (bp) | 877,470 | 219,145,773 | 1,422,351 | 162,635,149 | |
| Shortest sequence (bp) | 1,286 | 1,286 | 719 | 719 | |
| Ratio of ambiguous bases (%) | 0.0 | 0.6 | 0.0 | 0.7 | |
| GC ratio (%) | 41.07 | 41.07 | 41.78 | 41.78 | |
aHybrid assembly means de novo assembly using Illumina and PacBio data in our study.
Figure 2:Hi-C contact map of the C. simplicifolius(a) and D. jenkinsiana genomes (b).(c) and (d) The Hi-C links on hic_scaffold_4 of C. simplicifolius and hic_scaffold_10 of D. jenkinsiana before (top) and after (bottom) conflict resolution. (e) and (f) The distribution of Hi-C link decay along the genomic distance.
Figure 3:Distribution of the sequence divergence rates of different TE types in the C. simplicifolius(a) and D. jenkinsiana(b) genomes.
Figure 4:The phylogenetic tree, orthologous gene families, and divergence times among C. simplicifolius, D. jenkinsiana, and eight other plants. (a) The phylogenetic tree was constructed by RAxML using all single-copy genes in the 10 species, and the divergence times were estimated using the MCMCTree program in the PAML software package. (b) Clusters of orthologous and paralogous gene families in C. simplicifolius, D. jenkinsiana, and other eight fully sequenced plants using OrthoMCL. (c) The numbers on the nodes are divergence times, and the red nodes indicate the calibration times.
Numbers of genes in gene families of the lignin biosynthesis pathway
| Family |
|
|
|
|
|
|
|
| Total |
|---|---|---|---|---|---|---|---|---|---|
| 4-coumarate CoA ligase | 9 | 13 | 12 | 13 | 12 | 15 | 13 | 16 | 90 |
| Coumarate 3-hydroxylase | 3 | 2 | 3 | 1 | 1 | 3 | 3 | 2 | 15 |
| Cinnamate 4-hydroxylase | 3 | 2 | 1 | 2 | 3 | 6 | 2 | 2 | 19 |
| Cinnamyl alcohol dehydrogenase | 29 | 22 | 9 | 7 | 10 | 14 | 17 | 11 | 102 |
| Caffeoyl-CoA 3-O-methyltransferase | 16 | 5 | 4 | 7 | 6 | 9 | 5 | 5 | 52 |
| Cinnamoyl-CoA reductase | 6 | 6 | 3 | 9 | 12 | 17 | 10 | 11 | 64 |
| Caffeic acid 3-O-methyltransferase | 13 | 16 | 11 | 4 | 6 | 4 | 11 | 5 | 59 |
| Ferulate 5-hydroxylase | 7 | 6 | 1 | 4 | 5 | 16 | 17 | 11 | 50 |
| Hydroxycinnamoyl-CoA | 5 | 4 | 3 | 12 | 6 | 16 | 7 | 13 | 59 |
| Laccase | 29 | 29 | 16 | 22 | 20 | 41 | 47 | 21 | 178 |
| Phenylalanine ammonia-lyase | 2 | 7 | 4 | 9 | 8 | 12 | 5 | 10 | 52 |
| Chalcone synthase | 31 | 17 | 4 | 7 | 17 | 12 | 13 | 27 | 115 |
| Peroxidase | 40 | 43 | 45 | 44 | 37 | 77 | 56 | 42 | 328 |
| Total | 193 | 172 | 116 | 141 | 143 | 242 | 206 | 176 | – |