| Literature DB >> 31273216 |
Shuangshuang Qin1,2, Lingqing Wu3, Kunhua Wei2, Ying Liang2, Zhijun Song2, Xiaolei Zhou2, Shuo Wang2, Mingjie Li1, Qinghua Wu2, Kaijian Zhang3, Yuanyuan Hui3, Shuying Wang3, Jianhua Miao4, Zhongyi Zhang5,6.
Abstract
Spatholobus suberectus Dunn (S. suberectus), which belongs to the Leguminosae, is an important medicinal plant in China. Owing to its long growth cycle and increased use in human medicine, wild resources of S. suberectus have decreased rapidly and may be on the verge of extinction. De novo assembly of the whole S. suberectus genome provides us a critical potential resource towards biosynthesis of the main bioactive components and seed development regulation mechanism of this plant. Utilizing several sequencing technologies such as Illumina HiSeq X Ten, single-molecule real-time sequencing, 10x Genomics, as well as new assembly techniques such as FALCON and chromatin interaction mapping (Hi-C), we assembled a chromosome-scale genome about 798 Mb in size. In total, 748 Mb (93.73%) of the contig sequences were anchored onto nine chromosomes with the longest scaffold being 103.57 Mb. Further annotation analyses predicted 31,634 protein-coding genes, of which 93.9% have been functionally annotated. All data generated in this study is available in public databases.Entities:
Mesh:
Year: 2019 PMID: 31273216 PMCID: PMC6609623 DOI: 10.1038/s41597-019-0110-x
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Morphological character of S. suberectus. (a) A picture of S. suberectus plant. (b) The vine stem of S. suberectus is called “chicken blood vines”. (c) The pod of S. suberectus has only one seed.
The sizes of sequencing data using various sequencing platforms.
| Pair-end libraries | Platform | Insert size | Total Data(G) | Read length (bp) | Sequence Coverage(X) |
|---|---|---|---|---|---|
| Illumina | Illumina HiSeq | 250 bp | 41.89 | 150 | 52.82 |
| 450 bp | 35.84 | 150 | 45.20 | ||
| Pacbio reads | Pacbio Sequel | 20 kb | 63.27 | — | 79.79 |
| 10× | Illumina HiSeq | 20 kb | 123.09 | 150 | 155.22 |
| Hi-C | Illumina HiSeq | 350 bp | 233.19 | 150 | 293.92 |
Statistics of characteristics of Pacbio long-read.
| Read_type | Read_base | Read_Number | Read_length (max) | Read_length (mean) | Read_length (N50) |
|---|---|---|---|---|---|
| Subreads | 63,270,110,556 | 6,710,707 | 122,873 | 9,428 | 14,288 |
Statistics of Hi-C sequencing and mapping.
| Statistics of mapping | ||
|---|---|---|
| Read1 | Read2 | |
| Total Reads | 10,000,000 | 10,000,000 |
| Unique Alignments | 7,869,514 | 7,702,126 |
| Multiple Alignments | 859,867 | 832,203 |
| Failed To Align | 866,895 | 1,073,869 |
| Unique Mapped Paired-end Reads | 6,056,459 | 6,056,459 |
|
| ||
| Unique Mapped Paired-end Reads | 6,056,459 | |
| Invalid Paired-end Pairs | 1,699,845 | |
| Valid Paired-end Reads | 4,356,614 | |
| Valid Rate (%) | 43.56 | |
| Cis-close (<10 Kbp) | 478,994 | |
| Cis-far (>10 Kbp) | 2,313,017 | |
| Trans | 1,564,603 | |
Cis-close (<10 Kbp): interactions between intrachromosomal read pairs less than 10 kb apart.
Cis-far (>10 Kbp): interactions between intrachromosomal read pairs more than 10 kb apart.
Trans: the alignable read pairs represent interchromosomal interactions.
Fig. 2Estimation of S. suberectus genome size by K-mer analysis.
Summary of S. suberectus genome assembly using PacBio long reads.
| Sample ID | Length | Number |
|---|---|---|
| Contig (bp) | Contig | |
| Total | 794,088,373 | 1,954 |
| Max | 8,229,915 | — |
| Number >=2000 | — | 1,928 |
| N50 | 2,057,658 | 114 |
| N60 | 1,446,732 | 161 |
| N70 | 1,036,389 | 226 |
| N80 | 673,988 | 322 |
Summary of S. suberectus genome assembly using PacBio long reads and 10X genomics data.
| Sample ID | Length | Number | ||
|---|---|---|---|---|
| Contig (bp) | Scaffold(bp) | Contig | Scaffold | |
| Total | 794,088,373 | 798,435,360 | 1,954 | 1,146 |
| Max | 8,229,915 | 27,701,983 | — | — |
| Number >=2000 | — | — | 1,928 | 1,120 |
| N50 | 2,057,658 | 6,903,381 | 114 | 34 |
| N60 | 1,446,732 | 5,179,305 | 161 | 47 |
| N70 | 1,036,389 | 3,931,704 | 226 | 64 |
| N80 | 673,988 | 2,630,391 | 322 | 89 |
Fig. 3Diagrammatic sketch of the annotation pipeline.
Fig. 4Circos Plot Showing the Genomic Features of S. suberectus. Concentric circles, from outermost to innermost, show (a) gene density (blue), (b) tandem repeats density (green), (c) transposon element density (purple), (d) LTR-Copia density (yellow), (e) LTR-Gypsy density (red) and intra-genome collinear blocks connected by curved lines. All distributions are drawn in a window size of 300 kb, chromosomes_ scale = 5,000,000 bp.
| Design Type(s) | sequence assembly objective • sequence annotation objective |
| Measurement Type(s) | whole genome sequencing assay |
| Technology Type(s) | DNA sequencing |
| Factor Type(s) | growth condition |
| Sample Characteristic(s) | Spatholobus suberectus • leaf |