| Literature DB >> 29257136 |
Shou-Jun Nie1, Yu-Qiang Liu1, Chun-Chao Wang2, Shi-Wei Gao1, Tian-Tian Xu2, Qing Liu1, Hui-Lin Chang1, Yu-Bao Chen3, Peng-Cheng Yan3, Wei Peng3, Tian-Qing Zheng2,4, Jian-Long Xu2,4, Zhi-Kang Li2,4.
Abstract
The early-matured japonica (Geng) rice variety, Suijing18 (SJ18), carries multiple elite traits including durable blast resistance, good grain quality, and high yield. Using PacBio SMRT technology, we produced over 25 Gb of long-read sequencing raw data from SJ18 with a coverage of 62×. Using Illumina paired-end whole-genome shotgun sequencing technology, we generated 59 Gb of short-read sequencing data from SJ18 (23.6 Gb from a 200 bp library with a coverage of 59× and 35.4 Gb from an 800 bp library with a coverage of 88×). With these data, we assembled a single SJ18 genome and then generated a set of annotation data. These data sets can be used to test new programs for variation deep mining, and will provide new insights into the genome structure, function, and evolution of SJ18, and will provide essential support for biological research in general.Entities:
Mesh:
Year: 2017 PMID: 29257136 PMCID: PMC5735919 DOI: 10.1038/sdata.2017.195
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Outline of the workflow used to generate and analyze the genome data for Suijing18 (SJ18).
Analyzed data resources for Suijing18 (SJ18) deposited at figshare or the Rice Functional Genomics and Breeding (RFGB) database.
| KEGG, Kyoto Encyclopedia of genes and genomes; ncRNA, non-coding RNA. | ||
|---|---|---|
| Suijing18 Denovo assembly version 1 | Data Citation 5 or | |
| Repeat-masked | Repeat masked data based on Suijing18 Denovo assembly version 1 | Data Citation 5 or |
| Gene annotation results | Annotated genes based on Suijing18 Denovo assembly version 1 | Data Citation 5 or |
| ncRNA annotated | Annotated ncRNAs based on Suijing18 Denovo assembly version 1 | Data Citation 5 or |
| Repeats annotated | Annotated Repeats based on Suijing18 Denovo assembly version 1 | Data Citation 5 or |
| Functional annotation results based on the alignments from KEGG | Annotated proteins based on Suijing18 Denovo assembly version 1 and KEGG database | Data Citation 5 or |
| Functional annotation results based on the alignments from UniProt | Annotated proteins based on Suijing18 Denovo assembly version 1 and Uniprot database | Data Citation 5 or |
| Pathway analysis results | Gene ontology analysis results based on Suijing18 Denovo assembly version 1 | Data Citation 5 or |
Figure 2Distribution of high quality reads from PacBio long-read sequencing (LRS) for Suijing18 (SJ18).
Comparisons between Suijing18 (SJ18) and the other datasets for representative assembled contigs publicly available and the annotated ncRNAs.
| tRNA, transfer RNA; snoRNA, small nucleolar RNA; snRNA, small ribonuclear RNA; rRNA, ribosomal RNA; miRNA, microRNA. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Total nucleotides (Mb) | 418.9 | 373.2 | 390.3–423.2 | 346.9 | 359.9 | 389.8 | 374.6–466.0 | 382.0 | 316.3 | 321.2 |
| N50 contig length (bp) | 2,467,626 | 7,711,345 | 1,185,206 | 2,339,070 | 3,097,358 | 28,500 | 6,690 | 17,000 | 22,200 | 25,500 |
| Total genes | 38,456 | 39,045 | 38,714 | 34,610 | 37,324 | 56,284 | 40,745 | 37,162 | 37,768 | 37,812 |
| tRNA | 434 | 244 | NA | 592 | 589 | NA | 734–993 | NA | NA | NA |
| snoRNA | 681 | NA | NA | 449 | 457 | NA | NA | NA | NA | NA |
| snRNA | 108 | NA | NA | 92 | 97 | NA | 3,374 | NA | NA | NA |
| rRNA | 88 | 724 | NA | 40 | 60 | NA | 752 | NA | NA | NA |
| miRNA | 173 | 146 | NA | 341 | 363 | NA | 3,806 | 1,155 | NA | NA |
Different types of repeat sequences found in the Suijing18 (SJ18) assembly (version 1).
| LTR, Long Terminal Repeats; SINE, Short Interspersed Nuclear Element; LINE, Long Interspersed Nuclear Element; EnSpm, Enhancer/Suppressor mutator; hAT, hobo-Ac-Tam3; MuDR, MuDR: A generic notation for a Mu transposon containing a sequence necessary to permit Mu transposition and related behaviors. The ‘DR’ is in honor of Dr Donald S. Robertson, who discovered and characterized the original Mutator lines. | ||||||
|---|---|---|---|---|---|---|
| Class I: Retrotransposon | 94,087,436 | 22.3 | 105,098,791 | 28.2 | 117,509,061 | 30.1 |
| LTR-Retrotransposon | 88,392,161 | 21.0 | 98,903,987 | 26.5 | 111,173,484 | 28.4 |
| LTR/Gypsy | 72,477,718 | 17.2 | 65,915,787 | 17.7 | 80,330,011 | 20.6 |
| LTR/Copia | 14,161,220 | 3.4 | 17,931,866 | 4.8 | 15,079,454 | 3.9 |
| LTR/Other | 1,753,223 | 0.4 | 15,056,334 | 4.0 | 15,764,019 | 4.0 |
| Non-LTR Retrotransposon | 5,695,275 | 1.4 | 6,194,804 | 1.7 | 6,335,577 | 1.6 |
| SINE | 362,005 | 0.1 | 796,311 | 0.2 | 848,061 | 0.2 |
| LINE | 5,333,270 | 1.3 | 5,398,493 | 1.5 | 5,487,516 | 1.4 |
| Class II: DNA Transposon | 69,249,868 | 16.4 | 40,716,340 | 10.9 | 40,743,536 | 10.4 |
| EnSpm/CACTA | 10,690,200 | 2.5 | 14,117,095 | 3.8 | 13,264,041 | 3.4 |
| hAT | 5,494,725 | 1.3 | 1,641,580 | 0.4 | 1,897,505 | 0.5 |
| Harbinger | 9,746,844 | 2.3 | 3,729,352 | 1.0 | 3,882,866 | 1.0 |
| Tc1/Mariner | 6,525,750 | 1.6 | 462,697 | 0.1 | 607,903 | 0.2 |
| MuDR | 16,081,157 | 3.8 | 5,872,349 | 1.6 | 5,993,554 | 1.5 |
| Helitron | 12,538,290 | 3.0 | 1,850,232 | 0.5 | 1,702,664 | 0.4 |
| Other | 8,172,902 | 1.9 | 13,043,035 | 3.5 | 13,395,003 | 3.4 |
| Other tandem repeat | 18,811,934 | 4.5 | 3,935,022 | 1.1 | 4,548,484 | 1.2 |
| Low Complexity | 313,050 | 0.1 | 22,478 | 0.0 | 17,672 | 0.0 |
| Unclassified | 13,830,076 | 3.3 | 1,117,819 | 0.3 | 1,607,036 | 0.4 |
| Total | 196,292,364 | 46.5 | 150,890,450 | 40.4 | 164,425,789 | 42.1 |