| Literature DB >> 32184372 |
Tsuyoshi Tanaka1, Ryo Nishijima2, Shota Teramoto1, Yuka Kitomi1, Takeshi Hayashi1, Yusaku Uga1, Taiji Kawakatsu3.
Abstract
IR64 is a rice variety with high-yield that has been widely cultivated around the world. IR64 has been replaced by modern varieties in most growing areas. Given that modern varieties are mostly progenies or relatives of IR64, genetic analysis of IR64 is valuable for rice functional genomics. However, chromosome-level genome sequences of IR64 have not been available previously. Here, we sequenced the IR64 genome using synthetic long reads obtained by linked-read sequencing and ultra-long reads obtained by nanopore sequencing. We integrated these data and generated the de novo assembly of the IR64 genome of 367 Mb, equivalent to 99% of the estimated size. Continuity of the IR64 genome assembly was improved compared with that of a publicly available IR64 genome assembly generated by short reads only. We annotated 41,458 protein-coding genes, including 657 IR64-specific genes, that are missing in other high-quality rice genome assemblies IRGSP-1.0 of japonica cultivar Nipponbare or R498 of indica cultivar Shuhui498. The IR64 genome assembly will serve as a genome resource for rice functional genomics as well as genomics-driven and/or molecular breeding.Entities:
Keywords: De novo genome assembly; IR64; indica rice; linked-read sequencing; nanopore sequencing
Mesh:
Year: 2020 PMID: 32184372 PMCID: PMC7202035 DOI: 10.1534/g3.119.400871
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Schematic illustration of de novo assembly of the IR64 genome. Software used for analysis are indicated by italic.
Summary statistics for the linked-reads and MinION data
| Linked-Reads | MinION | |
|---|---|---|
| Number of reads | 910,295,956 | 1,449,788 |
| Total data size (bp) | 137,454,689,356 | 9,276,893,086 |
| Read depth (x) | 368 | 24 |
Genome size was 373 Mb based on IRGSP-1.0.
Figure 2Distribution of assembled sequence length. The x-axis represents sequence length of scaffolds/contigs and the y-axis represents covered length of genome. Linked-read scaffolds (blue), MinION contigs (orange) and merged scaffolds (yellow) are shown.
Summary statistics for the genome assemblies
| Platforms | Linked-Read | MinION | IR64 v.0 | IR64 v.1.0 |
|---|---|---|---|---|
| Number of sequences | 10,153 | 3,258 | 1,770 | 13 |
| Total length (bp) | 384,086,199 | 323,606,076 | 367,012,357 | 367,109,233 |
| N50 (bp) | 1,187,152 | 224,507 | 1,646,684 | 27,827,038 |
| Minimum length (bp) | 1,000 | 1,011 | 1,011 | |
| Maximum length (bp) | 6,875,104 | 1,431,035 | 9,584,587 | |
| Number of Ns (bp) | 22,492,160 | 0 | 19,621,636 | 19,672,336 |
| GC content wo Ns (%) | 42.8 |
Figure 3Genome alignment between the IR64 v.1.0 assembly and other rice reference genomes. Genome alignments were constructed using MUMMER4 A) between IRGSP-1.0 (japonica) and IR64 (indica), and B) between R498 (indica) and IR64 (indica). Numerals indicate chromosome number. Un indicate concatenated unanchored sequences. Red and blue dots represent forward and complement alignments, respectively. Arrows indicate the alignment gaps significantly larger than the resolution of IR64 v.1.0.
Figure 4Distribution of sequence similarity of genes between IR64 and R498 (black) and IRGSP-1.0 representative genes (white). Sequence alignment between transcripts (IRGSP-1.0 or R498) and genomes (IR64) and calculation of identity were performed using GMAP.
Number of IR64 v.1.0 genes that are missing in IRGSP-1.0 and R498 for individual chromosomes
| BOTH | IRGSP-1.0 | R498 | |
|---|---|---|---|
| chr01 | 249 | 422 | 148 |
| chr02 | 215 | 398 | 154 |
| chr03 | 206 | 318 | 113 |
| chr04 | 203 | 439 | 106 |
| chr05 | 162 | 265 | 112 |
| chr06 | 182 | 356 | 125 |
| chr07 | 145 | 364 | 87 |
| chr08 | 165 | 373 | 102 |
| chr09 | 156 | 353 | 82 |
| chr10 | 164 | 290 | 93 |
| chr11 | 180 | 365 | 96 |
| chr12 | 144 | 342 | 102 |
| chrUn | 478 | 490 | 95 |
| Total | 2,649 | 4,775 | 1,415 |
Figure 5Distribution of repeat elements in IR64 (blue), IRGSP-1.0 (orange), and R498 (gray). Repeat elements were calculated from the results of RepeatMasker.