| Literature DB >> 35752642 |
Aaron L Phillips1,2, Scott Ferguson3,4, Nathan S Watson-Haigh5,6, Ashley W Jones3,4, Justin O Borevitz3,4, Rachel A Burton1,2, Brian J Atwell7.
Abstract
Oryza australiensis is a wild rice native to monsoonal northern Australia. The International Oryza Map Alignment Project emphasises its significance as the sole representative of the EE genome clade. Assembly of the O. australiensis genome has previously been challenging due to its high Long Terminal Repeat (LTR) retrotransposon (RT) content. Oxford Nanopore long reads were combined with Illumina short reads to generate a high-quality ~ 858 Mbp genome assembly within 850 contigs with 46× long read coverage. Reference-guided scaffolding increased genome contiguity, placing 88.2% of contigs into 12 pseudomolecules. After alignment to the Oryza sativa cv. Nipponbare genome, we observed several structural variations. PacBio Iso-Seq data were generated for five distinct tissues to improve the functional annotation of 34,587 protein-coding genes and 42,329 transcripts. We also report SNV numbers for three additional O. australiensis genotypes based on Illumina re-sequencing. Although genetic similarity reflected geographical separation, the density of SNVs also correlated with our previous report on variations in salinity tolerance. This genome re-confirms the genetic remoteness of the O. australiensis lineage within the O. officinalis genome complex. Assembly of a high-quality genome for O. australiensis provides an important resource for the discovery of critical genes involved in development and stress tolerance.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35752642 PMCID: PMC9233661 DOI: 10.1038/s41598-022-14893-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Locations of O. australiensis seed collection sites for this study. List of wild rice accessions (obtained from the Australian Grain Genebank) used in this study reported by Yichie et al.[19] to demonstrate varying degrees of salt tolerance. Dots on the map show the occurrence of O. australiensis (retrieved from the Atlas of Living Australia). In-set table: accessions are ordered by salt tolerance, with Oa-KR being the most sensitive to salt, and Oa-VR being most tolerant[19].
Summary of reads used for each step of genome assembly, polishing, variant detection and annotation.
| Purpose | Sequencing platform | Sample | No. reads (Millions) | Mean Read Length (kbp) | Read N50 (kbp) | No. bases (Gbp) | Coverage (x) |
|---|---|---|---|---|---|---|---|
| Genome assembly and polishing | MinION FLO-MIN106 R9.4.1 revD | 0.45 | 26.24 | 40.31 | 12 | 12 | |
| 0.94 | 23.22 | 35.14 | 22 | 23 | |||
| 0.62 | 24.95 | 37.38 | 15 | 16 | |||
| Genome polishing | Illumina NovaSeq | 138.81 | NA | NA | 19 | 21 | |
| Genetic similarity | Illumina NovaSeq | 143.27 | NA | NA | 20 | 22 | |
| 87.27 | NA | NA | 12 | 13 | |||
| 139.67 | NA | NA | 19 | 21 | |||
| Genome annotation | PacBio Sequel II | 0.24 | 2.69 | NA | 0.65 | NA | |
| 0.18 | 2.29 | NA | 0.42 | NA | |||
| 0.3 | 2.16 | NA | 0.64 | NA | |||
| 0.3 | 2.75 | NA | 0.84 | NA | |||
| 0.24 | 2.46 | NA | 0.59 | NA |
See Fig. 1 for details on the accessions. KR 1, KR 2, and KR 3 are the reads obtained from a single genomic DNA preparation sequenced on three different flowcells. These reads were derived from the same O. australiensis KR plant and were used for the assembly of the reference O. australiensis KR genome. The same DNA preparation was used for the KR Illumina NovaSeq library preparation. CH, D, and VR (accession numbers appear in parentheses) refer to different accessions of O. australiensis that have been shown to vary in their tolerance to salt[20]. Reads from these accessions were used to estimate genetic similarity between the genotypes. Multiple O. australiensis KR plants were used for RNA extraction for Iso-Seq analysis.
Genome assembly statistics for O. australiensis keep River.
| Stage | Feature (unit) | Value |
|---|---|---|
| Long reads | Post-QC bases (Gbp) | 49.2 |
| Estimated coverage (x) | 51 | |
| Expected genome size (Mbp) | 965 | |
| Genome assembly (contigs) | Assembled genome size (Mbp) | 996 |
| Long read coverage (x) | 38 | |
| No. contigs | 1956 | |
| Contig N50 (Mbp) | 1.9 | |
| Contig L50 | 114 | |
| Contig N90 (kbp) | 186.7 | |
| Contig L90 | 799 | |
| BUSCO score (100% = 4896) | 91.9 | |
| Whole-genome LAI | 15.2 | |
| Genome scaffolds | Scaffolded genome size (Mbp) | 860.9 |
| Long read coverage (x) | 46 | |
| Placed contigs | 693 | |
| Unplaced contigs | 157 | |
| Total length of placed contigs (including Ns; Mbp) | 812 | |
| Total length of unplaced contigs (Mbp) | 46.9 | |
| Gaps (Mbp) | 2.1 | |
| BUSCO score (100% = 4896) | 97.5 | |
| Whole-genome LAI score | 17.6 | |
| LTR-RT content (Mbp) | 518.2 | |
| LTR-RT content (%) | 60.2 |
Figure 2Alignment of scaffolded O. australiensis KR contigs (named Chr1–12) to the O. sativa Nipponbare reference genome. The wild rice genome was aligned to the domestic rice genome using minimap2 and visualised using dotplotly. The 21 SVs that were investigated further are circled in red (some circles contain multiple SVs). Note: chromosomes do not appear in numerical order—dotPlotly orders the target sequence (O. sativa, here) by chromosome size; ChrUn is not included as it did not contain any large alignments.
Summary statistics of the Pac Bio Iso-Seq data.
| Tissue | # CCS reads | Mean length (kbp) | # Polished isoforms | |
|---|---|---|---|---|
| High quality | Low quality | |||
| Leaf | 242,841 | 2.69 | 25,805 | 6 |
| Coleoptile | 182,785 | 2.29 | 21,013 | 2 |
| Root tip | 297,806 | 2.16 | 33,619 | 3 |
| Growth zone | 304,934 | 2.75 | 35,048 | 12 |
| Developing grain | 239,213 | 2.46 | 23,176 | 6 |
Repeat elements in the O. australiensis KR scaffolds[51].
| Class | Superfamily | Count | Masked (Mbp) | Masked (%) |
|---|---|---|---|---|
| LTR-RT | 168,433 | 97.5 | 11.4 | |
| 299,854 | 319.2 | 37.2 | ||
| Unknown | 63,847 | 39.1 | 4.6 | |
| Non-LTR-RT | – | 365,794 | 167.2 | 19.5 |
| Total | – | 897,928 | 623 | 72.7 |
Figure 3Genetic distance between Oryza species/accessions derived from Illumina short read libraries by kWIP. Samples are coloured based on the Oryza genome clade that they occupy (see inset Legend). kWIP clustered samples into the canonical O. sativa (AA genome) and O. officinalis (BB, CC, BBCC, CCDD, EE genomes) complexes. The O. australiensis lineage is divergent from the rest of the O. officinalis genome complex, suggesting it harbours lineage specific adaptations that could be explored for stress tolerance. Within the O. australiensis lineage, the four genotype samples re-sequenced in the present study (KR, CH, VR, and D) show genetic distances that are correlated with geographic distance. The sample divergence between KR, CH, VR, and D, as well as the other O. australiensis accessions shown here suggests that there may be within-species genomic variation that can also be explored for tolerance to stresses (e.g., genotype-specific tolerance to salt stress). Samples that appear with red borders are suspicious as they did not cluster with their corresponding genome clade. This may be due to errors in the sequencing files that were not corrected prior to running kWIP or may reflect human error during sample preparation for sequencing.