| Literature DB >> 35388373 |
Kanae Nishii1,2, Michelle Hart1, Nathan Kelso1, Sadie Barber1, Yun-Yu Chen1,3, Marian Thomson4, Urmi Trivedi4, Alex D Twyford1,5, Michael Möller1.
Abstract
Cape Primroses (Streptocarpus, Gesneriaceae) are an ideal study system for investigating the genetics underlying species diversity in angiosperms. Streptocarpus rexii has served as a model species for plant developmental research for over five decades due to its unusual extended meristem activity present in the leaves. In this study, we sequenced and assembled the complete nuclear, chloroplast, and mitochondrial genomes of S. rexii using Oxford Nanopore Technologies long read sequencing. Two flow cells of PromethION sequencing resulted in 32 billion reads and were sufficient to generate a draft assembly including the chloroplast, mitochondrial and nuclear genomes, spanning 776 Mbp. The final nuclear genome assembly contained 5,855 contigs, spanning 766 Mbp of the 929-Mbp haploid genome with an N50 of 3.7 Mbp and an L50 of 57 contigs. Over 70% of the draft genome was identified as repeats. A genome repeat library of Gesneriaceae was generated and used for genome annotation, with a total of 45,045 genes annotated in the S. rexii genome. Ks plots of the paranomes suggested a recent whole genome duplication event, shared between S. rexii and Primulina huaijiensis. A new chloroplast and mitochondrial genome assembly method, based on contig coverage and identification, was developed, and successfully used to assemble both organellar genomes of S. rexii. This method was developed into a pipeline and proved widely applicable. The nuclear genome of S. rexii and other datasets generated and reported here will be invaluable resources for further research to aid in the identification of genes involved in morphological variation underpinning plant diversification.Entities:
Keywords: Gesneriaceae; Oxford Nanopore Technologies; PLCL pipeline; Streptocarpus rexii; genome assembly; high‐molecular weight DNA
Year: 2022 PMID: 35388373 PMCID: PMC8977575 DOI: 10.1002/pld3.388
Source DB: PubMed Journal: Plant Direct ISSN: 2475-4455
FIGURE 1. (a) Flowering plant. (b) Top view of the irregular “false rosette” of this rosulate species. (c) An excised individual leaf with inflorescences at its base, similar to the structure of a unifoliate Streptocarpus species. (d) Flower in front view. (e) Flower in side view. (f) Root tip mitotic late pro‐metaphase showing 32 chromosomes in a cell. (g) Chromosomes, above as schematic diagram showing the 16 unique chromosomes of one genome complement (n) aligned along the centromere by decreasing length with the NOR chromosome at the end (NOR in gray), and below as karyotype showing all 32 chromosomes arranged in pairs
Statistics of genome assembly and annotation
| Parameter | Values |
|---|---|
| Estimated genome size (Mb) | 929 |
| Assembled genome size (Mb) | 766 |
| Num. of total contigs | 5,855 |
| Num. of contigs ≥ 50 Kbp | 1,811 |
| Longest contig (bp) | 15,643,668 |
| Statistics (≥3,000 bp) | |
| GC (%) | 38.89 |
| N50 (bp) | 3,726,467 |
| N75 (bp) | 1,476,021 |
| L50 | 57 |
| L75 | 135 |
| BUSCO completeness (%) | 99.0 |
| Genome repeats (%) | 70.97 |
| Num. genes annotated | 45,045 |
| Average gene length (bp) | 2,609 |
| Num. of exons | 213,819 |
| Average num. exons per gene | 4.7 |
| Average exon length (bp) | 249 |
| BUSCO completeness of annotated gene set (%) | 89.6 |
| Chloroplast genome length (bp) | 152,571 |
| Mitochondrial genome length (bp) | 599,262 |
Based on Möller (2018).
Statistics of assembled genomes available for Gesneriaceae
|
|
|
| |
|---|---|---|---|
| Estimated genome size (Mb) | 929 | 511 | 1,691 (240 |
| Assembled genome size (Mb) | 766 | 478 | 1,548 |
| Num. of haploid chromosomes |
|
|
|
| N50 (bp) | 3,726,469 | 23,479,473 | 110,988 |
| L50 | 57 | 9 | 3,003 |
| Genome repeats (%) | 70.95 | 54.10 | 75.16 |
| Num. of nuclear scaffolds | 5,855 | 18 | 520,969 |
| Num. of annotated genes (this study) | 45,045 | 42,685 | 24,585 |
| Num. of annotated genes (references) | — | 31,328 |
23,250 (49,374 |
| Total candidate gene pairs (whole paranome gene pairs) | 84,617 | 49,048 | 56,877 |
| Anchor gene pairs | 20,197 | 9,919 | 760 |
| Orthologous gene pairs 1 | 12,160 | 12,160 | |
| Orthologous gene pairs 2 | 12,756 | 12,756 | |
| Orthologous gene pairs 3 | 10,923 | 10,923 |
Note: Statistic of genome assembly and annotation are based on Feng et al. (2020) for P. huaijiensis and Xiao et al. (2015) for D. hygrometricum unless noted. Orthologous gene pairs 1, S. rexii versus P. huaijiensis; 2, S. rexii versus D. hygrometricum; 3, P. huaijiensis versus D. hygrometricum.
Based on Zhao et al. (2014).
From Kiehn et al. (1998).
Num. of annotated gene models from genome in Xiao et al. (2015).
Num. of predicted gene models from transcriptome in Xiao et al. (2015).
Gesneriaceae annotation resources used in the Marker annotation pipeline
| Type of resource | Taxon | Reference |
|---|---|---|
| Genome repeat library |
| This study |
|
| This study | |
|
| This study | |
| Transcriptome |
| Xiao et al. ( |
|
| This study | |
|
| Ai et al. ( | |
|
| This study | |
|
| Chiara et al. ( |
Genome sequences obtained from NCBI.
Reads were obtained from SRA archive.
FIGURE 2Whole genome duplication (WGD) analyses in selected Gesneriaceae. (a–c) Distributions of synonymous substitutions (Ks) for the whole paranome and anchor gene pairs for each species. (a) . (b) Primulina huaijiensis. (c) . (d–f) Distributions of Ks values of one‐to‐one species' orthologous gene pairs between and P. huaijiensis alone (d), superimposed with Ks values for anchor gene pairs (e), and superimposed with Ks values for P. huaijiensis anchor gene pairs (f)
FIGURE 3chloroplast and mitochondrial genome assemblies. (a and b) Plots of blastn bitscores with chloroplast or mitochondrion queries to the assembled genome contigs of (x‐axis), and the contig coverage (y‐axis). The ID of contigs is shown for contigs selected as chloroplast or mitochondrial contigs. Green and red shaded markers indicate the selected chloroplast and mitochondrial contigs, respectively, found among the genome contigs. (c) circularized chloroplast genome assembly. (d) mitochondrial genome assembly
Benchmark statistics of the plant contig clustering‐based genome assembly (PLCL) pipeline
| Taxon |
|
|
| |||
|---|---|---|---|---|---|---|
| Num. total reads | 32,242,708 | 300,071 | 1,449,788 | |||
| Num. total bases (bp) | 158,689,714,464 | 3,421,779,258 | 9,275,443,298 | |||
| Num. total assembled contigs | 5,964 | 353 | 1,341 | |||
| cp | mt | cp | mt | cp | mt | |
| Num. contigs | 6 | 3 | 4 | 2 | 2 | 5 |
| Average coverage | 2,694 | 213 | 2,255 | 98 | 3,382 | 192 |
| Num. reads | 24,438 | 2,462 | 12,621 | 8,239 | 79,822 | 8,081 |
| Read N50 (bp) | 21,036 | 20,894 | 19,778 | 12,157 | 13,457 | 12,569 |
| Assembly result | ||||||
| Whole cp/mt genome | Yes | Unknown | Yes | No | Yes | No |
| Circular genome | Yes | No | Yes | No | Yes | No |
Note: Statistics for chloroplast (cp) and mitochondrial (mt) contigs and reads from the genome assemblies.
FIGURE 4Overview of the plant contig clustering‐based genome assembly (PLCL) pipeline developed and applied in the present study, with steps, workflow, and programs embedded in the pipeline. cp, chloroplast; mt, mitochondrial