| Literature DB >> 29893829 |
Mian Xia1, Xue Han2, Hang He2, Renbo Yu2, Gang Zhen2, Xiping Jia3, Beijiu Cheng1, Xing Wang Deng2.
Abstract
Background: Luo-han-guo (Siraitia grosvenorii), also called monk fruit, is a member of the Cucurbitaceae family. Monk fruit has become an important area for research because of the pharmacological and economic potential of its noncaloric, extremely sweet components (mogrosides). It is also commonly used in traditional Chinese medicine for the treatment of lung congestion, sore throat, and constipation. Recently, a single reference genome became available for monk fruit, assembled from 36.9x genome coverage reads via Illumina sequencing platforms. This genome assembly has a relatively short (34.2 kb) contig N50 length and lacks integrated annotations. These drawbacks make it difficult to use as a reference in assembling transcriptomes and discovering novel functional genes. Findings: Here, we offer a new high-quality draft of the S. grosvenorii genome assembled using 31 Gb (∼73.8x) long single molecule real time sequencing reads and polished with ∼50 Gb Illumina paired-end reads. The final genome assembly is approximately 469.5 Mb, with a contig N50 length of 432,384 bp, representing a 12.6-fold improvement. We further annotated 237.3 Mb of repetitive sequence and 30,565 consensus protein coding genes with combined evidence. Phylogenetic analysis showed that S. grosvenorii diverged from members of the Cucurbitaceae family approximately 40.9 million years ago. With comprehensive transcriptomic analysis and differential expression testing, we identified 4,606 up-regulated genes in the early fruit compared to the leaf, a number of which were linked to metabolic pathways regulating fruit development and ripening. Conclusions: The availability of this new monk fruit genome assembly, as well as the annotations, will facilitate the discovery of new functional genes and the genetic improvement of monk fruit.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29893829 PMCID: PMC6007378 DOI: 10.1093/gigascience/giy067
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Morphological characteristics of the fruit of S. grosvenorii (A), vertical section of fruit of S. grosvenorii (B), horizontal section of fruit of S. grosvenorii (C) and seeds (D). Size bar, 1 cm.
Figure 2:Candidate genes involved in the mogrosides biosynthesis pathway. Candidate functional genes were annotated as SQEs, EPHs, CDSs, CYP450s, and UGTs and assigned to the pathway.
SMRT reads used for genome assembly
| Statistics | Length (bp) |
|---|---|
| Total raw data | 31 G |
| Mean length of raw reads | 11 k |
| N50 of raw reads | 15 754 |
| Mean length of subreads | 7.7 k |
| N50 of subreads | 11,898 |
Subreads: reads without adapters and low-quality bases.
Metrics of de novo S. grosvenorii genome assembly
| Statistics | Contig | Contig (polished) |
|---|---|---|
| Total number | 4128 | 4128 |
| Total length (bp) | 467,072,951 | 469,518,713 |
| N50 length (bp) | 433,684 | 432,384 |
| N90 length (bp) | 36,820 | 36,953 |
| Max length (bp) | 7657,852 | 7683,850 |
| GC content (%) | 33.57 | 33.49 |
Summarized benchmarks of the BUSCO assessment
| Monk fruit (%) | |
|---|---|
| Complete BUSCOs | 89.2 |
| Complete and single-copy | 59.0 |
| Complete and duplicated | 30.2 |
| Partial | 2.7 |
| Missing | 8.1 |
Genome base accuracy estimated using resequencing short reads
| Number of variation | |||||||
|---|---|---|---|---|---|---|---|
| Sample | Mean depth | Coverage | 0/1 | 1/1 | 1/2 | Total | Error rate |
| Paired-end | 65.3 x | 92.99% | 1342,849 | 37,987 | 14,704 | 1395,540 | 1.21E-4 |
| Published | 80.0 x | 90.79% | 2569,592 | 172,906 | 16,777 | 2759,276 | 4.45E-4 |
High-quality genome criteria: 1E-4.
0: genotype that is identical to the reference, 1,2: genotype that is different from the reference.
Error rate = (Number of 1/1 + Number of 1/2)/(Genome size * Coverage).
Quality evaluation of the draft genome with the overall alignment rate
| Sample | Overall alignment rate |
|---|---|
| FL-1 | 89.93% |
| FL-2 | 87.75% |
| FL-3 | 85.83% |
| ML-1 | 89.70% |
| ML-2 | 89.73% |
| ML-3 | 85.07% |
| L-1 | 85.95% |
| L-2 | 87.39% |
| R-1 | 81.50% |
| R-2 | 84.36% |
| R-3 | 84.57% |
| F1–1 | 84.35% |
| F1–2 | 91.58% |
| F2–1 | 86.83% |
| F2–2 | 87.37% |
FL: female leaf, ML: male leaf, L: leaf, R: root, F1: fruit stage 1, F2: fruit stage 2.
Repeat annotation of the S. grosvenorii genome
|
|
|
| |||||
|---|---|---|---|---|---|---|---|
| Repeat classification | Length (bp) | Content | Length (bp) | Content | Length (bp) | Content | |
| Interspersed repeats | SINEs | 0 | 0.00% | 0 | 0.00% | 0 | 0.00% |
| LINEs | 9629,949 | 2.05% | 5183,926 | 1.82% | 2397,830 | 1.22% | |
| LTR | 67,499,840 | 14.38% | 34,217,647 | 11.98% | 8253,090 | 4.18% | |
| DNA elements | 9372,444 | 2.00% | 3460,431 | 1.21% | 2777,943 | 1.41% | |
| Unclassified | 147,311,542 | 31.38% | 75,056,338 | 26.28% | 37,539,553 | 19.03% | |
| Total | 233,813,775 | 49.80% | 117,918,342 | 41.29% | 50,967,966 | 25.84% | |
| Simple repeats | 5401,880 | 1.15% | 3451,508 | 1.21% | 3547,474 | 1.80% | |
| Low complexity | 1570,875 | 0.33% | 958,289 | 0.34% | 1095,406 | 0.56% | |
| Total | 240,122,745 | 51.14% | 122,111,538 | 42.75% | 55,540,243 | 28.15% | |
Gene prediction and annotation
| RNA-Seq data-based | Ab initio | Homology-based | Integration | Annotation | |||
|---|---|---|---|---|---|---|---|
| Weight | 10 | 0.1 | 5 | - | - | ||
| Number of predicted genes | 27,304 | 60,818 | 130,686 | 30,565 | nr | IPR | GO |
| 23,936 | 19,684 | 14,966 | |||||
| Tools | HISAT2 StringTie TransDecoder | RepeatMasker AUGUSTUS | BLAST GeneWise | EVM | BLAST | InterProScan | |
Figure 3:Number of best-matching proteins for each predicted S. grosvenorii gene by species.
Abundance analysis of the mogrosides synthesis related gene families
|
|
|
|
| |
|---|---|---|---|---|
| SQE | 5 (5) | 1 | 2 | 1 |
| EPH | 30 (8) | 23 | 29 | 22 |
| CYP450 | 276 (191) | 213 | 289 | 234 |
| UGT | 156 (131) | 124 | 137 | 121 |
| CDS | 1 (1) | 1 | 2 | 3 |
The numbers quoted are the number of genes belonging to each gene family annotated in monk fruit genome version 1.
Figure 4:Comparative genome analysis of the S. grosvenorii genome. (A) Orthologue clustering analysis of the protein-coding genes in the S. grosvenorii genome. (B) Venn diagram showing shared and unique gene families among four cucurbit plant species. Numbers represent the number of gene families in unique or shared regions. (C) Phylogenetic tree and divergence time of S. grosvenorii and seven other plant species. The phylogenetic tree was generated from 834 single-copy orthologues using the maximum-likelihood method. The divergence time range is shown in blue blocks. The numbers beside the branching nodes are the predicted divergence time.
Figure 5:KEGG pathway enrichment analysis of candidate functional genes