Literature DB >> 33299551

The complete genome sequence of Stevia rebaudiana, the Sweetleaf.

Abstract

The Sweetleaf ( Stevia rebaudiana: Asteraceae) is widely grown for use as a sweetener. We present the whole genome sequence and annotation of this species. A total of 146,838,888 paired-end reads consisting of 22.2G bases were obtained by sequencing one leaf from a commercially grown seedling. The reads were assembled by a de-novo method followed by alignment to related species. Annotation was performed via GenMark-ES. The raw and assembled data is publicly available via GenBank: Sequence Read Archive ( SRR6792730) and Assembly ( GCA_009936405). Copyright:

Entities: Chemical Disease Species

Keywords: Stevia rebaudiana; Sweetleaf; annotation; assembly; genome

Year: 2020 PMID： 33299551 PMCID： PMC7676390 DOI： 10.12688/f1000research.24396.1

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

The Sweetleaf ( Stevia rebaudiana: Asteraceae) is cultivated commercially for use as a sweetener. The sweetness is due to various steviol glycosides, primarily stevioside and rebaudioside. These compounds have 200-300X the sweetness of sugar ( Abdullateef & Osman, 2012) but have no calories. The market for raw Stevia and derived products is expected to exceed 1B USD by 2021 ( International Stevia Council, 2017). Stevia rebaudiana has been used as a sweetener for centuries in Brazil and Paraguay ( Misra ). Botanist Moisés Santiago Bertoni first described the plant as growing in eastern Paraguay and noted its use as a sweetener ( Bertoni, 1899). Chemists Bridel and Lavielle isolated the glycosides stevioside and rebaudioside that give the leaves their sweet taste ( Bridel & Lavielle, 1931). The chemical structures of the aglycone steviol and its glycoside have been solved ( Mosettig & Nes, 1955). A complete genome sequence for this species will assist with discovering markers for crop yields, disease and drought resistance, and determining the biochemical pathways for the relevant metabolites.

Methods

A single commercially grown Stevia rebaudiana plant was used for this study (Behnke Nurseries, Beltsville, MD, USA). DNA extraction was performed on tissue from a single leaf using the Qiagen DNAeasy genomic extraction kit for plants, using the standard process. A paired-end sequencing library was constructed using the Illumina TruSeq kit, according to the manufacturer’s instructions. The library was sequenced on an Illumina Hi-Seq platform in paired-end, 2 × 150bp format. The resulting fastq files were trimmed of adapter/primer sequence and low-quality regions with Trimmomatic v0.33 ( Bolger ). The trimmed sequence was assembled by SPAdes v2.5 ( Bankevich ) followed by a finishing step using RagTag v1.0.0 ( Alonge, 2020) to make additional contig joins based on conserved regions in related plant species: Erigeron canadensis ( GCA_010389155), Mikania micrantha ( GCA_009363875), and Helianthus annuus ( GCA_002127325). Default parameters were used for all assembly steps. Annotation was performed using GeneMark-ES v2.0 ( Lomsadze ). Annotation was performed fully de novo without a curated training set and default parameters.

Results

The genome assembly yielded a total sequence length of 411,383,069 bp over 55,557 scaffolds with an N50 of 37,276,437. The GeneMark-ES annotation resulted in 24,994 genes.

Data availability

Underlying data

Raw and assembled data is publicly available via GenBank: Raw genome of Stevia rebaudiana, Accession number SRR6792730: https://www.ncbi.nlm.nih.gov/sra/?term=SRR6792730 Assembly of Stevia rebaudiana, Accession number ASM993640v1: https://www.ncbi.nlm.nih.gov/assembly/GCA_009936405.1/ Stevia is an economically important plant with no prior genomic resource for researchers. Standard methods were used for laboratory preparation, sequencing, assembly, and annotation of the genome. The resulting assembly and gene annotations are good based on the provided statistics. I would like to see that a voucher specimen of the sequenced plant (or a similar individual) at Behnke Nursuries was deposited in a herbarium. This is important for reproducible science and its omission almost causes me to mark that the methods are only "partly" sufficient. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Plant systematics. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors present the whole genome sequence of Sweetleaf, the plant where the sweetener, Stevia, is derived. I am surprised someone has not already sequenced this genome - it is certainly timely. The data are well presented and publicly available through NCBI. It is well-written and worthy of indexing. I only have minor comments below: Should the keywords repeat words found in the title? Can more be said in the Results about the genome? Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Sanger sequencing; mycology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors sequenced and assembled the genome of Sweetleaf by Illumina short-reads and annotated the genes using de novo prediction. The data could be useful to the filed, but some points need to be clarified: Commonly, the first step of genome assembly is to estimate the genome size using k-mer frequency distribution. The authors should add such analysis in the revision. Based on the results in (1), the genome size of Stevia rebaudiana is ca. 1.3Gb. I don't think SPAdes is a good choice as it only works well on the small geneomes. Gene prediction is too simple and may loose considerable genes. The prediction should include multifaceted information, such as de novo prediction, Homology-based genes prediction et al. Add a citation for "standard process" of DNA extraction and the model of sequencing platform, e.g. Hi-Seq 2500. It is better to include a table to show the statistics, such as N50, genome size, the number of genes. Both of the assemblies from short-reads and short-reads+RagTag should be provided, becasue RagTag may introduce certain bias. Are sufficient details of methods and materials provided to allow replication by others? Partly Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Partly Reviewer Expertise: Computational genomics, bioinformatics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

4 in total

1. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors: Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal: J Comput Biol Date: 2012-04-16 Impact factor: 1.479

2. Antidiabetic activity of medium-polar extract from the leaves of Stevia rebaudiana Bert. (Bertoni) on alloxan-induced diabetic rats.

Authors: Himanshu Misra; Manish Soni; Narendra Silawat; Darshana Mehta; B K Mehta; D C Jain
Journal: J Pharm Bioallied Sci Date: 2011-04

3. Gene identification in novel eukaryotic genomes by self-training algorithm.

Authors: Alexandre Lomsadze; Vardges Ter-Hovhannisyan; Yury O Chernoff; Mark Borodovsky
Journal: Nucleic Acids Res Date: 2005-11-28 Impact factor: 16.971

4. Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors: Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal: Bioinformatics Date: 2014-04-01 Impact factor: 6.937

4 in total

1 in total

1. The chromosome-level Stevia genome provides insights into steviol glycoside biosynthesis.

Authors: Xiaoyang Xu; Haiyan Yuan; Xiaqing Yu; Suzhen Huang; Yuming Sun; Ting Zhang; Qingquan Liu; Haiying Tong; Yongxia Zhang; Yinjie Wang; Chunxiao Liu; Lei Wu; Menglan Hou; Yongheng Yang
Journal: Hortic Res Date: 2021-06-01 Impact factor: 6.793

1 in total