Literature DB >> 33045048

A Chromosome-Level Genome Assembly of Dendrobium Huoshanense Using Long Reads and Hi-C Data.

Bangxing Han1, Yi Jing2, Jun Dai1, Tao Zheng2,3, Fangli Gu1, Qun Zhao1, Fucheng Zhu1, Xiangwen Song1, Hui Deng1, Peipei Wei1, Cheng Song1, Dong Liu1, Xueping Jiang1, Fang Wang1, Yanjun Chen1, Chuanbo Sun1, Houjun Yao1, Li Zhang1, Naidong Chen1, Shaotong Chen1, Xiaoli Li1, Yuan Wei4, Zhen Ouyang4, Hui Yan5, Jiangjie Lu6, Huizhong Wang6, Lanping Guo7, Lingdong Kong8, Jing Zhao9, Shaoping Li9, Lifen Luo10, Karsten Kristiansen3, Zhan Feng2, Silong Sun2, Cunwu Chen1, Zhen Yue2, Naifu Chen1.   

Abstract

Dendrobium huoshanense is used to treat various diseases in traditional Chinese medicine. Recent studies have identified active components. However, the lack of genomic data limits research on the biosynthesis and application of these therapeutic ingredients. To address this issue, we generated the first chromosome-level genome assembly and annotation of D. huoshanense. We integrated PacBio sequencing data, Illumina paired-end sequencing data, and Hi-C sequencing data to assemble a 1.285 Gb genome, with contig and scaffold N50 lengths of 598 kb and 71.79 Mb, respectively. We annotated 21,070 protein-coding genes and 0.96 Gb transposable elements, constituting 74.92% of the whole assembly. In addition, we identified 252 genes responsible for polysaccharide biosynthesis by Kyoto Encyclopedia of Genes and Genomes functional annotation. Our data provide a basis for further functional studies, particularly those focused on genes related to glycan biosynthesis and metabolism, and have implications for both conservation and medicine.
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Hi-C assembly; annotation; de novo assembly; genome; orchid

Year:  2020        PMID: 33045048      PMCID: PMC7846097          DOI: 10.1093/gbe/evaa215

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Significance Polysaccharides and alkaloids were identified as the active ingredients for the therapeutic effects of Dendrobium huoshanense. But the biosynthetic pathways by which they are generated remain poorly understood. In this article, we report a chromosome-level genome and a set of high-quality genes of Dendrobium huoshanense. Furthermore, we identified 252 genes responsible for polysaccharide biosynthesis by Kyoto Encyclopedia of Genes and Genomes (KEGG) functional annotation. The construction of the genomic architecture of a medically important orchid will accelerate genomic and medical studies of this species and Orchidaceae.

Introduction

Dendrobium is the second largest genus in Orchidaceae (Chaudhary et al. 2012). The genus includes over 1,000 species widely distributed across tropical and subtropical regions of Asia and Oceania. In China, 74 species and two varieties of Dendrobium have been described (Xu et al. 2006). Among them, Dendrobium huoshanense, Dendrobium officinale, and Dendrobium nobile have established therapeutic value (Ng et al. 2012). Dendrobium huoshanense is one of the most valuable traditional Chinese herbal medicines. This variety grows exclusively in Huoshan County in western Anhui Province. The plant has been listed as threatened and endangered in the Convention on International Trade in Endangered Species of Wild Fauna and Flora (Jin et al. 2016). Dendrobium huoshanense harbors a variety of characteristic compounds (polysaccharides, terpenoids, stilbenoids, alkaloids, flavonoids, etc.) with functions in oxidation resistance, immunity, liver protection, and tumor suppression (Xu et al. 2013; Zha et al. 2014; Xie et al. 2016). In traditional Chinese medicine, D. huoshanense had been used to treat various diseases as nourishing the stomach and nourishing Yin (Ng et al. 2012; Veronika et al. 2017). Recent studies have identified the active ingredients responsible for the therapeutic effects. Despite extensive evaluations of the chemical and pharmacological properties of D. huoshanense alkaloids (Li Juan et al. 2011), the biosynthetic pathways by which they are generated remain poorly understood. Analyses of the genome sequence and expressed sequence tags could improve our understanding of the biological mechanisms of action of natural active ingredients (Li Ying et al. 2010). The genomes of D. officinale and D. catenatum have recently been sequenced (Yan et al. 2015; Zhang et al. 2016). The assembled genomes are 1.35 Gb and 1.01 Gb, respectively and provide a basis for genome data mining aimed at elucidating biosynthetic pathways for medicinal polysaccharides and alkaloids. Although D. huoshanense is regarded as the species with the highest therapeutic value within the genus, the genome has not yet been published. In this article, we describe the plant material and full data sets used to assemble, annotate, and validate the D. huoshanense reference genome. First, PacBio sequencing data and Illumina whole-genome shotgun sequencing data were used for genome assembly. Second, we assessed the quality of the genome assembly by using data sets from the most recent version of BUSCO (Simao et al. 2015). Third, we annotated the assembled D. huoshanense genome. The assembled genome and annotation for D. huoshanense will accelerate research on this valuable plant.

Materials and Methods

Plant Material, DNA Extraction, and DNA Sequencing Library

Seeds of D. huoshanense collected from the D. huoshanense Conservation Center in Huoshan County were cultivated in sterile culture medium for 200 days. The leaves and root of a mature healthy plant were collected and stored at −80°C prior to DNA sequencing. The cetyltrimethyl ammonium bromide method was employed to isolate genomic DNA. The extracted DNA was used to construct a 20 kb PacBio library, a 500 bp Illumina paired-end library, and a 350 bp Hi-C library.

Whole-Genome Shotgun, Single-Molecular Real-Time, and Hi-C Sequencing

A 500 bp Illumina paired-end library was sequenced using the Illumina HiSeq X-Ten DNA Sequencer. The raw data were filtered using SOAPnuke1.5.6 (https://github.com/BGI-flexlab/SOAPnuke), with the following parameter settings to remove adaptor sequences and low-quality reads: -n 0.01 -l 20 -q 0.1 -i -Q 2 -G -M 2 -A 0.5 -d. A 20 kb PacBio library was constructed for sequencing using eight single-molecular real-time (SMRT) cells on the PacBio (sequel) platform. Raw reads were processed using the SMRT pipeline with a minimum read quality of 0.8. A 350 bp Hi-C sequencing library was constructed and sequenced using the MGISEQ-2000 DNA Sequencer. The clean data, after the removal of duplicates using Juicer (Durand et al. 2016), were processed using 3D-DNA (Dudchenko et al. 2017) for integration into the D. huoshanense genome assembly.

Genome Size Estimation and Genome Assembly

The genome size of D. huoshanense was estimated by a k-mer analysis of shotgun sequencing data using Jellyfish and KmerFreq v5.0 (Marçais and Kingsford 2011). The estimated genome size was 1.29 Gb (supplementary fig. 1, Supplementary Material online). The D. huoshanense genome was assembled using PacBio long-read sequencing data, followed by the integration of Illumina paired-end data and Hi-C sequencing data. First, the PacBio sequencing data were de novo assembled into contigs using smartdenovo (Liu et al. 2020) after a correction process with Canu (Koren et al. 2017) with the following parameter settings: minReadLength > 3,000 and minOverlapLength > 500. Next, Pilon (Walker et al. 2014) was used to improve the accuracy of the genome assembly by integrating Illumina sequencing data. Then, purge_haplotigs (Roach et al. 2018) was used with the parameters contigcov -l 5 -m 80 -h 190 and purge -a 65 to identify and delete duplicate contigs resulting from heterozygosity in the plant material. Finally, Hi-C (Lieberman-Aiden et al. 2009; Durand et al. 2016; Dudchenko et al. 2017) technology was used to anchor primary contigs to pseudo-molecules and remove redundancy.

Gene and Transposable Element Annotation

A combination of de novo prediction and homology-based methods was used to predict protein-coding genes. De novo prediction was performed using Semi-HMM-based Nucleic Acid Parser (Johnson et al. 2008) and AUGUSTUS (Stanke et al. 2006) trained with de novo assembled transcripts collected from four organs of D. huoshanense (the root, stem, leaf, and flower). Homology-based methods were based on the detection of homologous gene sets of five species, Arabidopsis thaliana, D. catenatum, Apostasia shenzhenica, Phalaenopsis equestris, and Oryza sativa in the D. huoshanense genome. After masking repetitive elements using RepBase in the genome assembly, MAKER (Holt and Yandell 2011) was used to integrate the de novo and homology-based prediction results. Functional annotation of the gene set was performed using Blast v2.2.31 (Altschul et al. 1990) to compare the genes with eight protein databases, including SwissProt, TrEMBL, Kyoto Encyclopedia of Genes and Genomes (KEGG), InterPro, NR, KOG, and GO. Both de novo prediction and homology-based prediction methods were used to detect transposable elements (TEs). For de novo prediction, a repetitive sequence data set was constructed using RepeatModeler (Tarailo-Graovac and Chen 2009). Then, RepeatMasker was used to search this data set for TEs. In the homology-based method, RepeatMasker v4.0.7 and RepeatProteinMask v4.0.7 were used to identify TEs by aligning the genome assembly to RepBase v21.12 (Bao et al. 2015).

Assessment of Genome and Gene Quality

Benchmarking Universal Single-Copy Orthologs (Simao et al. 2015) (BUSCO v3) with a total of 1,375 ortholog groups from the Embryophyta Dataset was used to assess the completeness of the genome assembly and gene sets predicted.

Results and Discussion

Genome Assembly and Gene Annotation

We generated 135.63 Gb whole-genome shotgun sequencing data; 139.15 Gb single-molecule real-time sequencing data; and 179.51 Gb Hi-C sequencing data with 105-fold, 108-fold, and 139-fold coverage (supplementary table 1, Supplementary Material online). The final genome assembly was 1.285 Gb in length, with a Contig N50 of 598 kb and Scaffold N50 of 71.79 Mb (table 1). We aligned the filtered reads to the assembled genome sequence using Burrows-Wheeler-Alignment Tool (Li and Durbin 2009) and calculated the base number and percentage of bases with different frequency depths in the genome. The Guanine and Cytosine content and average sequencing depth were approximately 38% and 60×, respectively (supplementary fig. 2, Supplementary Material online). We also generated a sequencing depth plot and found that the percentage of sequences with a depth of less than 10 was lower than 5% (supplementary fig. 3, Supplementary Material online).
Table 1

Summary of the genome assembly and annotation tables

Genome assemblyEstimated genome size1.29 Gb
Guanine and Cytosine content38%
N50 length (contig)598 kb
Longest contig6.11 Mb
Total length of contigs1.28 Gb
N50 length (scaffold)71.79 Mb
Longest scaffold100.20 Mb
Total length of scaffolds1.29 Gb
Transposable elementsAnnotationPercent (%)Total length
DNA5.5671.48 Mb
LINE12.04154.75 Mb
SINE0.01131.45 kb
LTR65.53842.36 Mb
Other0.009.35 kb
Unknown4.3856.29 Mb
Total74.920.96 Gb
Protein-coding genesPredicted genes21,070
Average transcript length (bp)9,877.52
Average coding sequence length (bp)1,202.62
Average exon length (bp)270.66
Average intron length (bp)2206.22
Functionally annotated20,904
Summary of the genome assembly and annotation tables We predicted 21,070 genes, with an average mRNA length of 9,877 bp, an average Coding DNA Sequence length of 1,202 bp, and an average intron length of 2,206 bp. A total of 20,904 genes were functionally annotated, accounting for 99.21% of the predicted genes. We also identified 1,495 non-coding RNA genes from the assembly. Furthermore, we functionally classified 14,552 (69.07%) D. huoshanense genes using KEGG. In particular, we identified 252 genes related to glycan biosynthesis and metabolism (fig. 1). These findings are consistent with previous experimental studies indicating the importance of polysaccharides in D. huoshanense (Xu et al. 2013; Zha et al. 2014).
. 1.

(A) Functional classification of Dendrobium huoshanense genes using the KEGG database. (B) Copia and Gypsy element distributions along the chromosomes of D. huoshanense. Copia and Gypsy densities represent the proportions of Copia and Gypsy elements within 1 Mb intervals.

(A) Functional classification of Dendrobium huoshanense genes using the KEGG database. (B) Copia and Gypsy element distributions along the chromosomes of D. huoshanense. Copia and Gypsy densities represent the proportions of Copia and Gypsy elements within 1 Mb intervals. We assessed the D. huoshanense genome assembly and quality of annotated gene models using version 3 of Benchmarking Universal Single-Copy Orthologs (Simao et al. 2015) (BUSCO) with a total of 1,375 ortholog groups from the Embryophyta Dataset. The assembled genome and annotated genes had greater than 86.6% and 78.1% completeness (supplementary table 2, Supplementary Material online), indicating a relatively complete genome assembly and gene prediction.

Transposable Element and Long Terminal Repeat Distributions

We identified 1.02 Gb repetitive elements and 0.96 Gb TEs in the assembled D. huoshanense genome. The TEs are summarized in supplementary table 3, Supplementary Material online. In brief, 65.53% of the assembled genome was long terminal repeats, 85% of which belonged to subtypes Copia and Gypsy. Gypsy elements were distributed evenly along the chromosomes, while Copia showed a biased distribution. On seven chromosomes, Copia elements were concentrated at one end, whereas on the remaining 12 chromosomes, they were concentrated near the chromosome center (fig. 1).

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online.
  24 in total

1.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

2.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Authors:  Guillaume Marçais; Carl Kingsford
Journal:  Bioinformatics       Date:  2011-01-07       Impact factor: 6.937

3.  SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

Authors:  Andrew D Johnson; Robert E Handsaker; Sara L Pulit; Marcia M Nizzari; Christopher J O'Donnell; Paul I W de Bakker
Journal:  Bioinformatics       Date:  2008-10-30       Impact factor: 6.937

4.  The Genome of Dendrobium officinale Illuminates the Biology of the Important Traditional Chinese Orchid Herb.

Authors:  Liang Yan; Xiao Wang; Hui Liu; Yang Tian; Jinmin Lian; Ruijuan Yang; Shumei Hao; Xuanjun Wang; Shengchao Yang; Qiye Li; Shuai Qi; Ling Kui; Moses Okpekum; Xiao Ma; Jiajin Zhang; Zhaoli Ding; Guojie Zhang; Wen Wang; Yang Dong; Jun Sheng
Journal:  Mol Plant       Date:  2014-12-24       Impact factor: 13.164

5.  EST analysis reveals putative genes involved in glycyrrhizin biosynthesis.

Authors:  Ying Li; Hong-Mei Luo; Chao Sun; Jing-Yuan Song; Yong-Zhen Sun; Qiong Wu; Ning Wang; Hui Yao; André Steinmetz; Shi-Lin Chen
Journal:  BMC Genomics       Date:  2010-04-28       Impact factor: 3.969

6.  Immunoregulatory activities of Dendrobium huoshanense polysaccharides in mouse intestine, spleen and liver.

Authors:  Xue-Qiang Zha; Hong-Wei Zhao; Vibha Bansal; Li-Hua Pan; Zheng-Ming Wang; Jian-Ping Luo
Journal:  Int J Biol Macromol       Date:  2013-12-24       Impact factor: 6.953

7.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

Authors:  Carson Holt; Mark Yandell
Journal:  BMC Bioinformatics       Date:  2011-12-22       Impact factor: 3.307

8.  Metabolic Analysis of Medicinal Dendrobium officinale and Dendrobium huoshanense during Different Growth Years.

Authors:  Qing Jin; Chunyan Jiao; Shiwei Sun; Cheng Song; Yongping Cai; Yi Lin; Honghong Fan; Yanfang Zhu
Journal:  PLoS One       Date:  2016-01-11       Impact factor: 3.240

9.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

Authors:  Sergey Koren; Brian P Walenz; Konstantin Berlin; Jason R Miller; Nicholas H Bergman; Adam M Phillippy
Journal:  Genome Res       Date:  2017-03-15       Impact factor: 9.043

10.  Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.

Authors:  Michael J Roach; Simon A Schmidt; Anthony R Borneman
Journal:  BMC Bioinformatics       Date:  2018-11-29       Impact factor: 3.169

View more
  8 in total

Review 1.  Natural Composition and Biosynthetic Pathways of Alkaloids in Medicinal Dendrobium Species.

Authors:  Cheng Song; Jingbo Ma; Guohui Li; Haoyu Pan; Yanfang Zhu; Qing Jin; Yongping Cai; Bangxing Han
Journal:  Front Plant Sci       Date:  2022-05-06       Impact factor: 6.627

2.  Qualitative Proteome-Wide Analysis Reveals the Diverse Functions of Lysine Crotonylation in Dendrobium huoshanense.

Authors:  Jing Wu; Xiaoxi Meng; Weimin Jiang; Zhaojian Wang; Jing Zhang; Fei Meng; Xiaoyan Yao; Mengjuan Ye; Liang Yao; Longhai Wang; Nianjun Yu; Daiyin Peng; Shihai Xing
Journal:  Front Plant Sci       Date:  2022-02-16       Impact factor: 5.753

Review 3.  Dendrobium huoshanense C.Z.Tang et S.J.Cheng: A Review of Its Traditional Uses, Phytochemistry, and Pharmacology.

Authors:  Leilei Gao; Fang Wang; Tingting Hou; Chunye Geng; Tao Xu; Bangxing Han; Dong Liu
Journal:  Front Pharmacol       Date:  2022-07-12       Impact factor: 5.988

4.  Selection of Suitable Reference Genes for Gene Expression Normalization Studies in Dendrobium huoshanense.

Authors:  Shanyong Yi; Haibo Lu; Chuanjun Tian; Tao Xu; Cheng Song; Wei Wang; Peipei Wei; Fangli Gu; Dong Liu; Yongping Cai; Bangxing Han
Journal:  Genes (Basel)       Date:  2022-08-19       Impact factor: 4.141

Review 5.  In-depth analysis of genomes and functional genomics of orchid using cutting-edge high-throughput sequencing.

Authors:  Cheng Song; Yan Wang; Muhammad Aamir Manzoor; Di Mao; Peipei Wei; Yunpeng Cao; Fucheng Zhu
Journal:  Front Plant Sci       Date:  2022-09-23       Impact factor: 6.627

6.  Genome-wide identification and adaptive evolution of CesA/Csl superfamily among species with different life forms in Orchidaceae.

Authors:  Jingjing Wang; Jing Li; Wei Lin; Ban Deng; Lixian Lin; Xuanrui Lv; Qilin Hu; Kunpeng Liu; Mahpara Fatima; Bizhu He; Dongliang Qiu; Xiaokai Ma
Journal:  Front Plant Sci       Date:  2022-09-29       Impact factor: 6.627

7.  Genome-Wide Analysis of PEBP Genes in Dendrobium huoshanense: Unveiling the Antagonistic Functions of FT/TFL1 in Flowering Time.

Authors:  Cheng Song; Guohui Li; Jun Dai; Hui Deng
Journal:  Front Genet       Date:  2021-07-09       Impact factor: 4.599

8.  The genome of Cymbidium sinense revealed the evolution of orchid traits.

Authors:  Feng-Xi Yang; Jie Gao; Yong-Lu Wei; Rui Ren; Guo-Qiang Zhang; Chu-Qiao Lu; Jian-Peng Jin; Ye Ai; Ya-Qin Wang; Li-Jun Chen; Sagheer Ahmad; Di-Yang Zhang; Wei-Hong Sun; Wen-Chieh Tsai; Zhong-Jian Liu; Gen-Fa Zhu
Journal:  Plant Biotechnol J       Date:  2021-08-25       Impact factor: 9.803

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.