Literature DB >> 31641135

Draft genome sequences of two oriental melons, Cucumis melo L. var. makuwa.

Ah-Young Shin1, Namjin Koo2, Seungill Kim3, Young Mi Sim2, Doil Choi3,4, Yong-Min Kim5, Suk-Yoon Kwon6,7.   

Abstract

Oriental melon (Cucumis melo L. var. makuwa) is one of the most important cultivated cucurbits, and is grown widely in Northeast Asian countries. With increasing interest in its biological properties and economic importance, oriental melon has become an attractive model crop for studying various horticultural traits. A previous genome sequence of the melon was constructed from a homozygous double-haploid line. Thus, individual reference genomes are required to perform functional studies and further breeding applications. Here, we report draft genome sequences of two oriental melons, Chang Bougi and SW3. The assembled 344 Mb genome of Chang Bougi was obtained with scaffold N50 1.0 Mb, and 36,235 genes were annotated. The 354 Mb genome of SW3 was assembled with scaffold N50 1.6 Mb, and has 38,173 genes. These newly constructed genomes will enable studies of fruit development, disease resistance, and breeding applications in the oriental melon.

Entities:  

Mesh:

Year:  2019        PMID: 31641135      PMCID: PMC6805853          DOI: 10.1038/s41597-019-0244-x

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

The oriental melon (Cucumis melo L. var. makuwa), one of the most important annual diploid crops within the Cucurbitaceae family, is grown largely in Northeast Asian countries, including Korea, China, and Japan. It is cultivated primarily for its fruit, which generally has a sweet aromatic flavor and contains soluble sugars, organic acids, minerals, and vitamins[1-3]. Traits of the fruit, such as shape, skin color, flesh color, and sugar content, are highly variable. Because its economic importance and interest in its biological properties have increased, oriental melon has become an attractive model crop for the study of various traits. Reference genomes from genetically diverse individuals provide insights into genome structures, genome evolution, and diversification within the genus and species. For instance, precise comparison of genome structures and analyses about lineage-specific evolution of gene families in the genus Capsicum became possible through the completion of multiple reference genomes[4]. In the case of melon, a previous reference genome was constructed from the homozygous DHL92 double-haploid line[5], and subsequent improvements to the genome assembly and annotations were reported[6]. To carry out functional studies, evolutionary studies of gene families, link genetic markers to desirable traits, and further breeding applications in the oriental melon, multiple reference genomes will be required. Here, we report the construction of draft genomes of two oriental melon types, Chang Bougi and SW3. Chang Bougi, a Korean landrace, is a new source for the breeding of resistance to Cucumber Green Mottle Mosaic Virus (CGMMV), which causes mosaicism in leaves and deterioration of fruits, leading to severe yield and quality losses of cucurbit crops worldwide[7]. The high-quality breeding line SW3, from NongWoo Bio Company, contains deep-yellow and oval-type fruits with high sugar content. Figure 1 presents an overview of the study. A combination of paired-end (PE) and mate-pair (MP) libraries were sequenced to generate 231× and 345× of genomic sequencing data[8], respectively, for Chang Bougi and SW3 (Table 1). Genome assembly and annotation were then performed (Fig. 1). The assembled genome of Chang Bougi[9] comprised 11,309 scaffolds totaling 344 Mb in length, with scaffold N50 of 1.0 Mb. For SW3, 7,202 scaffolds totaling 354 Mb in length were assembled[10], with scaffold N50 of 1.6 Mb (Table 2). Repeat annotation was then carried out (Table 3). K-mer frequencies were calculated to provide information related to low frequencies, sequencing depth, level of heterozygosity, and genome size (Fig. 2)[11]. The estimated genome sizes of Chang Bougi and SW3 were 355 Mb and 373 Mb, respectively, which were similar to previously reported genome sizes[5]. A total of 36,235 and 38,173 genes were determined as final genes in Chang Bougi and SW3, respectively (Table 2 and Fig. 3). Then functional annotation of final gene models were performed (Table 4 and Fig. 4). Finally, we provide new reference genome of oriental melons for further analysis and breeding program.
Fig. 1

Overview of the pipeline of the study.

Table 1

Metrics of raw Illumina datasets.

Library TypeSamplesInsert Size (bp)Read Length (bp)Coverage (×)
Paired-endChang Bougi40015057.5
80015043.7
Mate-pair2,00015054.4
5,00015034.6
10,00015041.7
Paired-endSW3400150166.7
80015050.3
Mate-pair2,00015055.2
5,00015039.8
10,00015033.5
Table 2

Statistics of genome assembly and annotation.

Chang BougiSW3
Number of scaffolds11,3097,202
Total length of scaffolds (Mbp)344354
N50 of scaffolds (Mbp)1.01.6
Longest scaffold length (Mbp)6.85.6
Number of contigs43,25129,154
Total length of contigs (Mbp)325346
N50 of contigs (kbp)1525
Longest contig length (kbp)160214
Number of genes36,23538,173
Average/total CDS lengths1,083/39,426,1071,107/42,780,742
Average exon/intron lengths243/346248/356
Table 3

Statistics of repeat annotation.

TypeChang BougiSW3
Length (Mb)Ratio (%)LengthRatio
DNA elements39113710
LINE elements4151
SINE elements0000
LTR/Gypsy2983610
LTR/Copia3193410
LTR/Caulimoviridae4151
rDNA0000
Simple repeat6262
Others2131
Unclassified64196819
Total1795219454
Fig. 2

Distribution of 19-mers in raw sequence data from two oriental melon genomes. Distribution of 19-mers for Chang Bougi (blue) and SW3 (orange) are depicted. The x- and y-axes indicate frequency and volume of 19mers, respectively.

Fig. 3

Comparisons of gene models for two oriental melon genomes and other genomes. (a) Gene length distribution (b) CDS length distribution (c) Exon number distribution (d) Intron length distribution (e) Intron number distribution. x-axis stands for length (bp) of gene (a), CDS (b) and intron (d) or numbers of exon (c) and intron (e), respectively. y-axis stands for ratio of genes.

Table 4

Functional annotation of genes.

DatabaseChang BougiSW3
Annotated NumberAnnotated Percent (%)Annotated NumberAnnotated Percent (%)
NR39,52198.8641,99798.74
InterPro27,19868.0328,99668.17
GO18,77746.9719,84946.67
KEGG1,8844.711,5633.67
Annotated39,52498.8742,00798.76
Total39,97742,535
Fig. 4

Venn diagram of the number of genes having functional annotation in Chang Bougi and SW3 genomes using multiple public databases. Functional annotation of Chang Bougi (a) and SW3 (b) were primarily performed using Blast2Go. For genes that remained unassigned by Blast2Go, we used NR, GO, KEGG, and InterPro to assign gene function.

Overview of the pipeline of the study. Metrics of raw Illumina datasets. Statistics of genome assembly and annotation. Statistics of repeat annotation. Distribution of 19-mers in raw sequence data from two oriental melon genomes. Distribution of 19-mers for Chang Bougi (blue) and SW3 (orange) are depicted. The x- and y-axes indicate frequency and volume of 19mers, respectively. Comparisons of gene models for two oriental melon genomes and other genomes. (a) Gene length distribution (b) CDS length distribution (c) Exon number distribution (d) Intron length distribution (e) Intron number distribution. x-axis stands for length (bp) of gene (a), CDS (b) and intron (d) or numbers of exon (c) and intron (e), respectively. y-axis stands for ratio of genes. Functional annotation of genes. Venn diagram of the number of genes having functional annotation in Chang Bougi and SW3 genomes using multiple public databases. Functional annotation of Chang Bougi (a) and SW3 (b) were primarily performed using Blast2Go. For genes that remained unassigned by Blast2Go, we used NR, GO, KEGG, and InterPro to assign gene function.

Methods

DNA extraction and sequencing

Leaves of two oriental melons were harvested and frozen immediately in liquid nitrogen. Genomic DNA was extracted, and paired-end and mate-pair libraries for next-generation sequencing were constructed according to the manufacturer’s instructions (Illumina, San Diego, CA, USA). The quality of each library was validated using the KAPA SYBR FAST Universal 2× qPCR Master Mix (Kapa Biosystems, Boston, MA, USA). Each library was sequenced with the Illumina HiSeq 2500 platform.

Genome assembly

Pre-processing analyses of raw sequences, using in-house pipeline and genome assembly, were performed as described in previous studies[4,12]. After pre-processing to remove erroneous sequences in raw data, remaining sequences in paired-end libraries were assembled using Platanus[13], with parameters for Chang Bougi (-k 63 -c 5 -d 0.3 -t 40 -m 220) and for SW3 (-k 91 -c 5 -d 0.3 -t 44 -m 200). The scaffolding process was performed with Platanus, using paired-end and mate-pair sequences, with parameters for Chang Bougi (-l 3 -s 61 -u 0.2 -t 40), and for SW3 (-l 3 -u 0.2 -t 15). Remaining gaps were filled with Platanus and GapCloser version 1.10 (http://soap.genomics.org.cn/down/GapCloser_release_2011.tar.gz), using reads from the paired-end and mate-pair libraries. Finally, 344 Mb of Chang Bougi genomic sequence (96.9% of 355 Mb) and 354 Mb of SW3 genomic sequence (94.9% of 373 Mb) were assembled (Table 2).

Repeat annotation

After construction of repeat libraries using the assembled Chang Bougi and SW3 genomes, repeat annotation was implemented using RepeatModeler and RepeatMasker (http://www.repeatmasker.org). A total of 179 Mb (52% of 355 Mb) and 194 Mb (54% of 373 Mb) of repeat sequences were detected in Chang Bougi and SW3, respectively (Table 3).

Genome annotation

Annotation of the two genomes were performed using the KOBIC annotation pipeline (a modified PGA pipeline[14]), consisting of repeat masking, mapping of different protein sequence sets, and ab initio prediction performed by AUGUSTUS v3.2.2[15]. Transcript assembly was performed with the assembled genome by a reference-based algorithm using HISAT2[16] and StringTie[17]. To generate protein-based gene models for consensus modeling, the protein sequences of Arabidopsis thaliana (TAIR10, http://www.arabidopsis.org), Citrullus lanatus[18], Cucumis melo[5], and Cucumis sativus[19] were mapped using GeneWise v2.1[20]. AUGUSTUS was used for gene prediction in the two oriental melon genomes. To validate the predicted gene models, protein sequences from the genomes of C. lanatus, C. melo, C. sativus, and A. thaliana were used as queries in BLASTp, and erratic gene models were filtered with a BLASTp cut-off of query coverage ≥0.3. Also, the assembled transcripts were validated against the same four sets of protein sequences using tLBASTn, and filtered with cut-off values of query coverage ≥0.5 and subject coverage ≥0.3. The GeneWise gene models that remained were reformatted from GeneWise format to GFF3 data, and used to determine the consensus gene model via EVM[21], which combines ab initio gene predictions with protein alignments into weighted-consensus gene structures (ab initio predictions = 1, protein alignment = 5, transcript alignment assemblies = 7). Ultimately, the final gene models included a total of 36,235 consensus genes for Chang Bougi and 38,173 consensus genes for SW3 (Table 2 and Fig. 3). Further functional annotations were performed using the program Blast2Go[22], including InterPro[23], NR from NCBI, Kyoto Encyclopedia of Genes and Genomes (KEGG)[24]. Functional annotation of the final gene models (Table 4 and Fig. 4) predicted 2,093, 3,703, and 493 genes as hypothetical protein, uncharacterized protein, and unknown function, respectively, in the Chang Bougi genome. In the SW3 genome, respectively 2,245, 3,827, and 570 genes were predicted as hypothetical protein, uncharacterized protein, and unknown function.

Data Records

All of the raw sequence reads produced by Illumina HiSeq 2500 have been deposited at NCBI Sequence Read Archive (SRA) under BioProject number PRJNA531526 (accession SRP191487)[8] and BioSample from SAMN11368505 to SAMN11368524 (SAMN11368505 ~ SAMN11368515 for Chang Bougi; SAMN11368516 ~ SAMN11368524 for SW3). The Whole Genome Shotgun project of Chang Bougi have been deposited at DDBJ/ENA/GenBank under the accession number SSTD00000000[9] under PRJNA531576 and SAMN11370205. The Whole Genome Shotgun project of SW3 have been deposited at DDBJ/ENA/GenBank under the accession number SSTE00000000[10] under BioProject number PRJNA531478 and BioSample SAMN11381272.

Technical Validation

Detection and filtration of misannotated genes

EvidenceModeler predicted 39,977 and 42,535 consensus genes for Chang Bougi and SW3, respectively. We investigated these to detect misannotated genes, as recommended by NCBI GenBank, including genes containing internal stop codons, genes lacking a stop codon, frame-shifted genes, or erroneous start codons. A total of 3,742 and 4,362 misannotated genes were detected and masked in Chang Bougi and in SW3, respectively. Thus, 36,235 genes remained in the Chang Bougi genome, and 38,173 genes remained in SW3.

Evaluation of genome annotation using BUSCO

BUSCO v3.0.2[25] provides an assessment of assembled genome completeness based on the orthologous group, with single-copy genes from OrthoDB (http://www.orthodb.org), and using a hidden Markov model to profile amino acid alignments. For genome annotation assessments, we used 1,440 gene sets of orthologs conserved in embryophyta (Table 5). The results showed that nearly all of these core genes/orthologs were present in the genomes of Chang Bougi (85.28%) and SW3 (86.81%).
Table 5

The presence and completeness of universally conserved single-copy genes in Chang Bougi and SW3 (BUSCO) genomes.

Chang BougiSW3
Complete BUSCOs (C)12281250
Complete and single-copy BUSCOs (S)11991220
Complete and duplicated BUSCOs (D)2930
Fragmented BUSCOs (F)10387
Missing BUSCOs (M)109103
The presence and completeness of universally conserved single-copy genes in Chang Bougi and SW3 (BUSCO) genomes.

Comparison of gene sets in the genomes of oriental melons Chang Bougi and SW3 with those in the genomes of melon (DHL92 v3.6.1) and cucumber

To compare gene sets between oriental melons and previously reported cucurbit genomes, orthologous and paralogous genes were detected in melon genome (DHL 92 v3.6.1), Chang Bougi, SW3, and cucumber (C. sativus) using the program OrthoFinder[26]. A total of 113,006 sequences were clustered into 30,738 groups, with 3,475 and 4,469 singleton genes detected in Chang Bougi and in SW3, respectively (Fig. 5). Fewer singleton genes might be expected in the two oriental melons than in the melon genome, which was constructed from a homozygous DHL92 double-haploid line, derived from a cross between Korean landraces of oriental melon (Songwhan Chamoe, PI 161375) and melon (Piel de Sapo). In addition, 2,213 genes were determined as common among melon and the two oriental melons, and 12,983 genes were detected in all four genomes. Functional investigation of singleton genes of Chang Bougi and SW3 indicated that 869 and 1,112 of genes were functionally unknown genes, respectively.
Fig. 5

Distribution of orthologous gene families of Cucumis melo (DHL92 v3.6.1), Cucumis sativus, Chang Bougi, and SW3 genomes. A total of 113,006 sequences were clustered into 30,738 groups. Each panel shows the number of clustered genes for that genome.

Distribution of orthologous gene families of Cucumis melo (DHL92 v3.6.1), Cucumis sativus, Chang Bougi, and SW3 genomes. A total of 113,006 sequences were clustered into 30,738 groups. Each panel shows the number of clustered genes for that genome.
Measurement(s)DNA
Technology Type(s)DNA sequencing
Factor Type(s)reference genome
Sample Characteristic - OrganismCucumis melo
  20 in total

1.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding.

Authors:  Mario Stanke; Mark Diekhans; Robert Baertsch; David Haussler
Journal:  Bioinformatics       Date:  2008-01-24       Impact factor: 6.937

2.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

3.  The genome of the cucumber, Cucumis sativus L.

Authors:  Sanwen Huang; Ruiqiang Li; Zhonghua Zhang; Li Li; Xingfang Gu; Wei Fan; William J Lucas; Xiaowu Wang; Bingyan Xie; Peixiang Ni; Yuanyuan Ren; Hongmei Zhu; Jun Li; Kui Lin; Weiwei Jin; Zhangjun Fei; Guangcun Li; Jack Staub; Andrzej Kilian; Edwin A G van der Vossen; Yang Wu; Jie Guo; Jun He; Zhiqi Jia; Yi Ren; Geng Tian; Yao Lu; Jue Ruan; Wubin Qian; Mingwei Wang; Quanfei Huang; Bo Li; Zhaoling Xuan; Jianjun Cao; Zhigang Wu; Juanbin Zhang; Qingle Cai; Yinqi Bai; Bowen Zhao; Yonghua Han; Ying Li; Xuefeng Li; Shenhao Wang; Qiuxiang Shi; Shiqiang Liu; Won Kyong Cho; Jae-Yean Kim; Yong Xu; Katarzyna Heller-Uszynska; Han Miao; Zhouchao Cheng; Shengping Zhang; Jian Wu; Yuhong Yang; Houxiang Kang; Man Li; Huiqing Liang; Xiaoli Ren; Zhongbin Shi; Ming Wen; Min Jian; Hailong Yang; Guojie Zhang; Zhentao Yang; Rui Chen; Shifang Liu; Jianwen Li; Lijia Ma; Hui Liu; Yan Zhou; Jing Zhao; Xiaodong Fang; Guoqing Li; Lin Fang; Yingrui Li; Dongyuan Liu; Hongkun Zheng; Yong Zhang; Nan Qin; Zhuo Li; Guohua Yang; Shuang Yang; Lars Bolund; Karsten Kristiansen; Hancheng Zheng; Shaochuan Li; Xiuqing Zhang; Huanming Yang; Jian Wang; Rifei Sun; Baoxi Zhang; Shuzhi Jiang; Jun Wang; Yongchen Du; Songgang Li
Journal:  Nat Genet       Date:  2009-11-01       Impact factor: 38.330

4.  HISAT: a fast spliced aligner with low memory requirements.

Authors:  Daehwan Kim; Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2015-03-09       Impact factor: 28.547

5.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.

Authors:  Mihaela Pertea; Geo M Pertea; Corina M Antonescu; Tsung-Cheng Chang; Joshua T Mendell; Steven L Salzberg
Journal:  Nat Biotechnol       Date:  2015-02-18       Impact factor: 54.908

6.  The tomato genome sequence provides insights into fleshy fruit evolution.

Authors: 
Journal:  Nature       Date:  2012-05-30       Impact factor: 49.962

7.  New reference genome sequences of hot pepper reveal the massive evolution of plant disease-resistance genes by retroduplication.

Authors:  Seungill Kim; Jieun Park; Seon-In Yeom; Yong-Min Kim; Eunyoung Seo; Ki-Tae Kim; Myung-Shin Kim; Je Min Lee; Kyeongchae Cheong; Ho-Sub Shin; Saet-Byul Kim; Koeun Han; Jundae Lee; Minkyu Park; Hyun-Ah Lee; Hye-Young Lee; Youngsill Lee; Soohyun Oh; Joo Hyun Lee; Eunhye Choi; Eunbi Choi; So Eui Lee; Jongbum Jeon; Hyunbin Kim; Gobong Choi; Hyeunjeong Song; JunKi Lee; Sang-Choon Lee; Jin-Kyung Kwon; Hea-Young Lee; Namjin Koo; Yunji Hong; Ryan W Kim; Won-Hee Kang; Jin Hoe Huh; Byoung-Cheorl Kang; Tae-Jin Yang; Yong-Hwan Lee; Jeffrey L Bennetzen; Doil Choi
Journal:  Genome Biol       Date:  2017-11-01       Impact factor: 13.583

8.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Authors:  Alex L Mitchell; Teresa K Attwood; Patricia C Babbitt; Matthias Blum; Peer Bork; Alan Bridge; Shoshana D Brown; Hsin-Yu Chang; Sara El-Gebali; Matthew I Fraser; Julian Gough; David R Haft; Hongzhan Huang; Ivica Letunic; Rodrigo Lopez; Aurélien Luciani; Fabio Madeira; Aron Marchler-Bauer; Huaiyu Mi; Darren A Natale; Marco Necci; Gift Nuka; Christine Orengo; Arun P Pandurangan; Typhaine Paysan-Lafosse; Sebastien Pesseat; Simon C Potter; Matloob A Qureshi; Neil D Rawlings; Nicole Redaschi; Lorna J Richardson; Catherine Rivoire; Gustavo A Salazar; Amaia Sangrador-Vegas; Christian J A Sigrist; Ian Sillitoe; Granger G Sutton; Narmada Thanki; Paul D Thomas; Silvio C E Tosatto; Siew-Yit Yong; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

9.  Blast2GO: A comprehensive suite for functional analysis in plant genomics.

Authors:  Ana Conesa; Stefan Götz
Journal:  Int J Plant Genomics       Date:  2008

10.  Data, information, knowledge and principle: back to metabolism in KEGG.

Authors:  Minoru Kanehisa; Susumu Goto; Yoko Sato; Masayuki Kawashima; Miho Furumichi; Mao Tanabe
Journal:  Nucleic Acids Res       Date:  2013-11-07       Impact factor: 16.971

View more
  3 in total

1.  Identification of accession-specific variants and development of KASP markers for assessing the genetic makeup of Brassica rapa seeds.

Authors:  Yong Pyo Lim; Ah-Young Shin; Yong-Min Kim; Seongmin Hong; Su Ryun Choi; Jihyeong Kim; Young-Min Jeong; Ju-Sang Kim; Chun-Hee Ahn; Suk-Yoon Kwon
Journal:  BMC Genomics       Date:  2022-04-25       Impact factor: 4.547

2.  Cucurbitaceae genome evolution, gene function and molecular breeding.

Authors:  Lili Ma; Qing Wang; Yanyan Zheng; Jing Guo; Shuzhi Yuan; Anzhen Fu; Chunmei Bai; Xiaoyan Zhao; Shufang Zheng; Changlong Wen; Shaogui Guo; Lipu Gao; Donald Grierson; Jinhua Zuo; Yong Xu
Journal:  Hortic Res       Date:  2022-01-19       Impact factor: 7.291

3.  Draft genomes assembly and annotation of Carex parvula and Carex kokanica reveals stress-specific genes.

Authors:  Guangpeng Qu; Yuhong Bao; Yangci Liao; Can Liu; Hailing Zi; Magaweng Bai; Yunfei Liu; Dengqunpei Tu; Li Wang; Shaofeng Chen; Gang Zhou; Muyou Can
Journal:  Sci Rep       Date:  2022-03-23       Impact factor: 4.996

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.