Hong Zhang1, Xuming Li2, Haiyan Yu2, Yongbing Zhang1, Meihua Li1, Haojie Wang1, Dengming Wang1, Huaisong Wang3, Qiushi Fu3, Min Liu2, Changmian Ji2, Liming Ma2, Juan Tang2, Song Li2, Jianshun Miao2, Hongkun Zheng4, Hongping Yi5. 1. Hami Melon Research Center, Xinjiang Academy of Agricultural Sciences, Urumqi, Xinjiang 830091, China. 2. Biomarker Technologies Corporation, Beijing 101200, China. 3. The Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China. 4. Biomarker Technologies Corporation, Beijing 101200, China. Electronic address: zhenghk@biomarker.com.cn. 5. Hami Melon Research Center, Xinjiang Academy of Agricultural Sciences, Urumqi, Xinjiang 830091, China. Electronic address: hpyi1223@163.com.
Abstract
Accurate reference genomes have become indispensable tools for characterization of genetic and functional variations. Here we generated a high-quality assembly of the melon Payzawat using a combination of short-read sequencing, single-molecule real-time sequencing, Hi-C, and a high-density genetic map. The final 12 chromosome-level scaffolds cover ∼94.13% of the estimated genome (398.57 Mb). Compared with the published DHL92 genome, our assembly exhibits a 157-fold increase in contig length and remarkable improvements in the assembly of centromeres and telomeres. Six genes within STHQF12.4 on pseudochromosome 12, identified from whole-genome comparison between Payzawat and DHL92, may explain a considerable proportion of the skin thickness. In addition, our population study showed that melon domesticated at multiple times from whole-genome perspective and melons in China are introduced from different routes. Selective sweeps underlying the genes related to desirable traits, haplotypes of alleles associated with agronomic traits, and the variants from resequencing data enable efficient breeding.
Accurate reference genomes have become indispensable tools for characterization of genetic and functional variations. Here we generated a high-quality assembly of the melon Payzawat using a combination of short-read sequencing, single-molecule real-time sequencing, Hi-C, and a high-density genetic map. The final 12 chromosome-level scaffolds cover ∼94.13% of the estimated genome (398.57 Mb). Compared with the published DHL92 genome, our assembly exhibits a 157-fold increase in contig length and remarkable improvements in the assembly of centromeres and telomeres. Six genes within STHQF12.4 on pseudochromosome 12, identified from whole-genome comparison between Payzawat and DHL92, may explain a considerable proportion of the skin thickness. In addition, our population study showed that melon domesticated at multiple times from whole-genome perspective and melons in China are introduced from different routes. Selective sweeps underlying the genes related to desirable traits, haplotypes of alleles associated with agronomic traits, and the variants from resequencing data enable efficient breeding.
Melon (Cucumis melo L.) is an economically important crop. The annual yield is estimated to be about 30 million tons worldwide, with China contributing to more than half of global melon production (http://www.fao.org). Melon is a eudicot diploid plant species (2n = 2x = 24), belonging to the Cucurbitaceae family. Melon is cultivated worldwide and mainly grows in temperate, subtropical, and tropical regions.Cucumis melo L. is classified into two sub-species: melo and agrestis. The origin of C. melo remains controversial as wild relatives can be found in Africa and Asia (Diaz et al., 2017). The sub-species agrestis is found mainly in Asia from India to the Far-East and in Africa and Central America. The sub-species melo is found in India, Central and Western Asia, Africa, Europe, and America (Pitrat, 2013). Wild melons are observed in these two sub-species. Until recently, independent domestication of wild melons in Africa and Asia resolved the geographic origin of melon (Endl et al., 2018). The traits in fruit morphology and quality are diverse in different horticultural groups. The wild and cultivated melons show contrasting phenotypic traits. The fruit, flower, and seed of wild melons are small, and the plant is monoecious. The fruit of wild melons is bitter, differing from the sweetness of cultivar melons. These are domestication traits and evidently different in wild and cultivar accessions.The draft genome of melon was released in 2012 (Garcia-Mas et al., 2012); however, it was sequenced using the short reads, resulting in a highly fragmented genome assembly. Although efforts had been made to anchor the genome into pseudochromosomes using genetic maps (Argyris et al., 2015), their marker coverage was not sufficient to anchor many gene-rich scaffolds or correct mistakes made in the genome assembly. Moreover, short-read sequencing has difficulty in traversing complex repeat structures, leading to incomplete gene models, less accurate representation of repeats, and biases in our understanding of genome biology (Gordon et al., 2016). The lack of an accurate genome assembly also hinders many studies that rely on genome sequences, such as resequencing, transcriptome, and genetic mapping. Until now, the current version of melon genome (v3.6.1) for DHL92 has more than 40,000 sequence gaps (Ruggieri et al., 2018), making it highly fragmented.The single-molecule real-time (SMRT) sequencing technology allows researchers to obtain long reads with tens of kilobases in size that can span repeat-rich genomic regions, thus significantly reducing sequencing gaps. The completeness of a variety of genome assemblies have been improved by using the SMRT technology, such as quinoa (Jarvis et al., 2017), goat (Bickhart et al., 2017), gorilla (Gordon et al., 2016), and citrus (Wang et al., 2017). In this study, we combined the SMRT sequencing and high-throughput chromosome interaction mapping (Hi-C) technologies to obtain a high-quality genome assembly of a cultivated melon species named Payzawat, a most typical representative of Cucumis melo sp. melo in China. Furthermore, resequencing a number of accessions allows a better understanding of the evolution of melon and provides candidate genes controlling phenotypic traits involved in domestication or diversification.
Results
De Novo Sequencing and Assembly
The SMRT sequencing technologies were used to assemble a high-quality reference genome for melon. A total of 27.80 Gb error-corrected PacBio sub-reads were individually assembled into the contigs with Canu (v1.5) (Koren et al., 2017) assembler and DBG2LOC (Ye et al., 2016), respectively (see Table S2). The resulting two draft assemblies (365 and 389 Mb, see Table S3) then were merged into a consensus assembly. The consensus was polished with PacBio sub-reads and Illumina reads. The estimated genome size for Payzawat was 398.57 Mb using kmer analysis (see Figure S1). As a result, the final genome assembly was 386 Mb with contig N50 and N90 of 2.86 Mb and 511.5 Kb, respectively (see Table S3). This assembly showed substantial improvements compared with the latest assembly of CM3.6.1 (Garcia-Mas et al., 2012), including in terms of assembly completeness, contiguity (gap filling), TE content, etc. With the aid of Hi-C interaction data, 95.53% (363.76 Mb) of the assembly was anchored onto 12 pseudochromosomes (see Table S4 and Figure 1). The Hi-C heatmap of the genome using Hi-C plotter (Akdemir and Chin, 2015) at 200-kb resolution showed the high accuracy of melon assembly (see Figure S2A).
Figure 1
An Overview of the Genomic Features of Payzawat Melon in 500-kb Intervals
The outer layer shows the 12 chromosomes. (a) Gene density. (b) Repeat coverage. (c) SNP density of 50 resequencing accessions. (d) GC content. (e) Duplications of genomic paralogous sequences in melon.
An Overview of the Genomic Features of Payzawat Melon in 500-kb IntervalsThe outer layer shows the 12 chromosomes. (a) Gene density. (b) Repeat coverage. (c) SNP density of 50 resequencing accessions. (d) GC content. (e) Duplications of genomic paralogous sequences in melon.
Assembly Assessment
To assess the accuracy of the anchored pseudochromosomes, the genetic markers yielded from an RIL population containing 119 individuals were mapped onto the genome assembly. The results showed that the orders of these markers were highly consistent with the pseudochromosomes (see Figure S2B).The conserved plant gene sets such as CEG and BUSCO and the specific repeats in melon were also used to evaluate the quality of the genome assembly. A total of 448 (97.82% of 458) CEGs, including 246 (99.19% of 248) highly conserved CEGs, could be found in the melon genome (see Table S7). BUSCO (Simao et al., 2015) assessment showed that 1,336 (92.78%) of gene models were complete, 30 (4.03%) gene models were fragmented, and 76 (5.14%) gene models were missing (Table S8). Melon-specific CentM centromeric repeats (GenBank accession no. 3929695) were found in all 12 chromosomes with a maximum length of 26,811 bp (see Figure S3 and Table S10). In CM3.6.1 genome, centromeric repeats were found only in ten chromosomes with a maximum length of 8,839 bp (see Figure S3 and Table S11). The maximum length of telomeres in Payzawat was 33,998 bp other than 118 bp in CM3.6.1 (see Figure S3 and Tables S12 and S13). In addition, 99.03% of the short reads and 71.82% of the Pacbio sub-reads could be correctly remapped to the assembly with 90.67% and 70% coverage, respectively. All these data verified the high-quality assembly of the Payzawat genome.
Genome Annotation
A total of 192.4 Mb repetitive sequences were identified in Payzawat genome, of which 83.2% (160 Mb) were retrotransposons and 10.02% (38.7 Mb) were DNA transposons (see b in Figure 1 and Table S14).Protein-coding genes were predicted by integrating evidence from RNA sequencing (RNA-seq)-based prediction, protein-homology-based prediction, and ab initio prediction. A total of high-quality 22,924 gene models were predicted in the Payzawat genome (see Table S15 and Figure S4), of which 22,298 (97.27%) genes were supported by RNA-seq or homolog evidence (see Figure S4 and Table S17) and 22,506 (98.18%) could be annotated by at least one of the public databases (see Table S20). The length distributions of mRNA, CDS, and intron were similar to species within the Cucurbitaceae family (see Figures S5A–S5C). The average length of gene model is 4,491 bp (see Tables 1 and S16), and the median length of mRNA is 3,166 bp in Payzawat longer than 2,326 bp in CM3.6.1. The median number of exons is four in Payzawat but three in CM3.6.1 (see Table 1). In total, 22,037 (96%) of protein-coding genes were allocated to 12 pseudochromosomes. The density of genes tended to be higher and the density of repeats tended to be lower at both ends of the chromosome (see a and b in Figure 1).
Table 1
The Statistics of Protein-Coding Genes in Payzawat and DHL92
Payzawat (C. melo L. cv. inodorus)
DHL92 (C. melo L. cv. DHL92, CM3.6.1)
Number of genes
22,924
29,980
Number of mRNAs
22,924
29,980
Number of 5′UTR
17,419
23,703
Number of 3′UTR
16,078
22,546
Median mRNA length
3,166
2,326
Median CDS length
452
321
Median intron length
2,227
3,046
Median exon number
4
3
Percentage of genome covered by genes
26.64
25.85
The Statistics of Protein-Coding Genes in Payzawat and DHL92
Whole-Genome Comparison between Payzawat and CM3.6.1 Genome
Genomic collinearity analysis between Payzawat and CM3.6.1 revealed that 76.7% (296.59 Mb) of Payzawat genomes have one-to-one syntenic blocks with 71.1% (296.29 Mb) of CM3.6.1, and these syntenic blocks accommodated 83.36% (19,110) of genes in Payzawat and 89.63% (26,869) in CM3.6.1, respectively (see Figure 2, and Table S23 and Figure S6).
Figure 2
Whole Genome Alignment between Payzawat and CM3.6.1
The red bar represents the chromosome of Payzawat, and the deep navy-blue bar represents the chromosome of CM3.6.1. The sky blue line indicates synteny between Payzawat and CM3.6.1.
Whole Genome Alignment between Payzawat and CM3.6.1The red bar represents the chromosome of Payzawat, and the deep navy-blue bar represents the chromosome of CM3.6.1. The sky blue line indicates synteny between Payzawat and CM3.6.1.A total of 1,761,822 SNPs and 735,486 indels were identified in syntenic blocks between the two genomes, respectively. Among them, 421,364 SNPs were located in the genic region (9,795 genes) and 33,636 in coding regions (6,971 genes) and 172,760 indels were located in the genic region (9,777 genes) and 4,548 in the coding region (1,842 genes) (Table S24, Figure 3A). For the SNPs within the coding regions, 3,128 SNPs (affecting796 genes) were non-synonymous or lead to changes at the start or stop codon. These genes were enriched in regulating fruit flavor (diterpenoid biosynthesis and the sulfur relay system) and cell wall biosynthesis (glycosylphosphatidylinositol [GPI]-anchor biosynthesis). The ratio of triple indels on the coding regions is higher than that on the non-coding regions (Figure 3B). The counts of SNPs and indels are higher on chr04, chr07, chr08, chr09, and chr12 (see Figure 3A). Genes with indels causing frameshift were mainly enriched in pathways relating to the cell wall (xylem development, glucuronoxylan metabolic process, xylan biosynthetic process, and D-xylose transport), flavor (organic substance transport and vacuolar transport), sugar metabolism (galactose transport, mannitol transport, sorbitol transport, and galactose transmembrane transporter activity), and fruit color (naringenin 3-dioxygenase activity). Some 49,394 structural variations, including GAP, JMP, INV, and SEQ, were detected on 5,852 genes (see Tables S25 and S26). In addition, 6,955 structural variations were detected in one-to-one syntenic blocks and 1,186 genes were affected. Among the genes located in syntenic blocks, approximately 5,181genes (23.51%, 22,037 in pseudochromosomes) of Payzawat showed variations in coding regions.
Figure 3
Genome-Wide Comparison between Payzawat and DHL92
(A) Distribution of SNP and indel count on the genome and coding regions at 100-Kb interval and the distribution of QTLs.
(B)The bar plot of triple and non-triple indels on genomic and coding regions.
(C) QTL STHQF12.4 regions on chromosome 12. Each blue line indicates one gene.
(D)The mutation of gene on DHL92 compared with Payzawat.
Genome-Wide Comparison between Payzawat and DHL92(A) Distribution of SNP and indel count on the genome and coding regions at 100-Kb interval and the distribution of QTLs.(B)The bar plot of triple and non-triple indels on genomic and coding regions.(C) QTL STHQF12.4 regions on chromosome 12. Each blue line indicates one gene.(D)The mutation of gene on DHL92 compared with Payzawat.The skin (epidermis) of DHL92 (genome version CM3.6.1) is thin, whereas that of Payzawat is thick (see Figures S7A and S7B). To reveal the candidate genes involved in skin thickness, we analyzed the epidermis thickness-related quantitative trait loci (QTLs) (Diaz et al., 2011) and genes harboring SNPs, indels, and structural variations within these regions. QTL STHQF12.4, controlling the skin thickness, is located at 23,803,761–26,907,220 of chr12, consisting of 455 genes. Six genes (EVM0008156: pectinesterase, EVM0009214: gibberellin 20 oxidase 1-B-like, EVM0016862: pectate lyase-like, EVM0014824: pectinesterase, EVM0014071: pectinesterase, and EVM0004080: gibberellin 20 oxidase 1-B-like) with structural variants (SVs) were implicated in cell wall biosynthesis (see Figure 3C). The insertion of 76 bp on exon1 of EVM0014842 on DHL92 leads to the truncated protein, and the variation on a region encompassing the start codon of EVM0014071 and EVM0016862 disables the function of these genes on DHL92 (Figure 3D). The sequence alignment showed the difference between the two genomes (see Figure S9). The other three genes in these two genomes were highly divergent. The variation status of six genes was checked in 49 resequencing samples, and the group information is shown in Figure 4B. The ratios of genes with variation in the TC (thick epidermis) group (group three) were higher compared with the TN (thin epidermis) group, such as EVM0008156, EVM0009214, and EVM0014824, and some genes showed the contrary, such as EVM0004080 and EVM0014071. Variation of EVM0016862 only can be detected in R13 (African horned melon [Cucumis metuliferus]) (see Figure S10). The variants for EVM0016862 between Payzawat and DHL92 may be not real due to the poor assembly quality of the gene. 15 samples downloaded from NCBI (see Table S29), consisting of thick- and thin-skinned melons, were used to test the expression of six genes. Except for one gene, which did not express at any sample, other five genes expressed more or less in the thick-epidermis melon (see Figure S11). All the above-mentioned evidence indicates that the six genes may function in determining the thickness of the epidermis.
Figure 4
Population Structure Analysis
(A) The geographical distribution of resequencing samples.
(B) Neighbor-joining phylogenetic tree of the 50 melon accessions.
(C) K value estimation.
(D) Population structure of 49 resequencing accessions estimated by ADMISTURE. Each color indicates one ancestral population. Each bar indicates one accession.
(E) Decay of linkage disequilibrium for wild melon, subclade two and subclade four.
Population Structure Analysis(A) The geographical distribution of resequencing samples.(B) Neighbor-joining phylogenetic tree of the 50 melon accessions.(C) K value estimation.(D) Population structure of 49 resequencing accessions estimated by ADMISTURE. Each color indicates one ancestral population. Each bar indicates one accession.(E) Decay of linkage disequilibrium for wild melon, subclade two and subclade four.The chromosome-level genomes for Payzawat and CM3.6.1 enable the identification of large structural variations and complex genome rearrangement. Large structural variations between Payzawat and CM3.6.1 are intra-chromosomal translocation and inversion. To exclude the assembly error leading to these differences, PacBio sub-reads were mapped to the inversion breakpoints of five large inversions on chromosome 6 (see Tables S27 and S28 and Figure 2). All breakpoints were covered by PacBio sub-reads (see Figure S8), validating the quality of Payzawat assembly. The structure of Payzawat chromosome 6 is also supported by Hi-C contact frequency heatmapping. This discrepancy may be the true difference between the two lines or the erroneous assembly of CM3.6.1 in this region. Some QTLs, EayQL6.1 for early yield, FDQN6.1 for fruit diameter, PHYQN6.2 for phytoene content, β-carQN6.1 for α-carotene content, and fwi6.1 for fruit width (Diaz et al., 2011), were found to be co-located within the inversion regions on chromosome 6.Payzawat is characterized as large fruit size, sweetness, and orange fleshy pulp, whereas DHL92 is medium fruit size and white fleshy pulp (see Figures S7A and S7B). Comparative genomic analysis uncovers numerous line-specific variations, which may serve as the important resource for such trait differences.
Genome Resequencing and Population Analysis
Resequencing of the 50 melon accessions (Figure 4A) generated a total of clean 90,443,068 paired-end (PE) reads, with an average of 18X depth per accession (Tables S30 and S31), and the mapping rate ranged from 86.74% to 95.63% when R13 is excluded owing to its distantly phylogenetic relationship with species melo (see Tables S31 and S32).A total of 240,775 (4.65%; 5,174,489 in total) and 12,794 indels (0.97%; 1,316,073 in total) were located in coding sequences (CDS), among which 98,701 SNPs were non-synonymous and 6,453 indels led to frameshift mutations.Neighbor-joining phylogenetic tree recovered from 50 melon accessions showed two clades (see Figure 4B). The sub-species agrestis did not cluster into one clade. Cultivars/landraces and the wild accessions of northern and southern China in sub-species agrestis formed a subclade and then grouped with the Indian wild subclade, indicating China may be one Asian diversifying center, supporting the idea that some oriental Asian melons originated from India, then spread into northern and southern China before domestication (Akashi et al., 2002, Yashiro et al., 2005). The cultivars/landraces on sub-species agrestis was designated as Group two, with a characteristic thin epidermis (TN group) (see Figure S7D).The cultivars/landraces on sub-species melo, designated as Group three, have the characteristic of a thick epidermis (TC group) (see Figure S7B), with different geographical origins first grouped together and then grouped with the Indian wild subclade, which supported the opinion that cultivars/landraces were introduced to western China via the Silk Road (Kitamura, 1951) after the domestication occurred in India. The wild melons on these two clades were designated as Group one. The fruit of wild-type (WT) is small compared with that of cultivar (CV) (see Figure S7C).The evolutionary history of melons in China was further explored with Delta K (individual ancestry coefficients) values. Results revealed that four populations (K = 4) represent the best model (see Figure 4C) and new sub-populations arose from each clade with K from 2 to 4 (see Figure 4D). When K was set to 4, new sub-populations arose from the wild species, indicating their high diversity and distinct phylogenetic relationships with domesticated melons. The cultivars/landraces were segregated into two sub-populations revealing the geographical distributions thereof in China.Together, these findings led us to propose the melon evolutionary history in China, revealing that China may be one of the diversifying centers and illustrating the idea that melons in different regions of China originated from distinct Indian regions and spread into China via multiple routes.Population genetic analysis among groups was performed to test whether, or not, China was a diversification center. Nucleotide diversity (π) and Tajima's D were highest in the WT group (π = 0.0020285 and Tajima's D = 1.59854), followed by Group three (TC group, π = 0.0011642 and Tajima's D = 0.87328), and lowest in Group two (TN group, π = 0.0002681and Tajima's D = −0.50861), indicating the higher diversification of melons in northern and southern China (Group two).LD analysis further confirmed the nucleotide diversity results. Linkage disequilibrium (LD) analysis for Group two (mainly containing melons from northern and southern China) has a much slower decay of pairwise correlation (r2) values than for Group three (melons from western China) (see Figure 4E). Population analysis indicates that northern and southern China may be one of the diversification centers.
Selective Sweep Analysis
The fixation index (Fst) and π were used to detect the sweep signals between the wild (Group1) and cultivated (Groups 2 and 3) populations (see Figure 5A). Some 42,311,973-bp segments were recognized as sweep regions. Since the contrasting genotypes between wild and cultivated populations may contribute to the different phenotypes, e.g., fruit size, leaf size, seed size, and bitterness of the fruit, we focus on those genes with different genotypes between the wild and cultivated populations within the sweep regions. Among 26,919 small mutations (21,010 SNPs and 5,909 indels), 275 mutations (194 genes) change across the coding regions (non-synonymous mutation, start gained, stop gained, coding region change, insertion, deletion, and stop loss), providing the candidate pool for fruit trait improvement (Table S34). These genes were implicated in processes relating to fruit size (expansin-A12, cell division, and cell cycle), fruit development (brassinosteroid biosynthesis, the ethylene-activated signaling pathway, indole-3-acetic acid-induced protein ARG7, etc.), synthesis and transport of sugar (beta-glucosidase and alpha-L-fucosidase 2), fruit flavor (organic substance metabolism, carotenoid cleavage dioxygenase, water transport, solute carrier family, etc.), and biotic or abiotic response (disease resistance protein, etc.) (see Table S34).
Figure 5
Candidate Alleles for Domestication Traits
(A) The upper panel is Fst for 100-kb windows compared between group one and group two. The bottom panel is π for 100-kb window compared between group one and group two.
(B) The structures of candidate genes implicated in domestication traits and the positions of the mutations on the gene. The black rectangles represent coding sequences (CDS) and the black lines between rectangles are introns. The red line on the CDS indicates the position of causative SNP/InDel.
(C) Haplotype distributions for these genes according to different groups.
Candidate Alleles for Domestication Traits(A) The upper panel is Fst for 100-kb windows compared between group one and group two. The bottom panel is π for 100-kb window compared between group one and group two.(B) The structures of candidate genes implicated in domestication traits and the positions of the mutations on the gene. The black rectangles represent coding sequences (CDS) and the black lines between rectangles are introns. The red line on the CDS indicates the position of causative SNP/InDel.(C) Haplotype distributions for these genes according to different groups.CmACS7 is involved in sex determination, and a causal SNP was reported to be associated with andromonoecy. CmACS7 affected fruit size through the SNP (C_small fruit-T_large fruit) regulating andromonoecy (Boualem et al., 2008). The gene was co-located with the selective sweep regions. The SNP at A57V (chr01:35853579:C-T) is linked with andromonoecy (see Figure 5B). Interestingly, most wild accessions had the same haplotype as the V57 mutant (andromonoecious), whereas the allele of cultivars/landraces equals CmACS7 (monoecious) (see Figure 5C and Table S34). The gene CmACS7 was also implicated in fruit weight, total soluble solids, and ethylene contents (Galpaz et al., 2018).Except for CmACS7, the other six genes, including EVM0005618 (3-epi-6-deoxocathasterone 23-monooxygenase), EVM0005054 (mitotic B-type cyclin), EVM0010492 (VAL2), EVM0018337 (cyclin-B3-1), EVM0010058 (CCD1, carotenoid 9,10 (9′,10′)-cleavage dioxygenase 1), and EVM0009845 (early flowering 3, ELF3) were involved in fruit development, leaf and seed development, and regulation of flower development and flowering. The SNP leading to non-synonymous mutations on these genes may contribute to the difference in fruit size, leaf size, and seed size between WT and CV populations (Figure 5B). The differentiated haplotypes of these SNPs for WT (group one) and CV (group two and group three) could be detected (see Figure 5C). Heterogeneity of the genotype in the wild or CV populations may be due to the complicated taxonomic history of melon, leading to a high number of misclassified germplasm collections. The variations on these genes may be involved in the fruit, seed, or leaf size, and more dense sampling, especially for wild populations, is necessary.
Identification of Candidate Haplotypes for Reported Genes
The resequencing data can also be used to identify and validate the candidate causal mutation for previously reported genes. We focus on small mutation on the coding regions (non-synonymous mutation, start gained, stop gained, coding region change, insertion, deletion, and stop loss).The pH gene has a significant effect on fruit acidity via a 12-base duplication on coding region, and most sweet melons have this duplication (Cohen et al., 2014). pH in Payzawat matched the allele of sweet melon, with 12-bp (TTAATTGTTGCA) duplication in the coding region, whereas the allele in DHL92 matched the allele of sour melon. Wild accessions matched the allele of sour melon, whereas cultivars/landraces accessions matched both alleles (see Figure S12 and Table S34), indicating that acidity is the diversifying trait (see Figure S12).CmOr gene was the causal gene for difference of orange and non-orange colors (Tzuri et al., 2015), and a single SNP (G_green flesh->A_orange flesh) is responsible for the different flesh color pigmentation. The allele in Payzawat matched the allele of orange melon (“Dulce”), whereas the allele in CM3.6.1 matched the allele of non-orange melon (“Tam Dew”). Sub-species agrestis (Group one and Group two) accessions matched the allele of non-orange melon, whereas sub-species melo (Group three) accessions matched the allele of orange melon (see Table S34 and Figure S12), indicating that the flesh color is the diversifying, or selection, trait.ETHQB6.3 controls fruit ripening type (climacteric and non-climacteric), and MELO3C016540 was recognized as the causal gene (Ríos et al., 2017). We detected one SNP on EVM0015173, homologous to MELO3C016540. CmKFB (EVM0012228) is implicated in the accumulation of naringenin chalcone, which determined the yellow color of rind (Feder et al., 2015).Candidate genes EVM0015625 and EVM0019658 were on the QTL SUCQSC5.1, which is responsible for sugar accumulation (Argyris et al., 2017). CmTHAT1 (acetyl-CoA acetyltransferase, EVM0016460) affects fruit flavor. CmPPR1 (pentatricopeptide repeat-containing protein, EVM0014144) genes may affect carotenoid accumulation and flesh color, which may correspond to the wf locus (Galpaz et al., 2018). RGH10 (EVM0014144) confers resistance to Papaya ring-spot virus (PRSV) and RGH9 (EVM0004621) confers resistance to races 0 and 2 (Brotman et al., 2013). ACS11 (EVM0021716) is involved in sex determination (Boualem et al., 2015), and some contrasting haplotypes were detected in TC and TN groups, such as SUCQSC5.1, CmTHAT1, and RGH10; some were detected in sub-species agrestis and melo, such as RGH9, PPR1, and ETHQB6.3. These traits are diversifying and showed taxonomic specificity.The population analysis provides the basis of genetic control of many traits, e.g., flowering, fruit morphology, ripening behavior, rind characteristics, flesh color, and sugar content. The results will expand our knowledge of candidate genes controlling desirable traits.
Discussion
In summary, by combing the SMRT sequencing technology and high-throughput chromosome interaction mapping technology, we obtained a high-quality melon genome with contig N50 up to 2.8 Mb and more than 98.53% sequences were anchored to 12 chromosomes, which provided a fine genome for future melon studies. Compared with use of NGS technology, SMRT sequencing exhibited superiority in genome assembly. Compared with traditional genetic maps, Hi-C technology exhibits several advantages in chromosome construction, such as encompassing up to one million markers, being convenient when preparing materials and offering high accuracy at lower cost. Both technologies help us obtain more intact genome information, which should be helpful in understanding the genetic changes in long-term natural evolution and artificial domestication of melon.A high-quality genome assembly with higher contiguity and completeness shows its values in studying evolution, domestication, and breeding, as well as gaining insights into genome-wide structural variants (Sun et al., 2018, Zhang et al., 2019, Xie et al., 2019). The whole-genome sequence comparison between Payzawat and DHL92 enables the identification of the large inversions, e.g., five inversions on the tail of chromosome 6, and translocations, e.g., translocation on chromosome 5. These differences between two genomes may be line-specific or mis-assembly, and further validation of assembly quality is necessary. The comparative genomic analysis between Payzawat and DHL92 indicates that chromosome regions (e.g., Chr3, Chr4, Chr7, Chr8, Chr9, and Chr12) enriched with small variations (SNP and InDels) tend to colocalize with QTLs. The QTL STHQF12.4 (skin thickness) was located on chr12 (23,803,761–26,907,220), where small variations were enriched. Six genes annotated with pectinesterase, gibberellin 20 oxidase 1-B-like, and pectate lyase-like, and these six genes have variations in resequencing data. The homologs of pectinesterase, gibberellin 20 oxidase 1-B-like, and pectate lyase-like in plants are involved in cell wall biosynthesis (Micheli, 2001, Alqsous et al., 2004, Yang et al., 2017, Jeon et al., 2016, Bai et al., 2014). The correlation between expression profiles of six genes and epidermis thickness must be tested in more samples. There is a shortage in the number of samples in our present analysis, and whether the six genes have variants in samples should be checked. Owing to the unavailable data (transcriptome data and resequencing data for each sample), the conclusion is less strong at present. More samples should be included in future to test the correlation among expression profiles, structural variants, and skin thickness. The identification of variants (large structural variants and small variants) will allow the exploration of genotypes and phenotypes. However, the mechanism controlling the skin thickness in melon remains to be explored.Previously, evidence backing up multiple domestications in melon was mostly derived from short DNA sequences (Endl et al., 2018), and whole-genome level evidence is more persuasive. We here present evidence supporting multiple domestications in melon from the whole-genome perspective. Because most samples in our study were collected from China, more samples from wider geographical locations are needed to strengthen the evidence. The higher diversification in northern and southern melons (TC group) compared with the western melons (TN group) supported with LD, π, and Tajima's D explains the diverse shape, color, etc. in the TC group. The haplotype analysis indicates that some traits in modern melons, like fruit size through evaluating the haplotype of causative SNP in CmACS7, are domesticated from wild melon and some traits are lineage specific. The domestication and diversification processes contribute to the breeding through providing the molecular foundation.
Limitations of the Study
We reported a high-quality assembly of melon. We identified six genes potentially related to skin thickness. More samples are required to test the correlation among the variation, expression profile of six genes, and skin thickness. Furthermore, the samples for population analysis are inadequate and samples from wider geographical locations are needed.
Methods
All methods can be found in the accompanying Transparent Methods supplemental file.
Authors: Derek M Bickhart; Benjamin D Rosen; Sergey Koren; Brian L Sayre; Alex R Hastie; Saki Chan; Joyce Lee; Ernest T Lam; Ivan Liachko; Shawn T Sullivan; Joshua N Burton; Heather J Huson; John C Nystrom; Christy M Kelley; Jana L Hutchison; Yang Zhou; Jiajie Sun; Alessandra Crisà; F Abel Ponce de León; John C Schwartz; John A Hammond; Geoffrey C Waldbieser; Steven G Schroeder; George E Liu; Maitreya J Dunham; Jay Shendure; Tad S Sonstegard; Adam M Phillippy; Curtis P Van Tassell; Timothy P L Smith Journal: Nat Genet Date: 2017-03-06 Impact factor: 38.330
Authors: Silong Sun; Yingsi Zhou; Jian Chen; Junpeng Shi; Haiming Zhao; Hainan Zhao; Weibin Song; Mei Zhang; Yang Cui; Xiaomei Dong; Han Liu; Xuxu Ma; Yinping Jiao; Bo Wang; Xuehong Wei; Joshua C Stein; Jeff C Glaubitz; Fei Lu; Guoliang Yu; Chengzhi Liang; Kevin Fengler; Bailin Li; Antoni Rafalski; Patrick S Schnable; Doreen H Ware; Edward S Buckler; Jinsheng Lai Journal: Nat Genet Date: 2018-07-30 Impact factor: 38.330
Authors: Josef Endl; Enoch G Achigan-Dako; Arun K Pandey; Antonio J Monforte; Belén Pico; Hanno Schaefer Journal: Am J Bot Date: 2018-10-09 Impact factor: 3.844
Authors: Aurora Díaz; Ana Montserrat Martín-Hernández; Ramón Dolcet-Sanjuan; Ana Garcés-Claver; José María Álvarez; Jordi Garcia-Mas; Belén Picó; Antonio José Monforte Journal: Theor Appl Genet Date: 2017-06-05 Impact factor: 5.699
Authors: Sergey Koren; Brian P Walenz; Konstantin Berlin; Jason R Miller; Nicholas H Bergman; Adam M Phillippy Journal: Genome Res Date: 2017-03-15 Impact factor: 9.043
Authors: Elad Oren; Galil Tzuri; Asaf Dafna; Evan R Rees; Baoxing Song; Shiri Freilich; Yonatan Elkind; Tal Isaacson; Arthur A Schaffer; Yaakov Tadmor; Joseph Burger; Edward S Buckler; Amit Gur Journal: Hortic Res Date: 2022-01-19 Impact factor: 6.793