| Literature DB >> 29112324 |
Javier Montero-Pau1, José Blanca1, Aureliano Bombarely2, Peio Ziarsolo1, Cristina Esteras1, Carlos Martí-Gómez1, María Ferriol3, Pedro Gómez4, Manuel Jamilena5, Lukas Mueller6, Belén Picó1, Joaquín Cañizares1.
Abstract
The Cucurbita genus (squashes, pumpkins and gourds) includes important domesticated species such as C. pepo, C. maxima and C. moschata. In this study, we present a high-quality draft of the zucchini (C. pepo) genome. The assembly has a size of 263 Mb, a scaffold N50 of 1.8 Mb and 34 240 gene models. It includes 92% of the conserved BUSCO core gene set, and it is estimated to cover 93.0% of the genome. The genome is organized in 20 pseudomolecules that represent 81.4% of the assembly, and it is integrated with a genetic map of 7718 SNPs. Despite the small genome size, three independent lines of evidence support that the C. pepo genome is the result of a whole-genome duplication: the topology of the gene family phylogenies, the karyotype organization and the distribution of 4DTv distances. Additionally, 40 transcriptomes of 12 species of the genus were assembled and analysed together with all the other published genomes of the Cucurbitaceae family. The duplication was detected in all the Cucurbita species analysed, including C. maxima and C. moschata, but not in the more distant cucurbits belonging to the Cucumis and Citrullus genera, and it is likely to have occurred 30 ± 4 Mya in the ancestral species that gave rise to the genus.Entities:
Keywords: Cucurbitaceae; crop; genome; transcriptome; whole-genome duplication; zucchini
Mesh:
Year: 2017 PMID: 29112324 PMCID: PMC5978595 DOI: 10.1111/pbi.12860
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 9.803
Assembly statistics of C. pepo genome version 4.1
| Parameter | Value |
|---|---|
| GC content (%) | 36.52 |
| No. of contigs (≥0 bp) | 32 754 |
| No. of contigs (≥500 bp) | 13 896 |
| No. of contigs (≥1000 bp) | 8217 |
| Bases in contigs (≥0 bp) | 247 816 249 |
| Bases in contigs (≥1000 bp) | 238 245 128 |
| Largest contigs (bp) | 639 487 |
| N50 contig size (bp) | 110 136 |
| N75 contig size (bp) | 49 377 |
| L50 contig number | 606 |
| L75 contig number | 1407 |
| No. of scaffolds (≥0 bp) | 26 025 |
| No. of scaffolds (≥500 bp) | 7994 |
| No. of scaffolds (≥1000 bp) | 3709 |
| Bases in scaffolds (≥0 bp) | 263 500 453 |
| Bases in scaffolds (≥500 bp) | 258 108 973 |
| Bases in scaffolds (≥1000 bp) | 255 237 628 |
| Largest contig (bp) | 6 123 784 |
| N50 scaffold size (bp) | 1 749 822 |
| N75 scaffold size (bp) | 453 344 |
| L50 scaffold number | 42 |
| L75 scaffold number | 112 |
Pseudochromosome summary. Number of scaffolds anchored to each pseudochromosome, total length, length without the 1000 N spacers and number of genes
| Molecule | #Scaffolds | Length (bp) | Length without N spacers (bp) | Number of genes |
|---|---|---|---|---|
| Cp4.1LG01 | 19 | 21 320 769 | 21 302 769 | 3258 |
| Cp4.1LG02 | 16 | 14 376 414 | 14 361 414 | 1981 |
| Cp4.1LG03 | 12 | 13 772 414 | 13 761 414 | 2178 |
| Cp4.1LG04 | 5 | 12 709 140 | 12 705 140 | 1870 |
| Cp4.1LG05 | 8 | 10 865 678 | 10 858 678 | 1753 |
| Cp4.10LG06 | 11 | 10 677 745 | 10 667 745 | 1407 |
| Cp4.1LG07 | 14 | 10 147 556 | 10 134 556 | 1404 |
| Cp4.1LG08 | 4 | 10 059 303 | 10 056 303 | 1503 |
| Cp4.1LG09 | 10 | 9 920 322 | 9 911 322 | 1608 |
| Cp4.1LG10 | 8 | 9 835 092 | 9 828 092 | 1432 |
| Cp4.1LG11 | 11 | 9 833 969 | 9 823 969 | 1319 |
| Cp4.1LG12 | 5 | 9 824 194 | 9 820 194 | 1388 |
| Cp4.1LG13 | 8 | 9 354 089 | 9 347 089 | 1400 |
| Cp4.1LG14 | 5 | 8 955 933 | 8 951 933 | 1263 |
| Cp4.1LG15 | 4 | 8 816 444 | 8 813 444 | 1136 |
| Cp4.1LG16 | 10 | 8 691 934 | 8 682 934 | 1114 |
| Cp4.1LG17 | 9 | 8 680 504 | 8 672 504 | 1281 |
| Cp4.1LG18 | 7 | 8 333 454 | 8 327 454 | 1186 |
| Cp4.1LG19 | 8 | 8 246 682 | 8 239 682 | 1386 |
| Cp4.1LG20 | 7 | 8 120 804 | 8 114 804 | 1000 |
Genome annotation summary
| Number of genes | 34 240 |
| Number of protein‐coding genes | 27 870 |
| Number of exons | 184 243 |
| Number of CDSs | 166 271 |
| Number of introns | 150 003 |
| Number of 5′ UTRs | 21 701 |
| Number of 3′ UTRs | 22 296 |
| Number of tRNAs | 6370 |
| % of gene with introns | 70.8 |
| Mean number of exons per gene | 5.4 |
| Mean gene length (bp) | 34503.4 |
| Mean exon length (bp) | 274.9 |
| Mean intron length (bp) | 450.0 |
Figure 1Genome organization. (a) Circos plot showing paralogous gene pairs in Cucurbita pepo (red lines). Outer plots represent the proportion of repetitive (blue) or gene‐encoding (green) DNA by 200‐Kb windows. (b) Genomic synteny between Cucurbita pepo and Cucumis melo, Cucumis sativus and Citrullus lanatus. Lines join single‐copy orthologs. (c) Summary diagram with the species phylogeny showing the WGD event.
Figure 2Genome duplication. (a) Distribution of the number of gene families based on the number of gene copies for Cucurbita pepo, Cucumis melo, Cucumis sativus and Citrullus lanatus; (b) Venn diagram showing the number of gene families, and (c) the number of duplicated gene families shared among the cucurbit genomes; (d) distribution of the rate of transversions on fourfold degenerate synonymous sites (4DTv) among paralogs for the five studied genomes. The inset shows the boxplots for the 4DTv distribution between ortholog copies of C. pepo and the rest of cucurbit species. The red dashed line shows the duplication event in C. pepo.
Figure 3Phylogeny of the Cucurbita genus based on a concatenated method (left tree) or a joint estimation of gene and species trees (right tree). Left tree: branch lengths represent genetic distance and only bootstrap values lower than 100 are shown. Right tree: branch length represents proportion of duplicated genes per branch, values shown those proportions of duplicated genes higher than 0.1