| Literature DB >> 32847817 |
Jinping Wang1, Shoule Tian2, Xiaoli Sun2, Xinchao Cheng3, Naibin Duan4, Jihan Tao1, Guangning Shen1.
Abstract
The Chinese chestnut (Castanea mollissima Bl.) is a woody nut crop with a high ecological value. Although many cultivars have been selected from natural seedlings, elite lines with comprehensive agronomic traits and characters remain rare. To explore genetic resources with aid of whole genome sequence will play important roles in modern breeding programs for chestnut. In this study, we generated a high-quality C. mollissima genome assembly by combining 90× Pacific Biosciences long read and 170× high-throughput chromosome conformation capture data. The assembly was 688.93 Mb in total, with a contig N50 of 2.83 Mb. Most of the assembled sequences (99.75%) were anchored onto 12 chromosomes, and 97.07% of the assemblies were accurately anchored and oriented. A total of 33,638 protein-coding genes were predicted in the C. mollissima genome. Comparative genomic and transcriptomic analyses provided insights into the genes expressed in specific tissues, as well as those associated with burr development in the Chinese chestnut. This highly contiguous assembly of the C. mollissima genome provides a valuable resource for studies aiming at identifying and characterizing agronomical-important traits, and will aid the design of breeding strategies to develop more focused, faster, and predictable improvement programs.Entities:
Keywords: Castanea mollissima Blume; Chinese chestnut; genome assembly; high-throughput chromosome conformation capture; single molecular sequencing
Mesh:
Year: 2020 PMID: 32847817 PMCID: PMC7534444 DOI: 10.1534/g3.120.401532
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Tree, flowers and nuts of Castanea mollissima.
Properties of the Castanea mollissima assembly
| Contig number | Total contig length (bp) | Contig N50 (bp) | Contig N90 (bp) | Longest contig (bp) | GC content (%) |
|---|---|---|---|---|---|
| 2,828,629 | 455,877 | 19,325,860 | 35.11 |
Pseudochromosomes in the Castanea mollissima genome
| Group | Number of contigs | Sequence length (bp) |
|---|---|---|
| Lachesis Group 1 | 52 | 90,647,674 |
| Lachesis Group 2 | 84 | 74,427,361 |
| Lachesis Group 3 | 97 | 70,533,118 |
| Lachesis Group 4 | 72 | 62,883,648 |
| Lachesis Group 5 | 60 | 60,035,541 |
| Lachesis Group 6 | 38 | 53,752,763 |
| Lachesis Group 7 | 55 | 51,743,333 |
| Lachesis Group 8 | 35 | 50,879,406 |
| Lachesis Group 9 | 41 | 45,391,892 |
| Lachesis Group 10 | 51 | 46,531,048 |
| Lachesis Group 11 | 38 | 44,129,845 |
| Lachesis Group 12 | 29 | 36,323,263 |
| Total Sequences Clustered | ||
| Total Sequences Ordered and Oriented |
Figure 2Distribution of the interaction frequencies among chromosomes. The distribution is based on the 170× high-throughput chromosome conformation capture (Hi-C) links.
Comparison of assembly quality in three genomes of C. mollissima
| Parameter | |||
|---|---|---|---|
| I | II | III | |
| Total sequence length (Mb) | |||
| Total contig No. | |||
| Contig N50 (kb) | 2,828.6 | 944.5 | 96.7 |
| Contig N90 (kb) | 455.88 | 133.7 | |
| Longest contig length (Mb) | 19.3 | 6.58 | |
| Total scaffold No. | |||
| Total scaffold length (Mb) | 689.98 | ||
| Scaffold N50 (kb) | 57,343.43 | ||
| Scaffold N90 (kb) | 4,301.26 | ||
| Longest scaffold length (Mb) | 90.2 | ||
| Anchored onto chromosome (Mb/%) | |||
| Anchored with order and orientation (Mb/%) | |||
Footnotes: I: Present assembly; II: Assembly published by Xing et al; III: Assembly submitted by Clemson University Genomics Institute. The assembly II has no scaffold chromosome-level information, so the detailed comparisons were mainly on contig-level.
Gene families in the genomes of Castanea mollissima and eight other plant species
| Species | Total genes | Genes clustered | Gene families | Unique gene families |
|---|---|---|---|---|
| 29,827 | 21,278 | 14,176 | 707 | |
| 36,263 | 31,861 | 14,833 | 738 | |
| 39,027 | 31,534 | 13,407 | 1,421 | |
| 38,852 | 25,505 | 12,626 | 2,016 | |
| 30,612 | 28,991 | 16,042 | 88 | |
| 27,369 | 23,120 | 12,960 | 743 | |
| 25,808 | 20,813 | 12,245 | 469 | |
| 41,335 | 32,920 | 17,194 | 617 | |
| 30,192 | 15,367 | 508 |
Figure 3Species phylogenetic tree and gene family evolution. Numbers on the branch indicate counts of gene families that are under either expansion (red) or contraction (blue). The bootstrap is 1000. The bottom scale bar shows divergence time, Mya: million years ago.
Figure 4Genome synteny between the Chinese chestnut and the European oak. European oak chromosomes are labeled “Qrob_chr”; Chinese chestnut chromosomes are labeled “LG”.
Figure 5Tissue-specific gene analysis. (A) Venn diagram showing unique and shared genes among 5 tissues. Numbers represent the number of genes that were unique or shared. (B−D) KEGG enrichment of tissue-specific genes in leaf, flowers and fruit respectively. The node size represents the gene numbers enriched in each KEGG pathway. The color bar illuminates P-value from red (low) to blue (high) in the plot.