| Literature DB >> 36035157 |
Yu Chen1,2, Yongbai Zhang1,2, Hongjie Wang1,2, Juan Sun1,2, Lichao Ma1,2, Fuhong Miao1,2, Zixin Zhang2, Yang Cheng3, Jianwei Huang4, Guofeng Yang1,2, Zengyu Wang1,2.
Abstract
Sweet sorghum (Sorghum dochna) is a high-quality bio-energy crop that also serves as food for humans and animals. However, there is little information on the genomic characteristics of S. dochna. In this study, we presented a high-quality assembly of S. dochna with PacBio long reads, Illumina short reads, high-throughput chromosome capture technology (Hi-C) sequencing data, gene annotation, and a comparative genome analysis. The results showed that the genome of S. dochna was assembled to 777 Mb with a contig N50 of 553.47 kb and a scaffold N50 of 727.11 kb. In addition, the gene annotation predicted 37,971 genes and 39,937 transcripts in the genome of S. dochna. A Venn analysis revealed a set of 7,988 common gene annotations by integrating five databases. A Cafe software analysis showed that 191 gene families were significantly expanded, while 3,794 were significantly contracted in S. dochna. A GO enrichment analysis showed that the expanded gene families were primarily clustered in the metabolic process, DNA reconstruction, and DNA binding among others. The high-quality genome map constructed in this study provides a biological basis for the future analysis of the biological characteristics of S. dochna, which is crucial for its breeding.Entities:
Keywords: Hi-C; Sorghum dochna; assembly; comparative genome analysis; genome
Year: 2022 PMID: 36035157 PMCID: PMC9412107 DOI: 10.3389/fgene.2022.844385
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Morphological characteristics of Sorghum dochna as shown in photographs that display a whole plant, leaf, and root.
Genome assembly results.
| Parameter | Contig | Scaffold |
|---|---|---|
| Genome assembly and Hi-C results | 144 | 82 |
| Total number | 777,990,620 | 778,026,804 |
| Total length (bp) | 55,347,497 | 43,657,906 |
| N50 length (bp) | 43.90 | 43.90 |
| GC (%) | 11,660,912 | — |
| Contig N90 length (bp) | — | 72,771,365 |
| Scaffold N50 length (bp) | — | 94.09 |
| Chromosome length (%) | — | — |
Hi-C, high-throughput chromosome capture technology.
Statistics of the results of a comparison of the DNA library.
| Sample name | Reads number | Mapped | Properly paired Mapped | Mapped DifferentChr | Mapped different ChrMapQ>=5 | Secondary reads |
|---|---|---|---|---|---|---|
|
| 294,034,737 | 292,796,203 | 279,197,414 | 9,547,568 | 4,598,204 | 1,848,927 |
| 99.58% | 95.55% | 3.2% | 1.6% | 0.6% |
FIGURE 2Hi-C-assisted assembly genome interaction heatmap. The exchange within chromosomes is stronger than that between chromosomes. Moreover, the exchange between the same chromosomes with a close physical location is stronger than the exchange between chromosomes with a distant physical location. Hi-C, high-throughput chromosome capture technique.
FIGURE 3A Venn analysis of gene function annotation.
FIGURE 4(A) Circos display of the important features of the assembled Sorghum dochna genome. From outside to inside, (A) chromosome, (B) repeat sequence distribution, (C) gene distribution, (D) GC content distribution, and (E) colinearity between S. dochna and S. bicolor. (B) Venn diagram of the protein families. tgl: S. dochna (S. bicolor dochna), sbi: S. bicolor (S. bicolor bicolor), osa: rice (Oryza sativa), zma: maize (Zea mays), ssp: sugarcane (Saccharum spontaneum), and bol: kale (Brassica oleracea).
FIGURE 5Phylogenetic tree of the species. In the analysis that estimated the time of differentiation of species, the branch length obtained is the base replacement rate, and after the analysis of species differentiation time, the branch length is the time in million years. O. sativa: Oryza sativa. Z. mays: Zea mays. S. bicolor1: Sorghum dochna. S. bicolor: Sorghum bicolor. S. spontaneum: Saccharum spontaneum. B. oleracea: Brassica oleracea.
FIGURE 6(A) Ks distribution map of Oryza sativa. The peaks indicate that WGD events occurred during the evolution of species. (B) Ks distribution map of the complete genome replication of Sorghum dochna and its related species. Tgl: S. dochna. WGD, whole-genome duplication.