| Literature DB >> 33300949 |
Chaoyang Liu1, Chao Feng2, Weizhuo Peng1,3, Jingjing Hao1,3, Juntao Wang1,3, Jianjun Pan4, Yehua He1,3.
Abstract
BACKGROUND: Plums are one of the most economically important Rosaceae fruit crops and comprise dozens of species distributed across the world. Until now, only limited genomic information has been available for the genetic studies and breeding programs of plums. Prunus salicina, an important diploid plum species, plays a predominant role in modern commercial plum production. Here we selected P. salicina for whole-genome sequencing and present a chromosome-level genome assembly through the combination of Pacific Biosciences sequencing, Illumina sequencing, and Hi-C technology.Entities:
Keywords: Chromosome-level; Genome; Plum; Prunus
Year: 2020 PMID: 33300949 PMCID: PMC7727024 DOI: 10.1093/gigascience/giaa130
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:The genome and photograph of P. salicina. (A) Landscape of the P. salicina genome, comprising 8 pseudochromosomes that cover ∼96.56% of assembly. Concentric circles, from outermost to innermost, show (B) TE percentage (red), (C) gene density (green), and (D) density of duplicates resulting from tandem duplications (blue). (E) Photograph of P. salicina.
Summary of genome assembly and annotation for P. salicina
| Parameter | Value |
|---|---|
|
| |
| Scaffolds | |
| Total length (bp) | 284,209,110 |
| No. | 75 |
| N50 (bp) | 32,324,625 |
| Contigs | |
| Total length (bp) | 284,189,410 |
| No. | 272 |
| N50 (bp) | 1,777,944 |
| Mapping rate by reads from short-insert libraries (%) | 96.93 |
| CEGs (%) | |
| Assembled | 94.35 |
| Completely assembled | 92.34 |
| BUSCOs (%) | |
| Complete | 95.7 |
| Complete and single-copy | 86.5 |
| Complete and duplicated | 9.2 |
| Fragmented | 1.3 |
| Missing | 3.0 |
| RNA-Seq evaluation | 92.44–95.25 |
|
| |
| TEs (%) | 48.28 |
| LTR retrotransposons (%) | 42.10 |
| No. of predicted protein-coding genes | 24,448 |
| No. (%) of genes | |
| Assigned to pseudochromosomes | 24,209 (99.0) |
| Annotated to public database | 23,931 (97.9) |
| Annotated to GO database | 13,484 (55.2) |
| Duplicated by tandem duplications | 2,384 (9.8) |
CEG: core eukaryotic gene; LTR: long terminal repeat; TE: transposable element.
Statistics of predicted protein-coding genes
| Gene set | No. | Mean transcript length (bp) | Mean CDS length (bp) | Mean exons per gene | Mean exon length (bp) | Mean intron length (bp) | |
|---|---|---|---|---|---|---|---|
|
| Augustus | 23,592 | 2,627.71 | 1,167.83 | 4.80 | 243.43 | 384.45 |
| GlimmerHMM | 39,985 | 5,450.51 | 747.07 | 3.14 | 238.12 | 2,200.59 | |
| SNAP | 24,882 | 2,876.50 | 728.45 | 4.22 | 172.73 | 667.66 | |
| Geneid | 33,780 | 3,829.40 | 899.99 | 4.44 | 202.74 | 851.78 | |
| Genscan | 21,882 | 8,251.09 | 1,355.87 | 6.34 | 213.98 | 1,292.13 | |
| Homolog prediction |
| 20,265 | 3,119.83 | 1,356.17 | 4.74 | 286.35 | 472.06 |
|
| 20,010 | 2,920.17 | 1,361.30 | 4.65 | 292.56 | 426.72 | |
|
| 23,064 | 3,038.66 | 1,346.19 | 4.78 | 281.67 | 447.84 | |
|
| 28,915 | 2,296.51 | 1,099.56 | 4.06 | 270.55 | 390.64 | |
|
| 28,284 | 2,071.73 | 973.28 | 3.67 | 265.51 | 412.07 | |
|
| 22,927 | 2,994.24 | 1,380.61 | 4.59 | 300.66 | 449.24 | |
|
| 22,715 | 3,077.20 | 1,351.28 | 4.74 | 284.86 | 461.03 | |
| RNA-seq | PASA | 196,264 | 3,913.86 | 1,008.68 | 5.16 | 195.60 | 698.88 |
| Transcripts | 42,450 | 11,076.28 | 2,360.92 | 6.85 | 344.83 | 1,490.64 | |
| EVM | 27,981 | 2,736.70 | 1,061.73 | 4.57 | 232.52 | 469.68 | |
| PASA-update | 27,594 | 2,784.15 | 1,092.82 | 4.64 | 235.59 | 464.83 | |
| Final set | 24,448 | 2,988.45 | 1,157.42 | 4.97 | 233.09 | 461.72 | |
Includes untranslated regions. CDS: coding sequence.
Figure 2:Evolution of P. salicina genome and orthogroups. (A) Phylogeny, divergence time, and orthogroup expansions/contractions for 17 rosids species. The tree was constructed by maximum likelihood method using 341 single-copy orthogroups. All nodes have 100% bootstrap support. Divergence time was estimated on a basis of 3 calibration points (blue circles). Blue bar indicates 95% highest posterior density (HPD) for each node. The numbers in red and green indicate the numbers of orthogroups that have expanded and contracted along particular branches, respectively. (B) Comparison of genes among 17 rosids. The grey bars indicate the genes belonging to 9,616 rosids-shared orthogroups in each of 17 rosids. The grey + green bars indicate the genes belonging to 10,447 rosales-shared orthogroups in each of 16 rosales. The grey + green + pink bars indicate the genes belonging to 11,098 Rosaceae-shared orthogroups in each of 15 Rosaceae. The grey + green + pink + yellow bars indicate the genes belonging to 13,963 rosaceae-shared orthogroups in each of ten Amygdaloideae. The grey + green + pink + yellow + blue bars indicate the genes belonging to 15,512 Prunus-shared orthogroups in each of 7 Prunus species. The red and striped bars indicate the genes in species-specific orthogroups and unassigned genes, respectively. The white bars indicate the remaining genes for each genome.
Figure 3:Chromosome-level collinearity patterns (A) between P. salicina, P. mume, and P. armeniaca and (B) between P. salicina, P. avium and P. dulcis. The numbers indicate the pseudochromosome order generated from the original genome sequence. The pseudochromosome 2 and 6 in P. armeniaca and P. mume are reversed. Each grey line represents 1 block. The inverted regions are highlighted with brown color.
Figure 4:The significant expansion of the DUF579 family members in P. salicina. (A) Phylogenetic tree of the DUF579 proteins from P. salicina (red cicle), P. persica (hollow inverted triangle), P. mume (solid triangle), P. armeniaca (hollow diamond), P. dulcis (solid diamond), and A. thaliana (solid square). (B) The summary of the numbers of clade members in DUF579 family.