| Literature DB >> 34792565 |
Kenta Shirasawa1, Akihiro Itai2, Sachiko Isobe1.
Abstract
To gain genetic insights into the early-flowering phenotype of ornamental cherry, also known as sakura, we determined the genome sequences of two early-flowering cherry (Cerasus × kanzakura) varieties, 'Kawazu-zakura' and 'Atami-zakura'. Because the two varieties are interspecific hybrids, likely derived from crosses between Cerasus campanulata (early-flowering species) and Cerasus speciosa, we employed the haplotype-resolved sequence assembly strategy. Genome sequence reads obtained from each variety by single-molecule real-time sequencing (SMRT) were split into two subsets, based on the genome sequence information of the two probable ancestors, and assembled to obtain haplotype-phased genome sequences. The resultant genome assembly of 'Kawazu-zakura' spanned 519.8 Mb with 1,544 contigs and an N50 value of 1,220.5 kb, while that of 'Atami-zakura' totalled 509.6 Mb with 2,180 contigs and an N50 value of 709.1 kb. A total of 72,702 and 69,528 potential protein-coding genes were predicted in the genome assemblies of 'Kawazu-zakura' and 'Atami-zakura', respectively. Gene clustering analysis identified 2,634 clusters uniquely presented in the C. campanulata haplotype sequences, which might contribute to its early-flowering phenotype. Genome sequences determined in this study provide fundamental information for elucidating the molecular and genetic mechanisms underlying the early-flowering phenotype of ornamental cherry tree varieties and their relatives.Entities:
Keywords: early-flowering; genome assembly; haplotype-phased genome sequence; long-read sequencing; sakura
Mesh:
Year: 2021 PMID: 34792565 PMCID: PMC8643691 DOI: 10.1093/dnares/dsab026
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.477
Figure 1Estimation of the genome size of two flowering cherry (Cerasus × kanzakura) varieties, ‘Kawazu-zakura’ and ‘Atami-zakura’, based on k-mer analysis (k = 17), with the given multiplicity values.
Statistics of the contig sequences of two flowering cherry (Cerasus × kanzakura) cultivars, ‘Kawazu-zakura’ and ‘Atami-zakura’
| KWZ_r1.0 | KWZcam_r1.0 | KWZspe_r1.0 | ATM_r1.0 | ATMcam_r1.0 | ATMspe_r1.0 | |
|---|---|---|---|---|---|---|
| Total contig size (bases) | 519,843,677 | 262,196,010 | 257,647,667 | 509,633,549 | 267,393,285 | 242,240,264 |
| Number of contigs | 1,544 | 783 | 761 | 2,180 | 1,124 | 1,056 |
| Contig N50 length (bases) | 1,220,495 | 1,445,144 | 1,108,133 | 709,113 | 853,547 | 569,444 |
| Longest contig size (bases) | 8,019,066 | 5,955,677 | 8,019,066 | 5,799,312 | 5,799,312 | 3,381,444 |
| Gap (bases) | 0 | 0 | 0 | 0 | 0 | 0 |
| Complete BUSCOs | 98.2% | 93.1% | 96.7% | 98.0% | 93.4% | 93.5% |
| Single-copy BUSCOs | 7.5% | 86.7% | 89.0% | 16.0% | 86.5% | 87.8% |
| Duplicated BUSCOs | 90.7% | 6.4% | 7.7% | 82.0% | 6.9% | 5.7% |
| Fragmented BUSCOs | 0.3% | 0.7% | 0.4% | 0.4% | 0.7% | 1.6% |
| Missing BUSCOs | 1.5% | 6.2% | 2.9% | 1.6% | 5.9% | 4.9% |
| #Genes | 72,702 | 36,281 | 36,421 | 72,528 | 36,264 | 36,264 |
Statistics of the pseudomolecule sequences of flowering cherry (C. × kanzakura) cultivars, ‘Kawazu-zakura’ and ‘Atami-zakura’
| ‘Kawazu-zakura’ | ‘Atami-zakura’ | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Chrom. | Total length | % | Number of contigs | % | Number of genes | % | Total length | % | Number of contigs | % | Number of genes | % | |
|
| 1 | 38,834,322 | 14.8 | 86 | 11.0 | 5,173 | 14.3 | 37,943,013 | 14.2 | 123 | 10.9 | 5,180 | 14.3 |
| 2 | 45,216,402 | 17.2 | 216 | 27.6 | 6,881 | 19.0 | 49,279,926 | 18.4 | 289 | 25.7 | 6,710 | 18.5 | |
| 3 | 28,286,294 | 10.8 | 74 | 9.5 | 3,837 | 10.6 | 29,043,603 | 10.9 | 108 | 9.6 | 3,885 | 10.7 | |
| 4 | 31,150,796 | 11.9 | 110 | 14.0 | 3,747 | 10.3 | 33,258,136 | 12.4 | 157 | 14.0 | 4,442 | 12.2 | |
| 5 | 28,296,805 | 10.8 | 82 | 10.5 | 3,852 | 10.6 | 26,412,617 | 9.9 | 83 | 7.4 | 3,625 | 10.0 | |
| 6 | 33,630,106 | 12.8 | 83 | 10.6 | 4,908 | 13.5 | 34,947,855 | 13.1 | 132 | 11.7 | 4,871 | 13.4 | |
| 7 | 20,603,721 | 7.9 | 61 | 7.8 | 2,702 | 7.4 | 21,052,878 | 7.9 | 92 | 8.2 | 3,003 | 8.3 | |
| 8 | 30,708,646 | 11.7 | 65 | 8.3 | 4,559 | 12.6 | 29,538,870 | 11.0 | 126 | 11.2 | 3,745 | 10.3 | |
| Unassigned | 5,546,318 | 2.1 | 6 | 0.8 | 622 | 1.7 | 6,027,887 | 2.3 | 14 | 1.2 | 803 | 2.2 | |
| Total | 262,273,410 | 100.0 | 783 | 100.0 | 36,281 | 100.0 | 267,504,785 | 100.0 | 1,124 | 100.0 | 36,264 | 100.0 | |
|
| 1 | 42,661,824 | 16.6 | 82 | 10.8 | 5,912 | 16.2 | 38,473,692 | 15.9 | 136 | 12.9 | 5,644 | 17.0 |
| 2 | 33,947,397 | 13.2 | 125 | 16.4 | 4,804 | 13.2 | 32,131,565 | 13.3 | 179 | 17.0 | 4,495 | 13.5 | |
| 3 | 29,126,079 | 11.3 | 81 | 10.6 | 4,367 | 12.0 | 30,781,260 | 12.7 | 111 | 10.5 | 4,269 | 12.8 | |
| 4 | 34,989,196 | 13.6 | 162 | 21.3 | 4,668 | 12.8 | 35,466,104 | 14.6 | 214 | 20.3 | 4,532 | 13.6 | |
| 5 | 25,691,272 | 10.0 | 99 | 13.0 | 3,729 | 10.2 | 23,383,776 | 9.6 | 100 | 9.5 | 3,062 | 9.2 | |
| 6 | 29,111,560 | 11.3 | 56 | 7.4 | 4,240 | 11.6 | 32,639,480 | 13.5 | 120 | 11.4 | 4,532 | 13.6 | |
| 7 | 16,872,426 | 6.5 | 20 | 2.6 | 2,328 | 6.4 | 16,033,168 | 6.6 | 40 | 3.8 | 2,174 | 6.5 | |
| 8 | 34,141,902 | 13.2 | 121 | 15.9 | 5,125 | 14.1 | 30,022,323 | 12.4 | 141 | 13.4 | 4,075 | 12.3 | |
| Unassigned | 11,181,211 | 4.3 | 15 | 2.0 | 1,248 | 3.4 | 3,413,596 | 1.4 | 15 | 1.4 | 481 | 1.4 | |
| Total | 257,722,867 | 100.0 | 761 | 100.0 | 36,421 | 100.0 | 242,344,964 | 100.0 | 1,056 | 100.0 | 33,264 | 100.0 | |
Figure 2Comparative analysis of the genome sequence and structure of flowering cherry varieties, ‘Atami-zakura’, ‘Kawazu-zakura’ and ‘Somei-Yoshino’. Chromosome numbers are indicated above the x-axis and on the right side of the y-axis. Genome sizes (Mb) are below the x-axis and on the left side of the y-axis.
Repetitive sequences in two flowering cherry (C. × kanzakura) cultivars, ‘Kawazu-zakura’ and ‘Atami-zakura’
| ‘Kawazu-zakura’ | ‘Atami-zakura’ | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| |||||||||
| Repeat type | Number of elements | Length occupied (bp) | % | Number of elements | Length occupied (bp) | % | Number of elements | Length occupied (bp) | % | Number of elements | Length occupied (bp) | % |
| SINEs | 5,278 | 495,207 | 0.2 | 7,013 | 665,451 | 0.3 | 8,832 | 896,223 | 0.3 | 6,537 | 608,541 | 0.3 |
| LINEs | 9,358 | 3,548,357 | 1.4 | 9,980 | 3,635,040 | 1.4 | 9,242 | 3,175,048 | 1.2 | 9,285 | 3,460,432 | 1.4 |
| LTR elements | 63,025 | 45,423,275 | 17.3 | 57,175 | 42,749,594 | 16.6 | 61,503 | 47,443,444 | 17.7 | 52,221 | 36,517,551 | 15.1 |
| DNA transposons | 85,647 | 33,936,563 | 12.9 | 84,151 | 30,984,015 | 12.0 | 88,999 | 35,176,601 | 13.2 | 77,636 | 26,829,236 | 11.1 |
| Unclassified | 131,199 | 36,041,455 | 13.7 | 116,201 | 32,825,941 | 12.7 | 130,209 | 34,407,194 | 12.9 | 112,663 | 31,370,494 | 12.9 |
| Small RNA | 5,384 | 657,326 | 0.3 | 7,211 | 1,536,541 | 0.6 | 6,949 | 828,201 | 0.3 | 2,598 | 503,911 | 0.2 |
| Satellites | 1,072 | 277,222 | 0.1 | 297 | 53,425 | 0.0 | 1,083 | 399,307 | 0.2 | 342 | 75,860 | 0.0 |
| Simple repeats | 75,567 | 3,104,558 | 1.2 | 77,082 | 3,144,046 | 1.2 | 77,750 | 3,266,196 | 1.2 | 74,414 | 3,048,442 | 1.3 |
| Low complexity | 14,265 | 706,271 | 0.3 | 14,754 | 717,629 | 0.3 | 14,352 | 695,481 | 0.3 | 14,137 | 693,339 | 0.3 |
Figure 3Number of gene clusters identified in the haplotype sequences of the three sakura genomes. Gene clusters uniquely presented in the C. campanulata haplotype sequences are shown in red.