| Literature DB >> 35294470 |
Jahn Davik1, Dag Røen2, Erik Lysøe1, Matteo Buti3, Simeon Rossman1, Muath Alsheikh2,4, Erez Lieberman Aiden5,6,7,8,9, Olga Dudchenko5,6, Daniel James Sargent10,11.
Abstract
Rubus idaeus L. (red raspberry), is a perennial woody plant species of the Rosaceae family that is widely cultivated in the temperate regions of world and is thus an economically important soft fruit species. It is prized for its flavour and aroma, as well as a high content of healthful compounds such as vitamins and antioxidants. Breeding programs exist globally for red raspberry, but variety development is a long and challenging process. Genomic and molecular tools for red raspberry are valuable resources for breeding. Here, a chromosome-length genome sequence assembly and related gene predictions for the red raspberry cultivar 'Anitra' are presented, comprising PacBio long read sequencing scaffolded using Hi-C sequence data. The assembled genome sequence totalled 291.7 Mbp, with 247.5 Mbp (84.8%) incorporated into seven sequencing scaffolds with an average length of 35.4 Mbp. A total of 39,448 protein-coding genes were predicted, 75% of which were functionally annotated. The seven chromosome scaffolds were anchored to a previously published genetic linkage map with a high degree of synteny and comparisons to genomes of closely related species within the Rosoideae revealed chromosome-scale rearrangements that have occurred over relatively short evolutionary periods. A chromosome-level genomic sequence of R. idaeus will be a valuable resource for the knowledge of its genome structure and function in red raspberry and will be a useful and important resource for researchers and plant breeders.Entities:
Mesh:
Year: 2022 PMID: 35294470 PMCID: PMC8926247 DOI: 10.1371/journal.pone.0265096
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary statistics for the genome sequence of R. idaeus cv. ‘Anitra’.
| Pac Bio assembly | Hi-C assembly–all (including embedded gaps) | Hi-C assembly–seven pseudo-chromosomes (including embedded gaps) | |
|---|---|---|---|
| Estimated genome size (flow cytometry)(Mbp) | 293.4 | 293.4 | 293.4 |
| Assembled genome size (bp) | 291,080,678 | 291,685,178 | 247,480,545 |
| Assembly rate | 99.2% | 99.2% | 84.3% |
| Number of contigs | 2,350 | 2,370 | 1,175 |
| Largest contig (bp) | 4,726,662 | - | - |
| Number of scaffolds | 2,350 | 1,161 | 7 |
| Largest scaffold (bp) | - | 42,972,798 | 42,972,798 |
| Contig N50 (bp) | 241,822 | - | - |
| Contig NG50 (bp) | 340,977 | - | - |
| Scaffold N50 | - | 34,491,998 | 34,491,998 |
| Scaffold NG50 (bp) | - | 35,735,390 | 35,735,390 |
Fig 1The seven chromosomes of the genome of R. idaeus ‘Anitra’.
Genetic and physical positions of the 4,948 molecular markers on the reconstructed chromosomes of the ‘Anitra’ genome and the ‘Heritage’ genetic linkage map of [7] (left panels). Scatter plots showing the physical positions of the same molecular markers on each ‘Anitra’ chromosome (Chr) (x axis) in relation to the ‘Heritage’ map positions (y axis) where Rho (ρ) is the Pearson correlation coefficient for the associations (central panels). Hi-C intrachromosomal contact maps for each ‘Anitra’ chromosome are given in the right panels. The intensity of pixels represents the proportion of Hi-C links within 400-kb windows along each chromosome plotted on a logarithmic scale.
Fig 2Visualisation of gene distribution and densities across the seven chromosomes of the R. idaeus ‘Anitra’ genome.
The seven chromosomes are represented by the vertical strands, gene density is represented by the horizontal bands with a window of 10,000 bp. The darker the colour, the greater the density of genes.
Comparison of assembly statistics for R. idaeus cv. ‘Anitra’ and three closely-related Rosoideae genome assemblies.
| ‘Anitra’ | ||||
|---|---|---|---|---|
| Estimated genome size (flow cytometry) | 293.4 Mb | - | 293 Mb | 254.28 Mb [ |
| Assembled genome size | 291.1 Mb | 231.21 Mb | 290.8 Mb | 220.8 Mb |
| Number of contigs | 2,370 | - | 235 | 61 |
| Number of scaffolds | 1,161 | 155 | 7 | 31 |
| Number of contigs in pseudomolecules | 1,175 | - | 235 | - |
| contig N50 | 242.2 Kb | - | 5.1 Mb | 7.9 Mb |
| Scaffold N50 | 34.5 Mb | 8.2 Mb | 41.1 Mb | 36.1 Mb |
| Assembly rate of genome (%) | 99.2% | - | 99.3% | 99.8% |
| Number of chromosomes | 7 | 7 | 7 | 7 |
| Size of sequence anchored on chromosomes | 247.5 Mb | 220.05 Mb | 290.8 Mb | 220.8 Mb |
| Anchoring rate on chromosomes (%) | 85% | 95.17% | 100% | 100% |
| Average chromosome length | 35.4 Mb | 31.4 Mb | 41.5 Mb | |
| Number of predicted protein-coding genes | 39,448 | 33,130 | 34,545 | 28,588 |
| Average coding sequence length | 1,189 bp | 2,803 bp | 3,220 bp | 1,475 bp |
| BUSCO score | 98.0% | 97.1% | 94% | 95% |
Fig 3MUMMer plots of the macro-synteny between the Rubus idaeus ‘Anitra’ genome and (a) R. chingii; (b) R. occidentalis; (c) Fragaria vesca ‘Hawaii 4’; and (d) Rosa chinensis ‘Old Blush’.