| Literature DB >> 33569882 |
Joeri S Strijk1,2,3, Damien D Hinsinger2,4, Mareike M Roeder5,6, Lars W Chatrou7, Thomas L P Couvreur8, Roy H J Erkens9, Hervé Sauquet10, Michael D Pirie11, Daniel C Thomas12, Kunfang Cao13.
Abstract
The flowering plant family Annonaceae includes important commercially grown tropical crops, but development of promising species is hindered by a lack of genomic resources to build breeding programs. Annonaceae are part of the magnoliids, an ancient lineage of angiosperms for which evolutionary relationships with other major clades remain unclear. To provide resources to breeders and evolutionary researchers, we report a chromosome-level genome assembly of the soursop (Annona muricata). We assembled the genome using 444.32 Gb of DNA sequences (676× sequencing depth) from PacBio and Illumina short-reads, in combination with 10× Genomics and Bionano data (v1). A total of 949 scaffolds were assembled to a final size of 656.77 Mb, with a scaffold N50 of 3.43 Mb (v1), and then further improved to seven pseudo-chromosomes using Hi-C sequencing data (v2; scaffold N50: 93.2 Mb, total size in chromosomes: 639.6 Mb). Heterozygosity was very low (0.06%), while repeat sequences accounted for 54.87% of the genome, and 23,375 protein-coding genes with an average of 4.79 exons per gene were annotated using de novo, RNA-seq and homology-based approaches. Reconstruction of the historical population size showed a slow continuous contraction, probably related to Cenozoic climate changes. The soursop is the first genome assembled in Annonaceae, supporting further studies of floral evolution in magnoliids, providing an essential resource for delineating relationships of ancient angiosperm lineages. Both genome-assisted improvement and conservation efforts will be strengthened by the availability of the soursop genome. As a community resource, this assembly will further strengthen the role of Annonaceae as model species for research on the ecology, evolution and domestication potential of tropical species in pomology and agroforestry.Entities:
Keywords: Annonaceae; basal angiosperms; crop improvement; high quality draft génome; magnoliids; pomology
Year: 2021 PMID: 33569882 PMCID: PMC8251617 DOI: 10.1111/1755-0998.13353
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
FIGURE 1Annona muricata description and genomic landscape. Top: (a) leaves, (b) mature flower, (c) mature fruit. Bottom: Circular view of the chromosome organization of Annona muricata, with genomic features indicated from outer to inner layers in sequence windows of 200 kb, (d) Structural organisation of the chromosomes arranged by size, indicated in Mb, (e) loci density from Couvreur et al., 2019, (f) GC déviation, (g) GC content (percentage), (h) gene breadth (i.e., the percentage of the sequence window occupied by coding regions) heatmap, (i) gene density (i.e., the number of genes found in one sequence window) histogram, (j) TE protein breadth heatmap, (k) TE protein density histogram, (l) transposon breadth heatmap; (m) transposon density histogram. In (i), (k) and (m), values above and below the mean are indicated in green and red, respectively.
Sequencing strategy and statistics used for the A. muricata genome assembly and annotation
| Step | Technology | Tissue | Insert size | Bases generated (Gb) | Sequence coverage (x) |
|---|---|---|---|---|---|
| Genome assembly | Illumina reads | Leaves | 250 bp | 65.96 | 82.54 |
| 350 bp | 65.47 | 81.93 | |||
| PacBio reads | Leaves | 20 kb | 36.95 | 46.24 | |
| 10× | Leaves | 180.04 | 225.3 | ||
| Bionano | Leaves | 95.9 | 120.01 | ||
| Total | 444.32 | 556.02 | |||
| Chromosome scaffolding | Hi‐C | Leaves | N.A. | 66.17 | N.A. |
| Genome annotation | Illumina reads | Flowers (several developmental stages) | 350 bp | 5.52 | N.A |
| Young fruit | 350 bp | 9.93 | N.A | ||
| Ripening fruit | 350 bp | 5.73 | N.A | ||
| Bark | 350 bp | 4.80 | N.A | ||
| Leaves | 350 bp | 5.04 | N.A | ||
| Total | 25.51 |
Assembly properties
| Length | Number | |||
|---|---|---|---|---|
| Contig | Scaffold (bp) | Contig | Scaffold | |
| Assembly v1 (Illumina +PacBio +10X + BioNano) | ||||
| Total | 652,885,881 | 656,774,640 | 2066 | 949 |
| Max | 4,254,538 | 20,459,086 | ‐ | ‐ |
| Number > =2000 | ‐ | ‐ | 1990 | 873 |
| N50 | 784,561 | 3,429,555 | 250 | 52 |
| N60 | 632,116 | 2,673,626 | 342 | 73 |
| N70 | 483,912 | 2,112,119 | 459 | 101 |
| N80 | 346,983 | 1,573,287 | 618 | 137 |
| N90 | 207,456 | 964,101 | 856 | 189 |
| Assembly v2 (Assembly v1 + Hi‐C) | ||||
| Total | 652,885,881 | 656,813,740 | 2262 | 755 |
| Max | 4,254,538 | 122,620,176 | ‐ | ‐ |
| Number > =2000 | ‐ | ‐ | 2186 | 679 |
| N50 | 743,350 | 93,205,713 | 264 | 3 |
| N60 | 578,736 | 89,409,058 | 364 | 4 |
| N70 | 451,341 | 85,026,703 | 492 | 5 |
| N80 | 320,782 | 69,840,041 | 665 | 6 |
| N90 | 184,498 | 60,483,854 | 929 | 7 |
Contig after scaffolding.
Chromosome properties of the v2 assembly
| Chromosome name | Cluster number | Sequences length | |
|---|---|---|---|
| Hic_asm_0 | Amur4 | 49 | 89,409,058 |
| Hic_asm_1 | Amur1 | 68 | 122,620,176 |
| Hic_asm_2 | Amur3 | 57 | 93,205,713 |
| Hic_asm_3 | Amur2 | 75 | 118,991,926 |
| Hic_asm_4 | Amur7 | 34 | 60,483,854 |
| Hic_asm_5 | Amur5 | 62 | 85,026,703 |
| Hic_asm_6 | Amur6 | 53 | 69,840,041 |
FIGURE 2TE characteristics in the soursop genome. (a) Distribution of repeat classes in the soursop génome, (b) divergence distribution of transposable elements in the genome of Annona muricata. Both Kimura substitution level (CpG adjusted) and absolute time are given.
FIGURE 3Population size variation in soursop. Effective population size history inferred by the PSMC method (black line), with 100 bootstraps shown (red lines).