| Literature DB >> 32817122 |
Alberto Acquadro1, Ezio Portis1, Danila Valentino1, Lorenzo Barchi2, Sergio Lanteri1.
Abstract
Globe artichoke (Cynara cardunculus var. scolymus; 2n2x=34) is cropped largely in the Mediterranean region, being Italy the leading world producer; however, over time, its cultivation has spread to the Americas and China. In 2016, we released the first (v1.0) globe artichoke genome sequence (http://www.artichokegenome.unito.it/). Its assembly was generated using ∼133-fold Illumina sequencing data, covering 725 of the 1,084 Mb genome, of which 526 Mb (73%) were anchored to 17 chromosomal pseudomolecules. Based on v1.0 sequencing data, we generated a new genome assembly (v2.0), obtained from a Hi-C (Dovetail) genomic library, and which improves the scaffold N50 from 126 kb to 44.8 Mb (∼356-fold increase) and N90 from 29 kb to 17.8 Mb (∼685-fold increase). While the L90 of the v1.0 sequence included 6,123 scaffolds, the new v2.0 just 15 super-scaffolds, a number close to the haploid chromosome number of the species. The newly generated super-scaffolds were assigned to pseudomolecules using reciprocal blast procedures. The cumulative size of unplaced scaffolds in v2.0 was reduced of 165 Mb, increasing to 94% the anchored genome sequence. The marked improvement is mainly attributable to the ability of the proximity ligation-based approach to deal with both heterochromatic (e.g.: peri-centromeric) and euchromatic regions during the assembly procedure, which allowed to physically locate low recombination regions. The new high-quality reference genome enhances the taxonomic breadth of the data available for comparative plant genomics and led to a new accurate gene prediction (28,632 genes), thus promoting the map-based cloning of economically important genes.Entities:
Keywords: Cynara cardunculus; Genomics; HI-C libraries; NGS
Mesh:
Year: 2020 PMID: 32817122 PMCID: PMC7534446 DOI: 10.1534/g3.120.401446
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
– Metrics for the v1.0 (reference) scaffolds, the v1.0 (reference) pseudomolecules, and v2.0 (Hi-C) super-scaffolds
| Metrics | v2.0 (Hi-C) | v1.0 (pseudomolecules) | v1.0 (scaffolds) |
|---|---|---|---|
| Total assembly size | 726,213,971 | 725,337,666 | 725,334,175 |
| Number of contigs/scaffolds | 5,023 | 8,344 | 13,662 |
| Average size | 144,578 | 86,929 | 53,091 |
| N50 | 44,809,927 | 25,947,084 | 125,836 |
| L50 | 7 | 9 | 1,411 |
| N75 | 31,669,976 | 166,465 | 59,381 |
| L75 | 11 | 98 | 3,545 |
| N90 | 23,740,492 | 45,160 | 31,081 |
| L90 | 15 | 1,384 | 5,853 |
| Busco, complete genes (%) | 89.65 | 89.44 | 89.44 |
| Busco, partial genes (%) | 3.06 | 1.98 | 1.98 |
| Busco, overall (%) | 92.71 | 91.42 | 91.42 |
Figure 1- Contiguity improvement performed on v1.0 genome (scaffolds), v1.0 reference genome (pseudomolecules plus unplaced scaffolds) and v2.0 genome (Hi-C superscaffolds). Top picture: Nx statistics with x varying between 1 and 100. Bottom picture: it represents the cumulative length increment of the genome through the scaffold/contig addition.
- Comparison in length between v1.0 (reference) pseudomolecules and v2.0 (Hi-C) super-scaffolds. Number of genes predicted from v1.0 and v2.0 are shown and compared. The number of genes reported in Acquadro (annotation v1.1) predicted on the v1.0 assembly are also shown
| Size assembly (bp) | N° Genes | |||||||
|---|---|---|---|---|---|---|---|---|
| Chromosome | v2.0 | v1.0 | Δ (bp) | Ratio (%) v2.0/v1.0 | v2.0 | v1.1 | v1.0 | Ratio (%) v2.0/v1.0 |
| 53,988,940 | 49,754,839 | 4,234,101 | 9% | 2,881 | 2,692 | 2,630 | 10% | |
| 75,886,343 | 70,441,430 | 5,444,913 | 8% | 2,696 | 2,502 | 2,351 | 15% | |
| 69,604,505 | 40,297,365 | 29,307,140 | 73% | 2,261 | 1,942 | 1,868 | 21% | |
| 23,740,492 | 20,164,318 | 3,576,174 | 18% | 1,104 | 991 | 962 | 15% | |
| 63,544,927 | 37,196,517 | 26,348,410 | 71% | 1,967 | 1,723 | 1,640 | 20% | |
| 24,383,717 | 20,634,051 | 3,749,666 | 18% | 1,084 | 956 | 903 | 20% | |
| 18,502,611 | 15,568,887 | 2,933,724 | 19% | 1,003 | 933 | 907 | 11% | |
| 44,609,785 | 25,947,084 | 18,662,701 | 72% | 1,529 | 1,250 | 1,196 | 28% | |
| 17,815,532 | 18,344,014 | −528,482 | −3% | 1,061 | 1,047 | 1,006 | 5% | |
| 31,669,976 | 29,133,143 | 2,536,833 | 9% | 1,609 | 1,516 | 1,436 | 12% | |
| 34,212,861 | 22,016,825 | 12,196,036 | 55% | 1,611 | 1,459 | 1,453 | 11% | |
| 44,809,927 | 39,693,055 | 5,116,872 | 13% | 1,590 | 1,473 | 1,404 | 13% | |
| 44,877,405 | 41,551,399 | 3,326,006 | 8% | 2,077 | 1,873 | 1,801 | 15% | |
| 28,499,371 | 14,487,748 | 14,011,623 | 97% | 1,003 | 669 | 646 | 55% | |
| 38,772,909 | 21,275,025 | 17,497,884 | 82% | 1,751 | 1,501 | 1,466 | 19% | |
| 30,156,653 | 21,933,510 | 8,223,143 | 37% | 1,193 | 964 | 949 | 26% | |
| 47,245,614 | 37,737,787 | 9,507,827 | 25% | 1,655 | 1,349 | 1,277 | 30% | |
| 33,892,403 | 199,160,669 | −165,268,266 | −83% | 557 | 3,470 | 2,994 | −81% | |
| 692,321,568 | 526,176,997 | +166,144,571 | 32% | 28,075 | 24,840 | 23,895 | 17% | |
Figure 2Circos plot depicting the syntenic relationships between the chromosomes of the globe artichoke genome (v1.0, pseudomolecules, in red) and the new assembly (v2.0, Hi-C superscaffold, in blue). A - from chromosome 1 to 4; B - from chromosome 5 to 8; C) from chromosome 9 to 12; D) from chromosome 13 to 17. Blue dots highlights extended regions in the v2.0 assembly in pericentromeric positions in metacentric/sub-metacentric chromosomes. Red dots highlights extended regions in the v2.0 assembly in pericentromeric positions in acrocentric/telocentric chromosomes.
- TOP20 Superfamily in the v2 annotation, after Interproscan5 analyses and compared to v1 and v1.1 annotations
| Domain | Description | v2 | v1.1 | v1.0 |
|---|---|---|---|---|
| SSF52540 | P-loop containing nucleoside triphosphate hydrolases | 1,346 | 1,347 | 1,311 |
| SSF56112 | Protein kinase-like (PK-like) | 1,310 | 1,309 | 1,303 |
| SSF52058 | L domain-like | 757 | 806 | 772 |
| SSF57850 | RING/U-box | 530 | 530 | 529 |
| SSF48371 | ARM repeat | 491 | 493 | 481 |
| SSF51735 | NAD(P)-binding Rossmann-fold domains | 441 | 443 | 427 |
| SSF48452 | TPR-like | 404 | 402 | 408 |
| SSF54928 | RNA-binding domain, RBD | 431 | 417 | 401 |
| SSF53474 | alpha/beta-Hydrolases | 390 | 397 | 391 |
| SSF48264 | Cytochrome P450 | 370 | 380 | 373 |
| SSF46689 | Homeodomain-like | 372 | 366 | 372 |
| SSF52047 | RNI-like | 292 | 295 | 296 |
| SSF53335 | S-adenosyl-L-methionine-dependent methyltransferases | 288 | 288 | 289 |
| SSF50978 | WD40 repeat-like | 278 | 281 | 281 |
| SSF52833 | Thioredoxin-like | 271 | 272 | 275 |
| SSF53756 | UDP-Glycosyltransferase/glycogen phosphorylase | 250 | 251 | 241 |
| SSF81383 | F-box domain | 240 | 238 | 241 |
| SSF49503 | Cupredoxins | 226 | 230 | 241 |
| SSF51445 | (Trans)glycosidases | 235 | 238 | 241 |
- miRNA families in the v2.0 annotation compared to v1.1 annotation
| miRNA family | Annotation v2.0 | Annotation v1.1 |
|---|---|---|
| 156 | 14 | 15 |
| 7699 | 13 | 14 |
| 166 | 18 | 13 |
| 172 | 7 | 9 |
| 399 | 10 | 8 |
| 396 | 8 | 7 |
| 169 | 10 | 6 |
| 393 | 3 | 6 |
| 160 | 4 | 5 |
| 164 | 3 | 5 |
| 171 | 8 | 5 |
| 167 | 3 | 3 |
| 168 | 4 | 3 |
| 319 | 9 | 3 |
| 394 | 3 | 3 |
| 159 | 3 | 2 |
| 390 | 1 | 2 |
| 403 | 2 | 2 |
| 444 | 1 | 2 |
| 479 | 0 | 2 |
| 1030 | 0 | 2 |
| 1446 | 1 | 2 |
| 2630 | 3 | 2 |
| 157 | 1 | 1 |
| 397 | 1 | 1 |
| 398 | 1 | 1 |
| 408 | 0 | 1 |
| 530 | 1 | 1 |
| 824 | 0 | 1 |
| 837 | 1 | 1 |
| 902 | 0 | 1 |
| 1155 | 1 | 1 |
| 2079 | 0 | 1 |
| 2651 | 1 | 1 |
| 2657 | 0 | 1 |
| 2658 | 1 | 1 |
| 2673 | 0 | 1 |
| 2680 | 0 | 1 |
| 3633 | 0 | 1 |
| 4414 | 1 | 1 |
| 5254 | 1 | 1 |
| 5258 | 1 | 1 |
| 5559 | 0 | 1 |
| 5751 | 0 | 1 |
| 7696 | 1 | 1 |
| 1040 | 1 | 0 |
| 1044 | 1 | 0 |
| 5237 | 1 | 0 |
| 6463 | 1 | 0 |
Figure 3Gene frequency expressed in n° of genes/Mb calculated at chromosome level for the v1.0 genome (light blue bars), v2.0 genome (white bars) and newly extended regions. Blue arrows show newly extended regions in the v2.0 assembly in pericentromeric positions in metacentric/sub-metacentric-like chromosomes. Red arrows highlights newly extended regions in the v2.0 assembly in pericentromeric positions in acrocentric/telocentric-like chromosomes.