| Literature DB >> 33319912 |
Mathieu Rousseau-Gueutin1, Caroline Belser2, Corinne Da Silva2, Gautier Richard1, Benjamin Istace2, Corinne Cruaud3, Cyril Falentin1, Franz Boideau1, Julien Boutte1, Regine Delourme1, Gwenaëlle Deniot1, Stefan Engelen2, Julie Ferreira de Carvalho1, Arnaud Lemainque3, Loeiz Maillet1, Jérôme Morice1, Patrick Wincker2, France Denoeud2, Anne-Marie Chèvre1, Jean-Marc Aury2.
Abstract
BACKGROUND: The combination of long reads and long-range information to produce genome assemblies is now accepted as a common standard. This strategy not only allows access to the gene catalogue of a given species but also reveals the architecture and organization of chromosomes, including complex regions such as telomeres and centromeres. The Brassica genus is not exempt, and many assemblies based on long reads are now available. The reference genome for Brassica napus, Darmor-bzh, which was published in 2014, was produced using short reads and its contiguity was extremely low compared with current assemblies of the Brassica genus.Entities:
Keywords: zzm321990 Brassicazzm321990 ; Darmor-bzh; assembly; chromosome-scale; direct RNA; nanopore sequencing; oilseed rape; optical mapping
Year: 2020 PMID: 33319912 PMCID: PMC7736779 DOI: 10.1093/gigascience/giaa137
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Statistics of the Darmor-bzh B. napus assemblies
|
| Technology | Contig N50 (L50) | No. of contigs | Cumulative size (Mb) | No. of genes | Number of anchored bases in Mb (% of the assembly) | Number of anchored [ACGT] bases in Mb | BUSCO scores |
|---|---|---|---|---|---|---|---|---|
| 10 | ONT | 11,486,274 (24) | 505 | 924 | 108,190 | 867 (93.84%) | 849 | C: 98.6% S: 7.0% D: 91.6% F: 0.1% M: 1.3% |
| 8 | 454 | 37,644 (5,517) | 44,818 | 850 | 80,382 | 799 (93.96%) | 690 | C: 89.6% S: 18.0% D: 71.6% F: 3.4% M: 7.0% |
| 5 | 454 | 37,644 (5,517) | 44,837 | 850 | 101,040 | 645 (75.90%) | 553 | C: 97.7% S: 9.9% D: 87.8% F: 0.5% M: 1.8% |
BUSCO scores are calculated for n = 4,596. C: complete; S: complete and single-copy; D: complete and duplicated; F: fragmented; M: missing.
General information about the available Brassica long reads and older Darmor-bzh assemblies
| Species and subgenome | Genotype | Principal sequencing technology | Release year | Reference |
|---|---|---|---|---|
|
| Z1 | ONT | 2018 | Belser et al. [ |
| Chiifu | PacBio | 2018 | Zhang et al. [ | |
|
| NI100 | ONT | 2020 | Perumal et al. [ |
|
| HDEM | ONT | 2018 | Belser et al. [ |
| D134 | PacBio | 2020 | Honghao et al. [ | |
|
| Darmor-bzh v5 | 454 | 2014 | Chalhoub et al. [ |
| Darmor-bzh v8 | 454 | 2017 | Bayer et al. [ | |
| Darmor-bzh v10 | ONT | 2020 | This study | |
| Westar | PacBio | 2020 | Song et al. [ | |
| Express617 | PacBio | 2020 | Lee et al. [ | |
| Tapidor3 | PacBio | 2020 | Song et al. [ | |
| Shengli3 | PacBio | 2020 | Song et al. [ | |
| QuintaA | PacBio | 2020 | Song et al. [ | |
| No2127 | PacBio | 2020 | Song et al. [ | |
| GanganF73 | PacBio | 2020 | Song et al. [ | |
| Zheyou73 | PacBio | 2020 | Song et al. [ | |
| Zs11 | PacBio | 2020 | Song et al. [ |
Statistics of the Brassica long-read assemblies, ordered by contig N50 value
| Species | Genotype | Technology | Contig N50 (L50) | No. contigs | Maximum contig size | No. scaffolds | Cumulative size (Mb) | No. of genes | Complete BUSCO (%) |
|---|---|---|---|---|---|---|---|---|---|
|
| NI100 | ONT | 11,504,526 (13) | 187 | 32,581,047 | 58 | 506 | 59,851 | 97.2 |
|
| Darmor-bzh | ONT | 11,486,274 (24) | 505 | 46,624,681 | 237 | 924 | 108,190 | 98.6 |
|
| HDEM | ONT | 9,491,203 (19) | 264 | 26,712,175 | 129 | 555 | 61,279 | 96.2 |
|
| Z1 | ONT | 5,519,976 (17) | 627 | 22,127,468 | 304 | 402 | 46,721 | 96.7 |
|
| D134 | PacBio | 3,591,417 (43) | 229 | 15,061,548 | 9 | 530 | 43,868 | 95.1 |
|
| Westar | PacBio | 3,130,520 (93) | 5,602 | 15,044,602 | 3,458 | 1,008 | 100,194 | 98.3 |
|
| Express617 | PacBio | 3,002,211 (78) | 2,092 | 17,406,538 | 1,431 | 925 | 99,481 | 98.8 |
|
| Tapidor3 | PacBio | 2,855,025 (104) | 5,318 | 11,486,821 | 3,566 | 1,014 | 105,409 | 98.4 |
|
| Shengli3 | PacBio | 2,825,656 (104) | 6,024 | 14,590,814 | 3,802 | 1,002 | 103,920 | 98.1 |
|
| QuintaA | PacBio | 2,801,289 (104) | 5,835 | 14,779,276 | 3,722 | 1,004 | 98,755 | 98.4 |
|
| No2127 | PacBio | 2,704,645 (98) | 5,787 | 12,981,515 | 3,733 |
| 105,894 | 98.0 |
|
| GanganF73 | PacBio | 2,696,026 (98) | 6,887 | 18,404,683 | 4,930 |
| 106,567 | 98.4 |
|
| Zheyou73 | PacBio | 2,103,870 (126) | 7,399 | 11,351,519 | 4,990 | 1,016 | 105,842 | 97.9 |
|
| Zs11 | PacBio | 1,506,624 (186) | 8,773 | 10,640,036 | 3,332 |
| 107,233 | 98.5 |
|
| Chiifu | PacBio | 1,386,548 (71) | 1,507 | 9,417,752 | 1,099 | 353 | 60,609 | 99.6 |
BUSCO completeness is calculated for n = 4,596. Full BUSCO scores are available in Supplementary Table S9.
Figure 1.Genome overview of the 19 chromosomes of B. napus Darmor-bzh v10 (the 10 A chromosomes are in blue and the 9 C chromosomes in orange). A, Localization of centromere from flanking markers identified by Mason et al. [47]. B, Density of repeated pericentromere-specific sequences of Brassica allowing more precise localization of centromeres. C, Density of Gypsy elements. D, Density of Copia elements. E, Density of DNA transposon elements. F, Black boxes represent gaps in the Darmor-bzh assembly. G, Density of methylated CpG. H, Gene density. I, Scatter plots of RGAs density. All densities are calculated in 100-kb windows; blue and red colors in density plots indicate lower and higher values, respectively.
Figure 2.Genome-wide alignments of B. napus Darmor-bzh v10 (Y axis) with Darmor-bzh v5 and ZS11 genome assemblies (X axis). Each dot corresponds to syntenic regions of the genomes that are aligning with high confidence. Blue dots correspond to regions aligned in the correct orientation (forward strand) while red dots represent regions aligned in an inverted orientation (reverse strand). (Peri)centromeric regions are represented by black boxes on the X and Y axes. Some (peri)centromere flanking markers were not found in Zs11 owing to polymorphisms. The new assembly of a centromeric region in Darmor-bzh v10 compared to Darmor-bzh v5 is highlighted in orange for the A06 chromosome. An example of a large misassembled inverted region whose orientation has been corrected within the new Darmor-bzh v10 genome assembly is highlighted in green for the chromosome C07.
Repetitive element content of the Brassica long-read assemblies (10 polyploid B. napus, 2 diploid B. rapa, 1 diploid B. nigra, and 2 diploid B. oleracea genomes)
| Total length (Mb) | Masked proportion (%) | LTR Copia (%) | LTR Gypsy (%) | DNA CMC-EnSpm (%) | LINE (%) | RC Helitron (%) | DNA TcMar (%) | DNA MULE-MuDR (%) | |
|---|---|---|---|---|---|---|---|---|---|
| Darmor-bzh | 924 | 53.86 | 15.55 | 13.05 | 6.66 | 6.37 | 5.26 | 2.05 | 1.93 |
| Westar | 1,008 | 58.91 | 16.88 | 13.38 | 6.26 | 6.92 | 4.87 | 1.92 | 1.77 |
| Express617 | 925 | 56.32 | 15.92 | 12.46 | 6.54 | 6.63 | 5.22 | 2.05 | 1.88 |
| Tapidor3 | 1,014 | 59.01 | 16.49 | 13.53 | 6.17 | 7.05 | 4.86 | 1.91 | 1.76 |
| Shengli3 | 1,002 | 58.50 | 16.15 | 13.16 | 6.16 | 6.97 | 4.86 | 1.92 | 1.78 |
| QuintaA | 1,004 | 58.60 | 16.40 | 13.23 | 6.29 | 7.03 | 4.93 | 1.93 | 1.80 |
| No2127 | 1,012 | 59.85 | 17.08 | 13.74 | 6.23 | 7.02 | 4.77 | 1.91 | 1.72 |
| GanganF73 | 1,034 | 58.76 | 16.60 | 13.48 | 6.23 | 6.86 | 4.82 | 1.91 | 1.75 |
| Zheyou73 | 1,016 | 59.06 | 16.49 | 13.59 | 6.28 | 7.04 | 4.87 | 1.91 | 1.79 |
| Zs11 | 1,010 | 58.62 | 16.59 | 13.37 | 6.36 | 6.92 | 4.88 | 1.95 | 1.78 |
| Z1 | 402 | 46.95 | 13.22 | 13.23 | 2.88 | 5.16 | 4.37 | 1.39 | 1.53 |
| Chiifu | 353 | 48.46 | 12.89 | 13.18 | 3.29 | 5.60 | 5.15 | 1.63 | 1.82 |
| NI100 | 506 | 53.70 | 16.51 | 22.59 | 4.08 | 4.93 | 3.17 | 1.42 | 1.79 |
| D134 | 530 | 57.88 | 15.61 | 13.23 | 9.16 | 6.24 | 5.62 | 2.56 | 2.08 |
| HDEM | 555 | 57.14 | 16.06 | 12.68 | 8.74 | 6.31 | 5.42 | 2.46 | 1.99 |
Figure 3.Improvements of the B. napus Darmor-bzh v10 genome assembly and annotation compared to Darmor-bzh v5 by the use of long-read sequencing. A, Genome-wide pericentromere size (in Mb) and gene content using centromere flanking markers (Mason et al. [47], refer to methods). The Darmor-bzh v10 genome assembly now harbors larger pericentromeres with more annotated genes than in the previous genome. B, Repartition of gene locations on chromosome and unplaced scaffolds.
Figure 4.Distribution of the number of gaps per chromosome in Brassica genomes. Each dot represents the number of gaps in a given chromosome and genome assembly. PacBio assemblies are in blue and ONT assemblies in orange.
Figure 5.Example of a genomic region of B. napus Darmor-bzh assembly that corresponds to a region that contain a gap in the Zs11 genome assembly. First track represents the GC content. ONT reads (longer than 100 kb) are in the second track (blue boxes), and a 115-kb read that spanned the whole region is surrounded by a black box. Transposable elements are shown as black boxes, and the region absent from Zs11 contains a large LINE element (80,707 bp). The alignment of the Zs11 sequence (20 kb around the 500-bp gap) is represented by red boxes, with thin red lines representing missing sequences in the Zs11 genome assembly that are present in Darmor-bzh v10 (gaps). Predicted genes are in the last track, in purple.
Figure 6.Example of a splicing event detected using long reads. The exon in the third intron of the gene prediction (C03p53740.1_Bna_DAR, Gene prediction track) is mutually exclusive with the third exon. This alternative splice form is detected by a single nanopore read (blue track) and maintained by the TALC error correction (red track).
Resistance gene analogs categories and repartition between chromosomes and unplaced scaffolds in B. napus Darmor-bzh v5 and v10 genome assemblies
| RGA Location | RGA Class | Total | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NBS | CNL | TNL | CN | TN | NL | TX | Others | RLP | RLK | TM-CC | ||
| Darmor-bzh v5 RGA | ||||||||||||
| Genome-wide | 37 | 53 | 164 | 33 | 60 | 80 | 143 | 24 | 274 | 1,502 | 418 | 2,788 |
| Pseudomolecules | 27 | 45 | 144 | 27 | 54 | 67 | 112 | 22 | 231 | 1,259 | 340 | 2,328 |
| Scaffolds | 10 | 8 | 20 | 6 | 6 | 13 | 31 | 2 | 43 | 243 | 78 | 460 |
| Darmor-bzh v10 RGA | ||||||||||||
| Genome-wide | 45 | 71 | 189 | 22 | 58 | 85 | 128 | 29 | 285 | 1,569 | 471 | 2,952 |
| Pseudomolecules | 45 | 71 | 184 | 22 | 57 | 84 | 128 | 29 | 282 | 1,561 | 471 | 2,934 |
| Scaffolds | 0 | 0 | 5 | 0 | 1 | 1 | 0 | 0 | 3 | 8 | 0 | 18 |
Figure 7.Resistance gene analogs (RGA) annotation improvement in B. napus Darmor-bzh v10 using long-read direct RNA sequencing. A, Conservation of the RGA categories between Darmor-bzh v5 and v10 genome annotations. Shifting RGAs are genes whose sequence is matching but that have been annotated in a different RGA class by RGAugury between the v5 and v10 genome annotations. Such shifting RGAs only represent a small fraction of the annotated RGAs. B, Total number of RGAs plotted against the ratio of shifting RGAs over total RGAs by RGA class. CN and NL RGAs are the categories whose annotation shifted the most between Darmor-bzh v5 and v10. A detail of the raw data used to compute the ratio is available on the right part of the plot. C. Detail of the RGA class shifts between Darmor-bzh v5 and v10 made using ggalluvial [56]. The size of the arcs is proportional to the number of shifting RGAs from 1 RGA class to another between the 2 annotations. E, Genome browser snapshot using pyGenomeTracks [57] of the phoma canker resistance QTL (grey) in Darmor-bzh v5 (top) and Darmor-bzh v10 (bottom). RGA genes (blue) and pericentromeric regions (orange) are displayed.