| Literature DB >> 32076428 |
Raúl Castanera1, Valentino Ruggieri1,2, Marta Pujol1,2, Jordi Garcia-Mas1,2, Josep M Casacuberta1.
Abstract
The published melon (Cucumis melo L.) reference genome assembly (v3.6.1) has still 41.6 Mb (Megabases) of sequences unassigned to pseudo-chromosomes and about 57 Mb of gaps. Although different approaches have been undertaken to improve the melon genome assembly in recent years, the high percentage of repeats (~40%) and limitations due to read length have made it difficult to resolve gaps and scaffold's misassignments to pseudomolecules, especially in the heterochromatic regions. Taking advantage of the PacBio single- molecule real-time (SMRT) sequencing technology, an improvement of the melon genome was achieved. About 90% of the gaps were filled and the unassigned sequences were drastically reduced. A lift-over of the latest annotation v4.0 allowed to re-collocate protein-coding genes belonging to the unassigned sequences to the pseudomolecules. A direct proof of the improvement reached in the new melon assembly was highlighted looking at the improved annotation of the transposable element fraction. By screening the new assembly, we discovered many young (inserted less than 2Mya), polymorphic LTR-retrotransposons that were not captured in the previous reference genome. These elements sit mostly in the pericentromeric regions, but some of them are inserted in the upstream region of genes suggesting that they can have regulatory potential. This improved reference genome will provide an invaluable tool for identifying new gene or transposon variants associated with important phenotypes.Entities:
Keywords: assembly; long-reads; melon; reference genome; transposable elements
Year: 2020 PMID: 32076428 PMCID: PMC7006604 DOI: 10.3389/fpls.2019.01815
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Comparison of the chromosomes length (ungapped) between the v3.6.1 and the v4.0 genome assemblies.
Comparison of the transposable element (TE) annotation based on the v3.6.1 and v4.0 assemblies.
| TE order | Acronym | v3.6.1 | v4.0 | ||
|---|---|---|---|---|---|
| Copies | Genome fraction (%) | Copies | Genome fraction (%) | ||
| LTR | RLX | 74,161 | 23.44 | 136,761 | 23.81 |
| LINE | RIX | 11,913 | 2.64 | 15,067 | 1.80 |
| SINE | RSX | 391 | 0.04 | 746 | 0.04 |
| DIRS | RYX | 4,212 | 1.65 | 17,890 | 4.39 |
| TIR | DTX | 21,383 | 7.11 | 92,819 | 14.97 |
| Helitron | DHX | 1,699 | 0.3 | 5,637 | 0.45 |
| Others | 912 | 0.48 | 823 | 0.07 | |
| TOTAL | 35.66 | 45.53 | |||
The TE orders are referred to as follows: LTR-retrotransposons (LTR), Long Interspersed nuclear elements (LINE), short interspersed nuclear elements (SINE), DIRS retrotransposons (DIRS), TIR-TEs (TIR), and Helitrons.
Annotation of full-length LTR-retrotransposons. Number of full-length retrotransposon copies belonging to Gypsy, Copia, and unclassified superfamilies in the published v3.6.1 and the v4.0 genome assemblies.
| Superfamily | v3.6.1 | v4.0 |
|---|---|---|
| Gypsy | 815 | 1,526 |
| Copia | 1,067 | 1,427 |
| Unclassified | 1,358 | 1,607 |
| TOTAL | 3,240 | 4,560 |
Figure 2Distribution of insertion age of Gypsy, Copia, and unclassified LTR-retrotransposons annotated on the genome assemblies v3.6.1 and v4.0.
Figure 3Distribution of transposable elements (TEs) and genes across v4.0 pseudomolecules. In green, density of REPET features per window (6,000 windows in total). In red, density of full-length Long Terminal Repeat (LTR)-retrotransposons annotated in the v4.0 that were absent in the v3.6.1 assembly. In orange, density of polymorphic LTR-retrotransposons with insertion time below 2 Mya. In purple, gene density.
Figure 4Example of LTR-retrotransposon insertion (red boxes) in the proximal upstream region of genes (blue boxes) annotated in the V4.0 assembly and that corresponded to a gap in the v3.6.1 assembly.