| Literature DB >> 31216035 |
Qing Li1, Hongbo Li1, Wu Huang1,2, Yuanchao Xu1, Qian Zhou1,2, Shenhao Wang3, Jue Ruan2, Sanwen Huang2, Zhonghua Zhang1.
Abstract
BACKGROUND: Accurate and complete reference genome assemblies are fundamental for biological research. Cucumber is an important vegetable crop and model system for sex determination and vascular biology. Low-coverage Sanger sequences and high-coverage short Illumina sequences have been used to assemble draft cucumber genomes, but the incompleteness and low quality of these genomes limit their use in comparative genomics and genetic research. A high-quality and complete cucumber genome assembly is therefore essential.Entities:
Keywords: Hi-C; PacBio; chromosome-scale assembly; cucumber; genomics
Mesh:
Year: 2019 PMID: 31216035 PMCID: PMC6582320 DOI: 10.1093/gigascience/giz072
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Landscape of the 7 pseudo-chromosome (chr) sequences. All included contigs are shown. Cytogenetic map [22] is integrated with the sequences. Arrows mark positions of the centromeres (Cen). The distribution of satellite and repetitive sequences along the contigs is illustrated below. Fosmid clones are marked in green and red on the 7 chromosomes, and the imaginary lines connect the physical locations and approximate locations of assembled chromosomes.
Figure 2:Correlation of genome assembly with genetic maps and Hi-C data. A, Integrated genetic and physical maps of the cucumber genome assembly. Super-scaffolds of the genome assembly (middle) were anchored to the 4 linkage groups (left and right): map.1 (green) [3], map.2 (orange) [21], map.3 (light blue) [20], map.4 (pink) [19]. B, Heat map of Hi-C contact information. Pixel colors represent different normalized counts of Hi-C links between 30-kb non-overlapping windows for all 7 chromosomes (chr) on a logarithmic scale.
Figure 3Novel repetitive sequences and genes in assembly v3.0. A, Sizes of various types of repetitive sequences in the v2.0 and v3.0 assemblies. DNA, DNA transposons; LINE, Long interspersed nuclear elements; SINE, Short interspersed nuclear elements; LTRc, Copia long terminal repeat retrotransposons; LTRg, Gypsy long terminal repeat retrotransposons; LTRo, Other LTR categories; Unknown, unknown type. B, The number of full-length long terminal retrotransposons (FL-LTRs) in v2.0 and v3.0. C, A newly predicted FL-LTR in v3.0. TSR, Target site repeat; PBS, Primer bingding site; PPT, Primer polypurine tract; IN, Intergrase; RT, Reverse transcriptase. D, An example showing the newly assembled multiple tyrosylprotein sulfotransferase (TPST) genes in v3.0. b'-e' are all TPST genes, corresponding to CsaV3_1G013960, CsaV3_1G013970, CsaV3_1G013980 and CsaV3_1G013990, respectively.
Figure 4:Distribution of GC content for the whole genome and novel sequences in v3.0.