| Literature DB >> 35946987 |
Tao Wan1,2,3, Yanbing Gong4,5, Zhiming Liu3, YaDong Zhou6, Can Dai7, Qingfeng Wang1,2.
Abstract
Gymnosperms represent an ancient lineage that diverged from early spermatophytes during the Devonian. The long fossil records and low diversity in living species prove their complex evolutionary history, which included ancient radiations and massive extinctions. Due to their ultra-large genome size, the whole-genome assembly of gymnosperms has only generated in the past 10 years and is now being further expanded into more taxonomic representations. Here, we provide an overview of the publicly available gymnosperm genome resources and discuss their assembly quality and recent findings in large genome architectures. In particular, we describe the genomic features most related to changes affecting the whole genome. We also highlight new realizations relative to repetitive sequence dynamics, paleopolyploidy, and long introns. Based on the results of relevant genomic studies of gymnosperms, we suggest additional efforts should be made toward exploring the genomes of medium-sized (5-15 gigabases) species. Lastly, more comparative analyses among high-quality assemblies are needed to understand the genomic shifts and the early species diversification of seed plants.Entities:
Keywords: diversification; genome architecture; genomic shift; gymnosperms
Mesh:
Year: 2022 PMID: 35946987 PMCID: PMC9364684 DOI: 10.1093/gigascience/giac078
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 7.658
List of currently available whole-genome assembly of gymnosperms
| Species (common name) | Size of assembly (bp) | Family | Sequencing platform | Online year and relative publication | Link to the assembly data |
|---|---|---|---|---|---|
|
| 23 G | Pinaceae | Sanger+ Illumina HiSeq 2000 | 2013 [ |
|
|
| 12.3 G | Pinaceae | Sanger whole-genome shotgun | 2013 [ |
|
|
| 23.6 G | Pinaceae | Illumina HiSeq 2000, Miseq | 2013 [ |
|
|
| 23.2 G | Pinaceae | Illumina GA II, HiSeq 2000, Miseq | 2014 [ |
|
|
| 22.4 G | Pinaceae | Illumina HiSeq2500, MiSeq | 2015 [ |
|
|
| 27.6 G | Pinaceae | Illumina GA II, HiSeq 2000/2500, Miseq | 2016 [ |
|
|
| 10.6 G | Ginkgoaceae | Illumina Hiseq 2000/4000 | 2016 [ |
|
|
| 15.7 G | Pinaceae | Illumina HiSeq | 2017 [ |
|
|
| 4.0 G | Gnetaceae | Illumina HiSeq 2000/2500 | 2018 [ |
|
|
| 18.2 G | Pinaceae | Illumina HiSeq | 2019 [ |
|
|
| 12.3 G | Pinaceae | Illumina HiSeq | 2019 [ |
|
|
| 8.1 G | Cupressaceae | Illumina HiSeq + Oxford Nanopore | 2020 [ |
|
|
| 9.8 G | Ginkgoaceae | Illumina HiSeq + PacBio RSII | 2021 [ |
|
|
| 6.8 G | Welwitschiaceae | Illumina HiSeq + Oxford Nanopore | 2021 [ |
|
|
| 10.2 G | Taxaceae | Illumina HiSeq + PacBio RSII | 2021 [ |
|
|
| 10.9 G | Taxaceae | Illumina HiSeq + Oxford Nanopore | 2021 [ |
|
|
| 10.7 G | Taxaceae | Illumina HiSeq + Oxford Nanopore | 2021 [ |
|
|
| 25.4 G | Pinaceae | Illumina HiSeq + PacBio RSII | 2022 [ |
|
|
| 26.5 G | Cupressaceae | Illumina HiSeq + Oxford Nanopore | 2022 [ |
|
|
| 10.5 G | Cycadaceae | Illumina HiSeq, Miseq+ Oxford Nanopore | 2022 [ |
|
The prepublication release of the assembly was made in 2012 [10]. It contained 18.5 Gbp of sequence with a contig N50 size of 800 bp.
Figure 1:The contemporary overview of the deciphered gymnosperm genomes and the genomic features underpinning their complicated evolutionary history. (A) The geographical distribution of the extant gymnosperms is depicted based on data from the Global Biodiversity Information Facility. The images list the representative gymnosperm species that have been sequenced. (B) Current status of the accumulation of high-quality assemblies of gymnosperms since the advent of long-read sequencing technologies. Abbreviations of the taxa listed from top to bottom: Pab, Picea abies; Pgl, Picea glauca; Pta, Pinus taeda; Pla, Pinus lambertiana; Gbi, Ginkgo biloba; Pme, Pseudotsuga menziesii; Gmo, Gnetum montanum; Aal, Abies alba; Sgi, Sequoiadendron giganteum; Wmi, Welwitschia mirabilis; Tyu, Taxus yunnanensis; Sse, Sequoia sempervirens; Ptab, Pinus tabuliformis; Cpa, Cycas panzhihuaensis. (C) The prediction and placement of ancient whole-genome duplications (WGDs) in seed plants and the highly contested inference of paleopolyploidy in the most recent common ancestors of all extant gymnosperms. The dashed line indicates the conflicts in the phylogenetic position of gnetophytes. The dashed arrows refer to the controversy on the shared polyploidy event of gymnosperms. The Cupressaceae-WGD is highlighted by a “*” since only Taxus and Sequoiadendron were included (excluding Araucaceae) as representatives of the cupressophytes (left). The available records of the solo-/intact–long terminal repeat (LTR) ratios and the relevance of intron lengths are mapped to each species (right). The data for estimating the solo-/intact-LTR ratios were derived from Nystedt et al. [11], Cossu et al. [52], Wan et al. [24], Cheng et al. [39], Wan et al. [27], and Niu et al. [15]. The data on gene structure were derived from Niu et al. [15]. (D) Genome size distribution across the gymnosperm lineages with medium and ultra-large genome sizes. The 1C-DNA contents were obtained from Niu et al. [15] and the data sources of Kew. (E) The genomic signatures of gymnosperms and the potential genome evolutionary patterns are summarized here with the recent discoveries on recombination and repeat dynamics. TEs, transposable elements; UR, unequal recombination; GCE, gene conversion event.