| Literature DB >> 35042803 |
W John Kress1,2,3, Douglas E Soltis4,5,6, Paul J Kersey7, Jill L Wegrzyn8, James H Leebens-Mack9, Morgan R Gostel10, Xin Liu11, Pamela S Soltis4,5.
Abstract
Green plants play a fundamental role in ecosystems, human health, and agriculture. As de novo genomes are being generated for all known eukaryotic species as advocated by the Earth BioGenome Project, increasing genomic information on green land plants is essential. However, setting standards for the generation and storage of the complex set of genomes that characterize the green lineage of life is a major challenge for plant scientists. Such standards will need to accommodate the immense variation in green plant genome size, transposable element content, and structural complexity while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation. Here we provide an overview and assessment of the current state of knowledge of green plant genomes. To date fewer than 300 complete chromosome-scale genome assemblies representing fewer than 900 species have been generated across the estimated 450,000 to 500,000 species in the green plant clade. These genomes range in size from 12 Mb to 27.6 Gb and are biased toward agricultural crops with large branches of the green tree of life untouched by genomic-scale sequencing. Locating suitable tissue samples of most species of plants, especially those taxa from extreme environments, remains one of the biggest hurdles to increasing our genomic inventory. Furthermore, the annotation of plant genomes is at present undergoing intensive improvement. It is our hope that this fresh overview will help in the development of genomic quality standards for a cohesive and meaningful synthesis of green plant genomes as we scale up for the future.Entities:
Keywords: Viridiplantae; annotation; reference genome; transcriptomes; whole-genome duplication (WGD)
Mesh:
Substances:
Year: 2022 PMID: 35042803 PMCID: PMC8795535 DOI: 10.1073/pnas.2115640118
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Light purple, green algae; black, liverworts; dark green, bryophytes; brown, hornworts; light blue, ferns and lycophytes; light green, gymnosperms; and orange, angiosperms. The inner circle shows the current state of genome sequencing with complete genome assemblies shown as red bars, chromosome-level assemblies as blue, scaffold assemblies as dark gray, and contig assemblies as light gray. The outer circle (filled in gray) shows taxa indicated with yellow bars for which transcriptome data are available. Genome data were surveyed from GenBank, European Molecular Biology Laboratory ( EMBL), and DNA Data Bank of Japan (DDBJ) in October 2020. Lines radiating out from the circle show genome sizes as C values (genomesize.com). The phylogenetic framework for all plants to genus level was extracted from the Open Tree of Life (opentreeoflife.org) in October 2020. Full supporting data are available in Dataset S1. Thanks to Keith Crandall and David Stern for assistance with this figure, with help from M.R.G.
Fig. 2.A generalized plant genome workflow from sample collection through assembly and annotation to public data submission. The workflow follows from Left to Right. 1) Sample Collection and Assessment (yellow) to ethically and legally collect, identify, and voucher the reference specimen; obtain, store, ship, and extract DNA/RNA from the samples; as well as assess its biological qualities. 2) Genome and Transcriptome Sequencing (green) conducted using short- and long-read sequencing technologies. 3) Assembly, Error Correction, and Assessment (dark green) to determine sequence contiguity, completeness, and accuracy. 4) Scaffolding and Chromosome Anchoring (blue) to evaluate, elongate the scaffolds, and anchor the chromosomes. 5) Annotation and Assessment (dark blue) integrates transcriptomic resources and targeted sequencing to identify protein coding, repeat, and regulatory regions, and to provide biological context to the identified elements. 6) Public Data Submission (purple) to ensure open access of the sequence data and the derived assemblies and annotation.