| Literature DB >> 34990041 |
Adam Nunn1,2, Isaac Rodríguez-Arévalo3,4, Zenith Tandukar5, Katherine Frels5,6, Adrián Contreras-Garrido7, Pablo Carbonell-Bejerano7, Panpan Zhang8,9, Daniela Ramos Cruz3,4, Katharina Jandrasits3,4, Christa Lanz7, Anthony Brusa5, Marie Mirouze8,9, Kevin Dorn10,11, David W Galbraith12, Brice A Jarvis13, John C Sedbrook13, Donald L Wyse5, Christian Otto1, David Langenberger1, Peter F Stadler2,14, Detlef Weigel7, M David Marks10, James A Anderson5, Claude Becker3,4, Ratan Chopra5,10.
Abstract
Thlaspi arvense (field pennycress) is being domesticated as a winter annual oilseed crop capable of improving ecosystems and intensifying agricultural productivity without increasing land use. It is a selfing diploid with a short life cycle and is amenable to genetic manipulations, making it an accessible field-based model species for genetics and epigenetics. The availability of a high-quality reference genome is vital for understanding pennycress physiology and for clarifying its evolutionary history within the Brassicaceae. Here, we present a chromosome-level genome assembly of var. MN106-Ref with improved gene annotation and use it to investigate gene structure differences between two accessions (MN108 and Spring32-10) that are highly amenable to genetic transformation. We describe non-coding RNAs, pseudogenes and transposable elements, and highlight tissue-specific expression and methylation patterns. Resequencing of forty wild accessions provided insights into genome-wide genetic variation, and QTL regions were identified for a seedling colour phenotype. Altogether, these data will serve as a tool for pennycress improvement in general and for translational research across the Brassicaceae.Entities:
Keywords: comparative genomics; genetic mapping; genome annotations; genome assembly; pennycress
Mesh:
Year: 2022 PMID: 34990041 PMCID: PMC9055812 DOI: 10.1111/pbi.13775
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 13.263
Full descriptive statistics comparing the previously published T_arvense_v1 assembly with the present version T_arvense_v2
| Assembly category | T_arvense_v1 | T_arvense_v2 |
|---|---|---|
| No. of contigs | 44 109 | 4714 |
| Largest contig | – | 41.6 Mbp |
| contig N50 | 0.02 Mbp | 13.3 Mbp |
| No. of scaffolds | 6768 | 964 |
| No. of scaffolds (≥50 000 bp) | 1807 | 607 |
| Largest scaffold | 2.4 Mbp | 70.0 Mbp |
| Total length | 343 Mbp | 526 Mbp |
| Total length (≥50 000 bp) | 276 Mbp | 514 Mbp |
| GC (%) | 37.99 | 38.39 |
| N50 | 0.14 Mbp | 64.9 Mbp |
| NG50 | 0.05 Mbp | 64.9 Mbp |
| N75 | 0.06 Mbp | 61.0 Mbp |
| NG75 | – | 55.2 Mbp |
| L50 | 561 | 4 |
| LG50 | 1678 | 4 |
| L75 | 1469 | 6 |
| LG75 | – | 7 |
| No. of Ns per 100 kbp | 5165.00 | 0.51 |
Figure 1Overview of the seven largest scaffolds representing chromosomes in T. arvense var. MN106‐Ref. The tracks denote (a) DNA methylation level in shoot tissue (CG: grey; CHG: black; CHH: pink; 200 kbp window size), and density distributions (1 Mbp window size) of (b) protein‐coding loci, (c) sRNA loci, (d) Gypsy retrotransposons, (e) Copia retrotransposons, (f) LTR retrotransposons and (g) pseudogenes.
Figure 2Distribution of ancestral genomic blocks (top panel) along the seven largest scaffolds of T. arvense MN106‐Ref (T_arvense_v2), and a comparison of these genomic blocks with Eutrema salsugineum, Schrenkiella parvula, Arabidopsis thaliana and Arabidopsis lyrata.
Figure 3Feature annotations within T. arvense var MN106‐Ref. (a) Rooted species tree inferred from all genes, denoting node support and branch length in substitutions per site, and horizontal stacked bar chart comparing the genetic fraction in pennycress with other Brassicaceae sp. (ns = nonspecific orthologs, ss = species‐specific orthologs, un = unclassified genes, nc = non‐coding/intergenic fraction). (b) Comparison of gene macrosynteny between v1 and v2 of the genome, and a microsynteny example of genes MYB29 and MYB76, which are resolved in the v2 annotation. (c) Small RNA biogenesis locus length and expression values in each of four tissues. (d) Overall repetitive content in the genome as discovered by RepeatMasker2, and relative abundance of TEs within the fraction of repetitive elements.
Summary of feature annotations in comparison with the original version T_arvense_v1
| Type | T_arvense_v1 | T_arvense_v2 | diff. |
|---|---|---|---|
| (A) Protein‐coding genes | |||
| Total number of loci | 27 390 | 27 128 | ‐262 |
| Total number of unique loci | 4780 | 5034 | +254 |
| Total number of transcript isoforms | – | 30 650 | +30 650 |
| Number of matching loci with changes in CDS | – | – | +14 102 |
| Number of matching loci with changes in UTR(s) | – | – | +22 559 |
| Loci containing one or more PFAM domain | – | 21 171 | +21 171 |
| Loci annotated with one or more GO term | – | 13 074 | +13 074 |
| (B) Non‐coding genes | |||
| tRNA | – | 1148 | +1148 |
| rRNA clusters (<25 kbp) | – | 63 | +63 |
| snoRNA | – | 243 | +243 |
| Small interfering RNA (siRNA) | – | 19 373 | +19 373 |
| MicroRNA (miRNA) | – | 72 | +72 |
| (C) Other gene types | |||
| Pseudogenes (set II Ψs) | – | 44 490 | +44 490 |
| Transposable element genes | – | 423 251 | +423 251 |
Figure 4Regulatory dynamics in pennycress. (a) Relative fraction of genes in each tissue for low (0–0.2), intermediate (0.2–0.8) and high/absolute specificity (0.8–1.0) subsets. (b) Log2(TMM) expression values of the top 30 most highly expressed genes in each tissue, relative to the mean across all tissues, from the subset of genes with a high/absolute tau specificity score. (c) Distribution of average DNA methylation for different genomic features, by cytosine sequence context. (d) DNA methylation along genes (top) and TEs (bottom), including a 2‐kb flanking sequence upstream and downstream. DNA methylation was averaged in non‐overlapping 25‐bp windows.
Figure 5(a) Dendrogram representing the forty wild accessions in our study showing three distinct subpopulations, inferred from STRUCTURE analysis (Figure S13). (b,c) Variation of transcript isoforms for MN108 (b) and Spring32‐10 (c) accessions based on SQANTI3 analysis. (d) A pale phenotype segregating in an improved pennycress line (fae‐1‐1/rod1‐1) was analysed with a modified bulked‐segregant analysis, and the QTL region associated with this phenotype was mapped using the MutMap approach.