| Literature DB >> 35416986 |
Tatiana García Navarrete1, Cintia Arias2, Eric Mukundi1, Ana Paula Alonso2, Erich Grotewold1.
Abstract
The Brassicaceae family comprises more than 3,700 species with a diversity of phenotypic characteristics, including seed oil content and composition. Recently, the global interest in Thlaspi arvense L. (pennycress) has grown as the seed oil composition makes it a suitable source for biodiesel and aviation fuel production. However, many wild traits of this species need to be domesticated to make pennycress ideal for cultivation. Molecular breeding and engineering efforts require the availability of an accurate genome sequence of the species. Here, we describe pennycress genome annotation improvements, using a combination of long- and short-read transcriptome data obtained from RNA derived from embryos of 22 accessions, in addition to public genome and gene expression information. Our analysis identified 27,213 protein-coding genes, as well as on average 6,188 biallelic SNPs. In addition, we used the identified SNPs to evaluate the population structure of our accessions. The data from this analysis support that the accession Ames 32872, originally from Armenia, is highly divergent from the other accessions, while the accessions originating from Canada and the United States cluster together. When we evaluated the likely signatures of natural selection from alternative SNPs, we found 7 candidate genes under likely recent positive selection. These genes are enriched with functions related to amino acid metabolism and lipid biosynthesis and highlight possible future targets for crop improvement efforts in pennycress.Entities:
Keywords: zzm321990 Thlaspi arvense (pennycress); RNA-seq; SNPs; accessions; genome annotation
Mesh:
Substances:
Year: 2022 PMID: 35416986 PMCID: PMC9157065 DOI: 10.1093/g3journal/jkac084
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
List of pennycress accession used in this study.
| Reference name | Accession name | Origin | Illumina high-quality read counts | Nanopore read counts | Alignment rates (%) | |||
|---|---|---|---|---|---|---|---|---|
| 10 DAP | 16 DAP | 10 DAP | Illumina | Nanopore | ||||
| 10 DAP | 16 DAP | 10 DAP | ||||||
| PC1 | Ames 32908 | Illinois, USA | 166,159,412 | 142,642,262 | 98.3 | 98.4 | ||
| PC2 | Ames 32872 | Armenia | 147,309,695 | 147,941,054 | 96.3 | 96.1 | ||
| PC3 | Ames 31499 | British Columbia, Canada | 158,493,892 | 148,009,513 | 98.3 | 98.5 | ||
| PC4 | Ames 31497 | Saskatchewan, Canada | 161,944,688 | 147,716,867 | 98.3 | 98.5 | ||
| PC5 | Ames 29512 | Ohio, USA | 145,456,696 | 148,280,855 | 98.2 | 98.3 | ||
| PC6 | Ames 31026 | Colorado, USA | 158,043,152 | 150,666,428 | 98.1 | 98.2 | ||
| PC7 | Ames 31501 | Manitoba, Canada | 157,497,428 | 152,667,158 | 98.4 | 98.4 | ||
| PC8 | Ames 31500 | Alberta, Canada | 161,851,429 | 131,646,039 | 98.4 | 98.2 | ||
| PC9 | Ames 31488 | Ontario, Canada | 139,200,591 | 149,532,332 | 98.3 | 98.2 | ||
| PC10 | Ames 30933 | Magallanes, Chile | 128,441,416 | 134,602,498 | 97.6 | 97.5 | ||
| PC11 | Ames 30985 | South Dakota, USA | 127,281,910 | 131,307,530 | 97.6 | 97.5 | ||
| PC12 | Ames 24499 | Former Serbia and Montenegro | 128,470,022 | 135,033,559 | 97.6 | 97.5 | ||
| PC13 | Ames 29531 | North Dakota, USA | 115,343,123 | 151,205,534 | 98.1 | 98.4 | ||
| PC14 | Ames 22461 | Poland | 128,876,158 | 99,471,991 | 97.7 | 97.3 | ||
| PC15 | PI 650287 | Bas-Rhin, France | 131,295,933 | 132,096,941 | 97.6 | 97.0 | ||
| PC16 | PI 633415 | Saxony, Germany | 131,262,050 | 132,955,028 | 97.6 | 96.5 | ||
| PC17 | PI 650284 | Thuringia, Germany | 131,349,471 | 97.4 | ||||
| PC18 | Ames 30982 | Iowa, USA | 147,434,032 | 278,054,333 | 97.6 | 97.4 | ||
| PC19 | Ames 31012 | Colorado, USA | 131,144,003 | 133,595,159 | 97.5 | 96. | ||
| PC20 | Ames 31498 | Alberta, Canada | 128,261,227 | 133,677,160 | 98.0 | 96.2 | ||
| PC21 | PI 650285 | Saxony, Germany | 150,084,778 | 151,364,594 | 98.2 | 98.2 | ||
| PC22 | MN106 | Minnesota, USA | 147,772,792 | 148,588,444 | 10,272,041 | 98.0 | 98.0 | 84.7 |
Fig. 1.Workflow for the re-annotation of the pennycress genome. Regular oval shapes indicate the tools used for data processing, and the multi-document figure symbolize the input data from the Illumina and ONT sequencing platforms. (a) The dotted line represents the workflow for the genome-guided transcriptome assembly, and the double dashed line the de novo transcriptome assembly. (b) The line with a square arrow indicates the workflow incorporating the long reads, and the double dash dot line with circle arrow corresponds to the integration of short reads for the correction phase of the long reads.
Comparative summary between the 2 versions of the annotation of pennycress and Arabidopsis thaliana.
| PC_v.1.0 | PC_v1.1 |
| |
|---|---|---|---|
| Total genes | 27,390 | 27,213 | 27,416 |
| Average gene length | 2,195.26 | 2,454.17 | 2,206.02 |
| Average exon number | 5.54 | 5.63 | 5.86 |
| Average transcripts per gene | 1 | 1.2 | 1.29 |
| Average CDS length | 1,238.99 | 1,254.74 | 1,230.62 |
| Genes with both 5′ and 3′ UTR | 1,466 | 10,426 | 27,416 |
Fig. 2.Examples of improved genome annotation. Genome viewer images indicating the reads obtained from the two sequencing platforms comparing gene models from the current genome (v.1.0) and as provided in this study (v.1.1). The first track shows read distributions corresponding to previously available RNA-seq data, the second track reads obtained from Nanopore, and the last track shows the reads obtained from Illumina embryo RNA-seq. (a) Example of a gene model in which the new sequencing data resulted in the addition of 5’UTR and 3’UTR regions. (b) Example of a gene model in which the new sequencing data resulted in identifying three additional exons. (c) Example of two adjacent gene models in which the new sequencing data showed that they actually corresponded to just one gene.
Fig. 3.Analysis of SNP distributions among 22 pennycress accessions. In the representation, accessions were grouped according to geographical origin.
Fig. 4.(a) Relationship between the accessions based on the presence of transcribed SNPs. The tree was constructed using the neighbor-joining (NJ) method and branch support was provided by performing 1,000 bootstrap replicates. (b) Top panel: PCA of the pennycress accessions, where the accession coming from Armenia PC2 shows a high divergence in comparison with other pennycress accessions. Bottom panel: PCA without PC2 accession coming from Armenia. (c) Clusters to study pennycress population structure through the DAPC and evaluation of the posterior membership probability for each sample to each of the predetermined populations.