| Literature DB >> 34208974 |
Mohammed Bakkali1, Rubén Martín-Blázquez1,2, Mercedes Ruiz-Estévez1,3, Manuel A Garrido-Ramos1.
Abstract
We sequenced the sporophyte transcriptome of Killarney fern (Vandenboschia speciosa (Willd.) G. Kunkel). In addition to being a rare endangered Macaronesian-European endemism, this species has a huge genome (10.52 Gb) as well as particular biological features and extreme ecological requirements. These characteristics, together with the systematic position of ferns among vascular plants, make it of high interest for evolutionary, conservation and functional genomics studies. The transcriptome was constructed de novo and contained 36,430 transcripts, of which 17,706 had valid BLAST hits. A total of 19,539 transcripts showed at least one of the 7362 GO terms assigned to the transcriptome, whereas 6547 transcripts showed at least one of the 1359 KEGG assigned terms. A prospective analysis of functional annotation results provided relevant insights on genes involved in important functions such as growth and development as well as physiological adaptations. In this context, a catalogue of genes involved in the genetic control of plant development, during the vegetative to reproductive transition, in stress response as well as genes coding for transcription factors is given. Altogether, this study provides a first step towards understanding the gene expression of a significant fern species and the in silico functional and comparative analyses reported here provide important data and insights for further comparative evolutionary studies in ferns and land plants in general.Entities:
Keywords: Vandenboschia speciosa; ferns; functional annotation; transcriptome
Mesh:
Substances:
Year: 2021 PMID: 34208974 PMCID: PMC8304985 DOI: 10.3390/genes12071017
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Sequencing statistics.
| Raw Data | after Quality Trimming | |
|---|---|---|
| Number of paired-end reads | 66.3 million | 65.2 million |
| Number of bases | 6700 million | 6590 million |
Assembly statistics.
| Before Filtering | After Filtering | |
|---|---|---|
| Total transcripts | 84,759 | 36,430 |
| Percent GC | 45.18 | 45.18 |
| Contig N50 (bp) | 1955 | 2085 |
| Contig N70 (bp) | 1332 | 1511 |
| Contig N90 (bp) | 479 | 729 |
| Ex90N50 (bp) | 2039 | 2299 |
| Number transcripts corresponding to the Ex90 peak | 14,645 | 21,543 |
| Size of the smallest contig (bp) | 201 | 201 |
| Size of the largest contig (bp) | 13,225 | 13,224 |
| Number of contigs greater than 1 Kb long | 35,801 | 20,532 |
| Number of contigs greater than 10 Kb long | 18 | 12 |
| Median contig length (bp) | 722 | 1197 |
| Average contig (bp) | 1,144.86 | 1,437.37 |
| Total number of assembled bases | 97,037,551 | 52,363,571 |
The table lists the number of proteins from the Swiss Prot database on which the V. speciosa transcripts align along a percentage of their length.
| Before Filtering | After Filtering | |||
|---|---|---|---|---|
| Percentage Intervals | Number of Proteins * | Accumulated Number of Proteins ** | Number of Proteins * | Accumulated Number of Proteins ** |
| 91–100 | 3301 | 3301 (>90%) | 3196 | 3196 (>90%) |
| 81–90 | 1330 | 4631 (>80%) | 1352 | 4548 (>80%) |
| 71–80 | 916 | 5547 (>70%) | 878 | 5426 (>70%) |
| 61–70 | 646 | 6193 (>60%) | 569 | 5995 (>60%) |
| 51–60 | 551 | 6744 (>50%) | 449 | 6444 (>50%) |
| 41–50 | 586 | 7330 (>40%) | 448 | 6892 (>40%) |
| 31–40 | 538 | 7868 (>30%) | 381 | 7273 (>30%) |
| 21–30 | 506 | 8374 (>20%) | 302 | 7575 (>20%) |
| 11–20 | 411 | 8785 (>10%) | 225 | 7800 (>10%) |
| 1–10 | 90 | 8875 (>1%) | 51 | 7851 (>1%) |
| TOTAL | 8875 | 8875 | 7851 | 7851 |
* Number of proteins that each match a V. speciosa transcript in a percentage of their lengths comprised in the indicated interval; ** Number of proteins that each match a V. speciosa transcript in a percentage of their lengths above the percentage indicated in brackets.
Figure 1BUSCO completeness assessments of the V. speciosa filtered transcriptome with the Eukaryote (n = 255 conserved genes), Viridiplantae (n = 425 conserved genes) and Embryophyta (n = 1614 conserved genes) datasets. Blue, yellow, and red bars, respectively, represent the proportion of complete (C), fragmented (F), and missing (M) BUSCO genes. Light blue bars represent complete and single-copy BUSCO genes (S). Dark blue bars represent complete and duplicated BUSCO genes (D). Numbers within bars are absolute numbers of recovered (complete or fragmented) and missing genes. For example, for Embryophyta, the total dataset is composed of n = 1614 genes, of which 1286 (79.7%) were complete, 99 (6.1%) were fragmented, and 229 (14.2%) were not recovered. Among the complete genes, 319 represented duplicated genes and 967 represented single-copy genes.
Figure 2Distribution of the GO terms in the second GO hierarchical level for the biological process category.
Figure 3Distribution of the GO terms in the second GO hierarchical level for the molecular function category.
Per species statistics of the Orthofinder analysis.
| Mosses | Lycophyte | Seed Plant | Leptosporangiate Ferns | |||||
|---|---|---|---|---|---|---|---|---|
| Statistics |
|
|
|
|
|
|
|
|
| Number of proteins | 40,806 | 38,354 | 22,285 | 48,359 | 29,220 | 44,668 | 20,203 | 19,779 |
| Number of proteins in orthogroups | 31,747 | 26,769 | 20,136 | 44,174 | 23,964 | 35,963 | 17,948 | 16,818 |
| Number of unassigned proteins | 9059 | 11,585 | 2149 | 4185 | 5256 | 8705 | 2255 | 2961 |
| Percentage of proteins in orthogroups | 77.8 | 69.8 | 90.4 | 91.3 | 82.0 | 80.5 | 88.8 | 85.0 |
| Percentage of unassigned proteins | 22.2 | 30.2 | 9.6 | 8.7 | 18.0 | 19.5 | 11.2 | 15.0 |
| Number of orthogroups containing species | 12,351 | 11,499 | 9586 | 12,003 | 11,195 | 12,715 | 9915 | 9882 |
| Percentage of orthogroups containing species | 46.0 | 42.8 | 35.7 | 44.7 | 41.7 | 47.3 | 36.9 | 36.8 |
| Number of species-specific orthogroups | 1654 | 813 | 1461 | 4018 | 1384 | 2207 | 344 | 282 |
| Nº of proteins in species-specific orthogroups | 6849 | 2540 | 6545 | 18,005 | 3696 | 8205 | 1142 | 921 |
| % of proteins in species-specific orthogroups | 16.8 | 6.6 | 29.4 | 37.2 | 12.6 | 18.4 | 5.7 | 4.7 |
Cp: C. purpureus; Pp: P. patens; Sm: S. moellendorffii; At: A. thaliana; Vs: V. speciosa; Cr: C. richardii; Af: A. filiculoides; Sc: S. cucullata.
Figure 4Phylogenetic tree based on single-copy orthologues. The tree was rooted with the non-vascular species (C. purpureus and P. patens). Numbers are bootstrap values for individual nodes.
Number of shared orthogroups between V. speciosa and the rest of species.
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
|
| 6905 | 6921 | 7275 | 6954 | 6882 | 6753 | 7243 |
|
| 7169 | 8675 | 7188 | 8448 | 6901 | 8028 | |
|
| 7576 | 10339 | 7145 | 7072 | 7433 | ||
|
| 7576 | 8665 | 7309 | 8916 | |||
|
| 7167 | 7070 | 7443 | ||||
|
| 6901 | 8052 | |||||
|
| 7198 |