| Literature DB >> 31217262 |
Elena Mosca1, Fernando Cruz2, Jèssica Gómez-Garrido2, Luca Bianco3, Christian Rellstab4, Sabine Brodbeck4, Katalin Csilléry4,5, Bruno Fady6, Matthias Fladung7, Barbara Fussi8, Dušan Gömöry9, Santiago C González-Martínez10, Delphine Grivet11, Marta Gut2,12, Ole Kim Hansen13, Katrin Heer14, Zeki Kaya15, Konstantin V Krutovsky16,17,18, Birgit Kersten7, Sascha Liepelt14, Lars Opgenoorth14, Christoph Sperisen4, Kristian K Ullrich15, Giovanni G Vendramin19, Marjana Westergren20, Birgit Ziegenhagen14, Tyler Alioto2,12, Felix Gugerli4, Berthold Heinze21, Maria Höhn22, Michela Troggio3, David B Neale23.
Abstract
Silver fir (Abies alba Mill.) is a keystone conifer of European montane forest ecosystems that has experienced large fluctuations in population size during during the Quaternary and, more recently, due to land-use change. To forecast the species' future distribution and survival, it is important to investigate the genetic basis of adaptation to environmental change, notably to extreme events. For this purpose, we here provide a first draft genome assembly and annotation of the silver fir genome, established through a community-based initiative. DNA obtained from haploid megagametophyte and diploid needle tissue was used to construct and sequence Illumina paired-end and mate-pair libraries, respectively, to high depth. The assembled A. alba genome sequence accounted for over 37 million scaffolds corresponding to 18.16 Gb, with a scaffold N50 of 14,051 bp. Despite the fragmented nature of the assembly, a total of 50,757 full-length genes were functionally annotated in the nuclear genome. The chloroplast genome was also assembled into a single scaffold (120,908 bp) that shows a high collinearity with both the A. koreana and A. sibirica complete chloroplast genomes. This first genome assembly of silver fir is an important genomic resource that is now publicly available in support of a new generation of research. By genome-enabling this important conifer, this resource will open the gate for new research and more precise genetic monitoring of European silver fir forests.Entities:
Keywords: Abies alba; annotation; chloroplast genome; conifer genome; genome assembly
Mesh:
Year: 2019 PMID: 31217262 PMCID: PMC6643874 DOI: 10.1534/g3.119.400083
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary of the raw data for Illumina paired-end (PE) and mate-pair (MP) libraries for whole-genome sequencing of Abies alba
| Library | Read length (bp) | Insert size (kb) | Mean fragment size (bp) | Read Pairs (million) | Yield (Mb) | Coverage | Avg. Phix Error R1 (%) | Avg. Phix Error R2 (%) |
|---|---|---|---|---|---|---|---|---|
| PE300-1 | 2 x 151 | — | 304 | 3,274 | 989,029 | 57.103 | 0.646 | 0.908 |
| PE300-2 | 2 x 151 | — | 307 | 1,886 | 569,617 | 32.888 | 0.883 | 1.126 |
| PE300-3 | 2 x 151 | — | 312 | 1,066 | 322,181 | 18.602 | 0.768 | 1.081 |
| MP1500 | 2 x 101 | 1.5 | — | 1,255 | 253,529 | 14.638 | 0.214 | 0.32 |
| MP3000 | 2 x 101 | 3 | — | 1,277 | 257,985 | 14.895 | 0.214 | 0.32 |
| MP8000 | 2 x 101 | 8 | — | 1,255 | 253,590 | 14.641 | 0.214 | 0.32 |
| Total PE | 6,226 | 1,880,827 | 108.593 | |||||
| Total MP | 3,787 | 765,104 | 44.175 |
Figure 1Distribution of 17-mers in the whole-genome sequence of Abies alba using raw paired-end (PE) 2 × 151 bp reads generated from the PE300 library with 300 bp long fragment inserts and estimated with Jellyfish 2.2.0 (Marçais and Kingsford 2011). The high peak at very low depths is caused by sequencing errors.
Summary statistics for the Abies alba whole-genome assembly version 1.1 (ABAL 1.1) and chloroplast assembly
| Genome | Feature | |
|---|---|---|
| Number of contigs | 45,280,944 | |
| Number of scaffolds | 37,192,295 | |
| Mean GC% | 39.34 | |
| Total length (Mb) | 18,167 | |
| Minimum scaffold length (bp) | 106 | |
| Maximum scaffold length (bp) | 297,427 | |
| Mean scaffold length (bp) | 488.50 | |
| Median scaffold length (bp) | 115 | |
| Contig N50 (bp) | 2,477 | |
| Scaffold N50 (bp) | 14,051 | |
| Total length (bp) | 120,908 | |
| Number of contigs | 11 | |
| Number of scaffolds | 1 | |
| Contig N50 (bp) | 15,758 |
Figure 2Spectra copy number in the Abies alba genome ABAL 1.1. Comparison between the k-mer (k = 27) spectra of paired-end (PE) 300 2 × 151 bp reads generated from the PE300 library with 300 bp long fragment inserts and the ABAL 1.1 assembly. This stacked histogram was produced with KAT (Mapleson et al. 2016) that shows the spectra copy number classes along the assembly.
Comparison of genome summary metrics from Abies alba and other sequenced conifer genomes (version numbers in parentheses)
| Genome summary metric | |||||||
|---|---|---|---|---|---|---|---|
| Total length (Mb) | 18,167 | 15,700 | 20,613 | 31,000 | 32,795 | 19,600 | 12,340 |
| N50 scaffold (Kb) | 14.05 | 372.39 | 2,108.3 | 2,509.9 | 110.56 34.40 | 5.21 | 6.44 |
| N of genes | 94,205 | 54,830 | 47,602 | 71,117 | 102,915 | 70,968 | 49,521 |
| N of full-length genes | 50,757 | 20,616 | NA | 13,936 | 16,386 | 28,354 | 32,482 |
| N of exons | 181,168 | 181,475 | 166,465 | 153,111 | 232,182 | 178,049 | 151,838 |
| N of introns | 64,728 | 145,595 | 108,809 | 121,858 | 124,951 | 107,313 | 101,675 |
| Mean gene length (bp) | 1,190 | 10,510 | 9,066 | 40,820 | 1,330 | 2,427 | 982 |
| Mean exon length (bp) | 352 | 231 | 320 | 241 | 320 | 312 | 324 |
| Mean intron length (bp) | 311 | 2,301 | 3,004 | 10,164 | 511 | 1,017 | 353 |
| Maximum exon length (bp) | 6,300 | 8,037 | 4,946 | 8,003 | 9,568 | 6,068 | 10,268 |
| Maximum intron length (bp) | 36,015 | 182,831 | 408,800 | 805,500 | 44,116 | 68,269 | 10,154 |
| Exons per gene | 1.92 | 8.80 | 3.50 | 5.25 | 2.26 | 3.78 | 3.03 |
| Total exonic length | 6.4x106 | 4.2x106 | 5.3x106 | 1.8x106 | 7.4x106 | 5.6x106 | 4.9x106 |
For the gene annotation and the definition of the “full-length genes” different approaches were used across species. The scaffold N50 (scfN50) was calculated on the unshuffled assemblies and discarding scaffolds shorter than 200 bp.
Kuzmin .
high confidence set (Warren ; PG29 v3) and scaffold N50 calculated using sequences ≥ 500 bp: N50 is 71.5 Kb if considering both clones (WS77111)
low-quality and high-quality gene models from Pinus lambertiana v.1 (Stevens ), the other were calculated on Pinus lambertiana v1.5 (Crepeau ),
high confidence (Nystedt )
Genome annotation statistics for Abies alba considering two types of gene models (protein coding genes and full-length genes). All statistics are given in Table S3
| Features | Protein-coding genes | Full-length genes |
|---|---|---|
| Number of genes | 94,205 | 50,757 |
| Median gene length (bp) | 558 | 804 |
| Number of transcripts | 98,227 | 53,487 |
| Median transcript length (bp) | 445 | 597 |
| Number of exons | 187,740 | 181,168 |
| Coding GC content | 46.4% | 45.15% |
| Median exon length (bp) | 224 | 237 |
| Number of introns | 89,618 | 64,728 |
| Median intron length (bp) | 146 | 145 |
| Exons/transcript | 2.00 | 2.32 |
| Transcripts/gene | 1.04 | 1.05 |
Figure 3Violin plot of the distribution length of the genes, transcripts, exons and introns across the Abies alba (Abies_al) high-quality genes and full-length genes (indicated as “full”; A). The length was log10 transformed. Violin plot of the distribution lengths of genes (B), exons (C) and introns (D) across the Abies alba (A_alba) high-quality genes and full-length genes, Pseudotsuga menziesii (Ps_menz), Picea abies (P_abies), Picea glauca (P_glauca), Pinus taeda (P_taeda), Pinus lambertiana (P_lamb).