| Literature DB >> 32244605 |
Inger Greve Alsos1, Sebastien Lavergne2, Marie Kristine Føreid Merkel1, Marti Boleda2, Youri Lammers1, Adriana Alberti3, Charles Pouchon2, France Denoeud3, Iva Pitelkova1, Mihai Pușcaș4, Cristina Roquet2,5, Bogdan-Iuliu Hurdu6, Wilfried Thuiller2, Niklaus E Zimmermann7, Peter M Hollingsworth8, Eric Coissac2.
Abstract
Genome skimming has the potential for generating large data sets for DNA barcoding and wider biodiversity genomic studies, particularly via the assembly and annotation of full chloroplast (cpDNA) and nuclear ribosomal DNA (nrDNA) sequences. We compare the success of genome skims of 2051 herbarium specimens from Norway/Polar regions with 4604 freshly collected, silica gel dried specimens mainly from the European Alps and the Carpathians. Overall, we were able to assemble the full chloroplast genome for 67% of the samples and the full nrDNA cluster for 86%. Average insert length, cover and full cpDNA and rDNA assembly were considerably higher for silica gel dried than herbarium-preserved material. However, complete plastid genomes were still assembled for 54% of herbarium samples compared to 70% of silica dried samples. Moreover, there was comparable recovery of coding genes from both tissue sources (121 for silica gel dried and 118 for herbarium material) and only minor differences in assembly success of standard barcodes between silica dried (89% ITS2, 96% matK and rbcL) and herbarium material (87% ITS2, 98% matK and rbcL). The success rate was > 90% for all three markers in 1034 of 1036 genera in 160 families, and only Boraginaceae worked poorly, with 7 genera failing. Our study shows that large-scale genome skims are feasible and work well across most of the land plant families and genera we tested, independently of material type. It is therefore an efficient method for increasing the availability of plant biodiversity genomic data to support a multitude of downstream applications.Entities:
Keywords: ITS; alpine; chloroplast DNA; environmental DNA; matK; nuclear ribosomal DNA; phylogenomic; plant DNA barcode; polar; rbcL
Year: 2020 PMID: 32244605 PMCID: PMC7238428 DOI: 10.3390/plants9040432
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
The data analyzed for low coverage genome skims. The table lists the number of specimens sampled and analyzed overall (All) and for the two projects, PhyloAlp (including PhyloCarpates) and PhyloNorway. The three last columns list the number of specimens in the species-rich families that contain at least 20 taxa across the studied region. Number of complete genomes, average cover, and average library insert size (base pairs) is also given.
| All | PhyloAlps + Carp | Phylo Norway | All (+ 20 fam) | PhyloAlps + Carp (+ 20 fam) | PhyloNorway (+ 20 fam) | |
|---|---|---|---|---|---|---|
| Specimens | 6655 | 4604 | 2051 | 5893 | 4057 | 1836 |
| Libraries | 6817 | 4726 | 2091 | 6018 | 4147 | 1871 |
| Families | 161 | 158 | 112 | 43 | 43 | 43 |
| Genera | 1037 | 922 | 576 | 804 | 705 | 461 |
| Taxa | 5575 | 4437 | 1899 | 4957 | 3914 | 1689 |
| Complete genome cpDNA | 4439 | 3303 | 1136 | 3944 | 2922 | 1022 |
| Average sequencing depth cpDNA | 278 | 318 | 187 | 265 | 300 | 188 |
| Complete nrDNA cluster | 5748 | 4021 | 1727 | 5092 | 3543 | 1549 |
| Average sequencing depth nrDNA | 603 | 674 | 444 | 579 | 638 | 450 |
| Average library insert size | 316 | 346 | 249 | 318 | 350 | 249 |
Figure 1Collection sites for the projects PhyloAlps (blue), PhyloCarpates (yellow) and PhyloNorway (purple).
Figure 2Chloroplast and nuclear ribosomal sequencing depth (the average number of reads representing a given nucleotide) of the total dataset of 4604 freshly collected and silica gel dried material from the Alps and the Carpathians (“Silica gel”) and 2051 herbarium specimens from Norway and polar regions (“Herbarium”). (a) Effect of preservation methods on sequencing depth. (b) Sequencing depth in relation to complete assembly success for herbarium and silica gel dried material combined. Note that the y-axis is on a logarithmic scale.
Figure 3Library insert size (the length of the DNA fragment sequenced) for (a) herbarium and silica gel dried material and (b) success of complete assembly of chloroplast and nrDNA for herbarium and silica gel dried material combined.
Figure 4Sequencing success of the complete chloroplast and nrDNA clusters, the standard chloroplast barcodes rbcL and matK, the optional nuclear ribosomal barcode ITS2, and all three barcodes for freshly collected silica dried (n = 4604) and herbarium material (n = 2051).
Figure 5Sequencing success for 43 families with a minimum of 20 taxa available across the studied specimens based on freshly collected silica gel dried (Alps and Carpathians) and herbarium (Norway) material. (a) matK, (b) rbcL and (c) ITS2.
Figure 6Sequencing success for herbarium material in relation to age.