| Literature DB >> 31620145 |
Grace E Brewer1, James J Clarkson1, Olivier Maurin1, Alexandre R Zuntini1, Vanessa Barber1, Sidonie Bellot1, Nicola Biggs1, Robyn S Cowan1, Nina M J Davies1, Steven Dodsworth2, Sara L Edwards1, Wolf L Eiserhardt1,3, Niroshini Epitawalage1, Sue Frisby1, Aurélie Grall1, Paul J Kersey1, Lisa Pokorny1,4, Ilia J Leitch1, Félix Forest1, William J Baker1.
Abstract
The world's herbaria collectively house millions of diverse plant specimens, including endangered or extinct species and type specimens. Unlocking genetic data from the typically highly degraded DNA obtained from herbarium specimens was difficult until the arrival of high-throughput sequencing approaches, which can be applied to low quantities of severely fragmented DNA. Target enrichment involves using short molecular probes that hybridise and capture genomic regions of interest for high-throughput sequencing. In this study on herbariomics, we used this targeted sequencing approach and the Angiosperms353 universal probe set to recover up to 351 nuclear genes from 435 herbarium specimens that are up to 204 years old and span the breadth of angiosperm diversity. We show that on average 207 genes were successfully retrieved from herbarium specimens, although the mean number of genes retrieved and target enrichment efficiency is significantly higher for silica gel-dried specimens. Forty-seven target nuclear genes were recovered from a herbarium specimen of the critically endangered St Helena boxwood, Mellissia begoniifolia, collected in 1815. Herbarium specimens yield significantly less high-molecular-weight DNA than silica gel-dried specimens, and genomic DNA quality declines with sample age, which is negatively correlated with target enrichment efficiency. Climate, taxon-specific traits, and collection strategies additionally impact target sequence recovery. We also detected taxonomic bias in targeted sequencing outcomes for the 10 most numerous angiosperm families that were investigated in depth. We recommend that (1) for species distributed in wet tropical climates, silica gel-dried specimens should be used preferentially; (2) for species distributed in seasonally dry tropical climates, herbarium and silica gel-dried specimens yield similar results, and either collection can be used; (3) taxon-specific traits should be explored and established for effective optimisation of taxon-specific studies using herbarium specimens; (4) all herbarium sheets should, in future, be annotated with details of the preservation method used; (5) long-term storage of herbarium specimens should be in stable, low-humidity, and low-temperature environments; and (6) targeted sequencing with universal probes, such as Angiosperms353, should be investigated closely as a new approach for DNA barcoding that will ensure better exploitation of herbarium specimens than traditional Sanger sequencing approaches.Entities:
Keywords: DNA barcoding; angiosperms; degraded DNA; genomics; herbariomics; herbarium specimens; high-throughput sequencing; target enrichment
Year: 2019 PMID: 31620145 PMCID: PMC6759688 DOI: 10.3389/fpls.2019.01102
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Source material variables impact genomic DNA quality and yield, which feeds into library preparation and quality, pooling, target enrichment efficiency, sequencing, bioinformatics analysis, and targeted sequencing outcomes. In this study, we investigate the relationships shown by the black-filled arrows.
Sampling information.
| Herbarium specimens | Silica gel-dried specimens | All specimens | |
|---|---|---|---|
| Specimens | 435 | 94 | 529 |
| Orders | 37 | 17 | 40 |
| Families | 75 | 27 | 86 |
| Genera | 383 | 86 | 459 |
| Species | 426 | 91 | 515 |
| Collection date range | 1815–2017 | 1992–2017 | 1815–2017 |
Figure 2Genomic DNA quality according to sample age, material source, climate (according to species distributions), and material source and climate combined, in relative (proportion) and absolute (count) values. Quality is defined as very low (severely fragmented DNA <500 bp), low (DNA smear on agarose gel), or high (high-molecular-weight DNA >5 kbp).
Figure 3Frequency of samples per age and genomic DNA concentration (ng/μl), target enrichment efficiency (mapped/total reads), genes retrieved above 50% of target length, and mean exon and intron coverage (X) by sample age and material source. Inside each violin plot is a boxplot summarising the interquartile range and median. The diamond symbol denotes the mean while circles represent outliers. The horizontal width of the plot shows the density of the data along the y-axis.
Correlations between sample age and genomic DNA concentration (ng/µl), enrichment efficiency (mapped/total reads), genes retrieved above 50% of target length, and mean exon and intron coverage (X), grouped by material source.
| All samples | Herbarium (old) | Herbarium (recent) | Herbarium (combined) | Silica gel dried | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| corr |
| corr |
| corr |
| corr |
| corr |
| |
| Genomic DNA concentration | −0.0514 | 0.2505 | −0.0506 | 0.5387 | 0.0619 | 0.3200 | −0.0859 | 0.0823 | 0.0540 | 0.6095 |
| Enrichment efficiency |
| <0.0001 | −0.0588 | 0.4614 |
| 0.0014 |
| 0.0001 | −0.1619 | 0.1190 |
| Genes retrieved at 50% |
| <0.0001 | −0.0757 | 0.3387 |
| 0.0051 |
| 0.0001 | −0.1813 | 0.0803 |
| Exon coverage |
| 0.0009 | −0.0792 | 0.3227 | −0.1168 | 0.0549 |
| 0.0008 | 0.0757 | 0.4684 |
| Intron coverage |
| 0.0005 | −0.0715 | 0.3797 | −0.1455 | 0.0172 |
| 0.0006 | 0.0601 | 0.5673 |
Significant correlation values (p < 0.05) are marked in bold.
Figure 4Number of specimens per climate and distribution of genomic DNA concentration (ng/µl), enrichment efficiency (mapped/total reads), genes retrieved above 50% of target length, and mean exon and intron coverage (X) in each climate, grouped by material source. Each boxplot summarises the interquartile range and median.
Figure 5Number of specimens per family and distribution of genomic DNA concentration (ng/µl), enrichment efficiency (mapped/total reads), genes retrieved above 50% of target length, and mean exon and intron coverage (X) in each family, grouped by material source. Each boxplot summarises the interquartile range and median.