Literature DB >> 29928291

Genome skimming herbarium specimens for DNA barcoding and phylogenomics.

Chun-Xia Zeng1, Peter M Hollingsworth2, Jing Yang1, Zheng-Shan He1, Zhi-Rong Zhang1, De-Zhu Li1, Jun-Bo Yang1.   

Abstract

BACKGROUND: The world's herbaria contain millions of specimens, collected and named by thousands of researchers, over hundreds of years. However, this treasure has remained largely inaccessible to genetic studies, because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates.
RESULTS: As a practical test of routine recovery of rDNA and plastid genome sequences from herbarium specimens, we sequenced 25 herbarium specimens up to 80 years old from 16 different Angiosperm families. Paired-end reads were generated, yielding successful plastid genome assemblies for 23 species and nuclear rDNAs for 24 species, respectively. These data showed that genome skimming can be used to generate genomic information from herbarium specimens as old as 80 years and using as little as 500 pg of degraded starting DNA.
CONCLUSIONS: The routine plastome sequencing from herbarium specimens is feasible and cost-effective (compare with Sanger sequencing or plastome-enrichment approaches), and can be performed with limited sample destruction.

Entities:  

Keywords:  DNA barcoding; Degraded DNA; Genome skimming; Herbarium specimens; Plastid genome; rDNA

Year:  2018        PMID: 29928291      PMCID: PMC5987614          DOI: 10.1186/s13007-018-0300-0

Source DB:  PubMed          Journal:  Plant Methods        ISSN: 1746-4811            Impact factor:   4.993


Background

Herbaria are collections of preserved plant specimens stored for scientific study. There are approximately 3400 herbaria in the world, containing around 350 million specimens, collected over the past 400 years (http://sciweb.nybg.org/science2/indexHerbariorum.asp). These collections cover most of the world’s plant species, including many rare and endangered local endemics, and species collected from places that are currently expensive or difficult to access [1]. The recovery of DNA from this vast resource of already collected expertly-verified herbarium specimens represent a highly efficient way of building a DNA-based identification resource of the world’s plant species (DNA barcoding) and increasing knowledge of phylogenetic relationships. The ‘unlocking’ of preserved natural history specimens for DNA barcoding/species discrimination is of particular relevance. In the first decade of DNA barcoding, it became clear that obtaining material from expertly verified is a key rate-limiting step in the construction of a global DNA reference library [2]. The millions of samples that are required for this endeavor, each needing corresponding voucher specimens and meta-data, create a strong impetus for making best-use of previously collected material. DNA degradation in herbarium samples followed by subsequent diffusion from the sample creates challenges for DNA recovery [3]. In addition, different preservation methods can negatively affect the ability of extract, amplify and sequence DNA [4-6]. PCR amplification of historical DNA is, therefore, generally restricted to short amplicons (< 200 bp) and is further vulnerable to contamination by recent DNA and PCR products from the study species. The cumulative damage to the DNA can also cause incorrect bases to be inserted during enzymatic amplification. The main sources for these alterations are single nucleotide misincorporations [7, 8]. Above all, PCR-based Sanger sequencing by using herbarium samples to generate standard DNA barcodes can be challenging. A recent large-scale study by Kuzmina et al. 2017 [9] examined 20,816 specimens representing 5076 of 5190 vascular plant species in Canada. Kuzmina et al. found that specimen age and method of preservation had significant effects on sequence recovery for all barcode markers. However, massively-parallel short-read Next-generation sequencing (NGS) protocols have the potential to greatly increase the success of herbarium sequencing projects, as many new sequencing approaches do not rely on large, intact DNA templates and instead are well-suited for sequencing low concentrations of short (100-400 bp) fragmented molecules [3, 10]. Straub et al. [11], described how “genome skimming”, involving a shallow-pass genome sequence using NGS, could recover highly repetitive genome regions such as rDNA or organelle genomes, and yield highly useful sequence data at relatively low sequence depth, and these regions include the usual suite of DNA barcoding markers [12, 13]. The genome skimming approach using NGS has been used to recover plastid DNA and rDNA sequences from 146 herbarium specimens [14], to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana herbarium specimen [15], the complete plastome, the mitogenome, nuclear ribosomal DNA clusters, and partial sequences of low-copy genes from an herbarium specimen of an extinct species of Hesperelaea [16, 17], and the complete plastome, nuclear ribosomal DNA clusters, and partial sequences of low-copy genes from three grass herbarium specimens [18]. However, sequencing small, historical specimens may be especially challenging if a specimens is unique, or nearly so, with no alternative specimens available for study should the first specimen fail. Methods used to extract and prepare DNA for sequencing must both be more or less guaranteed to work, and, in many cases, allow for preservation of DNA for future study [19]. In recent studies that report successfully sequencing of historical specimens from 1 ng to 1 μg of input DNA (for example, up to 1 μg in Bakker et al. [14]; ∽ 600 ng in Staats et al. [15]; 33 ng in Zadane et al. [17]; 8.25–537 ng in Kanda et al. [20]; 5.8–200 ng in Blaimer et al. [21]; less than 10 ng in Besnard et al. [18]; 1–10 ng in Sproul and Maddison [19]). But a number of studies also report abandoning a subset of specimens for which too little input DNA was available (i.e. below 10 ng in Kanda et al. [20]; below 5 ng in Blaimer et al. [21]). To better understand ideal approaches of sample preparation for specimens with minimal DNA, we intentionally limited DNA input to 500 pg per specimen. In this paper we provide a further practical test of the genome skimming methodology applied to herbarium specimens. As part of the China Barcode of Life project, and our wider phylogenomic studies, our aim was to assess whether the success reported in these early genome skimming studies could be repeated in other laboratories. We evaluated the success and failure rates of rDNA and plastid genome sequencing from genome skims of 25 different species from herbarium specimens, and explored the impacts of parameters such as amount of input DNA and PCR cycle numbers.

Methods

Specimen sampling

25 herbarium specimens were selected from 16 Angiosperm families covering 22 genera, with specimen ages up to 80 years old. All 25 species were taken from the specimens housed in the Herbarium of the Institute of Botany, Chinese Academy of Sciences (KUN). The samples were selected to represent the major clades of APG III system (Table 1).
Table 1

List of the specimen materials, DNA yields used in our study

Sample IDSpeciesFamilyCollectionAgeng/ulVolume (ul)DNA yield (ng)
01 Manglietia fordiana Magnoliaceae19780402390.8943632.184
02 Manglietia fordiana Magnoliaceae19541027632.353786.95
03 Schisandra henryi Schisandraceae19821108351.873361.71
04 Schisandra henryi Schisandraceae19840528330.9093329.997
05 Phoebe neurantha Lauraceae1938790.5073618.252
06 Cinnamomum bodinieri Lauraceae1960572.263681.36
08 Holboellia latifolia Lardizabalaceae1982351.293443.86
09 Chloranthus erectus Chloranthaceae1973444.1836150.48
10 Sarcandra glabra Chloranthaceae1988294.3531.5137.025
11 Meconopsis racemosa Papaveraceae1976414.352295.7
12 Macleaya microcarpa Papaveraceae1986311.9735.569.935
13 Hodgsonia macrocarpa Cucurbitaceae1982352.183474.12
14 Malus yunnanensis Rosaceae1939780.8343529.19
15 Elaeagnus loureirii Elaeagnaceae1993249.7534331.5
16Rhododendron rex subsp. fictolacteumEricaceae1979388.1520.5167.075
17 Swertia bimaculata Gentianaceae19840823331.673558.45
18 Primula sinopurpurea Primulaceae19400907770.9743231.168
19 Paederia scandens Araceae19550331620.3443411.696
20 Colocasia esculenta Araceae19741001431.463652.56
21 Pholidota chinensis Orchidaceae1959580.107343.638
22 Otochilus porrectus Orchidaceae1990270.3443512.04
23 Indosasa sinica Poaceae2007101.653557.75
24 Camellia gymnogyna Theaceae19340617830.4173615.012
25Camellia sinensis var. assamicaTheaceae2002154.032392.69
26 Panicum incomtum Poaceae20001017171.633658.68

All vouchers are deposited in the herbarium of the Kunming Institute of Botany (KUN)

List of the specimen materials, DNA yields used in our study All vouchers are deposited in the herbarium of the Kunming Institute of Botany (KUN)

DNA extraction

Approximately 1 cm2 sections of leaf or 20 mg of leaf tissue were used for each DNA extraction. Genomic DNA was extracted using Tiangen DNAsecure Plant Kit (DP320). Yield and integrity (size distribution) of genomic DNA extracts were quantified by fluorometric quantification on the Qubit (Invitrogen, Carlsbad, California, USA) using the dsDNA HS kit, as well as by visual assessment on a 1% agarose gel.

Library preparation

All samples were subsequently built into blunt-end DNA libraries in the laboratories using the NEBNext Ultra II DNA library Prep kit for Illumina (New England BIolabs) which has been optimized for as little as 5 ng starting DNA and Illumina-specific adapters [22]. The library protocol was performed as per the manufacturer’s instructions with four modifications: (i) 500 pg of input DNA was selected to accommodate low starting DNA quantities, (ii) DNA was not fragmented by sonication because the DNA was highly degraded; (iii) The NEBNext library was generated without any size selection; (iv) DNA libraries were then amplified in an indexing PCR, which barcoded each library and discriminated each sample. Five PCR cycles was suggested by the manufacturer’s instruction for 5 ng of input DNA. As only 500 pg of starting DNA was used, we tested use of increasing numbers of PCR cycles (namely × 6, × 8, × 10, × 12, × 14 PCR cycles). Concentration and size profiles of the final indexed libraries (125 libraries, representing 25 specimens at 5 different numbers of PCR cycles) were assessed on a Bioanalyzer 2100 using a high sensitivity DNA chip.

Library pooling

The final indexed libraries were then pooled (33 or 34 samples per lane) in equimolar ratios and sequenced on three lanes on an Illumina XTen sequencing system (Illumina Inc.) using paired and chemistry at the Cloud health Medical Group Ltd.

Analyses

Successfully sequenced samples were assembled into chloroplast genomes and nuclear rDNAs. Here the rDNAs comprise the complete sequence of 26S, 18S, and 5.8S and internal transcribed spacers (ITS1 and ITS2). We did not assemble the internal gene spacer (IGS) because of the complexity of this region which is rich in duplications and inversions. The raw sequence reads were filtered for primer/adaptor sequences and low-quality reads with the NGS QC Toolkit [23]. The cut-off value for percentage of read length was 80, and that for PHRED quality score was 30. Then the filtered high-quality pair-end reads were assembled into contigs with Spades 3.0 [24]. Next, we identified highly similar genome sequences using the Basic Local Alignment Search Tool (BLAST: http://blast.ncbi.nlm.gov/). The procedures and parameters for setting the sequence quality control, de novo assembly, and blast search were followed as in Yang et al. [25]. Next, we determined the proper orders of the aligned contigs using the highly similar genome sequences identified in the BLAST search as references. At this point, the target contigs were assembled into complete plastid genomes and nuclear rDNAs. Annotation of the plastomes was performed using the plastid genome annotation package DOGMA [26] (http://dogma.ccbb.utexas.edu/). Start and stop codons of protein-coding genes, as well as intron/exon positions, were manually adjusted. The online tRNAscan-SE service [27] was used to further determine tRNA genes. The final complete plastomes and rDNAs were deposited into GenBank (Accession numbers: MH394344-MH394431; MH270450-MH270494). Fungi or other plants may be co-isolated during the DNA extraction process resulting in DNA contamination [1]. This is particularly important where starting DNA concentrations are extremely low. We thus sub-sampled our data to check for contamination. To check for contamination in the plastid DNA sequences, for each species we extracted its rbcL sequence and blasted it against GenBank to check that it grouped with related species. BLAST1 (implemented in the BLAST program, version 2.2.17) was used to search the reference database for each query sequence with an E value < 1 × 10−5. Likewise, to check for plant and fungal contamination in the rDNA sequences, we took the final assembled ITS sequences (or partial ITS sequences where complete ITS was not recovered) and blasted the sequences against the NCBI database to check that it grouped with related species.

Results

All 25 species yielded amounts of DNA suitable for library preparation and further processing. Total yields varied between 3 ng and 400 ng from on average 20 mg of dried leaf tissue, usually the equivalent of 1 cm2 of leaf tissue (Table 1). We found a negative correlation between specimen age and DNA yield (Fig. 1).
Fig. 1

DNA yield against specimen age

DNA yield against specimen age We successfully enriched and sequenced DNA libraries constructed from herbarium material. Despite only 500 pg of input DNA, good quality libraries were produced from 100 of 125 samples (25 species, with × 8, × 10, × 12, × 14 PCR cycles). The concentration of the final indexed libraries based on six PCR cycles per species was too low to be further sequenced. Between 15,877,478 and 44,724,436 high-quality paired-end reads were produced, with the total number of bases ranging from 2,381,621,700 bp (2.38 giga base pairs, Gbp) to 6,708,665,400 bp (6.71 Gbp) (Table 2). These were then assembled into contigs, and using a blast search into plastid genomes and rDNA arrays.
Table 2

Assembly statistics of plastid genome for all specimens used in this study

Sample IDPCR cyclesSpeciesFamilyTotal sequencesRaw data (gb)#contigsTotal assembly length (bp)CompletedGenBank accession number
01D×8 Manglietia fordiana Magnoliaceae224046323.3691589931059 bp gapMH394393
01E×10 Manglietia fordiana Magnoliaceae258696543.8832159759349 bp gapMH394394
01A×12 Manglietia fordiana Magnoliaceae352019725.28141582411840 bp gapMH394391
01B×14 Manglietia fordiana Magnoliaceae300072344.5141582211840 bp gapMH394392
02D×8 Manglietia fordiana Magnoliaceae228290383.4281614971040 bp gapMH394397
02E×10 Manglietia fordiana Magnoliaceae324970684.8721160113YMH394398
02A×12 Manglietia fordiana Magnoliaceae296371824.45121583151802 bp gapMH394395
02B×14 Manglietia fordiana Magnoliaceae310897304.6622160113YMH394396
03D×8 Schisandra henryi Schisandraceae296919844.45514596394 bp gapMH394365
03E×10 Schisandra henryi Schisandraceae251411603.77414561654 bp gapMH394366
03A×12 Schisandra henryi Schisandraceae325113444.881114603118 bp gapMH394363
03B×14 Schisandra henryi Schisandraceae298566364.48914599363 bp gapMH394364
04D×8 Schisandra henryi Schisandraceae240398223.61414621253 bp gapMH394369
04E×10 Schisandra henryi Schisandraceae238709023.58414624353 bp gapMH394370
04A×12 Schisandra henryi Schisandraceae331901584.981514621863 bp gapMH394367
04B×14 Schisandra henryi Schisandraceae304980444.57614589345 bp gapMH394368
05D×8 Phoebe neurantha Lauraceae290408504.3611152782YMH394354
05E×10 Phoebe neurantha Lauraceae278312544.1715152782YMH394355
05A×12 Phoebe neurantha Lauraceae447244366.71171527811 bp gapMH394352
05B×14 Phoebe neurantha Lauraceae352646345.29131527811 bp gapMH394353
06D×8 Cinnamomum bodinieri Lauraceae301888204.539152778YMH394417
06E×10 Cinnamomum bodinieri Lauraceae320653284.8113152719YMH394418
06A×12 Cinnamomum bodinieri Lauraceae244882923.677152719YMH394415
06B×14 Cinnamomum bodinieri Lauraceae350356025.2611152719YMH394416
08D×8 Holboellia latifolia Lardizabalaceae262299463.935157817YMH394377
08E×10 Holboellia latifolia Lardizabalaceae282730224.249157818YMH394378
08A×12 Holboellia latifolia Lardizabalaceae338731365.0813157614204 bp gapMH394375
08B×14 Holboellia latifolia Lardizabalaceae340213605.110157818YMH394376
09D×8 Chloranthus erectus Chloranthaceae218435123.28415781243 bp gapMH394413
09E×10 Chloranthus erectus Chloranthaceae180443642.71515781247 bp gapMH394414
09A×12 Chloranthus erectus Chloranthaceae300221624.513157852YMH394411
09B×14 Chloranthus erectus Chloranthaceae286566864.311157852YMH394412
10D×8 Sarcandra glabra Chloranthaceae188935082.835158733119 bp gapMH394361
10E×10 Sarcandra glabra Chloranthaceae206627703.1715900722 bp gapMH394362
10A×12 Sarcandra glabra Chloranthaceae275101664.139158900YMH394360
10B×14 Sarcandra glabra Chloranthaceae295452064.439158900YMH394431
11D×8 Meconopsis racemosa Papaveraceae243518843.655153762YMH394401
11E×10 Meconopsis racemosa Papaveraceae291605824.375153762YMH394402
11A×12 Meconopsis racemosa Papaveraceae337633405.066153763YMH394399
11B×14 Meconopsis racemosa Papaveraceae359903585.441537281 bp gapMH394400
12D×8 Macleaya microcarpa Papaveraceae262655483.941116106448 bp gapMH394385
12E×10 Macleaya microcarpa Papaveraceae251003723.771116106448 bp gapMH394386 
12A×12 Macleaya microcarpa Papaveraceae294919524.4213161118YMH394383
12B×14 Macleaya microcarpa Papaveraceae284623384.27121611102 bp gapMH394384
13D×8 Hodgsonia macrocarpa Cucurbitaceae268868704.03261550271300 bp gapMH394428
13E×10 Hodgsonia macrocarpa Cucurbitaceae341794185.13161548551298 bp gapMH394429
13A×12 Hodgsonia macrocarpa Cucurbitaceae371821445.581815601520 bp gapMH394426
13B×14 Hodgsonia macrocarpa Cucurbitaceae367822685.5217156146YMH394427
14D×8 Malus yunnanensis Rosaceae221077183.3216158955820 bp gapMH394389
14E×10 Malus yunnanensis Rosaceae257201603.865160071YMH394390
14A×12 Malus yunnanensis Rosaceae375010365.635160067YMH394387
14B×14 Malus yunnanensis Rosaceae337760585.075160068YMH394388
15D×8 Elaeagnus loureirii Elaeagnaceae151958222.2851521968 bp gapMH394424
15E×10Elaeagnus loureiriiElaeagnaceae168626802.5351521968 bp gapMH394425
15A×12 Elaeagnus loureirii Elaeagnaceae215110503.2341521995 bp gapMH394422
15B×14 Elaeagnus loureirii Elaeagnaceae205568603.0861521995 bp gapMH394423
16D×8Rhododendron rex subsp. fictolacteumEricaceae236230703.54
16E×10Rhododendron rex subsp. fictolacteumEricaceae280925964.21
16A×12Rhododendron rex subsp. fictolacteumEricaceae313525604.7
16B×14Rhododendron rex subsp. fictolacteumEricaceae305257304.58
17D×8 Swertia bimaculata Gentianaceae183031362.7753152808266 bp gapMH394373
17E×10 Swertia bimaculata Gentianaceae165595542.4841153443406 bp gapMH394374
17A×12 Swertia bimaculata Gentianaceae158774782.38301439779947 bp gapMH394371
17B×14 Swertia bimaculata Gentianaceae184483022.7748153602341 bp gapMH394372
18D×8 Primula sinopurpurea Primulaceae228905983.43515194550 bp gapMH394358
18E×10 Primula sinopurpurea Primulaceae266186843.99515194550 bp gapMH394359
18A×12 Primula sinopurpurea Primulaceae241074723.62315194550 bp gapMH394356
18B×14 Primula sinopurpurea Primulaceae258340663.88315194550 bp gapMH394357
19D×8 Paederia scandens Araceae253073563.815162267247 bp gapMH394346
19E×10 Paederia scandens Araceae246580683.77162268247 bp gapMH394347
19A×12 Paederia scandens Araceae238501803.588162282253 bp gapMH394344
19B×14 Paederia scandens Araceae240647643.6110162139253 bp gapMH394345
20D×8 Colocasia esculenta Araceae292842704.394162350155 bp gapMH394430
20E×10 Colocasia esculenta Araceae250459783.775162350155 bp gapMH394421
20A×12 Colocasia esculenta Araceae235603223.536162414155 bp gapMH394419
20B×14 Colocasia esculenta Araceae245336563.684162414155 bp gapMH394420
21D×8 Pholidota chinensis Orchidaceae216889903.25
21E×10 Pholidota chinensis Orchidaceae208809503.13
21A×12 Pholidota chinensis Orchidaceae235480183.53
21B×14 Pholidota chinensis Orchidaceae271482844.07
22D×8 Otochilus porrectus Orchidaceae155505122.33
22E×10 Otochilus porrectus Orchidaceae226387723.4
22A×12 Otochilus porrectus Orchidaceae215721963.23
22B×14 Otochilus porrectus Orchidaceae289608584.34
23D×8 Indosasa sinica Gramineae187930202.82613984818 bp gapMH394381
23E×10 Indosasa sinica Gramineae179034322.6910139740YMH394382
23A×12 Indosasa sinica Gramineae191064042.879139740YMH394379
23B×14 Indosasa sinica Gramineae196686822.958139740YMH394380
24D×8 Camellia gymnogyna Theaceae171766322.584156402YMH394405
24E×10 Camellia gymnogyna Theaceae245321963.687156590YMH394406
24A×12 Camellia gymnogyna Theaceae264782243.974156590YMH394403
24B×14 Camellia gymnogyna Theaceae297687704.474156590YMH394404
25D×8Camellia sinensis var. assamicaTheaceae232915723.494157028YMH394409
25E×10Camellia sinensis var. assamicaTheaceae186988142.85157028YMH394410
25A×12Camellia sinensis var. assamicaTheaceae217887763.274157029YMH394407
25B×14Camellia sinensis var. assamicaTheaceae261553423.928157028YMH394408
26D×8 Panicum incomtum Gramineae168651022.5361139986YMH394350
26E×10 Panicum incomtum Gramineae204659423.0721139999YMH394351
26A×12 Panicum incomtum Gramineae20004364318139999YMH394348
26B×14 Panicum incomtum Gramineae206726423.117139999YMH394349
Assembly statistics of plastid genome for all specimens used in this study After de novo assembly, two species (Otochilus porrectus and Pholidota chinensis) generated poor plastid assemblies, with the longest contigs being 6705 bp with 2 × coverage and 1325 bp with 3 × coverage respectively. The other 23 species yielded useful plastid assemblies drawn from 3 to 61 contigs assembled into plastid genomes with depths ranged from 459 × to 2176 ×. Of these 23 species, 14 were assembled into complete plastid genomes. Eight species were assembled into nearly complete plastid genomes, but with gaps ranged from 5 to 349 bp (Table 2). However, although Rhododendron rex subsp. fictolacteum yielded useful plastid assemblies, many gaps were detected among contigs when the species Vaccinium macrocarpon was used as reference data. For the nuclear rDNAs, 21 species gave ribosomal DNA sequences assemblies > 4.3 kb drawn from 1 to 2 contigs with sequencing depths ranging from 3 × to 567 × (no nrDNA sequences could be assembled for Phodidota chinensis, Paederia scandens, Otochilus porrectus, and Camellia gymnogyna) (Table 3). Of these 21 species, 18 resulted in assembled nrDNAs consisting of partial sequences of 18S and 26S, along with the complete sequence of 5.8S and the internal transcribed spacers ITS1 and ITS2. However, 3 species (2 samples of Manglietia fordiana (Sample ID 01 and 02), Phoebe neurantha (Sample ID 05), were difficult to assemble, resulting in only partial recovery of 5.8S and the internal transcribed spacers ITS1 and ITS2.
Table 3

Assembly statistics of rDNAs for all specimens used in this study

Sample IDPCR CyclesSpeciesFamily#contigsTotal assembly length (bp)(mean) Coverage (×)Reference genomeGenBank accession number
01A×12 Manglietia fordiana Magnoliaceae210343406KJ414477_Chrysobalanus icacoMH270473
02A×12 Manglietia fordiana Magnoliaceae2863767MH270474
03A×12 Schisandra henryi Schisandraceae11548747MH270475
04A×12 Schisandra henryi Schisandraceae11074778MH270476
05A×12 Phoebe neurantha Lauraceae2751619MH270477
06A×12 Cinnamomum bodinieri Lauraceae11092632MH270478
08A×12 Holboellia latifolia Lardizabalaceae19298160MH270479
09A×12 Chloranthus erectus Chloranthaceae1909454MH270480 
10A×12 Sarcandra glabra Chloranthaceae1906251MH270481
11A×12 Meconopsis racemosa Papaveraceae1757760MH270482
12A×12 Macleaya microcarpa Papaveraceae112587458MH270483
13A×12 Hodgsonia macrocarpa Cucurbitaceae110172567MH270484
14A×12 Malus yunnanensis Rosaceae15953249MH270485
15A×12 Elaeagnus loureirii Elaeagnaceae17901428MH270486
16A×12Rhododendron rex subsp. fictolacteumEricaceae16825380MH270487
17A×12 Swertia bimaculata Gentianaceae1964448MH270488
18A×12 Primula sinopurpurea Primulaceae1553915MH270489
19A×12 Paederia scandens Araceae
20A×12 Colocasia esculenta Araceae143995MH270490
21A×12 Pholidota chinensis Orchidaceae
22A×12 Otochilus porrectus Orchidaceae
23A×12 Indosasa sinica Gramineae11730693MH270491
24A×12 Camellia gymnogyna Theaceae
25A×12Camellia sinensis var. assamicaTheaceae11121246MH270493
26A×12 Panicum incomtum Gramineae1844674MH270494
Assembly statistics of rDNAs for all specimens used in this study To check the quality of the plastid sequences, all gene regions were translated. No stop codons that would be indicative of sequencing errors were detected within the assembled contigs. We then extracted about 1400 bp of rbcL sequence from 23 of the samples to check for contamination (for Rhododendron rex subsp. fictolacteum (Sample ID 16), the plastid genome was not assembled successfully but we could nevertheless extract the rbcL sequence from the plastid contigs). These rbcL sequences were subjected to a blast search against the NCBI database. The rbcL sequences contained no insertions or deletions and matched the correct genus or family in each case (Table 4). Likewise, we blasted the final assembled rDNA ITS sequences (or partial ITS sequences) from 24 samples against the NCBI database. In all cases, the closest match to the sequence was from the family of the sequenced sample. No matches with fungi were detected (Table 5).
Table 4

BLAST results with extracted rbcL sequence against GenBank

Query InformationBLAST results
Query_Sample IDQuery_Species (Family)PCR cyclesGene nameLength (bp)Reference_Species_Accession number (Family)Query coverage (%)Identities (%)Identify level
01AManglietia fordiana (Magnoliaceae)12rbcL1428Magnolia cathcartii_JX280392.1 (Magnoliaceae)10099Family
Magnolia biondii_KY085894.1 (Magnoliaceae)10099
Michelia odora_JX280398.1 (Magnoliaceae)10099
Manglietia fordiana_L12658.1 (Magnoliaceae)98100
02AManglietia fordiana (Magnoliaceae)12rbcL1428Magnolia cathcartii_JX280392.1 (Magnoliaceae)10099Family
Magnolia biondii_KY085894.1 (Magnoliaceae)10099
Michelia odora_JX280398.1 (Magnoliaceae)10099
Manglietia fordiana_L12658.1 (Magnoliaceae)98100
03ASchisandra henryi (Schisandraceae)12rbcL1428Schisandra chinensis_KY111264.1 (Schisandraceae)10099Genus
Schisandra chinensis_KU362793.1 (Schisandraceae)10099
Schisandra sphenanthera_L12665.2 (Schisandraceae)9899
04ASchisandra henryi (Schisandraceae)12rbcL1428Schisandra chinensis_KY111264.1 (Schisandraceae)10099Genus
Schisandra chinensis_KU362793.1 (Schisandraceae)10099
Schisandra sphenanthera_L12665.2 (Schisandraceae)9899
05APhoebe neurantha (Lauraceae)12rbcL1428Phoebe omeiensis_KX437772.1 (Lauraceae)10099Family
Persea Americana_KX437771.1 (Lauraceae)10099
Persea sp. _JF966606.1 (Lauraceae)10099
06ACinnamomum bodinieri (Lauraceae)12rbcL1428Phoebe bournei_KY346512.1 (Lauraceae)10099Family
Phoebe chekiangensis_KY346511.1 (Lauraceae)10099
Phoebe sheareri_KX437773.1 (Lauraceae)10099
Cinnamomum verum_KY635878.1 (Lauraceae)10099
08AHolboellia latifolia (Lardizabalaceae)12rbcL1428Akebia quinata_KX611091.1 (Lardizabalaceae)10099Family
Stauntonia hexaphylla_L37922.2 (Lardizabalaceae)9999
Akebia trifoliate_KU204898.1 (Lardizabalaceae)10099
Holboellia latifolia_L37918.2 (Lardizabalaceae)9999
09AChloranthus erectus (Chloranthaceae)12rbcL1428Chloranthus spicatus_EF380352.1 (Chloranthaceae)100100Genus
Chloranthus japonicas_KP256024.1 (Chloranthaceae)10099
Chloranthus spicatus_AY236835.1 (Chloranthaceae)9899
Chloranthus erectus_AY236834.1 (Chloranthaceae)9899
10ASarcandra glabra (Chloranthaceae)12rbcL1428Chloranthus spicatus_EF380352.1 (Chloranthaceae)10099Family
Chloranthus japonicas_KP256024.1 (Chloranthaceae)10098
Chloranthus nervosus_AY236841.1 (Chloranthaceae)9798
Sarcandra glabra_HQ336522.1 (Chloranthaceae)89100
11AMeconopsis racemosa (Papaveraceae)12rbcL1428Meconopsis horridula_JX087717.1 (Papaveraceae)97100Genus
Meconopsis horridula_ JX087712.1 (Papaveraceae)9799
Meconopsis delavayi_JX087688.1 (Papaveraceae)9799
12AMacleaya microcarpa (Papaveraceae)12rbcL1428Macleaya microcarpa_FJ626612.1 (Papaveraceae)9799Family
Macleaya cordata_U86629.1 (Papaveraceae)9799
Coreanomecon hylomeconoides_KT274030.1 (Papaveraceae)10098
13AHodgsonia macrocarpa (Cucurbitaceae)12rbcL1449Cucumis sativus var. hardwickii_KT852702.1 (Cucurbitaceae)10098Family
Cucumis sativus_KX231330.1 (Cucurbitaceae)10098
Cucumis sativus_KX231329.1 (Cucurbitaceae)10098
14AMalus yunnanensis (Rosaceae)12rbcL1428Cotoneaster franchetii_KY419994.1 (Rosaceae)10099Family
Vauquelinia californica_KY419925.1 (Rosaceae)10099
Cotoneaster horizontalis_KY419917.1 (Rosaceae)10099
Malus doumeri_KX499861.1 (Rosaceae)10099
15AElaeagnus loureirii (Elaeagnaceae)12rbcL1428Elaeagnus macrophylla_KP211788.1 (Elaeagnaceae)10099Order
Elaeagnus sp._KY420020.1 (Elaeagnaceae)10099
Toricellia angulate_KX648359.1 (Cornaceae)9999
16ARhododendron rex subsp. Fictolacteum (Ericaceae)12rbcL1428Rhododendron simsii_GQ997829.1 (Ericaceae)10099Family
Rhododendron ponticum_KM360957.1 (Ericaceae)9899
Epacris sp._ L01915.2 (Ericaceae)9799
17ASwertia bimaculata (Gentianaceae)12rbcL1443Swertia mussotii_KU641021.1 (Gentianaceae)9899Family
Gentianopsis ciliate_KM360802.1 (Gentianaceae)9798
Gentianella rapunculoides_Y11862.1 (Gentianaceae)9799
18APrimula sinopurpurea (Primulaceae)12rbcL1428Primula poissonii_KX668176.1 (Primulaceae)10099Genus
Primula chrysochlora_KX668178.1 (Primulaceae)10099
Primula poissonii_KF753634.1 (Primulaceae)10099
19APaederia scandens (Araceae)12rbcL1443Pothos scandens_AM905732.1 (Araceae)9699Family
Pedicellarum paiei_AM905733.1 (Araceae)9699
Pothoidium lobbianum_AM905734.1 (Araceae)9699
20AColocasia esculenta (Araceae)12rbcL1443Colocasia esculenta_JN105690.1 (Araceae)100100Species
Colocasia esculenta_JN105689.1 (Araceae)10099
Pinellia pedatisecta_KT025709.1 (Araceae)10099
21APholidota chinensis (Orchidaceae)12rbcL
22AOtochilus porrectus (Orchidaceae)12rbcL
23AIndosasa sinica (Poaceae)12rbcL1434Pleioblastus maculatus_JX513424.1 (Poaceae)100100Family
Oligostachyum shiuyingianum_JX513423.1 (Poaceae)100100
Indosasa sinica_JX513422.1 (Poaceae)100100
24ACamellia gymnogyna (Theaceae)12rbcL1428Camellia szechuanensis_KY406778.1 (Theaceae)100100Family
Pyrenaria menglaensis_KY406747.1 (Theaceae)
Camellia luteoflora_KY626042.1 (Theaceae)
25ACamellia sinensis var. assamica (Theaceae)12rbcL1428Camellia szechuanensis_KY406778.1 (Theaceae)100100Family
Pyrenaria menglaensis_KY406747.1 (Theaceae)100100
Camellia luteoflora_KY626042.1 (Theaceae)100100
Camellia sinensis var. assamica_JQ975030.1 (Theaceae)100100
26APanicum incomtum (Poaceae)12rbcL1434Lecomtella madagascariensis_HF543599.2 (Poaceae)9999Family
Chasechloa madagascariensis_KX663838.1 (Poaceae)9999
Amphicarpum muhlenbergianum_KU291489.1 (Poaceae)9999
Panicum virgatum_HQ731441.1 (Poaceae)10099
Table 5

BLAST results with extracted ITS sequence against GenBank

Query informationBLAST results
Query_Sample IDQuery_Species (Family)PCR cyclesGene nameLength (bp)Reference_Species (Family)Query coverageIdentities
01AManglietia fordiana (Magnoliaceae)12ITS369Magnolia virginiana_DQ499097.1 (Magnoliaceae)100%95%
02AManglietia fordiana (Magnoliaceae)12ITS349Magnolia virginiana_DQ499097.1 (Magnoliaceae)100%95%
03ASchisandra henryi (Schisandraceae)12ITS676Schisandra pubescens_AF263436.1 (Schisandraceae)99%100%
04ASchisandra henryi (Schisandraceae)12ITS676Schisandra pubescens_JF978533.1 (Schisandraceae)99%99%
05APhoebe neurantha (Lauraceae)12ITS518Phoebe neurantha_FM957847.1 (Lauraceae)100%99%
06ACinnamomum bodinieri (Lauraceae)12ITS603Cinnamomum micranthum f. kanehirae _KP218515.1 (Lauraceae)100%99%
08AHolboellia latifolia (Lardizabalaceae)12ITS677Holboellia angustifolia subsp. angustifolia_AY029790.1 (Lardizabalaceae)100%99%
09AChloranthus erectus (Chloranthaceae)12ITS663Chloranthus erectus_AF280410.1 (Chloranthaceae)99%99%
10ASarcandra glabra (Chloranthaceae)12ITS667Sarcandra glabra_KWNU91871 (Chloranthaceae)100%100%
11AMeconopsis racemosa (Papaveraceae)12ITS671Meconopsis racemosa_JF411034.1 (Papaveraceae)100%99%
12AMacleaya microcarpa (Papaveraceae)12ITS612Macleaya cordata_AY328307.1 (Papaveraceae)99%89%
13AHodgsonia macrocarpa (Cucurbitaceae)12ITS614Hodgsonia heteroclita_HE661302.1 (Cucurbitaceae)100%98%
14AMalus yunnanensis (Rosaceae)12ITS596Malus prattii_JQ392445.1 (Rosaceae)99%99%
15AElaeagnus loureirii (Elaeagnaceae)12ITS649Elaeagnus macrophylla_JQ062495.1 (Elaeagnaceae)99%99%
16ARhododendron rex subsp. fictolacteum (Ericaceae)12ITS646Rhododendron rex subsp. fictolacteum_KM605995.1 (Ericaceae)100%100
17ASwertia bimaculata (Gentianaceae)12ITS626Swertia bimaculata _JF978819.2 (Gentianaceae)10099%
18APrimula sinopurpurea (Primulaceae)12ITS631Primula melanops_JF978004.1 (Primulaceae)100%99%
19APaederia scandens (Araceae)12ITS
20AColocasia esculenta (Araceae)12ITS552Colocasia esculenta_AY081000.1 (Araceae)99%99%
21APholidota chinensis (Orchidaceae)12ITS
22AOtochilus porrectus (Orchidaceae)12ITS
23AIndosasa sinica (Poaceae)12ITS604Oligostachyum sulcatum_EU847131.1 (Poaceae)9899
24ACamellia gymnogyna (Theaceae)12ITS
25ACamellia sinensis var. assamica (Theaceae)12ITS645Camellia sinensis var. sinensis_FJ004871.1 (Theaceae)99%99%
26APanicum incomtum (Poaceae)12ITS795Chasechloa egregia_LT593967.1 (Poaceae)10098
BLAST results with extracted rbcL sequence against GenBank BLAST results with extracted ITS sequence against GenBank One-way analyses of variance (ANOVA) were performed to test the total reads against PCR cycles, PCR cycles against plastid contig numbers, PCR cycles against plastid genome assembly length, PCR cycles against plastid mean-depth, and PCR cycles against plastid coverage. We found that was no significant correlation between PCR cycles and plastid contig numbers, PCR cycles and plastid genome assembly length, and PCR cycles and plastid coverage. There was, however, a significant positive correlation between the number of PCR cycles and the total number of reads, and PCR cycles and the plastid mean-depth (Fig. 2).
Fig. 2

PCR cycles with raw data, contigs, and assembly length

PCR cycles with raw data, contigs, and assembly length Finally, when comparing plastome assembly coverage with C values of the species concerned we find a slight negative bit not significant correlation (Fig. 3), which would suggest, at least for our sampling, that plastome assembly coverage is not affected by nuclear genome size of the specimen concerned.
Fig. 3

Plastome coverage versus C value (pg DNA per 1C) of all samples assembled in this study

Plastome coverage versus C value (pg DNA per 1C) of all samples assembled in this study

Discussion

Sequencing herbarium specimens from low amounts of starting DNA

Our current study successfully demonstrated the recovery of plastid genome sequences and rDNA sequences from herbarium specimens, some up to 80 years old. Our study used small amounts of starting tissue (c 1 cm2) and extremely low initial concentrations (500 pg) of degraded starting DNA. This success with a small amount of starting tissue is important, and demonstrates the practical feasibility of organelle genome and rDNA recovery with minimal impacts on specimens. These findings, in the context of studies by others (e.g. Bakker et al. [14]) confirm that genome skimming can be performed with limited sample destruction enabling relatively straightforward access to high-copy number DNA in preserved herbarium specimens spanning a wide phylogenetic coverage. To accommodate the use of only 500 pg of input DNA, we modified the library protocol to remove the step of DNA fragmentation by sonication because the DNA was already highly degraded, we did not undertake any size selection, and we increased the number of PCR cycles to enrich the indexed library. After library preparation and Illumina paired-end sequencing, a sufficient number of read pairs (> 15,000,000) were generated for our 25 specimens and 100 libraries. This strategy allowed the generation of complete or near complete plastid genomes with depths ranging from 459 × to 2176 ×, and nuclear ribosomal units with a high sequencing depth (3 × to 567 ×) for 23 and 24 specimens respectively. Despite the low starting concentration, no plant or fungal contaminants were obviously detectable in the assembled plastomes and rDNA sequences. For herbarium plastome assembly, the procedures and parameters for setting the sequence quality control, de novo assembly, blast search and genome annotation were followed as in Yang et al. [25]. The rate of our 25 specimens with 100 libraries was c. 5 h per specimen on a 3-TB RAM Linux workstation with 32 cores. It was not different significantly between fresh and herbarium specimens.

Recovery of widely used loci in plant molecular systematics

A benefit of the genome skimming approach is that it can recover loci widely used in previous molecular systematics studies (e.g. Coissac et al. 2016 [12]). Here we recovered the standard rbcL DNA barcode region from 23/25 samples, the standard matK DNA barcode region from 23/25 specimens, the standard trnH-psbA DNA barcode region from 23/25 samples, the trnL intron from 23/25 samples, and the ITS1 and ITS2 from 20/25 to 19/25 samples respectively. In addition to the recovery of these standard DNA barcoding loci, we also recovered many other regions used as supplementary barcode markers (e.g. atpF-H, psbK-I). The data produced with this approach can thus contribute towards standard and extended DNA barcode reference libraries [12], in helping identify additional regions which are informative for any given clade [28], as well as producing data for phylogenomic investigations to elucidate the relationships amongst plant groups.

Practical benefits

A primary motivation for this study was our own experiences with suboptimal DNA recovery from herbarium specimens using Sanger sequencing coupled with difficulty in accessing fresh material of some species. The success of this method using only small amounts of starting tissue from herbarium specimens is an important step to addressing these challenges. It makes sequencing type specimens a realistic proposition, which can further serves to integrate genetic data into the existing taxonomic framework. A second practical benefit is that field work is often not possible in some geographical regions where past collections have been made. Political instability and/or general inaccessibility can preclude current collecting activities, and where habitats have been highly degraded or destroyed, the species concerned may simply be no longer available for collection. Mining herbaria to obtain sequences from previously collected material can circumvent this problem. Thirdly, sequencing plastid genomes and rDNA arrays from specimens that are many decades old enables a baseline to be established for haplotype and ribotype diversity. This baseline can then be used to assess evidence for genetic diversity loss or change due to recent population declines or environmental change.

Conclusions

This study confirms the practical and routine application of genome skimming for recovering sequences from plastid genomes and rDNA from small amounts of starting tissue from preserved herbarium specimens. The ongoing development of new sequencing technologies is creating a fundamental shift in the ease of recovery of nucleotide sequences enabling ‘new uses’ for the hundreds of millions of existing herbarium specimens [1, 10, 14, 16, 29]. This shift from Sanger sequencing to NGS approaches has now firmly moved herbarium specimens into the genomic era.
  24 in total

1.  'Ghost' alleles of the Mauritius kestrel.

Authors:  J J Groombridge; C G Jones; M W Bruford; R A Nichols
Journal:  Nature       Date:  2000-02-10       Impact factor: 49.962

2.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics.

Authors:  Shannon C K Straub; Matthew Parks; Kevin Weitemier; Mark Fishbein; Richard C Cronn; Aaron Liston
Journal:  Am J Bot       Date:  2011-12-14       Impact factor: 3.844

4.  Illumina sequencing library preparation for highly multiplexed target capture and sequencing.

Authors:  Matthias Meyer; Martin Kircher
Journal:  Cold Spring Harb Protoc       Date:  2010-06

5.  Association of enzyme inhibition with methods of museum skin preparation.

Authors:  L M Hall; M S Willcox; D S Jones
Journal:  Biotechniques       Date:  1997-05       Impact factor: 1.993

6.  From museums to genomics: old herbarium specimens shed light on a C3 to C4 transition.

Authors:  Guillaume Besnard; Pascal-Antoine Christin; Pierre-Jean G Malé; Emeline Lhuillier; Christine Lauzeral; Eric Coissac; Maria S Vorontsova
Journal:  J Exp Bot       Date:  2014-09-25       Impact factor: 6.992

7.  From writing to reading the encyclopedia of life.

Authors:  Paul D N Hebert; Peter M Hollingsworth; Mehrdad Hajibabaei
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2016-09-05       Impact factor: 6.237

8.  Mitogenomics of Hesperelaea, an extinct genus of Oleaceae.

Authors:  Céline Van de Paer; Cynthia Hong-Wa; Céline Jeziorski; Guillaume Besnard
Journal:  Gene       Date:  2016-09-04       Impact factor: 3.688

9.  Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Authors:  Kojun Kanda; James M Pflug; John S Sproul; Mark A Dasenko; David R Maddison
Journal:  PLoS One       Date:  2015-12-30       Impact factor: 3.240

10.  How to open the treasure chest? Optimising DNA extraction from herbarium specimens.

Authors:  Tiina Särkinen; Martijn Staats; James E Richardson; Robyn S Cowan; Freek T Bakker
Journal:  PLoS One       Date:  2012-08-28       Impact factor: 3.240

View more
  33 in total

1.  The complete chloroplast genomes of two Pedicularis species (Orobanchaceae) from Southwest China.

Authors:  Wei-Jia Wang; Rong Liu; You Wu; Hong Wang; Wen-Bin Yu
Journal:  Mitochondrial DNA B Resour       Date:  2022-06-10       Impact factor: 0.610

2.  Ancient DNA extraction methods for herbarium specimens: When is it worth the effort?

Authors:  Pia Marinček; Natascha D Wagner; Salvatore Tomasello
Journal:  Appl Plant Sci       Date:  2022-06-15       Impact factor: 2.511

3.  Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood.

Authors:  Tuo He; Lichao Jiao; Alex C Wiedenhoeft; Yafang Yin
Journal:  Planta       Date:  2019-03-01       Impact factor: 4.116

4.  Museomics for reconstructing historical floristic exchanges: Divergence of stone oaks across Wallacea.

Authors:  Joeri S Strijk; Hoàng Thi Binh; Nguyen Van Ngoc; Joan T Pereira; J W Ferry Slik; Rahayu S Sukri; Yoshihisa Suyama; Shuichiro Tagane; Jan J Wieringa; Tetsukazu Yahara; Damien D Hinsinger
Journal:  PLoS One       Date:  2020-05-22       Impact factor: 3.240

5.  The plastid genome and its implications in barcoding specific-chemotypes of the medicinal herb Pogostemon cablin in China.

Authors:  Caiyun Zhang; Tongjian Liu; Xun Yuan; Huirun Huang; Gang Yao; Xiaolu Mo; Xue Xue; Haifei Yan
Journal:  PLoS One       Date:  2019-04-15       Impact factor: 3.240

6.  Comparison of Four Complete Chloroplast Genomes of Medicinal and Ornamental Meconopsis Species: Genome Organization and Species Discrimination.

Authors:  Xiaoxue Li; Wei Tan; Jiqi Sun; Junhua Du; Chenguang Zheng; Xiaoxuan Tian; Min Zheng; Beibei Xiang; Yong Wang
Journal:  Sci Rep       Date:  2019-07-22       Impact factor: 4.379

7.  The Effects of Herbarium Specimen Characteristics on Short-Read NGS Sequencing Success in Nearly 8000 Specimens: Old, Degraded Samples Have Lower DNA Yields but Consistent Sequencing Success.

Authors:  Heather R Kates; Joshua R Doby; Carol M Siniscalchi; Raphael LaFrance; Douglas E Soltis; Pamela S Soltis; Robert P Guralnick; Ryan A Folk
Journal:  Front Plant Sci       Date:  2021-06-23       Impact factor: 5.753

8.  Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification.

Authors:  Kristine Bohmann; Siavash Mirarab; Vineet Bafna; M Thomas P Gilbert
Journal:  Mol Ecol       Date:  2020-06-29       Impact factor: 6.185

9.  Factors Affecting Targeted Sequencing of 353 Nuclear Genes From Herbarium Specimens Spanning the Diversity of Angiosperms.

Authors:  Grace E Brewer; James J Clarkson; Olivier Maurin; Alexandre R Zuntini; Vanessa Barber; Sidonie Bellot; Nicola Biggs; Robyn S Cowan; Nina M J Davies; Steven Dodsworth; Sara L Edwards; Wolf L Eiserhardt; Niroshini Epitawalage; Sue Frisby; Aurélie Grall; Paul J Kersey; Lisa Pokorny; Ilia J Leitch; Félix Forest; William J Baker
Journal:  Front Plant Sci       Date:  2019-09-18       Impact factor: 5.753

10.  New Insights Into the Plastome Evolution of the Millettioid/Phaseoloid Clade (Papilionoideae, Leguminosae).

Authors:  Oyetola Oyebanji; Rong Zhang; Si-Yun Chen; Ting-Shuang Yi
Journal:  Front Plant Sci       Date:  2020-03-10       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.