Literature DB >> 27352242

Development of Molecular Markers for Determining Continental Origin of Wood from White Oaks (Quercus L. sect. Quercus).

Hilke Schroeder1, Richard Cronn2, Yulai Yanbaev3, Tara Jennings4, Malte Mader1, Bernd Degen1, Birgit Kersten1.   

Abstract

To detect and avoid illegal logging of valuable tree species, identification methods for the origin of timber are necessary. We used next-generation sequencing to identify chloroplast genome regions that differentiate the origin of white oaks from the three continents; Asia, Europe, and North America. By using the chloroplast genome of Asian Q. mongolica as a reference, we identified 861 variant sites (672 single nucleotide polymorphisms (SNPs); 189 insertion/deletion (indel) polymorphism) from representative species of three continents (Q. mongolica from Asia; Q. petraea and Q. robur from Europe; Q. alba from North America), and we identified additional chloroplast polymorphisms in pools of 20 individuals each from Q. mongolica (789 variant sites) and Q. robur (346 variant sites). Genome sequences were screened for indels to develop markers that identify continental origin of oak species, and that can be easily evaluated using a variety of detection methods. We identified five indels and one SNP that reliably identify continent-of-origin, based on evaluations of up to 1078 individuals representing 13 white oak species and three continents. Due to the size of length polymorphisms revealed, this marker set can be visualized using capillary electrophoresis or high resolution gel (acrylamide or agarose) electrophoresis. With these markers, we provide the wood trading market with an instrument to comply with the U.S. and European laws that require timber companies to avoid the trade of illegally harvested timber.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27352242      PMCID: PMC4924829          DOI: 10.1371/journal.pone.0158221

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Illegal logging is a serious issue not only for tropical rainforests and tropical trees, but it is also a concern for tree species in temperate latitude forests. White oaks from the genus Quercus sect. Quercus (Fagaceae) provide a relevant example of illegal logging in a temperate zone tree, and they highlight the challenge facing importers and regulatory agencies responsible for validating the taxonomic and geographic sources of timber products. White oaks account for a significant percentage of the hardwood flooring and furniture trade in Europe and the USA, and they represent one of the most important hardwoods in terms of logs and lumber exports from these regions. The most important trade woods of white oaks derive from the European species Quercus robur L. and Q. petraea (Mattuschka) Liebl., the CITES Appendix III-protected Q. mongolica Fisch. Ex Ledeb. native to East-Asia, and North American oaks, such as Q. alba L. and Q. macrocarpa Michx. [1]. Non-governmental organizations such as the Environmental Investigation Agency (http://eia-global.org/news-media/liquidating-the-forests) have documented increases in the rate of illegal logging for white oak wood, especially in the Russian Far East region. These activities increase the likelihood that international wood trading companies will market illegally harvested wood, an activity that is banned by the U.S. Lacey Act amendment of 2008 and the European Union timber regulation of 2010. Violation of these regulations can result in fines, forfeiture of wood, and additional payments, as was recently demonstrated with improperly documented shipments of white oak flooring in the United States [2]. Under these laws, timber companies are responsible for avoiding the trade of illegally harvested timber, and they are obligated to declare the species name and geographic origin of traded timber in order to reduce the risk that traded timber originated from illegal logging [3]. The increased attention to illegal logging has led to an increased demand for methods that can be used to provide precise species identification and geographic origin verification. Wood anatomical methods are widely used for tree species identification [3], but these methods cannot discriminate white oak species, nor identify geographic origin of oaks generally. Over the last decade, worldwide programs have been established using the potential of DNA as universal tool for identifying organisms (Barcode of Life www.barcodeoflife.org, [4]). In plants, the success of barcoding is highly dependent on several factors, including magnitude of primary divergence, frequency of secondary contact, and mutation rate of the DNA region [5], so the choice of suitable barcode regions in plants can be difficult [6-8]. Barcoding efforts in plants have focused on chloroplast genomes due their simple pattern of (typically) uniparental inheritance, low effective population size, and useful variation at the scale of geography and taxonomy across a wide range of species (e.g. [9-13]). With advances in next-generation sequencing, chloroplast genomes are affordable to sequence in their entirety by ‘skimming’ methods [14], and whole genome analysis can reveal substantial variation, even in unexpected genomic regions that are included in traditional barcoding efforts (e.g. [15-16]). DNA barcoding already has proven to be appropriate for revealing illegal trading (e.g. [17,18]), and it is increasingly used to identify plant species in commercial trade [19]. The aim of this study is to use chloroplast-genome scale information to develop a cost-efficient, easy-to-use assay that allows the identification of the geographic origin of white oak wood products to hemisphere (Old World vs. New World) and continent (Asia; Europe; North America) to support regulatory and commercial efforts to detect illegal logging of Q. mongolica.

Material and Methods

Plant material

For next-generation sequencing, we sequenced a single oak individual from four species and three continents to produce chloroplast genome references; included are Q. mongolica from Asia (sample QUMO5_CH_1; China), Q. petraea from Europe (sample QUPE2_PO_1; Poland), Q. robur from Europe (sample QURO2_SVT6; Germany), and Q. alba from North America (sample QUAL_VT_1; USA). To develop a panel of polymorphisms for Asia and Europe, we used next-generation sequencing to screen two pooled DNA samples that included 20 individual specimens of Q. mongolica or European Q. robur, respectively. Each of the Q. robur/Q. mongolica specimens were sampled from 10 geographically-widespread populations. To develop a panel of polymorphisms for North American white oaks, we sequenced chloroplast genomes from additional specimens representing the following species: Q. alba, Q. bicolor Willd., Q. garryana Douglas ex. Hook., Q. lyrata Walter, Q. macrocarpa Michx., Q. michauxii Nutt., Q. prinoides Willd., and Q. stellata Wangenh. For marker validation, DNAs from 13 Quercus species were screened. Q. mongolica and Q. dentata Thunb. represented Asian oaks (Far East Russia) with 200 and 10 specimens, respectively. Q. robur, Q. petraea and Q. pubescens WillD. represented European oaks, with 360, 210, and 200 specimens, respectively. Finally, eight white oak species were screened from North America (Q. alba, Q. bicolor, Q. garryana, Q. lyrata, Q. macrocarpa, Q. michauxii, Q. prinoides, Q. stellata), with between 5 and 25 specimens per species. None of the oak specimens used in this study has been sampled in protected areas that require permission of any authority. Some of the samples were done on private land and all owners of the lands gave permission for sampling. The field studies did not involve endangered or protected species.

Next-generation sequencing analyses

Aliquots of DNA (~0.5–1 μg) were sheared to a median length of ~300 bp and converted into sequencing libraries using Illumina TruSeq v.2 kits at the USDA Forest Service (Corvallis, OR; individual samples, each indexed with dual-index adapters) or GATC Biotech AG (Konstanz, Germany; pooled samples). Sequencing was performed using three approaches: (A) using the Illumina MiSeq with 2x150 bp paired-end reads for individual de novo genome reference assemblies; (B) using the Illumina MiSeq with 2x300 bp paired-end reads for pooled samples and reference-guided mapping; and (C) using the Illumina HiSeq with 100 bp single-end reads for individual North American species and reference-guided mapping. All reactions used version 3 sequencing chemistry. Information on raw clusters, sequence yield, and approximate target sequence (chloroplast genome) coverage depth is provided in Table 1.
Table 1

Results of next-generation sequencing for oak reference assembly and polymorphism screening.

Indexed individuals of oaks were sequenced using 150 bp paired-end reads and evaluated using de novo assembly (reference assembly). Pooled individuals were sequenced using 300 bp paired-end reads and evaluated using reference-guided assembly (polymorphism screen).

Reference AssemblyPolymorphism Screen
QUMO5QUPE2QURO21QUALQUMON PoolQUROB Pool
Sequences6,183,3395,281,5176,676,9254,280,04917,370,32428,438,226
Bases sequenced (Gbp)1.7471.5020.9771.20710.4217.06
Proportion of reads mapping to cp genome3.31.97.60.40.952.25
De novo contigs (length sum, Mbp)2,085872807,085----
Longest de novo contig (bp)135,60360,70127,04357,523----
Chloroplast contigs4121920----
Variants relative to QUMO50126124572789346

1 Q. robur was sequenced using 100 bp single-end reads.

Results of next-generation sequencing for oak reference assembly and polymorphism screening.

Indexed individuals of oaks were sequenced using 150 bp paired-end reads and evaluated using de novo assembly (reference assembly). Pooled individuals were sequenced using 300 bp paired-end reads and evaluated using reference-guided assembly (polymorphism screen). 1 Q. robur was sequenced using 100 bp single-end reads.

De novo reference construction

Raw read quality filtering was accomplished using Trimmomatic v0.30 [20], and we removed reads with a mean Phred score less than 33. Reads were digitally normalized to a coverage of 20 using, khmer v0.7.1 [21], and kmer-filtered sequences were assembled using Velvet Optimizer v2.2.5 (https://github.com/Victorian-Bioinformatics-Consortium/VelvetOptimiser.git) and Velvet v1.2.10 [22]. K-mer lengths ranging from 21 to 121 were evaluated, with a final k-mer length of 121 selected for assembly. De novo contigs ≥ 100 bp in length were screened for homology to the Quercus rubra chloroplast genome (NCBI NC_020152) using BLAT [23]. Contigs showing high similarity were retained for reference assembly and ordered against the Q. rubra chloroplast genome. Reference sequences were constructed to include the large single-copy (LSC) region, one of two inverted repeats (IR), and the small single-copy (SSC) region, equivalent to positions 1–135,502 from Q. rubra NC_020152.

Reference-guided identification of SNP and indel variation

Reference guided read mapping and polymorphism detection was performed using CLC Genomics Workbench version 7.5.1 (CLC-bio, a Qiagen company; Aarhus, Denmark). The reference chloroplast sequence of the Q. mongolica individual QUMO5_CH_1 generated by de novo reference construction (see above) was used as reference for read mapping. The trimmed Illumina data of the two pools (Q. mongolica, Q. robur) and the trimmed HiSeq data of representative North American individuals were mapped to the reference scaffold using a length fraction of 0.9 and a similarity fraction of 0.94. Variants detected by CLC Genomics Workbench included SNPs and small indels, and these were exported to tab-delimited files and processed using an in-house script (Variant Tools, see below) to identify species-specific polymorphism. Because the aim of this study was to develop markers that differentiate between the continents, the data set was reduced to these variants appearing with a frequency between 95 and 100%. For marker development, we focused in indels due to their simplicity of analysis and their easier handling using DNA extracted from timber. Maximum likelihood (ML) analysis was performed so that variants could be understood within the phylogenetic context of New and Old World oak chloroplast genome evolution. We performed ML analyses using RAxML-HPC2 version 8.2.8 through the Cipres Science Gateway (http://www.phylo.org/) using the GTRGAMMA substitution model for determining the final best ML tree, and the GTRCAT model to conduct 1000 rapid bootstrap replicates [24]. For this analysis, de novo contigs were aligned to the Q. rubra chloroplast genome (NC_020152), Q. rubra was specified as the gougroup, and gaps were treated as missing information.

Post processing of identified SNPs and indels

To merge the SNP and indel tables and find common variants present in two or more individuals/pools, we developed Variant Tools, a command line program implemented in Ruby. This program merges individual sample SNP and indel tables (CSV format) produced by CLC GWB to create a multi-individual SNP and indel matrix. Required input options include the reference sequence (fasta format), an input directory containing the variant CSV tables, and an option specifying the input data type (SNP or indel). The reference fasta can contain one reference sequence or multiple reference contigs. Optionally, coverage tables (produced by CLC from read mappings to a reference) of every individual can be included in the analysis by specifying a directory containing coverage files (CSV format). Furthermore several filtering options are available to reduce the output according to user-provided thresholds. The output from Variant Tools is stored in a CSV file and contains several data columns: the reference sequence name, the reference position, the variant length, the calling type (SNP, MNP, deletion or insertion), the reference base(s), the alternative base(s) for every individual, the coverage for every individual at the reference position, different summary statistics, and sequences flanking the called variant. The flanking sequences are calculated based on two given distance thresholds. An upper and a lower threshold define minimum distances in base pairs between two called variants on the genomic scale. If a variant occurs within the genomic range of the lower threshold no flanking sequence is created. If a variant resides within the genomic range of the upper threshold the length of the flanking sequence created is defined by the lower threshold. If no variant is found within the range of both thresholds a flanking sequence with the length of the upper threshold is calculated. The default thresholds are 75 bp and 50 bp. The Variant Tools create different summary statistics while the variant matrix is generated. The number of individual alleles deviating from the reference is a count for all found variants in all individuals at a specific genomic position. The number of alleles matching the reference with minimal coverage is a count for all positions in all individuals where no variant has been called and that are supported by a minimum coverage. The threshold for the minimum coverage is specified by the user. The default threshold is set to a minimum coverage of 8. Critical forward reverse balance is an indicator for systematic sequencing errors and describes how many forward and reverse reads are supporting the called variant. The value is averaged over all individuals showing the variant. The Variant Tools are open source software under ongoing development. They are available under the terms of the ICS license and can be obtained from https://github.com/ThuenenFG/varianttools.

DNA extraction, PCR, restriction, and genotyping

Leaves

One cm2 of a single leaf was ground to powder in liquid nitrogen. Total DNA was extracted, following a modified ATMAB protocol by [25]. PCR reactions for leaf-derived DNA contained ~30 ng template DNA, 10x PCR buffer, 1.5 or 1.75 mM MgCl2, 200 μM dNTPs, 0.4 unit AmpliTaq Gold DNA polymerase (ThermoFisher Scientific, Darmstadt, Germany), and 0.05 to 0.13 μM of each primer in a total volume of 15 μl. PCR was carried out in a Sensoquest Thermocycler (Göttingen, Germany) with a pre-denaturation step at 94°C for 10 min, followed by 25 to 30 cycles of 94°C for 45 sec (30 sec trnCD), suitable annealing temperature for each primer combination (between 52°C and 57°C) (details of primer conditions are given in Table 2) for 45 sec (30 sec trnCD), 72°C for 45 sec (1 min trnLF) and a final elongation at 72°C for 10 min. PCR amplification products were checked relative to a 100 bp ladder (Life Technologies, Martinsried, Germany) on a 1% agarose gel stained with Roti-Safe GelStain (Carl Roth GmbH & Co. KG, Karlsruhe, Germany); afterwards, PCR products were run on an ABI3730 capillary sequencer. Fragment analysis was performed using GeneMarker™ software v. 2.4.0 (Softgenetics, State College, PA, USA).
Table 2

List of primers for the amplification and resequencing of the newly developed markers.

Fluorescent-labeling of the primers is given in column “sequences”: FAM = blue, VIC = green, PET = red. In the last column, the accession numbers of the related markers for the three species Q. robur, Q. mongolica and Q. alba are given. “Length” means sequence length.

Marker-namePrimerSequence 5‘-3‘length (bp)Annea-ling (°C)Acces. no.
psaI-ycf4psaICGTGTGTAAACATGATATATGAG_FAM174-55KU2010
ycf4GAGTAATTCATCGAATTGGTTAG17820–22
psbE-petLpsbEAAGGAATTGGTTAGTTGTCCAG_VIC179-55KU2010
petLTTACATATCTTAAATTAGAGAGCC18523–25
trnLFtrnL1CAATACATATCATTTCTTGTACTG_PET130-53KU2010
trnF2TAGATAACTTGAGTTTATGTCAATT13526–28
trnCDtrnC5TTGGATAGACGAACGGGGAAT_FAM115-57KU2010
trnD5TATCATATTAAATTGATTGCCGG12329–31
trnDTtrnD3GGATAGGGATCAACAAGTTATTG_PET187-52KU2010
trnT4CAAGACCGACCCTAATTGAAT_PET18932–34

List of primers for the amplification and resequencing of the newly developed markers.

Fluorescent-labeling of the primers is given in column “sequences”: FAM = blue, VIC = green, PET = red. In the last column, the accession numbers of the related markers for the three species Q. robur, Q. mongolica and Q. alba are given. “Length” means sequence length.

Timber

For genotyping analysis of timber-derived DNA, mostly a special DNA extraction protocol has been developed and patented [26] based on the CTAB method. Exceptionally the innuPREP Plant DNA Kit from Analytik Jena (Germany) was used. Due to the small amount of total DNA extracted from timber, the DNA quantity wasn’t measured; rather, a standard dilution of 1:10 (DNA:water) was used for all PCR reactions. PCR conditions were similar to those used for leaf material, but with slight modifications (described in Results). For one marker (trnDT), a restriction enzyme digest was used to reveal a SNP polymorphism in the amplified fragment. The restriction digestion reaction contained 10μl PCR product, 2μl 10x CutSmart® buffer, 0.5μl enzyme (FastDigest® HinfI, New England Biolabs, Ipswich, MA) in a final volume of 20μl. The reaction lasted 15 min at 37°C followed by an inactivation at 80°C for 20 min. Restriction products were either visualized relative to a 50 bp ladder (Life Technologies, Germany, Martinsried) using an 8% polyacrylamide gel stained with ethidium bromide, or using an ABI3730 capillary sequencer.

Probabilities for fixation of gene markers

The Thünen-Institute of Forest Genetics possesses a large collection of reference samples that contains oak species from Europe (Quercus robur, Q. petraea), North America (Q. alba, Q. macrocarpa, and others) and Asia (Q. mongolica, Q. dentata). Based on the numbers in this collection of white oaks from different continents, we computed the maximal potential frequency of variants that were not observed using 95% confidence intervals [27]. This can be described as a method to determine the risk (potential error rate) that a genetic variant (allele/haplotype) assumed to be exclusive to one continent is found in one or more individuals originating from another continent. The calculations were carried out using the online forma at http://vassarstats.net/prop1.html based on 962 European, 325 Asian and 61 American white oak individuals for the gene markers psaI-ycf4, psbE-petL, trnLF, and trnCD. For the gene marker trnDT the sample sizes were 115 European, 425 Asian and 19 American white oak individuals.

Results

Next-generation sequencing, reference genome assembly, and identification of cpDNA length variants in white oaks

Next-generation sequencing of four indexed oak exemplars (Q. alba, Q. mongolica, Q. petraea, Q. robur) yielded between nearly 1 and 1.75 Gbp per individual (Table 1). De novo contig assembly with Velvet produced between 872 and 7,085 contigs, and the longest contigs from each assembly were from the chloroplast genome. Our best de novo chloroplast assembly derived from Q. mongolica QUMO5_CHI_1, which was represented by a single large contig totalling 135.6 kb, and it spanned the three main chloroplast genome regions (large single copy, inverted repeat, small single copy) in their entirety. Alignment of these contigs against the published Q. rubra chloroplast genome yielded an alignment of 135,603 nucleotides (alignment excludes one inverted repeat). Maximum likelihood analysis of these selected white oaks yielded a topology similar to the topology previously established with chloroplast DNA restriction site analysis [28] (Fig 1), with the Old World white oaks Q. mongolica, Q. petraea and Q. robur resolving as a sister group to the New World Q. alba (Fig 1). The best maximum likelihood tree resolved Q. mongolica and Q. petraea as sister, but bootstrap support for this resolution was low (40%) and supported by a small number of characters. Across white oak chloroplast genomes, we identified 672 single nucleotide variants and 189 indels, the vast majority of which discriminate white oaks from the outgroup Q. rubra, and New World from Old World oaks (Fig 1).
Fig 1

Phylogenetic relationship among chloroplast genomes of white oak species representing Old World and New World lineages.

The best maximum likelihood tree is shown for four white oak chloroplast genomes (Q. mongolica; Q. robur; Q. petraea; Q. alba) and one outgroup genome (Q. rubra). Inferred branch lengths in maximum likelihood substitutions are shown in bold, and bootstrap support values are show in italics. The phylogenetic resolution of informative indel markers are shown in black inverted triangles, and the resolution of the diagnostic PCR-RFLP marker is shown as a grey triangle.

Phylogenetic relationship among chloroplast genomes of white oak species representing Old World and New World lineages.

The best maximum likelihood tree is shown for four white oak chloroplast genomes (Q. mongolica; Q. robur; Q. petraea; Q. alba) and one outgroup genome (Q. rubra). Inferred branch lengths in maximum likelihood substitutions are shown in bold, and bootstrap support values are show in italics. The phylogenetic resolution of informative indel markers are shown in black inverted triangles, and the resolution of the diagnostic PCR-RFLP marker is shown as a grey triangle. For production of a best maximum likelihood tree, we used all these variants (Fig 1) but indicated in Fig 1 only the indels and one SNP that were selected for a marker set due to our earlier described aim of this study. Nevertheless, the whole dataset is publically available via NCBI (Data Accessibility). Next-generation sequencing of the Q. mongolica and Q. robur DNA pool produced 17.4 and 28.4 million paired-end 300 bp sequences (10.4 and 17.1 Gbp of sequence data respectively; Table 1). Mapping of paired-end reads from the Q. mongolica and Q. robur pools against the Q. mongolica QUMO5_CHI_1 chloroplast genome reference revealed 346 variant positions for the Q. robur pool, and 789 variant positions for the Q. mongolica pool (Table 1). After read mapping, the two variant tables were compared to filter those variants that showed fixed differences between the continents. The next step was to reduce the dataset to these variants appearing with a frequency between 95 and 100%. This analysis left five indels and 15 SNPs. For marker development, we focused in indels. Checking of these indels within the mapping revealed a (T)N microsatellite, two indels with a difference in the fragment length of two bp, and one indel with one bp difference. These were removed from the further analyses and only the two longest indels in two spacer regions (psaI-ycf4, psbE-petL), one with four and one with six bp difference, remained. Short read sequences from eight North American species were also mapped to the QUMO5 reference, and we specifically searched for indels differentiating North American species from the reference QUMO5 with a frequency of 95 to 100%. We identified three indels that consistently differentiated North America from Asia. These length polymorphisms were found in three spacer regions (trnLF, trnCD, and trnDT) and they ranged in length from 2 bp to 8 bp.

Primer design, marker validation and resequencing

For the five indel-including cpDNA regions, primers were designed using the reference QUMO5 to amplify fragments ranging from 110 bp to 190 bp. A preliminary validation performed with three individuals each of Q. robur, Q. petraea, Q. mongolica, Q. alba, and Q. macrocarpa revealed that all five cpDNA regions could successfully be amplified by PCR. Subsequent Sanger sequencing validated the sequence of the intervening region, the repeat type, and BLASTN analysis confirmed annotations (Table 2). Two indels differentiate European (Q. petraea, Q. robur) and Asian (Q. mongolica) white oaks, and these are located in the psaI-ycf4 linker (4 bp difference) and the psbE-petL linker (6 bp difference; Table 2), and these are inferred as mutations (deletions, specifically) are restricted to Asian white oaks. The further validation of these indels was conducted by screening the amplification products of 10 additional individuals of North American species, 50 individuals of two European species (Q. robur, Q. petraea), and two Asian species (Q. mongolica, Q. dentata). This validation revealed that white oaks from North America and Europe showed the same fragment length (Table 3, S1 Fig), and confirmed that the Asian species shared deletions that resulted in shorter, diagnostic fragment lengths.
Table 3

Details for used species, individuals and markers.

Given are number of individuals per species and continent tested with the five markers, and fragment length based on sequencing for each marker and species. Consensus sequences of the five markers are given in S1 Fig.

continentspecies1 No. individualspsaI-ycf4psbE-petLtrnLFtrnCD2 trnDT
EuropeQ. robur531 (103)17818513512386/101
Q. petraea273 (12)17818513512386/101
Q. pubescens158 (0)178185135123
USAQ. alba15 (4)17818513011588/101
Q. macrocarpa12 (7)17818513011588/101
Q. bicolor7 (4)17818513011588/101
Q. garryana4 (0)178185130115
Q. lyrata4 (2)17818513011588/101
Q. michauxii5 (1)17818513011588/101
Q. stellata8 (1)17818513011588/101
Q. prinoides6 (0)178185130115
AsiaQ. mongolica316 (420)17417913512386/71 (30)
Q. dentata9 (5)17417913512386/71 (30)

1 The number of individuals tested includes numbers for all loci except trnDT, which is given in parentheses.

2 Fragment lengths for trnDT show the lengths after restriction digestion with HinfI. Asian species contain an additional internal restriction site that yields one additional unlabeled fragment after HinfI digestion; these are shown in brackets.

Details for used species, individuals and markers.

Given are number of individuals per species and continent tested with the five markers, and fragment length based on sequencing for each marker and species. Consensus sequences of the five markers are given in S1 Fig. 1 The number of individuals tested includes numbers for all loci except trnDT, which is given in parentheses. 2 Fragment lengths for trnDT show the lengths after restriction digestion with HinfI. Asian species contain an additional internal restriction site that yields one additional unlabeled fragment after HinfI digestion; these are shown in brackets. Three indels differentiate Old World (Q. petraea, Q. robur, Q. mongolica) and North American white oaks, and these are located in the trnL-trnF (trnLF) linker (5 bp difference), the trnC-petN linker of the broader trnC-trnD (trnCD) linker region (8 bp) and in the trnE-trnT linker of the broader trnD-trnT (trnDT) linker region (2 bp; Table 2). These indels were validated with the same individuals of all above mentioned species. The validation revealed that white oaks from Asia and Europe showed identical fragment lengths (Table 3), and that the North American species shared mutations that resulted in diagnostic fragment lengths. Resequencing (Sanger) of the trnDT region identified a single nucleotide polymorphism (SNP) differentiating Asia from Europe and North America. The SNP lies within a HinfI restriction site, and restriction digestion of this region was predicted to yield three fragments in Asian white oaks and two fragments in European and North American species (all oaks share one HinfI site). By labeling the forward and reverse amplification primer, restriction digestion of the PCR fragment with HinfI allows the visualization of two of the three fragments, one of which is diagnostic for Asian white oaks due to its truncated length. Since this single region offers the possibility to differentiate all three continents with one assay, we decided to include this SNP into the marker set. In total, the five markers have been evaluated with sample sizes ranging from 559 to 1241 (in detail: trnCD 1241, trnLF 1078, psaI_ycf4 1230, psbE_petL 1183, trnDT 559), with samples representing 13 oak species from the three continents (Table 3, genotype information per specimen is given in S1 File). The nucleotide variations shown to be characteristic for Asian white oaks (Q. mongolica and Q. dentata) are also present in the complete chloroplast genome of one individual of Q. aliena, another Asian white oak species (GenBank accession KP301144.1) [29]. We computed the probabilities of fixation for the gene markers. Since we require a minimum of two independent markers for continent assignment, we performed the calculations for the risk of not identifying a rare variant in the reference samples based on a combination of two markers. By this means the risk of not identifying a rare variant in the reference samples was calculated to be less than 0.022% for Europe, 0.0051% for Asia and less than 0.098% for America. Thus, it is extremely unlikely that studied gene markers are not fixed to just one variant in the different groups.

Marker set design and optimization for timber

All above described analyses were performed using single PCR reactions and fragment analyses to optimize each marker separately. Subsequently, the markers were successfully multiplexed for fragment analysis using the fluorescence labeling as given in Table 2 (Fig 2).
Fig 2

Fragment patterns of the five markers for individuals from Asia (top), North America (middle) and Europe (bottom).

The sequence sizes for each peak as given in Table 3 are shown beneath the peaks. The first blue peaks appear smaller (112, 120) than the sequenced length (115, 123) given in Table 3. The color code of the peaks is as described in Table 2.

Fragment patterns of the five markers for individuals from Asia (top), North America (middle) and Europe (bottom).

The sequence sizes for each peak as given in Table 3 are shown beneath the peaks. The first blue peaks appear smaller (112, 120) than the sequenced length (115, 123) given in Table 3. The color code of the peaks is as described in Table 2. For the development of the markers and multiplexing, DNA from fresh leaves was used. The protocol was later optimized for DNA from timber. From our experience, DNA from timber is more sensitive to all PCR parameters, thus, all markers were singly tested with DNA from timber and the PCR was optimized for the DNA from timber. Differences in the PCR conditions used for the two different materials are given in Table 4. Due to the sensitivity of timber DNA in PCR, multiplexing of the PCRs of the five markers is not advisable, but multiplexing of the markers on the sequencer worked as well with timber DNA as with DNA from leaves. The only difference is that the PCR product from leaf DNA is diluted 1:50 and the PCR product from timber DNA 1:10 for use on the sequencer.
Table 4

PCR conditions compared for leaf and timber.

Only the differences are shown, all other parameters are as given in material and methods.

LeafTimber
PCR cyclesConc. MgCl2EnhancerConc. PrimerPCR cyclesConc. MgCl2EnhancerConc. Primer
psaI-ycf4301.75 mMno0.07 μM402.0 mMno0.1 μM
psbE-petL301.75 mMno0.05 μM402.0 mMno0.05 μM
trnLF301.75 mMyes0.2 μM402.0 mMyes0.2 μM
trnCD25–301.5 mMyes0.13 μM352.5 mMyes0.3 μM
trnDT251.75 mMyes0.1 μM402.0 mMyes0.1 μM

PCR conditions compared for leaf and timber.

Only the differences are shown, all other parameters are as given in material and methods. Our analyses used a capillary sequencer to visualize length polymorphisms of these fragments. However, due to the large size differences of these indels, all markers can be distinguished on a polyacrylamide gel, even for differences as small as two base pairs, as shown for the fragment trnDT (Fig 3). In this way, polymorphisms can be screened in laboratories where no sequencer is available.
Fig 3

Marker trnDT visualized on a polyacrylamide gel.

Lane 1: 50 bp ladder, lane 8: zero control, lane 2–7 and 13–14: analysis of wood-derived DNA, its location is inferred from genotypes, lane 9–12: references from North America (US), Europe (EU) or Asia (AS), respectively.

Marker trnDT visualized on a polyacrylamide gel.

Lane 1: 50 bp ladder, lane 8: zero control, lane 2–7 and 13–14: analysis of wood-derived DNA, its location is inferred from genotypes, lane 9–12: references from North America (US), Europe (EU) or Asia (AS), respectively. The functionality of the optimized PCR protocols and the multiplexed sequencer runs has been tested by means of orders worked on in the “Thünen Centre of Competence on the Origin of Timber”. Timber from these orders included highly processed wood as flooring or parquet, as well as treated solid wood samples as different parts of furniture, barrels or boards, and unused solid wood as firewood. In total, over 80 processed timber samples and 130 treated solid wood samples have been evaluated (data not shown). Based on our experiences so far we have a success rate of sufficient DNA amplification for our gene markers for 58% for solid wood samples.

Discussion

A set of five chloroplast markers have been developed and optimized to analyze DNA from timber to identify the continental origin of white oak wood. Small fragment sizes (< 200 bp) were chosen because genotyping success with DNA from timber is highest when fragments under 200 bp are targeted. This has recently also been shown for DNA from old and dried insect specimens of museum collections when using mitochondrial barcoding regions [30]. For the identification of haplotypes within oak species from wood samples, Deguilloux et al. [31] similarly developed chloroplast microsatellite and SNP markers that targeted small DNA fragments. The sequencing data revealed no specific indels to differentiate oak species within the classical barcoding regions matK, rbcL or the linker trnH-psbA [6,11,32-34]. Recently, the barcode regions matK and trnH-psbA were evaluated for their power to discriminate select species from oak sections Cerris, Heterobalanus (= “Group Ilex”), Lobatae, and Quercus [34]. In this study, the matK region proved to have too low resolution for the differentiation within the genus; interestingly, for trnH-psbA, the variability was too high to identify fixed interspecific differences [35]. The intergenic linkers trnLF, trnCD and trnDT we found valuable in this work have been widely tested in population and evolutionary genetic studies of plants, and they show wide variation in their ability to discriminate species and lineages [36]. For example, the intergenic linker trnLF proved to be not variable enough for overall barcoding approaches [6]. For the trial of phylogenetic reconstructions this trnLF linker lacked variation in closely related species [37]. Similarly, differentiation within the genus Populus failed using trnLF [38]. Nevertheless, there are other examples for successful use of this marker in molecular systematics [39,40] and citations therein) and for unraveling of the phylogeny of different plant species [41]. Similarly, trnCD and trnDT have been used in comprehensive studies of chloroplast DNA diversity in European white oaks [42-51]. In Japan, four oak species have been differentiated using trnDT among other chloroplast markers [52]. Hence, as for many regions within the chloroplast, the applicability of these spacers to questions of species identity depends on the specific phylogenetic and geographic context where they are used. Forensic applications of molecular markers are already established with regard to illegal wildlife trade (parrots: [17]; sea turtles: [53]) or for identification of products made of endangered animal species (‘whale meat’: [54]; horn: [55]). The barcode of wildlife project (http://www.barcodeofwildlife.org/) has been originated especially for this purpose. Further on control of the seafood market in different countries is well supported by usage of barcoding markers [18,56]. For identification of illegal logging of tropic tree species the use of molecular markers is already widespread [57-59], but the methods are less established for tree species from temperate zones. Thus, the presented markers should be applied to give commercial vendors of white oak wood the possibility to exercise ‘due diligence’ when placing timber on the European market and the public authorities to control timber imports should questions emerge on the correct declaration of wood.

Alignment for five cp regions with marked indels and SNPs used in the marker set.

For each of the described markers an alignment of a different number of individuals (between three and 50) of three species from the three continents has been made. Thus, the sequence for each species given in the figure is a consensus sequence. Further details of the markers and the related accession numbers of single individuals are given in Table 2. QUALB: Quercus alba (USA), QUMON: Q. mongolica (Asia), QUROB: Q. robur (Europe). (TIF) Click here for additional data file.

Information details including genotypes of used Quercus individuals as references.

(XLSX) Click here for additional data file.
  37 in total

1.  Phylogeny, biogeography, and processes of molecular differentiation in Quercus subgenus Quercus (Fagaceae).

Authors:  P S Manos; J J Doyle; K C Nixon
Journal:  Mol Phylogenet Evol       Date:  1999-08       Impact factor: 4.286

2.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

3.  Checking the geographical origin of oak wood: molecular and statistical tools.

Authors:  M F Deguilloux; M H Pemonge; L Bertel; A Kremer; R J Petit
Journal:  Mol Ecol       Date:  2003-06       Impact factor: 6.185

4.  DNA barcodes: genes, genomics, and bioinformatics.

Authors:  W John Kress; David L Erickson
Journal:  Proc Natl Acad Sci U S A       Date:  2008-02-19       Impact factor: 11.205

5.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

6.  A DNA barcode for land plants.

Authors: 
Journal:  Proc Natl Acad Sci U S A       Date:  2009-07-30       Impact factor: 11.205

7.  Two-sided confidence intervals for the single proportion: comparison of seven methods.

Authors:  R G Newcombe
Journal:  Stat Med       Date:  1998-04-30       Impact factor: 2.373

8.  DNA barcode authentication of wood samples of threatened and commercial timber trees within the tropical dry evergreen forest of India.

Authors:  Stalin Nithaniyal; Steven G Newmaster; Subramanyam Ragupathy; Devanathan Krishnamoorthy; Sophie Lorraine Vassou; Madasamy Parani
Journal:  PLoS One       Date:  2014-09-26       Impact factor: 3.240

9.  A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

Authors:  W John Kress; David L Erickson
Journal:  PLoS One       Date:  2007-06-06       Impact factor: 3.240

10.  The khmer software package: enabling efficient nucleotide sequence analysis.

Authors:  Michael R Crusoe; Hussien F Alameldin; Sherine Awad; Elmar Boucher; Adam Caldwell; Reed Cartwright; Amanda Charbonneau; Bede Constantinides; Greg Edvenson; Scott Fay; Jacob Fenton; Thomas Fenzl; Jordan Fish; Leonor Garcia-Gutierrez; Phillip Garland; Jonathan Gluck; Iván González; Sarah Guermond; Jiarong Guo; Aditi Gupta; Joshua R Herr; Adina Howe; Alex Hyer; Andreas Härpfer; Luiz Irber; Rhys Kidd; David Lin; Justin Lippi; Tamer Mansour; Pamela McA'Nulty; Eric McDonald; Jessica Mizzi; Kevin D Murray; Joshua R Nahum; Kaben Nanlohy; Alexander Johan Nederbragt; Humberto Ortiz-Zuazaga; Jeramia Ory; Jason Pell; Charles Pepe-Ranney; Zachary N Russ; Erich Schwarz; Camille Scott; Josiah Seaman; Scott Sievert; Jared Simpson; Connor T Skennerton; James Spencer; Ramakrishnan Srinivasan; Daniel Standage; James A Stapleton; Susan R Steinman; Joe Stein; Benjamin Taylor; Will Trimble; Heather L Wiencko; Michael Wright; Brian Wyss; Qingpeng Zhang; En Zyme; C Titus Brown
Journal:  F1000Res       Date:  2015-09-25
View more
  9 in total

1.  Comparative analysis of two Korean irises (Iris ruthenica and I. uniflora, Iridaceae) based on plastome sequencing and micromorphology.

Authors:  Bokyung Choi; Inkyu Park; Soonku So; Hyeon-Ho Myeong; Jangseung Ryu; Yu-Eun Ahn; Kyu-Chan Shim; Jun-Ho Song; Tae-Soo Jang
Journal:  Sci Rep       Date:  2022-06-08       Impact factor: 4.996

2.  Genome-wide identification and expression analysis of glycosyltransferase gene family 1 in Quercus robur L.

Authors:  Jie Zhang; Li-Mei Lin; Wen-Wen Cheng; Xin Song; Yue-Hong Long; Zhao-Bin Xing
Journal:  J Appl Genet       Date:  2021-07-09       Impact factor: 3.240

3.  DNA barcoding of vouchered xylarium wood specimens of nine endangered Dalbergia species.

Authors:  Min Yu; Lichao Jiao; Juan Guo; Alex C Wiedenhoeft; Tuo He; Xiaomei Jiang; Yafang Yin
Journal:  Planta       Date:  2017-08-19       Impact factor: 4.116

4.  Complete chloroplast genome of Myracrodruon urundeuva and its phylogenetics relationships in Anacardiaceae family.

Authors:  Bruno Cesar Rossini; Mario Luiz Teixeira de Moraes; Celso Luis Marino
Journal:  Physiol Mol Biol Plants       Date:  2021-04-11

5.  Complete Chloroplast Genome Sequences of Four Meliaceae Species and Comparative Analyses.

Authors:  Malte Mader; Birte Pakull; Céline Blanc-Jolivet; Maike Paulini-Drewes; Zoéwindé Henri-Noël Bouda; Bernd Degen; Ian Small; Birgit Kersten
Journal:  Int J Mol Sci       Date:  2018-03-01       Impact factor: 5.923

6.  Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA.

Authors:  Kujin Tang; Jie Ren; Richard Cronn; David L Erickson; Brook G Milligan; Meaghan Parker-Forney; John L Spouge; Fengzhu Sun
Journal:  BMC Genomics       Date:  2018-12-10       Impact factor: 3.969

7.  High Level of Conservation of Mitochondrial RNA Editing Sites Among Four Populus Species.

Authors:  Wolfram Georg Brenner; Malte Mader; Niels Andreas Müller; Hans Hoenicka; Hilke Schroeder; Ingo Zorn; Matthias Fladung; Birgit Kersten
Journal:  G3 (Bethesda)       Date:  2019-03-07       Impact factor: 3.154

8.  Development and technical application of SSR-based individual identification system for Chamaecyparis taiwanensis against illegal logging convictions.

Authors:  Chiun-Jr Huang; Fang-Hua Chu; Yi-Shiang Huang; Yu- Mei Hung; Yu-Hsin Tseng; Chang-En Pu; Chi-Hsiang Chao; Yu-Shyang Chou; Shau-Chian Liu; Ya Ting You; Shuo-Yu Hsu; Hsiang-Chih Hsieh; Cheng Te Hsu; Meng-Yi Chen; Ting-An Lin; Hsin-Yi Shyu; Yu-Ching Tu; Chi-Tsong Chen
Journal:  Sci Rep       Date:  2020-12-16       Impact factor: 4.379

9.  Dissection for Floral Micromorphology and Plastid Genome of Valuable Medicinal Borages Arnebia and Lithospermum (Boraginaceae).

Authors:  Inkyu Park; Sungyu Yang; Jun-Ho Song; Byeong Cheol Moon
Journal:  Front Plant Sci       Date:  2020-12-04       Impact factor: 5.753

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.