Literature DB >> 35415645

Evaluating the usefulness of next-generation sequencing for herb authentication.

Anna Delgado-Tejedor1, Pimlapas Leekitcharoenphon1, Frank M Aarestrup1, Saria Otani1.   

Abstract

Food authentication is a rapidly growing field driven by increasing public awareness of food quality and safety. Foods containing herbs are particularly prone to industrial fraud and adulteration. Several methodologies are currently used to evaluate food authenticity. DNA-based technologies have increased focus, with DNA barcoding the most widely used. DNA barcoding is based on the sequencing and comparison of orthologous DNA regions from all species in a sample, but the approach is limited by its low resolution to distinguish closely-related species. Here we developed a customised database and bioinformatics pipeline (Herbs Authenticity - GitHub) to identify herbal ingredients implemented as a metagenomics approach for plant-derived product authenticity testing. We evaluated the accuracy of the method by using publicly available plant genomes and databases to allow the construction of our customised database barcodes, which were also complemented with entries from publicly available resources (iBOL and ENA). The pipeline performance was then tested with new 47 de novo partly sequenced whole plant genomes or barcodes as query sequences. Our results show that using our mapping algorithm with the customised barcode database correctly identifies the main components of a wide range of plant-derived samples, albeit with variable additional noise across samples depending on the tested samples and barcodes. Our result also show that at the current stage the usefulness of metagenomics is limited by the availability of reference sequences and the needed sequencing depth. However, this method shows promise for evaluating the authenticity of different herbal products provided that the method is further refined to increase the qualitative and quantitative accuracy.
© 2021 The Author(s).

Entities:  

Keywords:  Authenticity testing; Barcodes; Food; Herbs; Next generation sequencing

Year:  2021        PMID: 35415645      PMCID: PMC8991511          DOI: 10.1016/j.fochms.2021.100044

Source DB:  PubMed          Journal:  Food Chem (Oxf)        ISSN: 2666-5662


Introduction

There has been a significant increase in food fraud and adulteration for economic advantage over the last decade (Medina, Pereira, Silva, Perestrelo, & Câmara, 2019). As a result, food authentication is a rapidly growing field (Mishra et al., 2016). Foods can be misdescribed through substituting or mixing ingredients with cheaper alternatives, or including undeclared ingredients (Primrose, Woolfe, & Rollinson, 2010). Undeclared compounds can also represent a threat to public health; for example, foods adulterated with nut protein can cause anaphylactic reactions in susceptible individuals (Haynes, Jimenez, Pardo, & Helyar, 2019). There is therefore a need to develop accurate analytical methods to verify the type and quantity of ingredients in food products to verify manufacturer claims (Delia, 2019, Pamela et al., 2018). Plants and herbs are widely used in the food industry. Although often included in relatively small quantities, they are important and often expensive ingredients in many products, making them prone to industrial fraud (Black, Haughey, Chevallier, Christopher, & Elliott, 2016). Four main methods are currently used to evaluate food authenticity: morphology, chromatography/mass spectrometry, immunological assays, and DNA-based methodologies (Drouet et al., 2018). While morphological identification is a low-cost approach, its accuracy depends on human expertise and its low resolution makes it unsuitable for powdered products or to distinguish closely-related species (Yat-Tung & Shaw, 2018). High-resolution chromatographic techniques such as gas chromatography (GC) and high-performance liquid chromatography (HPLC) coupled to mass spectrometry (MS) are also popular in food authentication (Drouet et al., 2018). Authentication is performed by comparing the chemical fingerprints from standards with those obtained from the herbal product. However, it is expensive and requires specialist equipment and trained analysts for interpretation (Danezis, Tsagkaris, Camin, Brusic, & Georgiou, 2016). Enzyme-linked immunosorbent assays (ELISAs) are the most widely used immunological method in food authentication due to their high sensitivity (Sasikumar, Swetha, Parvathy, & Sheeja, 2016). However, their performance is lower in processed or powdered products, the design and use of specific antibodies can be expensive, and there may be cross-reactivity to proteins from closely-related species (Montowska et al., 2019, Walker et al., 2018). Over the last few years, next-generation sequencing (NGS) has transformed genomics (Sara, McPherson, & Richard McCombie, 2016). While NGS has been applied to plant and herb identification, currently the most widely used method is DNA barcoding (Gerard, 2016), a technology based on PCR amplification followed by sequencing and comparison of orthologous DNA regions from all species in a sample (Böhme, Calo-Mata, Barros-Velázquez, & Ortea, 2019). The main limitation of DNA barcoding is its current gene bias due to the PCR-based approach. For example, DNA barcoding has insufficient resolution to differentiate Mentha, Ocimum, Origanum, Salvia, and Thymus species in the Lamiaceae family (Drouet et al., 2018). Choosing appropriate barcoding genes for identification purposes is challenging, as they need to be present in a wide range of plants and herbs but harbour sufficient interspecies variability (Böhme et al., 2019). To overcome this limitation in resolution, multiple barcode methodologies have been developed and have improved product identification performance, instead of single-locus approaches (single barcode), however with bias that varies depending on the sample taxa (Mishra et al., 2016). Another challenge in DNA barcoding for herb identification is the lack of a complete and accurate reference library (Coissac et al., 2016, Hollingsworth et al., 2016, Tnah et al., 2019). Several databases include different plant-derived genes based on their location or their usage. The most relevant to species identification is the International Barcode of Life project (iBOL) (Illuminate Biodiversity - International Barcode of Life. url: http://ibol. org/site/) from the Consortium for the Barcode of Life (CBOL) initiative (CBOL — iBOL. url: http://www. ibol. org/phase1/cbol/), which contains 450,581 entries. iBOL, however, unevenly represents species and genera, some annotations are poor quality, and several sequences are incomplete. No publicly available database only includes the reference barcodes from herbs and species used in Danish and European cuisine. A customised database would eventually increase the accuracy of identification. Here we developed and evaluated a metagenomic approach for plant-derived sample authenticity, from both single plant species that are used as spices and commercially available herbal products. A customised alignment pipeline and plant-specific barcode database were built and the pipeline and barcode database were validated using publically available plant sequences of known origins. Finally, 47 herbal plant species and products of known composition and commercially available food preparations were sequenced at an approximately 80 million reads per sample to be evaluated and assessed using our novel metabarcoding analysis pipeline.

Materials and methods

Sample collection and DNA extraction

Forty-seven plants and herb products widely used in Danish and European cuisine were included: 33 plant species (Table S1) to build the barcode database and 14 herbal products (Table S2) for authenticity evaluation. Samples were seeds, powders, or fresh or dried plant tissue obtained both from a farm in Helsinge, Denmark (Fuglebjerggaard) and a Danish supermarket. Total genomic DNA was extracted from all samples with the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) (Peter M Hollingsworth et al., 2011) following the manufacturer’s instructions with the following modifications: 100–200 mg of sample was used as a starting material and mixed with 400 µl lysis buffer. TissueLyser (Qiagen) was used for bead treatment in two cycles of 1 min at 30 Hz. The final step of lysis was the addition of 4 µl RNase A stock solution to the sample followed by incubation for 15 min at 65 °C. After lysing and protein precipitation, the samples were centrifuged for 5 min at 20,000×g, and the supernatant was applied to the QIAshredder Mini spin column (Qiagen) and centrifuged for 5 min at 20,000×g. Then, the flow-through fraction excluding any precipitate was mixed with 1.5 volumes washing buffers AW1 and AW2. Finally, the DNA was eluted in two volumes of 50 µl of pre-heated (65 °C) AE buffer (Table S3).

Library preparation and sequencing

Genomic DNA quality was assessed with the TapeStation Genomic DNA Assay (Agilent Technologies, Santa Clara, CA). Library preparation was performed using the KAPA HyperPrep kit without PCR as per the manufacturer’s recommendations (Kapa Biosystems, Roche, Basel, Switzerland). Library quality and quantity were assessed with the Qubit 2.0 DNA HS Assay (Thermo Fisher Scientific, Waltham, MA) and QuantStudio® 5 (Applied Biosystems, Foster City, CA). Libraries were loaded onto an Illumina HiSeq 2 × 150 bp format to target 80 M total reads (40 M reads each direction) per sample.

Pipeline and database

Pipeline implementation

The HerbsAuthenticate package was constructed. It processes trimmed FASTQ files from both single or paired-end reads to perform sequence mapping against barcode databases. It can also build customised databases based on the user’s needs by extracting specific barcodes from the trimmed reads. The complete algorithm, scripts, and documentation are available online at GitHub (Anna Delgado. Herbs Authenticity - GitHub. 2019. url: https://github.com/ADelgadoT/HerbsAuthenticate.git).

Building a customized database: Barcode generation through the alignment of herbal plant sequences to the barcode backbone database

The script Barcodes.py was constructed and used to extract specific barcodes from trimmed reads to build a customised user database. The script uses KMA (Clausen, 2018) and the main workflow is shown in Fig. 1.
Fig. 1

Barcode extraction algorithm workflow.

Barcode extraction algorithm workflow. To obtain the consensus sequences of a set of specific barcodes (Table S4) for plant taxonomical identification, all trimmed reads from sequencing 33 plants (Table S1) were aligned to a barcode backbone database with KMA (Clausen, 2018). The default barcode backbones present in the database were: matK (iBOL accession number ABCBF144-11), rbcL (iBOL accession number AGOPO45-11), ropC1 (GenBank accession number DQ886273.1), rpoB (GenBank accession number GU732808.1), trnH-psbA (iBOL accession number ALOAF030-10), trnL-F (GenBank accession number AF292404.1), ycf1 (GenBank accession number JF289072.1), ITS2 (iBOL accession number AGOPO45-11), and COI (GenBank accession number AY490250.1) (Tables S4–S6). The backbone barcodes were selected based on previously published data for plant identification (Anantha and Johnson, 2019, Dong et al., 2015, Li et al., 2015, Yu, 2018). This database contained the sequences of the nine different genes from several species in the Magnoliophyta phylum (Tables S4–S6), and all the included plant species and herbal products belonged to it those taxa. Before performing alignment, the database was indexed with the default command and parameters, which can be accessed online (genomicepidemiology / kma — Bitbucket. url: https://bitbucket.org/genomicepidemiology/kma). To investigate the optimal set of parameters for barcode extraction, KMA was executed with different configurations (methodology validation section, Supplementary Information). The constructed plant barcode backbone was then validated using blastn search against nt database. Mapping trimmed reads against a specific barcode database was performed using the Mapping.py script (Supplementary Information).

Pipeline benchmarking procedure using publicly available plant sequence datasets

The performance of all used algorithms was evaluated using ten publicly available single-species datasets of plants downloaded from the European Nucleotide Archive (ENA) (European Nucleotide Archive EMBL-EBI. url: https://www.ebi.ac.uk/ena) (Table S7). Trimmed reads from those ten publicly available plant species (Table S7) were aligned against our barcode database (Tables S4–S6) to obtain all possible barcodes. The alignment was executed at five different level of stringency to evaluate which parameters lead to a better performance (Supplementary Information). To test the performance of the algorithm solely, without the potential effect of our customised barcode database, all ten publicly available plant samples (Table S7) were mapped against the iBOL database, a publicly available resource from The International Barcode of Life Consortium (Illuminate Biodiversity - International Barcode of Life. url: http://ibol. org/site/), using KMA with default alignment parameters.

De novo sequencing and testing

Plant species and herbal product analysis

33 plant species and 14 herbal product sequences included in this study (Table S1, S2) were used to identify the barcode sequences for the taxonomical annotation (Tables S4–S6). These sequences, once validated with blastn, formed the customized databased that was used in this study. In the case that there were missing barcodes from specific species (Table S12), those sequences were supplemented with barcodes from public databases such as iBOL and ENA (Table S8) when not detected in our samples using the script Barcodes.py (Anna Delgado. Herbs Authenticity - GitHub. 2019. url: https://github.com/ADelgadoT/HerbsAuthenticate.git) with default user options. This was done to increase the detection coverage of a barcode if our sequencing depth was not sufficient to build the backbone barcode database.

Results

Sequencing data

47 herbal plants and products were subjected to next generation sequencing. Before trimming, the maximum number of reads in a sample was 118,252,290 and the minimum value was 80,868,466. On average, there were 97,197,328.69 reads per sample. After trimming, there were 91,881,393 reads per sample on average (94.57%) (Table S9). The raw reads from 47 plants were submitted in ENA under project number PRJEB44059.

Evaluation of the pipeline, HerbsAuthenticate

To test the efficiency of our mapping pipeline for plant detection, ten publicly available samples (Table S7) were mapped against our customised barcode database with five different levels of stringency. The fraction of validated barcodes in all five stringency-parameter sets was almost constant, suggesting that the number of validated sequences did not increase proportionally with the reduction in stringency of the alignment algorithm (Fig. S1 and Supplementary Information). This suggested that parameter set number three, with a medium level of stringency, had the best performance and was used for the downstream analyses. To test the efficiency of our mapping pipeline solely, without the effect of our customised barcode database, for plant taxonomical identification, the ten publicly available samples (Table S7) were also mapped against the iBOL database. The mapping results are presented in Table 1 (Figure S3).
Table 1

Relative abundances of plant taxa that were identified in each of the ten publicly available plant species when mapped using our pipeline against iBOL database.

Sample nameBasil - OcimumBrown mustard - Brassica
GenusOcimumVitexLycopusGlechomaLamialesOthersBrassicaErucastrumLepidiumUlvaSinapisOthers
relative abundance69107329531196516

Sample nameCinnamon - CinnamomumFennel - Daucus

GenusCinnamomumMachilusLinderaLaurusLauralesOthersDaucusFoeniculumOsmorhizaAnethumCryptotaenia
relative abundance2423988285233653

Sample nameGarlic - AlliumLiquorice - Glycyrrhiza

GenusAlliumAspargalesPhoenixGlycyrrhizaWisteriaMedicagoMillettiaLotusOthers
relative abundance925275107431

Sample namePaprika - CapsicumThyme - Thymus

GenusCapsicumDaturaSolanumNicotianaCalibrachoaOthersThymusOriganumMimulusMenthaLycopusOthers
relative abundance401715106125017154410

Sample nameTomato - SolanumVanilla - Vanilla

GenusSolanumNicotianaCalibrachoaDaturaOthersVanillaCypripediumPhoenixEpipactisNannochlorisOthers
relative abundance691243125316151024
Relative abundances of plant taxa that were identified in each of the ten publicly available plant species when mapped using our pipeline against iBOL database. The most abundant taxon in each sample outputs corresponded to the sample origin (Table 1, Figure S3). For example the most abundant hit in garlic (genus Allium) (Figure S3-A) corresponded to the expected genus, with a relative read abundance of 92%. Although Asparagales and Phoenix were detected, they were present at largely lower abundances than Allium.

Barcode database construction

The plant samples included in this study (Table S1) were mapped to our customised barcode database (Tables S4–S6) to obtain their respective barcodes. All the generated barcodes from this mapping are shown in Fig. 2. Only 34% of the total possible sequences were assigned to the backbone database. No barcode was consistently found in all samples. Their recovery rates were: rbcL 78.8%, trnL-F 51.5%, trnH-psbA 42.4%, ITS2 and ropC1 36.4%, matK and rpoB 24.2%, ycf1 15.2%, and COI 3%.
Fig. 2

Heatmap shows the obtained barcodes by mapping sequences from 33 single species plant using our customised algorithm.

Heatmap shows the obtained barcodes by mapping sequences from 33 single species plant using our customised algorithm. To generate a complete database covering as many plant species as possible, the three barcodes with the highest recovery rates (rbcL, trnLF, and trnH-psbA) were selected for further database construction and the missing entries were incorporated manually from publicly available databases (iBOL and ENA; Fig. 3, Table S10). The final customised database included 99 sequences (Fig. 3), and contained data from rbcL, trnL-F, and trnH-psbA from all 33 plant samples (Table S1).
Fig. 3

Heatmap shows the structure of the customised barcode database that was obtained from 33 single species plant sequences. The barcode database was also supported by barcodes obtained from the publicly available database iBOL.

Heatmap shows the structure of the customised barcode database that was obtained from 33 single species plant sequences. The barcode database was also supported by barcodes obtained from the publicly available database iBOL.

Taxonomic composition of 33 plant samples using our customised barcode database and pipeline

Our validated and customised barcode database allowed for good quality plant-taxonomic assignments of the 33 single-species plants included in this study. The compositional taxonomic assignments are all shown as relative abundances, which refer to the proportional abundance of a plant taxon out of the entire identified taxa in one sample. Results are shown in Table 2 (Figure S4), and the raw mapped data, before relative abundance calculations, are presented in Table S13.
Table 2

Relative abundances of plant taxa that were identified in 33 single plant species when mapped using our pipeline against our customised barcode database.

Sample nameBasil - OcimumBay leaf - LaurusBirkes - Papaver
GenusOcimumThymusMenthaSalviaAnethumLaurusCymbopogonCinnamomumElettariaPapaverPapaverArtemisiaCymbopogonCrocusAllium
relative abundance65.959.348.225.864.8854.2816.6613.278.424.269.98.027.65.815.23

Sample nameMustard - BrassicaCardamome - ElettariaCelery - Apium

GenusBrassicaAnethumAlliumElettariaCurcumaZingiberCinnamomumLaurusApiumPetroselinumLaurusBrassicaElettaria
relative abundance82.1417.830.0163.1526.538.230.860.5371.9425.161.080.730.49

Sample nameChili - CapsicumChives - AlliumCinnamon - Cinnamomum

GenusCapsicumSolanumApiumCoriandrumElettariaAlliumElettariaLaurusArmoraciaBrassicaCinnamomumLaurusElettariaPapaverBrassica
relative abundance46.4912.1111.256.285.3886.537.72.551.430.8757.4230.186.312.441.27

Sample nameCoriander - CoriandrumCumin - AnethumDill - Anethum

GenusCoriandrumAnethumArmoraciaApiumArtemisiaAnethumPetroselinumArtemisiaCoriandrumFoeniculumAnethumPetroselinumArtemisiaCymbopogonCrocus
relative abundance31.119.2612.7812.3611.3740.7720.4816.9516.841.4852.2621.4210.979.244.41

Sample nameEstragon - ArtemisiaFennel - FoeniculumFenugreek - Trigonella

GenusArtemisiaAnethumZingiberElettariaCrocusFoeniculumPetroselinumArtemisiaAnethumCoriandrumTrigonellaArtemisiaGlycyrrhizaElettariaAllium
relative abundance64.5217.57.8752.1849.1828.7414.373.951.8275.0713.25.153.631.05

Sample nameGarlic - AlliumGinger - ZingiberHorseradish - Armoracia

GenusAlliumArtemisiaElettariaPetroselinumZingiberZingiberElettariaCurcumaPetroselinumArtemisiaArmoraciaBrassicaCymbopogonAnethumElettaria
relative abundance70.0215.786.12.851.9953.4535.269.70.60.4252.9620.1311.3910.512.67

Sample nameLavander - CymbopogonLemongrass - CymbopogonLiquorice - Glycyrrhiza

GenusCymbopogonCoriandrumGlycyrrhizaCrocusLaurusCymbopogonMenthaAlliumGlycyrrhizaArtemisiaGlycyrrhizaAnethumCymbopogonCrocusArtemisia
relative abundance32.2217.4512.3411.397.884.367.454.31.20.7660.6214.848.878.226.95

Sample nameMint - MenthaOnion - AlliumOregano - Origanum

GenusMenthaCymbopogonOriganumThymusOcimumAlliumCrocusCymbopogonMenthaGlycyrrhizaOriganumMenthaCymbopogonOcimumLavandula
relative abundance51.3114.8114.78.873.31789.597.451.751.1348.2923.2512.253.332.73

Sample nameParsley - PetroselinumPepper - PiperRosemary - Salvia

GenusPetroselinumFoeniculumCrocusSalviaSolanumPiperCrocusOcimumPetroselinumAnethumSalviaMenthaCrocusThymusOcimum
relative abundance72.6119.215.670.610.5590.885.541.670.730.4652.9617.8410.135.955.78

Sample nameSaffron - CrocusSalvia - SalviaThyme - Thymus

GenusCrocusSolanumSalviaPetroselinumMenthaSalviaOcimumMenthaCrocusThymusThymusMenthaOriganumCrocusSalvia
relative abundance97.960.430.40.270.2656.0413.768.918.745.4240.7827.6914.87.583.74

Sample nameTomato - SolanumTurmeric - CurcumaVanilla - Vanilla

GenusSolanumCrocusElettariaSalviaOriganumCurcumaElettariaCrocusZingiberPetroselinumVanillaZingiber
relative abundance82.819.485.931.420.0853.9232.017.485.360.3696.253.74
Relative abundances of plant taxa that were identified in 33 single plant species when mapped using our pipeline against our customised barcode database. The most abundant plant taxon that was found in each sample in this collection corresponded to the expected sample taxon (Table 2, Figure S4). For example, the basil sample (Ocimum) (Table 2, Figure S4) had the highest relative read abundance for Ocimum (66.0%) followed by Thymus (9.3%), Mentha (8.2%), Salvia (5.9%), and Anethum (4.9%). The number of assigned taxa in each sample was between 4 and 6 except for mustard and vanilla were only 3 and 2 taxa were assigned for each sample (Table 2, Figure S4).

Taxonomic composition of herbal products using our customised barcode database and pipeline

To evaluate the performance of our pipeline and customised barcode database methods with complex herbal samples, authenticities of 14 herbal products were tested (Table S2). Results are shown in Table 3, Figure S5. In the seven dried products where each sample represents mainly one herbal plant species (e.g., dried basil and dried tarragon Table S2, Table 3, Figure S5), the most abundant plant taxon in each sample corresponded to the expected herbal plant taxon (Table 3, Figure S5). For example, in the dried saffron sample (Crocus), Crocus was the first hit, with a relative read abundance of 95.9%. Zingiber (3.4%) (Table 3, Figure S5).
Table 3

Relative abundances of plant taxa that were identified in 14 herbal products for authenticity testing when mapped using our pipeline against our customised barcode database.

Herbal product nameChilli explosion (mixture)Cinnamon powder
GenusBrassicaArmoraciaAnethumCapsicumSolanumCinnamomumLaurusElettariaCurcumaThymus
relative abundance33.2124.5816.2814.238.7858.4731.336.80.980.85

Herbal product nameCitronpeppar (mixture)Curry (mixture)

GenusAlliumLaurusCrocusOcimumAnethumTrigonellaCurcumaElettariaAnethumFoeniculumCoriandrumCuminum
relative abundance35.4211.938.898.697.9529.3521.3910.067.876.765.194.49

Herbal product nameDried basilDried dill

GenusOcimumThymusMenthaSalviaLavandulaAnethumPetroselinumFoeniculumCrocusGlycyrrhiza
relative abundance64.139.18.275.934.4959.8822.5511.724.460.42

Herbal product nameDried estragonGarlic pepper (mixture)

GenusArtemisiaAnethumCymbopogonElettariaCrocusAlliumPetroselinumArtemisiaCymbopogonCrocus
relative abundance75.498.498.264.971.5143.7316.8512.3911.817.29

Herbal product nameGarlic powderGreen curry (mixture)

GenusAlliumGlycyrrhizaCrocusAnethumArtemisiaAlliumCoriandrumElettariaCrocusZingiber
relative abundance38.6926.369.98.065.4538.519.428.348.177.76

Herbal product namePaprika powderRed curry (mixture)

GenusCapsicumSolanumElettariaCoriandrumAnethumAlliumCapsicumArmoraciaCrocusElettaria
relative abundance53.7127.076.866.162.0640.02159.639.246.7

Herbal product nameRed garlic and pepper (mixture)Saffron

GenusAlliumAnethumElettariaPetroselinumCurcumaCrocusZingiberSalviaGlycyrrhizaMentha
relative abundance76.525.893.132.091.5895.863.40.130.10.08
Relative abundances of plant taxa that were identified in 14 herbal products for authenticity testing when mapped using our pipeline against our customised barcode database. In the remaining seven herbal products (Table S2) where each sample is a mixture of spices and herbal products, several plant taxa were identified in the mapped sequences when using our customised pipeline and barcode database (Table 3, Figure S5). For example, the red curry sample (Table 3, Figure S5) is a mix of herbal products and, as expected, contained several plant genera with different relative read abundance values: Allium (40.0%), Capsicum (15.0%), Armoracia (9.6%), Crocus (9.2%), and Elettaria (6.7%). Finally, the roasted garlic pepper sample (Table 3, Figure S5) had Allium as the highest relative read abundance (76.5%) followed by Anethum (5.9%), Elettaria (3.1%), Petroselinum (2.1%), and Curcuma (1.6%)

Discussion

Here we present a novel methodology and algorithm-based analysis to metagenomically identify edible herb and plant taxonomy as a product authenticity assay. Our method is cutting-edge and shows high performance at correctly identifying the most dominant genera in a sample. The method also allowed us to molecularly identify the main components in herbal species and commercially-available plant-derived products. However, our study also highlight the current limitations of using NGS, namely the lack of sufficient high quality reference genomes and the need to perform deep sequence which at the current prices will limit the routinely use of NGS for food authenticity.

Algorithm validation

Ten publicly available datasets were included in the validation of the mapping script against iBOL database only without the influence of our customised barcode database. In eight of them, the first hit matched the expected genera, with relative read abundance values higher than the remaining identified genera (Table 1, Figure S3). Conversely, the results from the last two samples did not fit the previous description, in the case of cinnamon and fennel. Thus, taking only the first hits from all samples into consideration, the accuracy of the method was 80%. Nevertheless, noise was apparent in all the tested samples, which was sample dependent. This might be for two main reasons. First, there may have been contamination from other plants or species in the trimmed reads from each publicly available dataset. Second, closely related genera can have a different number of barcodes in the iBOL database, which may explain the noise within the results (Table 1, Figure S3). As a consequence, a customised database containing the limited number of genera present in species and herbal products could reduce noise. To examine the impact of closely-related species, a phylogenetic tree containing all genera described in Table 1 (Table S7, Figure S3) was obtained from NCBI taxonomy. The false positives in each sample were mostly from closely related genera belonging to the same taxonomical order of the expected genus (Figure S7). Therefore, the alignment algorithm might incorrectly assign reads to the wrong species due to the high similarity between their genetic background presented in their barcode sequences. If several of our barcodes were found in both genera, the alignment algorithm would ideally map them to the correct genus, even though false positives could arise due to the high similarity between the query sequences. However, if the regions were only present in the closely related genus, the trimmed reads would map to these barcode sequences. Then, the hits assigned to this genus could be higher than the true value and, as a consequence, its relative read abundance would also be higher. These features also support the creation of a customised database in which each genus includes an equal number of sequences representing the same set of barcodes, as it could decrease the number of false positive hits. Therefore a barcode-based database for plant identification was generated here.

Species and herbal product analysis

The first hit for almost all of the 33 single-species samples (Table 2, Table S1, Figure S4) was the expected genus, with relative abundance values higher than the second hit. Hence, the overall accuracy of our pipeline and customised database was 93.9%. These data suggest that the customised barcode database has sufficient resolution to distinguish all the included genera. Although our methods showed high accuracy rates, signals of mismatches between the sequenced data and taxonomical hits were apparent (for example, authenticity test of Red Curry product Table 3, Figure S5). This is likely due to the low resolution of the barcodes for closely related species. Authenticity analysis was based on comparisons between 14 herbal product compositional profiles (Table 3, Figure S5). Regarding the four herbal products (Basil, Estragon, Dill and Saffron), the two first hits of all samples matched with their respective plant taxonomical hits. Their main component complied with their label specifications along with the presence of other non-specific taxon hits. With respect to the powdered products, we examined three single-species herbal products and three mixes. Cinnamon and paprika powder results were consistent with the data in their labels (Table 3, Figure S5). The garlic powder composition had more than one abundant plant genus in its plant taxonomical placement, even though the first hit was Allium (Table 3, Figure S5). Thus, a number of substitutes or non-declared species may be present in this powdered product. The three types of curry powder contained several plant species representing different genera and, as mentioned above, the overall accuracy of the algorithm is decreased when several genera are included in a single sample. Nevertheless, the main components in each mix should be readily identifiable. Genera such as Trigonella, Curcuma, Coriandrum, and Cuminum were detected in the curry powders, which was consistent with the product label data. Furthermore, the other detected genera were closely related to the ones previously mentioned. Thus, the composition was consistent with the product description. The red and green curry samples had different plant taxonomical profiles, with their respective labels supporting these differences as the main products in the red curry were onion, garlic (Allium), and paprika (Capsicum) and the main products in green curry were garlic (Allium), ginger (Zingiber), and coriander (Coriandrum). The inclusion of other genera in their profiles could be due to the low resolution of the mapping algorithm between closely related genera or could indicate adulteration of the product. Finally, we tested four ground products (Table 3, Table S2, Figure S5). The first and second hits from chili explosion were Brassica and Armoracia, which are not closely related to the main components reported on the product label as they belong to taxonomically different families. The main expected components, Capsicum and Solanum, were found in our data, although they were the least abundant genera. Regarding citron pepper, the only genus consistent with the label information was Allium. These results could indicate adulteration in these products. In at least three herbal products (e.g., citron pepper, red garlic and pepper Table 3, Figure S5), all the main components were detected apart from the pepper (Piper nigrum). These patterns support the hypothesis that this genus might not be detectable in the results due to its initial absence in the trimmed reads of the sequenced samples. The DNA extraction, library preparation, or the sequencing technology might be unsuitable for the detection of dry pepper in a mixed product. This is a proof of concept for improving food authentication using sequencing technology. Our mapping algorithm with the customised barcode database correctly identified the components in most of the samples, albeit with noisy signals from mismatched taxa. Therefore, our mapping algorithm based on the alignment of trimmed reads against a barcode database was able to identify in details the herbal components in both plant species and commercial herbal products for food authentication, which can pave the way for evaluating the authenticity of different species or commercially available herbal products. Sequence resolution must, however, be improved to increase the qualitative and quantitative accuracy of the method, especially for complex samples.

Conclusions

Here we aimed to design and develop bioinformatics tools to evaluate the authenticity of commercially available herbs and products used in European and Danish cuisine. First, we implemented a metagenomic-based pipeline in Python3 and UNIX (Anna Delgado. Herbs Authenticity - GitHub. 2019. url: https://github.com/ADelgadoT/HerbsAuthenticate.git). The pipeline was tested with publicly available data to demonstrate its capacity to extract barcodes from trimmed reads and also to map them against public or customised barcode databases. The use of the pipeline allowed the construction of a customised database barcode through the extraction of select marker genes (rbcL, trnH-psbA, and trnL-F) from all single species included in the study. Moreover, missing entries were incorporated from publicly available resources (iBOL and ENA). The final database contained 99 sequences with an average length of 465 bp. All trimmed reads from single plant species were mapped against our customised barcode databases to evaluate the methods performance. All expected genera were detected in all the plant samples. Other plant taxa can still appear due to their low discrimination rate, as short sequences representing the same region of the genome tend to have high similarity in closely-related species. Lastly, our barcoding mapping method could identify the predominant genera according to their respective labels in commercially available herbal samples where authenticity was evaluated. Further research is required to improve the qualitative and quantitative accuracy of the approach and decrease noise, for example by using of full chloroplast sequences as queries in the alignment procedure.

Declaration of Competing Interest

The authors declare no conflict of interest. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  12 in total

1.  From barcodes to genomes: extending the concept of DNA barcoding.

Authors:  Eric Coissac; Peter M Hollingsworth; Sébastien Lavergne; Pierre Taberlet
Journal:  Mol Ecol       Date:  2016-03-14       Impact factor: 6.185

Review 2.  DNA barcoding: an efficient tool to overcome authentication challenges in the herbal market.

Authors:  Priyanka Mishra; Amit Kumar; Akshitha Nagireddy; Daya N Mani; Ashutosh K Shukla; Rakesh Tiwari; Velusamy Sundaresan
Journal:  Plant Biotechnol J       Date:  2015-06-16       Impact factor: 9.803

Review 3.  Review of Recent DNA-Based Methods for Main Food-Authentication Topics.

Authors:  Karola Böhme; Pilar Calo-Mata; Jorge Barros-Velázquez; Ignacio Ortea
Journal:  J Agric Food Chem       Date:  2019-04-01       Impact factor: 5.279

Review 4.  Plant DNA barcoding: from gene to genome.

Authors:  Xiwen Li; Yang Yang; Robert J Henry; Maurizio Rossetto; Yitao Wang; Shilin Chen
Journal:  Biol Rev Camb Philos Soc       Date:  2014-03-26

Review 5.  Coming of age: ten years of next-generation sequencing technologies.

Authors:  Sara Goodwin; John D McPherson; W Richard McCombie
Journal:  Nat Rev Genet       Date:  2016-05-17       Impact factor: 53.242

6.  Almond or Mahaleb? Orthogonal Allergen Analysis During a Live Incident Investigation by ELISA, Molecular Biology, and Protein Mass Spectrometry.

Authors:  Michael J Walker; Malcolm Burns; Milena Quaglia; Gavin Nixon; Christopher J Hopley; Kirstin M Gray; Victoria Moore; Malvinder Singh; Simon Cowen
Journal:  J AOAC Int       Date:  2017-12-05       Impact factor: 1.913

Review 7.  Food fingerprints - A valuable tool to monitor food authenticity and safety.

Authors:  Sonia Medina; Jorge A Pereira; Pedro Silva; Rosa Perestrelo; José S Câmara
Journal:  Food Chem       Date:  2018-11-09       Impact factor: 7.514

8.  ycf1, the most promising plastid DNA barcode of land plants.

Authors:  Wenpan Dong; Chao Xu; Changhao Li; Jiahui Sun; Yunjuan Zuo; Shuo Shi; Tao Cheng; Junjie Guo; Shiliang Zhou
Journal:  Sci Rep       Date:  2015-02-12       Impact factor: 4.379

Review 9.  Telling plant species apart with DNA: from barcodes to genomes.

Authors:  Peter M Hollingsworth; De-Zhu Li; Michelle van der Bank; Alex D Twyford
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2016-09-05       Impact factor: 6.237

10.  Rapid and precise alignment of raw reads against redundant databases with KMA.

Authors:  Philip T L C Clausen; Frank M Aarestrup; Ole Lund
Journal:  BMC Bioinformatics       Date:  2018-08-29       Impact factor: 3.169

View more
  1 in total

Review 1.  Application of DNA barcoding for ensuring food safety and quality.

Authors:  Jirapat Dawan; Juhee Ahn
Journal:  Food Sci Biotechnol       Date:  2022-07-27       Impact factor: 3.231

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.